Run static workflow of EnGens

  • remember to align the trajectory (align = True when constructing EnGen)

  • make sure your binding_site_selstr is something that is generalizable to different possibly mutated residues

  • same for the featurization (do not use all atom featurization - since different residues have different number of atoms)

  • do not use TICA/HDE

  • do not use VAMP nets to select features

these are only for use with time series data (MDs)

[1]:
from engens.core.PrepStatic import *
Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
[2]:
pdbIds =  "2rd0 3hhm 3hiz 4jps \
5swg 5swo 5swp 5swr 5swt 5sx8 5sx9 5sxa  \
5sxb 5sxc 5sxd 5sxe 5sxf 5sxi 5sxj 5sxk \
5t8f 5ubt 6g6w 6pyr 6pyu \
5uk8 5ukj 5ul1 5xgh 5xgi 5xgj 6nct \
4a55 2y3a \
5dxu 5m6u 5t8f 5ubt 6g6w 6pyr 6pyu".split()
[3]:
prep_class = PrepStatic(pdb_codes=pdbIds, dst_folder="./test_PI3K_new")
======================================================
STEP 1 - Downloading renumbered pdbs and fixing files
======================================================
100%|██████████| 41/41 [00:00<00:00, 1611.49it/s]
Found existing test_PI3K_new/structure_output/2rd0_renum.pdb1
Found existing test_PI3K_new/structure_output/2rd0_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/3hhm_renum.pdb1
Found existing test_PI3K_new/structure_output/3hhm_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/3hiz_renum.pdb1
Found existing test_PI3K_new/structure_output/3hiz_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/4jps_renum.pdb1
Found existing test_PI3K_new/structure_output/4jps_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5swg_renum.pdb1
Found existing test_PI3K_new/structure_output/5swg_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5swo_renum.pdb1
Found existing test_PI3K_new/structure_output/5swo_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5swp_renum.pdb1
Found existing test_PI3K_new/structure_output/5swp_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5swr_renum.pdb1
Found existing test_PI3K_new/structure_output/5swr_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5swt_renum.pdb1
Found existing test_PI3K_new/structure_output/5swt_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sx8_renum.pdb1
Found existing test_PI3K_new/structure_output/5sx8_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sx9_renum.pdb1
Found existing test_PI3K_new/structure_output/5sx9_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxa_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxa_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxb_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxb_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxc_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxc_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxd_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxd_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxe_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxe_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxf_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxf_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxi_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxi_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxj_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxj_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxk_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxk_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5t8f_renum.pdb1
Found existing test_PI3K_new/structure_output/5t8f_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5ubt_renum.pdb1
Found existing test_PI3K_new/structure_output/5ubt_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6g6w_renum.pdb1
Found existing test_PI3K_new/structure_output/6g6w_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6pyr_renum.pdb1
Found existing test_PI3K_new/structure_output/6pyr_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6pyu_renum.pdb1
Found existing test_PI3K_new/structure_output/6pyu_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5uk8_renum.pdb1
Found existing test_PI3K_new/structure_output/5uk8_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5ukj_renum.pdb1
Found existing test_PI3K_new/structure_output/5ukj_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5ul1_renum.pdb1
Found existing test_PI3K_new/structure_output/5ul1_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5xgh_renum.pdb1
Found existing test_PI3K_new/structure_output/5xgh_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5xgi_renum.pdb1
Found existing test_PI3K_new/structure_output/5xgi_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5xgj_renum.pdb1
Found existing test_PI3K_new/structure_output/5xgj_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6nct_renum.pdb1
Found existing test_PI3K_new/structure_output/6nct_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/4a55_renum.pdb1
Found existing test_PI3K_new/structure_output/4a55_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/2y3a_renum.pdb1
Found existing test_PI3K_new/structure_output/2y3a_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5dxu_renum.pdb1
Found existing test_PI3K_new/structure_output/5dxu_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5m6u_renum.pdb1
Found existing test_PI3K_new/structure_output/5m6u_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5t8f_renum.pdb1
Found existing test_PI3K_new/structure_output/5t8f_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5ubt_renum.pdb1
Found existing test_PI3K_new/structure_output/5ubt_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6g6w_renum.pdb1
Found existing test_PI3K_new/structure_output/6g6w_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6pyr_renum.pdb1
Found existing test_PI3K_new/structure_output/6pyr_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6pyu_renum.pdb1
Found existing test_PI3K_new/structure_output/6pyu_renum_fixed.pdb
======================================================
Successfull download and fixing of files. Location: test_PI3K_new/structure_output
======================================================
======================================================
STEP 2 - Downloading matedata associated with given codes
======================================================

Fetching the pdb and uniprot metadata for codes

Fetching metadata for PDB entries - pdb id and related entity ids

200
Fetching metadata for PDB entries - entity ids and related uniprot ids
200
Fetching metadata for PDB entries - instance ids and related sequences
200
Fetching metadata for UniProt ids

PDB code associated metadata:
    pdb_id    entity_id    accession    database    asym_ids    first_asym_id    instance_id
--  --------  -----------  -----------  ----------  ----------  ---------------  -------------
 0  2rd0      2RD0_1       P42336       UniProt     ['A']       A                2RD0.A
 1  2rd0      2RD0_2       P27986       UniProt     ['B']       B                2RD0.B
 2  3hhm      3HHM_1       P42336       UniProt     ['A']       A                3HHM.A
 3  3hhm      3HHM_2       P27986       UniProt     ['B']       B                3HHM.B
 4  3hiz      3HIZ_1       P42336       UniProt     ['A']       A                3HIZ.A
 5  3hiz      3HIZ_2       P27986       UniProt     ['B']       B                3HIZ.B
 6  4jps      4JPS_1       P42336       UniProt     ['A']       A                4JPS.A
 7  4jps      4JPS_2       P27986       UniProt     ['B']       B                4JPS.B
 8  5swg      5SWG_1       P42336       UniProt     ['A']       A                5SWG.A
 9  5swg      5SWG_2       P27986       UniProt     ['B']       B                5SWG.B
10  5swo      5SWO_1       P42336       UniProt     ['A']       A                5SWO.A
11  5swo      5SWO_2       P27986       UniProt     ['B']       B                5SWO.B
12  5swp      5SWP_1       P42336       UniProt     ['A']       A                5SWP.A
13  5swp      5SWP_2       P27986       UniProt     ['B']       B                5SWP.B
14  5swr      5SWR_1       P42336       UniProt     ['A']       A                5SWR.A
15  5swr      5SWR_2       P27986       UniProt     ['B']       B                5SWR.B
16  5swt      5SWT_1       P42336       UniProt     ['A']       A                5SWT.A
17  5swt      5SWT_2       P27986       UniProt     ['B']       B                5SWT.B
18  5sx8      5SX8_1       P42336       UniProt     ['A']       A                5SX8.A
19  5sx8      5SX8_2       P27986       UniProt     ['B']       B                5SX8.B
20  5sx9      5SX9_1       P42336       UniProt     ['A']       A                5SX9.A
21  5sx9      5SX9_2       P27986       UniProt     ['B']       B                5SX9.B
22  5sxa      5SXA_1       P42336       UniProt     ['A']       A                5SXA.A
23  5sxa      5SXA_2       P27986       UniProt     ['B']       B                5SXA.B
24  5sxb      5SXB_1       P42336       UniProt     ['A']       A                5SXB.A
25  5sxb      5SXB_2       P27986       UniProt     ['B']       B                5SXB.B
26  5sxc      5SXC_1       P42336       UniProt     ['A']       A                5SXC.A
27  5sxc      5SXC_2       P27986       UniProt     ['B']       B                5SXC.B
28  5sxd      5SXD_1       P42336       UniProt     ['A']       A                5SXD.A
29  5sxd      5SXD_2       P27986       UniProt     ['B']       B                5SXD.B
30  5sxe      5SXE_1       P42336       UniProt     ['A']       A                5SXE.A
31  5sxe      5SXE_2       P27986       UniProt     ['B']       B                5SXE.B
32  5sxf      5SXF_1       P42336       UniProt     ['A']       A                5SXF.A
33  5sxf      5SXF_2       P27986       UniProt     ['B']       B                5SXF.B
34  5sxi      5SXI_1       P42336       UniProt     ['A']       A                5SXI.A
35  5sxi      5SXI_2       P27986       UniProt     ['B']       B                5SXI.B
36  5sxj      5SXJ_1       P42336       UniProt     ['A']       A                5SXJ.A
37  5sxj      5SXJ_2       P27986       UniProt     ['B']       B                5SXJ.B
38  5sxk      5SXK_1       P42336       UniProt     ['A']       A                5SXK.A
39  5sxk      5SXK_2       P27986       UniProt     ['B']       B                5SXK.B
40  5t8f      5T8F_1       O00329       UniProt     ['A']       A                5T8F.A
41  5t8f      5T8F_2       P23727       UniProt     ['B']       B                5T8F.B
42  5ubt      5UBT_1       O00329       UniProt     ['A']       A                5UBT.A
43  5ubt      5UBT_2       P27986       UniProt     ['B']       B                5UBT.B
44  6g6w      6G6W_1       O00329       UniProt     ['A']       A                6G6W.A
45  6g6w      6G6W_2       P23727       UniProt     ['B']       B                6G6W.B
46  6pyr      6PYR_1       O00329       UniProt     ['A']       A                6PYR.A
47  6pyr      6PYR_2       P27986       UniProt     ['B']       B                6PYR.B
48  6pyu      6PYU_1       O00329       UniProt     ['A']       A                6PYU.A
49  6pyu      6PYU_2       P27986       UniProt     ['B']       B                6PYU.B
50  5uk8      5UK8_1       P42336       UniProt     ['A']       A                5UK8.A
51  5uk8      5UK8_2       P27986       UniProt     ['B']       B                5UK8.B
52  5ukj      5UKJ_1       P42336       UniProt     ['A']       A                5UKJ.A
53  5ukj      5UKJ_2       P27986       UniProt     ['B']       B                5UKJ.B
54  5ul1      5UL1_1       P42336       UniProt     ['A']       A                5UL1.A
55  5ul1      5UL1_2       P27986       UniProt     ['B']       B                5UL1.B
56  5xgh      5XGH_1       P42336       UniProt     ['A']       A                5XGH.A
57  5xgh      5XGH_2       P27986       UniProt     ['B']       B                5XGH.B
58  5xgi      5XGI_1       P42336       UniProt     ['A']       A                5XGI.A
59  5xgi      5XGI_2       P27986       UniProt     ['B']       B                5XGI.B
60  5xgj      5XGJ_1       P42336       UniProt     ['A']       A                5XGJ.A
61  5xgj      5XGJ_2       P27986       UniProt     ['B']       B                5XGJ.B
62  6nct      6NCT_1       P42336       UniProt     ['A']       A                6NCT.A
63  6nct      6NCT_2       P27986       UniProt     ['B']       B                6NCT.B
64  4a55      4A55_1       P42337       UniProt     ['A']       A                4A55.A
65  4a55      4A55_2       P27986       UniProt     ['B']       B                4A55.B
66  2y3a      2Y3A_1       Q8BTI9       UniProt     ['A']       A                2Y3A.A
67  2y3a      2Y3A_2       O08908       UniProt     ['B']       B                2Y3A.B
68  5dxu      5DXU_1       O00329       UniProt     ['A']       A                5DXU.A
69  5dxu      5DXU_2       P23727       UniProt     ['B']       B                5DXU.B
70  5m6u      5M6U_1       O00329       UniProt     ['A']       A                5M6U.A
71  5m6u      5M6U_2       P27986       UniProt     ['B']       B                5M6U.B
72  5t8f      5T8F_1       O00329       UniProt     ['A']       A                5T8F.A
73  5t8f      5T8F_2       P23727       UniProt     ['B']       B                5T8F.B
74  5ubt      5UBT_1       O00329       UniProt     ['A']       A                5UBT.A
75  5ubt      5UBT_2       P27986       UniProt     ['B']       B                5UBT.B
76  6g6w      6G6W_1       O00329       UniProt     ['A']       A                6G6W.A
77  6g6w      6G6W_2       P23727       UniProt     ['B']       B                6G6W.B
78  6pyr      6PYR_1       O00329       UniProt     ['A']       A                6PYR.A
79  6pyr      6PYR_2       P27986       UniProt     ['B']       B                6PYR.B
80  6pyu      6PYU_1       O00329       UniProt     ['A']       A                6PYU.A
81  6pyu      6PYU_2       P27986       UniProt     ['B']       B                6PYU.B

UNIPROT metadata:
    accession_id    id           full_name
--  --------------  -----------  ------------------------------------------------------------------------------
 0  P42336          PK3CA_HUMAN  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
 1  P27986          P85A_HUMAN   Phosphatidylinositol 3-kinase regulatory subunit alpha
 2  O00329          PK3CD_HUMAN  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit delta isoform
 3  P23727          P85A_BOVIN   Phosphatidylinositol 3-kinase regulatory subunit alpha
 4  P42337          PK3CA_MOUSE  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
 5  Q8BTI9          PK3CB_MOUSE  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit beta isoform
 6  O08908          P85B_MOUSE   Phosphatidylinositol 3-kinase regulatory subunit beta

======================================================
STEP 3 - Define domains and map UNIPROT accessions to the domains
======================================================
Attention - this step requires user input! (press enter to continue)

Select the number of domains/chains in your complex: 2
Total of #2 domains!

Map each domain to uniprot accession (by their accession_id).

Map uniprot accessions to DOMAIN0
Choose from:
    accession_id    id           full_name
--  --------------  -----------  ------------------------------------------------------------------------------
 0  P42336          PK3CA_HUMAN  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
 1  P27986          P85A_HUMAN   Phosphatidylinositol 3-kinase regulatory subunit alpha
 2  O00329          PK3CD_HUMAN  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit delta isoform
 3  P23727          P85A_BOVIN   Phosphatidylinositol 3-kinase regulatory subunit alpha
 4  P42337          PK3CA_MOUSE  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
 5  Q8BTI9          PK3CB_MOUSE  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit beta isoform
 6  O08908          P85B_MOUSE   Phosphatidylinositol 3-kinase regulatory subunit beta

Input indices of uniprot metadata separated by space (e.g., '0 3 5' )
 to select all input string all
0
[0]
Selected for DOMAIN0:
    accession_id    id           full_name
--  --------------  -----------  ------------------------------------------------------------------------------
 0  P42336          PK3CA_HUMAN  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform

Map uniprot accessions to DOMAIN1
Choose from:
    accession_id    id           full_name
--  --------------  -----------  ------------------------------------------------------------------------------
 0  P27986          P85A_HUMAN   Phosphatidylinositol 3-kinase regulatory subunit alpha
 1  O00329          PK3CD_HUMAN  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit delta isoform
 2  P23727          P85A_BOVIN   Phosphatidylinositol 3-kinase regulatory subunit alpha
 3  P42337          PK3CA_MOUSE  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
 4  Q8BTI9          PK3CB_MOUSE  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit beta isoform
 5  O08908          P85B_MOUSE   Phosphatidylinositol 3-kinase regulatory subunit beta

Input indices of uniprot metadata separated by space (e.g., '0 3 5' )
 to select all input string all
0
[0]
Selected for DOMAIN1:
    accession_id    id          full_name
--  --------------  ----------  ------------------------------------------------------
 0  P27986          P85A_HUMAN  Phosphatidylinositol 3-kinase regulatory subunit alpha

WARNING: uniprot id O00329 not mapped to any domain
WARNING: uniprot id P23727 not mapped to any domain
WARNING: uniprot id P42337 not mapped to any domain
WARNING: uniprot id Q8BTI9 not mapped to any domain
WARNING: uniprot id O08908 not mapped to any domain

UNIPROT metadata mapped to domains:
    domain    acc_id    id           full_name
--  --------  --------  -----------  ------------------------------------------------------------------------------
 0  DOMAIN0   P42336    PK3CA_HUMAN  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
 1  DOMAIN1   P27986    P85A_HUMAN   Phosphatidylinositol 3-kinase regulatory subunit alpha
 2            O00329    PK3CD_HUMAN  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit delta isoform
 3            P23727    P85A_BOVIN   Phosphatidylinositol 3-kinase regulatory subunit alpha
 4            P42337    PK3CA_MOUSE  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
 5            Q8BTI9    PK3CB_MOUSE  Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit beta isoform
 6            O08908    P85B_MOUSE   Phosphatidylinositol 3-kinase regulatory subunit beta

DOMAIN metadata:
    domain_name    uniprot_id
--  -------------  ------------
 0  DOMAIN0        P42336
 1  DOMAIN1        P27986

======================================================
STEP 4 - Defining final selection for processing
======================================================

Attention - this step requires user input! (press enter to continue)

Your current inputs contain 2 domains with the following associated uniprots
    domain_name    uniprot_id    accession_name
--  -------------  ------------  ------------------------------------------------------------------------------
 0  DOMAIN0        P42336        Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
 1  DOMAIN1        P27986        Phosphatidylinositol 3-kinase regulatory subunit alpha

Please choose the domain-accession pairs you want to consider for your main analysis
Input indices of domain-uniprot metadata separated by space (e.g., '0 3 5' )0 1
Selected:
    domain_name    uniprot_id    accession_name
--  -------------  ------------  ------------------------------------------------------------------------------
 0  DOMAIN0        P42336        Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
 1  DOMAIN1        P27986        Phosphatidylinositol 3-kinase regulatory subunit alpha
['DOMAIN0-P42336', 'DOMAIN1-P27986']

PDB codes satisfying the above selection (containing all domains with any of the related uniprot accessions)

['2rd0' '3hhm' '3hiz' '4jps' '5swg' '5swo' '5swp' '5swr' '5swt' '5sx8'
 '5sx9' '5sxa' '5sxb' '5sxc' '5sxd' '5sxe' '5sxf' '5sxi' '5sxj' '5sxk'
 '5uk8' '5ukj' '5ul1' '5xgh' '5xgi' '5xgj' '6nct']
With following metadata:
--  ----  -------  ------
 0  2rd0  DOMAIN0  P42336
 1  2rd0  DOMAIN1  P27986
 2  3hhm  DOMAIN0  P42336
 3  3hhm  DOMAIN1  P27986
 4  3hiz  DOMAIN0  P42336
 5  3hiz  DOMAIN1  P27986
 6  4jps  DOMAIN0  P42336
 7  4jps  DOMAIN1  P27986
 8  5swg  DOMAIN0  P42336
 9  5swg  DOMAIN1  P27986
10  5swo  DOMAIN0  P42336
11  5swo  DOMAIN1  P27986
12  5swp  DOMAIN0  P42336
13  5swp  DOMAIN1  P27986
14  5swr  DOMAIN0  P42336
15  5swr  DOMAIN1  P27986
16  5swt  DOMAIN0  P42336
17  5swt  DOMAIN1  P27986
18  5sx8  DOMAIN0  P42336
19  5sx8  DOMAIN1  P27986
20  5sx9  DOMAIN0  P42336
21  5sx9  DOMAIN1  P27986
22  5sxa  DOMAIN0  P42336
23  5sxa  DOMAIN1  P27986
24  5sxb  DOMAIN0  P42336
25  5sxb  DOMAIN1  P27986
26  5sxc  DOMAIN0  P42336
27  5sxc  DOMAIN1  P27986
28  5sxd  DOMAIN0  P42336
29  5sxd  DOMAIN1  P27986
30  5sxe  DOMAIN0  P42336
31  5sxe  DOMAIN1  P27986
32  5sxf  DOMAIN0  P42336
33  5sxf  DOMAIN1  P27986
34  5sxi  DOMAIN0  P42336
35  5sxi  DOMAIN1  P27986
36  5sxj  DOMAIN0  P42336
37  5sxj  DOMAIN1  P27986
38  5sxk  DOMAIN0  P42336
39  5sxk  DOMAIN1  P27986
40  5uk8  DOMAIN0  P42336
41  5uk8  DOMAIN1  P27986
42  5ukj  DOMAIN0  P42336
43  5ukj  DOMAIN1  P27986
44  5ul1  DOMAIN0  P42336
45  5ul1  DOMAIN1  P27986
46  5xgh  DOMAIN0  P42336
47  5xgh  DOMAIN1  P27986
48  5xgi  DOMAIN0  P42336
49  5xgi  DOMAIN1  P27986
50  5xgj  DOMAIN0  P42336
51  5xgj  DOMAIN1  P27986
52  6nct  DOMAIN0  P42336
53  6nct  DOMAIN1  P27986
--  ----  -------  ------

Discarding the following entries (that do not contain selected domains:
{'5t8f', '4a55', '6g6w', '5dxu', '5m6u', '6pyu', '6pyr', '5ubt', '2y3a'}

======================================================
STEP 5- Extracting coordinates associated with given domains (per-accession)
======================================================
test_PI3K_new/structure_output/DOMAIN0/P42336/2rd0_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/2rd0_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/3hhm_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/3hhm_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/3hiz_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/3hiz_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/4jps_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/4jps_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swg_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swg_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swo_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swo_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swp_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swp_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swr_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swr_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swt_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swt_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sx8_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sx8_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sx9_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sx9_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxa_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxa_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxb_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxb_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxc_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxc_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxd_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxd_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxe_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxe_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxf_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxf_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxi_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxi_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxj_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxj_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxk_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxk_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5uk8_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5uk8_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5ukj_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5ukj_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5ul1_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5ul1_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5xgh_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5xgh_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5xgi_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5xgi_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5xgj_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5xgj_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/6nct_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/6nct_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/2rd0_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/2rd0_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/3hhm_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/3hhm_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/3hiz_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/3hiz_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/4jps_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/4jps_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swg_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swg_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swo_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swo_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swp_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swp_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swr_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swr_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swt_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swt_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sx8_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sx8_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sx9_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sx9_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxa_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxa_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxb_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxb_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxc_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxc_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxd_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxd_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxe_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxe_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxf_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxf_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxi_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxi_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxj_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxj_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxk_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxk_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5uk8_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5uk8_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5ukj_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5ukj_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5ul1_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5ul1_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5xgh_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5xgh_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5xgi_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5xgi_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5xgj_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5xgj_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/6nct_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/6nct_renum_fixed_B.pdb
Extracted per-accession domain coordinates
DOMAIN metadata:
    domain_name    uniprot_id    full_name                                                                       pdb_files_dir
--  -------------  ------------  ------------------------------------------------------------------------------  ---------------------------------------------
 0  DOMAIN0        P42336        Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform  test_PI3K_new/structure_output/DOMAIN0/P42336
 1  DOMAIN1        P27986        Phosphatidylinositol 3-kinase regulatory subunit alpha                          test_PI3K_new/structure_output/DOMAIN1/P27986
======================================================
STEP 6- Creating domain alignments (accoarding to uniprot accession numbering)
======================================================
Creating MSA for domain DOMAIN0 based on P42336 uniprot accession

Extracted MSA for domain DOMAIN0 based on P42336 uniprot accession to test_PI3K_new/sequence_output/DOMAIN0/P42336/msa.fasta
Residue range 1-5028
Check out the MSA at test_PI3K_new/sequence_output/DOMAIN0/P42336/msa.fasta
Check out the MSA visualized at test_PI3K_new/sequence_output/DOMAIN0/P42336/msa.html
Creating MSA for domain DOMAIN1 based on P27986 uniprot accession

Extracted MSA for domain DOMAIN1 based on P27986 uniprot accession to test_PI3K_new/sequence_output/DOMAIN1/P27986/msa.fasta
Residue range 322-600
Check out the MSA at test_PI3K_new/sequence_output/DOMAIN1/P27986/msa.fasta
Check out the MSA visualized at test_PI3K_new/sequence_output/DOMAIN1/P27986/msa.html
Extracted per-accession domain alignments
======================================================
STEP 7- Extracting per-accession domain MCS
======================================================
Successfully extracted per-accession domain MCS!
======================================================
STEP 8 - Processing multi-accession domains (requires mTM-align)
======================================================
Multi-accession domains (to be processed): []
Successfully processed multi-accession domain MCS! (if present)
======================================================
STEP 9 - Combining the final MCS data
======================================================

Not the case of multi-accession - MCS extracted based on accession residue numbers

Step 9.1 - copy all MCS to test_PI3K_new/structure_output/final_mcs for processing and extract backbones
Successfully copied all cleaned MCS domains to test_PI3K_new/structure_output/final_mcs!

Step 9.2 - combine multiple domains from same pdb id to single files
2rd0
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_2rd0_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_2rd0_renum_fixed_B_mcs_bb.pdb']
3hhm
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_3hhm_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_3hhm_renum_fixed_B_mcs_bb.pdb']
3hiz
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_3hiz_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_3hiz_renum_fixed_B_mcs_bb.pdb']
4jps
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_4jps_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_4jps_renum_fixed_B_mcs_bb.pdb']
5swg
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5swg_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5swg_renum_fixed_B_mcs_bb.pdb']
5swo
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5swo_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5swo_renum_fixed_B_mcs_bb.pdb']
5swp
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5swp_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5swp_renum_fixed_B_mcs_bb.pdb']
5swr
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5swr_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5swr_renum_fixed_B_mcs_bb.pdb']
5swt
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5swt_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5swt_renum_fixed_B_mcs_bb.pdb']
5sx8
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sx8_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sx8_renum_fixed_B_mcs_bb.pdb']
5sx9
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sx9_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sx9_renum_fixed_B_mcs_bb.pdb']
5sxa
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxa_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxa_renum_fixed_B_mcs_bb.pdb']
5sxb
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxb_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxb_renum_fixed_B_mcs_bb.pdb']
5sxc
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxc_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxc_renum_fixed_B_mcs_bb.pdb']
5sxd
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxd_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxd_renum_fixed_B_mcs_bb.pdb']
5sxe
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxe_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxe_renum_fixed_B_mcs_bb.pdb']
5sxf
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxf_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxf_renum_fixed_B_mcs_bb.pdb']
5sxi
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxi_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxi_renum_fixed_B_mcs_bb.pdb']
5sxj
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxj_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxj_renum_fixed_B_mcs_bb.pdb']
5sxk
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxk_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxk_renum_fixed_B_mcs_bb.pdb']
5uk8
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5uk8_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5uk8_renum_fixed_B_mcs_bb.pdb']
5ukj
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5ukj_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5ukj_renum_fixed_B_mcs_bb.pdb']
5ul1
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5ul1_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5ul1_renum_fixed_B_mcs_bb.pdb']
5xgh
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5xgh_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5xgh_renum_fixed_B_mcs_bb.pdb']
5xgi
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5xgi_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5xgi_renum_fixed_B_mcs_bb.pdb']
5xgj
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5xgj_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5xgj_renum_fixed_B_mcs_bb.pdb']
6nct
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_6nct_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_6nct_renum_fixed_B_mcs_bb.pdb']
Successfully combined all cleaned MCS domains to complex PDBs in test_PI3K_new/structure_output/final_mcs!

Step 9.3 - combine multiple files to single EnGens input
Successfully combined all cleaned MCS domains to complex PDBs in test_PI3K_new/structure_output/final_mcs!
======================================================
Successfully completed PDB pre-processing
======================================================
[4]:
from engens.core.EnGens import *
[6]:
engen = EnGen(prep_class.final_traj_file, prep_class.final_pdb_files[0], align=True)

Aligning trajectory: 100%|██████████| 1/1 [00:00<00:00,  6.37it/s]
Cleaning files...: 100%|██████████| 1/1 [00:00<00:00, 3452.10it/s]

Extract features from the PDB files

Input: reference PDB and trajectory

Output: featurized trajectory


Steps:

  1. Load reference PDB and trajectory in the EnGen object

  2. Provide set of featurizations of interest (or use default)

  3. Evaluate different featurization (optional)

  4. Choose the best featurization

  5. Extract those features

[7]:
# required imports
import engens.core.FeatureSelector as fs
import pickle
import mdshare
import mdtraj
import numpy as np
import nglview
from IPython.display import Javascript, display
import json

Step 1 - load the structure and trajectory

Provide the path to the files with the reference trajectory and topology. (You can use any format that mdtraj.load will take as input).

Optionally, provide a subset of the structure that you will use for featurization (e.g. binding site) as a atom selection string or a list of atom indices.

[7]:

#visualize the trajectory (optional - if trajectory too large, skip this step) nglwidget = engen.show_animated_traj() nglwidget.clear_representations() nglwidget.add_ball_and_stick() nglwidget.center() nglwidget

Step 2 - select different featurizations

Here we select ways to featurize the trajectory. Any PyEmma trajectory featurization can be used in this step and any of the parameters of the respective featurizations can be provided. Users can also use the default initialization which includes three sets of features: (1) amino-acid pairwise distances; (2) torsion angles and (3) amino-acid pairwise distances with the torsion angels.

[ ]:
# remove any existing featurizers
engen.reset_featurizers()
# initialize default features
engen.init_featurizers_default()
description = engen.describe_featurizers()
print(description)

Step 3 - evaluate the featurizations

This step is optional - we recommend evaluating the featurizations and picking the best using PyEmma’s implementation of VAMP approach .

This helps you choose a set of features with which to proceed to the next Workflow.

Not an option for crystal structure input!!

Step 4 - pick the featurization

We suggest using the featurization which gives you the highest VAMP2 score from the analysis above. To do so, run the cell below.

[ ]:
#apply features
engen.apply_featurizations()
#print possible features
print(engen.describe_featurizers())
#select the number of the desired feature
feat_num = 0
# initialize selector
featsel = fs.UserFeatureSelection(feat_num, engen)
#select the feature
featsel.select_feature()

Step 5 - save the results as input for Workflow2 - dimensionality reduction

[ ]:
# save the results for next workflow
with open("wf1_resulting_EnGen.pickle", "wb") as file:
    pickle.dump(engen, file, -1)