Run static workflow of EnGens
remember to align the trajectory (align = True when constructing EnGen)
make sure your binding_site_selstr is something that is generalizable to different possibly mutated residues
same for the featurization (do not use all atom featurization - since different residues have different number of atoms)
do not use TICA/HDE
do not use VAMP nets to select features
these are only for use with time series data (MDs)
[1]:
from engens.core.PrepStatic import *
Warning: importing 'simtk.openmm' is deprecated. Import 'openmm' instead.
[2]:
pdbIds = "2rd0 3hhm 3hiz 4jps \
5swg 5swo 5swp 5swr 5swt 5sx8 5sx9 5sxa \
5sxb 5sxc 5sxd 5sxe 5sxf 5sxi 5sxj 5sxk \
5t8f 5ubt 6g6w 6pyr 6pyu \
5uk8 5ukj 5ul1 5xgh 5xgi 5xgj 6nct \
4a55 2y3a \
5dxu 5m6u 5t8f 5ubt 6g6w 6pyr 6pyu".split()
[3]:
prep_class = PrepStatic(pdb_codes=pdbIds, dst_folder="./test_PI3K_new")
======================================================
STEP 1 - Downloading renumbered pdbs and fixing files
======================================================
100%|██████████| 41/41 [00:00<00:00, 1611.49it/s]
Found existing test_PI3K_new/structure_output/2rd0_renum.pdb1
Found existing test_PI3K_new/structure_output/2rd0_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/3hhm_renum.pdb1
Found existing test_PI3K_new/structure_output/3hhm_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/3hiz_renum.pdb1
Found existing test_PI3K_new/structure_output/3hiz_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/4jps_renum.pdb1
Found existing test_PI3K_new/structure_output/4jps_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5swg_renum.pdb1
Found existing test_PI3K_new/structure_output/5swg_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5swo_renum.pdb1
Found existing test_PI3K_new/structure_output/5swo_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5swp_renum.pdb1
Found existing test_PI3K_new/structure_output/5swp_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5swr_renum.pdb1
Found existing test_PI3K_new/structure_output/5swr_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5swt_renum.pdb1
Found existing test_PI3K_new/structure_output/5swt_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sx8_renum.pdb1
Found existing test_PI3K_new/structure_output/5sx8_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sx9_renum.pdb1
Found existing test_PI3K_new/structure_output/5sx9_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxa_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxa_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxb_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxb_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxc_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxc_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxd_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxd_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxe_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxe_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxf_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxf_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxi_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxi_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxj_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxj_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5sxk_renum.pdb1
Found existing test_PI3K_new/structure_output/5sxk_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5t8f_renum.pdb1
Found existing test_PI3K_new/structure_output/5t8f_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5ubt_renum.pdb1
Found existing test_PI3K_new/structure_output/5ubt_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6g6w_renum.pdb1
Found existing test_PI3K_new/structure_output/6g6w_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6pyr_renum.pdb1
Found existing test_PI3K_new/structure_output/6pyr_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6pyu_renum.pdb1
Found existing test_PI3K_new/structure_output/6pyu_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5uk8_renum.pdb1
Found existing test_PI3K_new/structure_output/5uk8_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5ukj_renum.pdb1
Found existing test_PI3K_new/structure_output/5ukj_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5ul1_renum.pdb1
Found existing test_PI3K_new/structure_output/5ul1_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5xgh_renum.pdb1
Found existing test_PI3K_new/structure_output/5xgh_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5xgi_renum.pdb1
Found existing test_PI3K_new/structure_output/5xgi_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5xgj_renum.pdb1
Found existing test_PI3K_new/structure_output/5xgj_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6nct_renum.pdb1
Found existing test_PI3K_new/structure_output/6nct_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/4a55_renum.pdb1
Found existing test_PI3K_new/structure_output/4a55_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/2y3a_renum.pdb1
Found existing test_PI3K_new/structure_output/2y3a_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5dxu_renum.pdb1
Found existing test_PI3K_new/structure_output/5dxu_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5m6u_renum.pdb1
Found existing test_PI3K_new/structure_output/5m6u_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5t8f_renum.pdb1
Found existing test_PI3K_new/structure_output/5t8f_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/5ubt_renum.pdb1
Found existing test_PI3K_new/structure_output/5ubt_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6g6w_renum.pdb1
Found existing test_PI3K_new/structure_output/6g6w_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6pyr_renum.pdb1
Found existing test_PI3K_new/structure_output/6pyr_renum_fixed.pdb
Found existing test_PI3K_new/structure_output/6pyu_renum.pdb1
Found existing test_PI3K_new/structure_output/6pyu_renum_fixed.pdb
======================================================
Successfull download and fixing of files. Location: test_PI3K_new/structure_output
======================================================
======================================================
STEP 2 - Downloading matedata associated with given codes
======================================================
Fetching the pdb and uniprot metadata for codes
Fetching metadata for PDB entries - pdb id and related entity ids
200
Fetching metadata for PDB entries - entity ids and related uniprot ids
200
Fetching metadata for PDB entries - instance ids and related sequences
200
Fetching metadata for UniProt ids
PDB code associated metadata:
pdb_id entity_id accession database asym_ids first_asym_id instance_id
-- -------- ----------- ----------- ---------- ---------- --------------- -------------
0 2rd0 2RD0_1 P42336 UniProt ['A'] A 2RD0.A
1 2rd0 2RD0_2 P27986 UniProt ['B'] B 2RD0.B
2 3hhm 3HHM_1 P42336 UniProt ['A'] A 3HHM.A
3 3hhm 3HHM_2 P27986 UniProt ['B'] B 3HHM.B
4 3hiz 3HIZ_1 P42336 UniProt ['A'] A 3HIZ.A
5 3hiz 3HIZ_2 P27986 UniProt ['B'] B 3HIZ.B
6 4jps 4JPS_1 P42336 UniProt ['A'] A 4JPS.A
7 4jps 4JPS_2 P27986 UniProt ['B'] B 4JPS.B
8 5swg 5SWG_1 P42336 UniProt ['A'] A 5SWG.A
9 5swg 5SWG_2 P27986 UniProt ['B'] B 5SWG.B
10 5swo 5SWO_1 P42336 UniProt ['A'] A 5SWO.A
11 5swo 5SWO_2 P27986 UniProt ['B'] B 5SWO.B
12 5swp 5SWP_1 P42336 UniProt ['A'] A 5SWP.A
13 5swp 5SWP_2 P27986 UniProt ['B'] B 5SWP.B
14 5swr 5SWR_1 P42336 UniProt ['A'] A 5SWR.A
15 5swr 5SWR_2 P27986 UniProt ['B'] B 5SWR.B
16 5swt 5SWT_1 P42336 UniProt ['A'] A 5SWT.A
17 5swt 5SWT_2 P27986 UniProt ['B'] B 5SWT.B
18 5sx8 5SX8_1 P42336 UniProt ['A'] A 5SX8.A
19 5sx8 5SX8_2 P27986 UniProt ['B'] B 5SX8.B
20 5sx9 5SX9_1 P42336 UniProt ['A'] A 5SX9.A
21 5sx9 5SX9_2 P27986 UniProt ['B'] B 5SX9.B
22 5sxa 5SXA_1 P42336 UniProt ['A'] A 5SXA.A
23 5sxa 5SXA_2 P27986 UniProt ['B'] B 5SXA.B
24 5sxb 5SXB_1 P42336 UniProt ['A'] A 5SXB.A
25 5sxb 5SXB_2 P27986 UniProt ['B'] B 5SXB.B
26 5sxc 5SXC_1 P42336 UniProt ['A'] A 5SXC.A
27 5sxc 5SXC_2 P27986 UniProt ['B'] B 5SXC.B
28 5sxd 5SXD_1 P42336 UniProt ['A'] A 5SXD.A
29 5sxd 5SXD_2 P27986 UniProt ['B'] B 5SXD.B
30 5sxe 5SXE_1 P42336 UniProt ['A'] A 5SXE.A
31 5sxe 5SXE_2 P27986 UniProt ['B'] B 5SXE.B
32 5sxf 5SXF_1 P42336 UniProt ['A'] A 5SXF.A
33 5sxf 5SXF_2 P27986 UniProt ['B'] B 5SXF.B
34 5sxi 5SXI_1 P42336 UniProt ['A'] A 5SXI.A
35 5sxi 5SXI_2 P27986 UniProt ['B'] B 5SXI.B
36 5sxj 5SXJ_1 P42336 UniProt ['A'] A 5SXJ.A
37 5sxj 5SXJ_2 P27986 UniProt ['B'] B 5SXJ.B
38 5sxk 5SXK_1 P42336 UniProt ['A'] A 5SXK.A
39 5sxk 5SXK_2 P27986 UniProt ['B'] B 5SXK.B
40 5t8f 5T8F_1 O00329 UniProt ['A'] A 5T8F.A
41 5t8f 5T8F_2 P23727 UniProt ['B'] B 5T8F.B
42 5ubt 5UBT_1 O00329 UniProt ['A'] A 5UBT.A
43 5ubt 5UBT_2 P27986 UniProt ['B'] B 5UBT.B
44 6g6w 6G6W_1 O00329 UniProt ['A'] A 6G6W.A
45 6g6w 6G6W_2 P23727 UniProt ['B'] B 6G6W.B
46 6pyr 6PYR_1 O00329 UniProt ['A'] A 6PYR.A
47 6pyr 6PYR_2 P27986 UniProt ['B'] B 6PYR.B
48 6pyu 6PYU_1 O00329 UniProt ['A'] A 6PYU.A
49 6pyu 6PYU_2 P27986 UniProt ['B'] B 6PYU.B
50 5uk8 5UK8_1 P42336 UniProt ['A'] A 5UK8.A
51 5uk8 5UK8_2 P27986 UniProt ['B'] B 5UK8.B
52 5ukj 5UKJ_1 P42336 UniProt ['A'] A 5UKJ.A
53 5ukj 5UKJ_2 P27986 UniProt ['B'] B 5UKJ.B
54 5ul1 5UL1_1 P42336 UniProt ['A'] A 5UL1.A
55 5ul1 5UL1_2 P27986 UniProt ['B'] B 5UL1.B
56 5xgh 5XGH_1 P42336 UniProt ['A'] A 5XGH.A
57 5xgh 5XGH_2 P27986 UniProt ['B'] B 5XGH.B
58 5xgi 5XGI_1 P42336 UniProt ['A'] A 5XGI.A
59 5xgi 5XGI_2 P27986 UniProt ['B'] B 5XGI.B
60 5xgj 5XGJ_1 P42336 UniProt ['A'] A 5XGJ.A
61 5xgj 5XGJ_2 P27986 UniProt ['B'] B 5XGJ.B
62 6nct 6NCT_1 P42336 UniProt ['A'] A 6NCT.A
63 6nct 6NCT_2 P27986 UniProt ['B'] B 6NCT.B
64 4a55 4A55_1 P42337 UniProt ['A'] A 4A55.A
65 4a55 4A55_2 P27986 UniProt ['B'] B 4A55.B
66 2y3a 2Y3A_1 Q8BTI9 UniProt ['A'] A 2Y3A.A
67 2y3a 2Y3A_2 O08908 UniProt ['B'] B 2Y3A.B
68 5dxu 5DXU_1 O00329 UniProt ['A'] A 5DXU.A
69 5dxu 5DXU_2 P23727 UniProt ['B'] B 5DXU.B
70 5m6u 5M6U_1 O00329 UniProt ['A'] A 5M6U.A
71 5m6u 5M6U_2 P27986 UniProt ['B'] B 5M6U.B
72 5t8f 5T8F_1 O00329 UniProt ['A'] A 5T8F.A
73 5t8f 5T8F_2 P23727 UniProt ['B'] B 5T8F.B
74 5ubt 5UBT_1 O00329 UniProt ['A'] A 5UBT.A
75 5ubt 5UBT_2 P27986 UniProt ['B'] B 5UBT.B
76 6g6w 6G6W_1 O00329 UniProt ['A'] A 6G6W.A
77 6g6w 6G6W_2 P23727 UniProt ['B'] B 6G6W.B
78 6pyr 6PYR_1 O00329 UniProt ['A'] A 6PYR.A
79 6pyr 6PYR_2 P27986 UniProt ['B'] B 6PYR.B
80 6pyu 6PYU_1 O00329 UniProt ['A'] A 6PYU.A
81 6pyu 6PYU_2 P27986 UniProt ['B'] B 6PYU.B
UNIPROT metadata:
accession_id id full_name
-- -------------- ----------- ------------------------------------------------------------------------------
0 P42336 PK3CA_HUMAN Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
1 P27986 P85A_HUMAN Phosphatidylinositol 3-kinase regulatory subunit alpha
2 O00329 PK3CD_HUMAN Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit delta isoform
3 P23727 P85A_BOVIN Phosphatidylinositol 3-kinase regulatory subunit alpha
4 P42337 PK3CA_MOUSE Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
5 Q8BTI9 PK3CB_MOUSE Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit beta isoform
6 O08908 P85B_MOUSE Phosphatidylinositol 3-kinase regulatory subunit beta
======================================================
STEP 3 - Define domains and map UNIPROT accessions to the domains
======================================================
Attention - this step requires user input! (press enter to continue)
Select the number of domains/chains in your complex: 2
Total of #2 domains!
Map each domain to uniprot accession (by their accession_id).
Map uniprot accessions to DOMAIN0
Choose from:
accession_id id full_name
-- -------------- ----------- ------------------------------------------------------------------------------
0 P42336 PK3CA_HUMAN Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
1 P27986 P85A_HUMAN Phosphatidylinositol 3-kinase regulatory subunit alpha
2 O00329 PK3CD_HUMAN Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit delta isoform
3 P23727 P85A_BOVIN Phosphatidylinositol 3-kinase regulatory subunit alpha
4 P42337 PK3CA_MOUSE Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
5 Q8BTI9 PK3CB_MOUSE Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit beta isoform
6 O08908 P85B_MOUSE Phosphatidylinositol 3-kinase regulatory subunit beta
Input indices of uniprot metadata separated by space (e.g., '0 3 5' )
to select all input string all
0
[0]
Selected for DOMAIN0:
accession_id id full_name
-- -------------- ----------- ------------------------------------------------------------------------------
0 P42336 PK3CA_HUMAN Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
Map uniprot accessions to DOMAIN1
Choose from:
accession_id id full_name
-- -------------- ----------- ------------------------------------------------------------------------------
0 P27986 P85A_HUMAN Phosphatidylinositol 3-kinase regulatory subunit alpha
1 O00329 PK3CD_HUMAN Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit delta isoform
2 P23727 P85A_BOVIN Phosphatidylinositol 3-kinase regulatory subunit alpha
3 P42337 PK3CA_MOUSE Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
4 Q8BTI9 PK3CB_MOUSE Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit beta isoform
5 O08908 P85B_MOUSE Phosphatidylinositol 3-kinase regulatory subunit beta
Input indices of uniprot metadata separated by space (e.g., '0 3 5' )
to select all input string all
0
[0]
Selected for DOMAIN1:
accession_id id full_name
-- -------------- ---------- ------------------------------------------------------
0 P27986 P85A_HUMAN Phosphatidylinositol 3-kinase regulatory subunit alpha
WARNING: uniprot id O00329 not mapped to any domain
WARNING: uniprot id P23727 not mapped to any domain
WARNING: uniprot id P42337 not mapped to any domain
WARNING: uniprot id Q8BTI9 not mapped to any domain
WARNING: uniprot id O08908 not mapped to any domain
UNIPROT metadata mapped to domains:
domain acc_id id full_name
-- -------- -------- ----------- ------------------------------------------------------------------------------
0 DOMAIN0 P42336 PK3CA_HUMAN Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
1 DOMAIN1 P27986 P85A_HUMAN Phosphatidylinositol 3-kinase regulatory subunit alpha
2 O00329 PK3CD_HUMAN Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit delta isoform
3 P23727 P85A_BOVIN Phosphatidylinositol 3-kinase regulatory subunit alpha
4 P42337 PK3CA_MOUSE Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
5 Q8BTI9 PK3CB_MOUSE Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit beta isoform
6 O08908 P85B_MOUSE Phosphatidylinositol 3-kinase regulatory subunit beta
DOMAIN metadata:
domain_name uniprot_id
-- ------------- ------------
0 DOMAIN0 P42336
1 DOMAIN1 P27986
======================================================
STEP 4 - Defining final selection for processing
======================================================
Attention - this step requires user input! (press enter to continue)
Your current inputs contain 2 domains with the following associated uniprots
domain_name uniprot_id accession_name
-- ------------- ------------ ------------------------------------------------------------------------------
0 DOMAIN0 P42336 Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
1 DOMAIN1 P27986 Phosphatidylinositol 3-kinase regulatory subunit alpha
Please choose the domain-accession pairs you want to consider for your main analysis
Input indices of domain-uniprot metadata separated by space (e.g., '0 3 5' )0 1
Selected:
domain_name uniprot_id accession_name
-- ------------- ------------ ------------------------------------------------------------------------------
0 DOMAIN0 P42336 Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform
1 DOMAIN1 P27986 Phosphatidylinositol 3-kinase regulatory subunit alpha
['DOMAIN0-P42336', 'DOMAIN1-P27986']
PDB codes satisfying the above selection (containing all domains with any of the related uniprot accessions)
['2rd0' '3hhm' '3hiz' '4jps' '5swg' '5swo' '5swp' '5swr' '5swt' '5sx8'
'5sx9' '5sxa' '5sxb' '5sxc' '5sxd' '5sxe' '5sxf' '5sxi' '5sxj' '5sxk'
'5uk8' '5ukj' '5ul1' '5xgh' '5xgi' '5xgj' '6nct']
With following metadata:
-- ---- ------- ------
0 2rd0 DOMAIN0 P42336
1 2rd0 DOMAIN1 P27986
2 3hhm DOMAIN0 P42336
3 3hhm DOMAIN1 P27986
4 3hiz DOMAIN0 P42336
5 3hiz DOMAIN1 P27986
6 4jps DOMAIN0 P42336
7 4jps DOMAIN1 P27986
8 5swg DOMAIN0 P42336
9 5swg DOMAIN1 P27986
10 5swo DOMAIN0 P42336
11 5swo DOMAIN1 P27986
12 5swp DOMAIN0 P42336
13 5swp DOMAIN1 P27986
14 5swr DOMAIN0 P42336
15 5swr DOMAIN1 P27986
16 5swt DOMAIN0 P42336
17 5swt DOMAIN1 P27986
18 5sx8 DOMAIN0 P42336
19 5sx8 DOMAIN1 P27986
20 5sx9 DOMAIN0 P42336
21 5sx9 DOMAIN1 P27986
22 5sxa DOMAIN0 P42336
23 5sxa DOMAIN1 P27986
24 5sxb DOMAIN0 P42336
25 5sxb DOMAIN1 P27986
26 5sxc DOMAIN0 P42336
27 5sxc DOMAIN1 P27986
28 5sxd DOMAIN0 P42336
29 5sxd DOMAIN1 P27986
30 5sxe DOMAIN0 P42336
31 5sxe DOMAIN1 P27986
32 5sxf DOMAIN0 P42336
33 5sxf DOMAIN1 P27986
34 5sxi DOMAIN0 P42336
35 5sxi DOMAIN1 P27986
36 5sxj DOMAIN0 P42336
37 5sxj DOMAIN1 P27986
38 5sxk DOMAIN0 P42336
39 5sxk DOMAIN1 P27986
40 5uk8 DOMAIN0 P42336
41 5uk8 DOMAIN1 P27986
42 5ukj DOMAIN0 P42336
43 5ukj DOMAIN1 P27986
44 5ul1 DOMAIN0 P42336
45 5ul1 DOMAIN1 P27986
46 5xgh DOMAIN0 P42336
47 5xgh DOMAIN1 P27986
48 5xgi DOMAIN0 P42336
49 5xgi DOMAIN1 P27986
50 5xgj DOMAIN0 P42336
51 5xgj DOMAIN1 P27986
52 6nct DOMAIN0 P42336
53 6nct DOMAIN1 P27986
-- ---- ------- ------
Discarding the following entries (that do not contain selected domains:
{'5t8f', '4a55', '6g6w', '5dxu', '5m6u', '6pyu', '6pyr', '5ubt', '2y3a'}
======================================================
STEP 5- Extracting coordinates associated with given domains (per-accession)
======================================================
test_PI3K_new/structure_output/DOMAIN0/P42336/2rd0_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/2rd0_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/3hhm_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/3hhm_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/3hiz_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/3hiz_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/4jps_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/4jps_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swg_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swg_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swo_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swo_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swp_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swp_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swr_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swr_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swt_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swt_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sx8_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sx8_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sx9_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sx9_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxa_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxa_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxb_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxb_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxc_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxc_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxd_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxd_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxe_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxe_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxf_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxf_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxi_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxi_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxj_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxj_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxk_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxk_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5uk8_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5uk8_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5ukj_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5ukj_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5ul1_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5ul1_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5xgh_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5xgh_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5xgi_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5xgi_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5xgj_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5xgj_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/6nct_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/6nct_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/2rd0_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/2rd0_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/3hhm_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/3hhm_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/3hiz_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/3hiz_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/4jps_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/4jps_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swg_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swg_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swo_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swo_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swp_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swp_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swr_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swr_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5swt_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5swt_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sx8_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sx8_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sx9_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sx9_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxa_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxa_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxb_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxb_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxc_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxc_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxd_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxd_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxe_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxe_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxf_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxf_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxi_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxi_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxj_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxj_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5sxk_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5sxk_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5uk8_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5uk8_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5ukj_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5ukj_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5ul1_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5ul1_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5xgh_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5xgh_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5xgi_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5xgi_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/5xgj_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/5xgj_renum_fixed_B.pdb
test_PI3K_new/structure_output/DOMAIN0/P42336/6nct_renum_fixed_A.pdb
test_PI3K_new/structure_output/DOMAIN1/P27986/6nct_renum_fixed_B.pdb
Extracted per-accession domain coordinates
DOMAIN metadata:
domain_name uniprot_id full_name pdb_files_dir
-- ------------- ------------ ------------------------------------------------------------------------------ ---------------------------------------------
0 DOMAIN0 P42336 Phosphatidylinositol 4,5-bisphosphate 3-kinase catalytic subunit alpha isoform test_PI3K_new/structure_output/DOMAIN0/P42336
1 DOMAIN1 P27986 Phosphatidylinositol 3-kinase regulatory subunit alpha test_PI3K_new/structure_output/DOMAIN1/P27986
======================================================
STEP 6- Creating domain alignments (accoarding to uniprot accession numbering)
======================================================
Creating MSA for domain DOMAIN0 based on P42336 uniprot accession
Extracted MSA for domain DOMAIN0 based on P42336 uniprot accession to test_PI3K_new/sequence_output/DOMAIN0/P42336/msa.fasta
Residue range 1-5028
Check out the MSA at test_PI3K_new/sequence_output/DOMAIN0/P42336/msa.fasta
Check out the MSA visualized at test_PI3K_new/sequence_output/DOMAIN0/P42336/msa.html
Creating MSA for domain DOMAIN1 based on P27986 uniprot accession
Extracted MSA for domain DOMAIN1 based on P27986 uniprot accession to test_PI3K_new/sequence_output/DOMAIN1/P27986/msa.fasta
Residue range 322-600
Check out the MSA at test_PI3K_new/sequence_output/DOMAIN1/P27986/msa.fasta
Check out the MSA visualized at test_PI3K_new/sequence_output/DOMAIN1/P27986/msa.html
Extracted per-accession domain alignments
======================================================
STEP 7- Extracting per-accession domain MCS
======================================================
Successfully extracted per-accession domain MCS!
======================================================
STEP 8 - Processing multi-accession domains (requires mTM-align)
======================================================
Multi-accession domains (to be processed): []
Successfully processed multi-accession domain MCS! (if present)
======================================================
STEP 9 - Combining the final MCS data
======================================================
Not the case of multi-accession - MCS extracted based on accession residue numbers
Step 9.1 - copy all MCS to test_PI3K_new/structure_output/final_mcs for processing and extract backbones
Successfully copied all cleaned MCS domains to test_PI3K_new/structure_output/final_mcs!
Step 9.2 - combine multiple domains from same pdb id to single files
2rd0
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_2rd0_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_2rd0_renum_fixed_B_mcs_bb.pdb']
3hhm
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_3hhm_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_3hhm_renum_fixed_B_mcs_bb.pdb']
3hiz
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_3hiz_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_3hiz_renum_fixed_B_mcs_bb.pdb']
4jps
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_4jps_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_4jps_renum_fixed_B_mcs_bb.pdb']
5swg
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5swg_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5swg_renum_fixed_B_mcs_bb.pdb']
5swo
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5swo_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5swo_renum_fixed_B_mcs_bb.pdb']
5swp
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5swp_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5swp_renum_fixed_B_mcs_bb.pdb']
5swr
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5swr_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5swr_renum_fixed_B_mcs_bb.pdb']
5swt
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5swt_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5swt_renum_fixed_B_mcs_bb.pdb']
5sx8
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sx8_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sx8_renum_fixed_B_mcs_bb.pdb']
5sx9
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sx9_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sx9_renum_fixed_B_mcs_bb.pdb']
5sxa
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxa_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxa_renum_fixed_B_mcs_bb.pdb']
5sxb
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxb_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxb_renum_fixed_B_mcs_bb.pdb']
5sxc
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxc_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxc_renum_fixed_B_mcs_bb.pdb']
5sxd
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxd_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxd_renum_fixed_B_mcs_bb.pdb']
5sxe
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxe_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxe_renum_fixed_B_mcs_bb.pdb']
5sxf
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxf_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxf_renum_fixed_B_mcs_bb.pdb']
5sxi
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxi_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxi_renum_fixed_B_mcs_bb.pdb']
5sxj
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxj_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxj_renum_fixed_B_mcs_bb.pdb']
5sxk
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5sxk_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5sxk_renum_fixed_B_mcs_bb.pdb']
5uk8
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5uk8_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5uk8_renum_fixed_B_mcs_bb.pdb']
5ukj
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5ukj_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5ukj_renum_fixed_B_mcs_bb.pdb']
5ul1
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5ul1_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5ul1_renum_fixed_B_mcs_bb.pdb']
5xgh
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5xgh_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5xgh_renum_fixed_B_mcs_bb.pdb']
5xgi
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5xgi_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5xgi_renum_fixed_B_mcs_bb.pdb']
5xgj
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_5xgj_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_5xgj_renum_fixed_B_mcs_bb.pdb']
6nct
['DOMAIN0-P42336', 'DOMAIN1-P27986']
['test_PI3K_new/structure_output/final_mcs/DOMAIN0_P42336_6nct_renum_fixed_A_mcs_bb.pdb', 'test_PI3K_new/structure_output/final_mcs/DOMAIN1_P27986_6nct_renum_fixed_B_mcs_bb.pdb']
Successfully combined all cleaned MCS domains to complex PDBs in test_PI3K_new/structure_output/final_mcs!
Step 9.3 - combine multiple files to single EnGens input
Successfully combined all cleaned MCS domains to complex PDBs in test_PI3K_new/structure_output/final_mcs!
======================================================
Successfully completed PDB pre-processing
======================================================
[4]:
from engens.core.EnGens import *
[6]:
engen = EnGen(prep_class.final_traj_file, prep_class.final_pdb_files[0], align=True)
Aligning trajectory: 100%|██████████| 1/1 [00:00<00:00, 6.37it/s]
Cleaning files...: 100%|██████████| 1/1 [00:00<00:00, 3452.10it/s]
Extract features from the PDB files
Input: reference PDB and trajectory
Output: featurized trajectory
Steps:
Load reference PDB and trajectory in the EnGen object
Provide set of featurizations of interest (or use default)
Evaluate different featurization (optional)
Choose the best featurization
Extract those features
[7]:
# required imports
import engens.core.FeatureSelector as fs
import pickle
import mdshare
import mdtraj
import numpy as np
import nglview
from IPython.display import Javascript, display
import json
Step 1 - load the structure and trajectory
Provide the path to the files with the reference trajectory and topology. (You can use any format that mdtraj.load will take as input).
Optionally, provide a subset of the structure that you will use for featurization (e.g. binding site) as a atom selection string or a list of atom indices.
[7]:
#visualize the trajectory (optional - if trajectory too large, skip this step)
nglwidget = engen.show_animated_traj()
nglwidget.clear_representations()
nglwidget.add_ball_and_stick()
nglwidget.center()
nglwidget
Step 2 - select different featurizations
Here we select ways to featurize the trajectory. Any PyEmma trajectory featurization can be used in this step and any of the parameters of the respective featurizations can be provided. Users can also use the default initialization which includes three sets of features: (1) amino-acid pairwise distances; (2) torsion angles and (3) amino-acid pairwise distances with the torsion angels.
[ ]:
# remove any existing featurizers
engen.reset_featurizers()
# initialize default features
engen.init_featurizers_default()
description = engen.describe_featurizers()
print(description)
Step 3 - evaluate the featurizations
This step is optional - we recommend evaluating the featurizations and picking the best using PyEmma’s implementation of VAMP approach .
This helps you choose a set of features with which to proceed to the next Workflow.
Not an option for crystal structure input!!
Step 4 - pick the featurization
We suggest using the featurization which gives you the highest VAMP2 score from the analysis above. To do so, run the cell below.
[ ]:
#apply features
engen.apply_featurizations()
#print possible features
print(engen.describe_featurizers())
#select the number of the desired feature
feat_num = 0
# initialize selector
featsel = fs.UserFeatureSelection(feat_num, engen)
#select the feature
featsel.select_feature()
Step 5 - save the results as input for Workflow2 - dimensionality reduction
[ ]:
# save the results for next workflow
with open("wf1_resulting_EnGen.pickle", "wb") as file:
pickle.dump(engen, file, -1)