Workflow 2 - reduce the dimensionality of the featurized trajectory
Input: EnGen object with the featurized trajectory (generated by Workflow1)
Output: Trajectory with rediced dimensionality
Steps:
Import the featurized trajectory from Workflow1
Choose dimensionality reduction technique: HDE, TICA (PCA)
Choose appropriate lag time (when using a VAC based approach (HDA or TICA) it is neccessary to choose a lag time appropriately)
Visualize the results
Save results for Workflow3
[16]:
#required imports
from engens.core.EnGens import EnGen
from engens.core.DimReduction import *
import pickle as pk
import mdtraj
import numpy as np
import matplotlib.pyplot as plt
plt.ioff()
Step 0 - Load the results from Workflow1
[17]:
engen = None
with open("wf1_resulting_EnGen.pickle", "rb") as file:
engen = pk.load(file)
Inspect the results from Workflow1 - if any value here is None you might have to run the Workflow1 again.
[ ]:
traj = engen.traj
ref = engen.ref
print("Using the trajectory {} and reference pdb file {}".format(traj, ref))
topology = engen.mdtrajref
print("The topology is:")
print(topology)
frame_num = engen.data[engen.chosen_feat_index][1].shape[0]
print("Number of frames is {}".format(frame_num))
feat_dims = engen.data[engen.chosen_feat_index][1].shape[1]
print("The dimensionality of your featurization is {}".format(feat_dims))
feat = engen.featurizers[engen.chosen_feat_index]
print("You chose to featurize with: "+engen.featurizer_names[engen.chosen_feat_index])
print(feat.describe())
Step 1 - Choose the method for dimensionality reduction and apply it
[19]:
# Here you will perform dimensionality reduction
# ------------------there are three options--------------------- #
# -------------------PCA, TICA or HDE----------------------------#
# --------- --uncomment the one you want to try------------------#
#Option 1 - PCA (not recommended)
#chose PCA
reduction_type = "PCA"
reducer = dimreds[reduction_type](engen)
#choose the variance theshold you want to explain and pick number of components
var_thr = 80
reducer.plot_variance(var_thr)
plt.show()
n_components = reducer.get_variance(var_thr)
#select the given number of components
reducer.choose_n(n_components)
#Option 2 - TICA
'''
#chose TICA
reduction_type = "TICA"
#to make an informed decision on lag time range see the timescale of your simulation
traj_times = mdtraj.load(engen.traj, top=engen.ref).time
traj_nframes = len(traj_times)
lags = [int(np.log(i)) for i in np.logspace(1, int(traj_nframes/50), num=15, base=np.e)]
#initialize TICA reducer
reducer = dimreds[reduction_type](engen, lags)
#estimate number of resolved processes and choose the lag time automatically
lag = reducer.choose_lag_auto()
if lag is None:
reducer.plot_lag_analysis(chosen_lag=0)
plt.show()
lag = int(input("Please enter custom lag or run the cell again with different lags"))
print(lag)
reducer.choose_lag(lag)
#visualize TICA resolved process scales with different lags
reducer.plot_lag_analysis(chosen_lag=lag)
plt.show()
#choose variance threshold and number of components
var_thr = 96
n_components = reducer.get_variance(var_thr=var_thr)
reducer.plot_variance(var_thr=var_thr)
plt.show()
#select the corresponding number of components
reducer.choose_n(n_components)
'''
#Option 3 - HDE
'''
#choose HDE
reduction_type = "HDE"
#find best lag range for your data
traj_times = mdtraj.load(engen.traj, top=engen.ref).time
traj_nframes = len(traj_times)
lags = [int(np.log(i)) for i in np.logspace(1, int(traj_nframes/50), num=15, base=np.e)]
#find best output dimensionality for your data
feat_dims = engen.featurizers[engen.chosen_feat_index].dimension()
# min(50, 10% of the feature dimensions)
output_dims = min(50, int(feat_dims*0.5))
#initialize HDE reducer
reducer = dimreds[reduction_type](engen, lags, output_dims)
#estimate number of resolved processes and choose the lag time automatically
lag = reducer.choose_lag_auto()
#visualize HDE resolved process scales with different lags
reducer.plot_lag_analysis(chosen_lag=lag)
plt.show()
'''
'''
# Option 4 - UMAP
reduction_type = "UMAP"
reducer = dimreds[reduction_type](engen)
#select the given number of components - best 3
n_components = 3
reducer.update_component_number(n_components)'''
Step 2 - inspect the results visually
[20]:
reducer.plot_2d()
plt.show()
[21]:
reducer.plot_3d()
Step 3 - save results for Workflow3 and clustering
[22]:
# to apply the dimensionality reduction to the engen structure run:
reducer.apply()
# to save the results from this analysis for workflow3 run:
with open("wf2_resulting_EnGen.pickle", "wb") as file:
pk.dump(engen, file, -1)