Workflow 2 - reduce the dimensionality of the featurized trajectory

Input: EnGen object with the featurized trajectory (generated by Workflow1)

Output: Trajectory with rediced dimensionality


Steps:

  1. Import the featurized trajectory from Workflow1

  2. Choose dimensionality reduction technique: HDE, TICA (PCA)

  3. Choose appropriate lag time (when using a VAC based approach (HDA or TICA) it is neccessary to choose a lag time appropriately)

  4. Visualize the results

  5. Save results for Workflow3

[16]:
#required imports

from engens.core.EnGens import EnGen
from engens.core.DimReduction import *
import pickle as pk
import mdtraj
import numpy as np
import matplotlib.pyplot as plt

plt.ioff()

Step 0 - Load the results from Workflow1

[17]:
engen = None
with open("wf1_resulting_EnGen.pickle", "rb") as file:
    engen = pk.load(file)

Inspect the results from Workflow1 - if any value here is None you might have to run the Workflow1 again.

[ ]:
traj = engen.traj
ref = engen.ref
print("Using the trajectory {} and reference pdb file {}".format(traj, ref))

topology = engen.mdtrajref
print("The topology is:")
print(topology)

frame_num = engen.data[engen.chosen_feat_index][1].shape[0]
print("Number of frames is {}".format(frame_num))

feat_dims = engen.data[engen.chosen_feat_index][1].shape[1]
print("The dimensionality of your featurization is {}".format(feat_dims))

feat = engen.featurizers[engen.chosen_feat_index]
print("You chose to featurize with: "+engen.featurizer_names[engen.chosen_feat_index])
print(feat.describe())

Step 1 - Choose the method for dimensionality reduction and apply it

[19]:
# Here you will perform dimensionality reduction

# ------------------there are three options--------------------- #
# -------------------PCA, TICA or HDE----------------------------#
# --------- --uncomment the one you want to try------------------#

#Option 1 - PCA (not recommended)
#chose PCA
reduction_type = "PCA"
reducer = dimreds[reduction_type](engen)
#choose the variance theshold you want to explain and pick number of components
var_thr = 80
reducer.plot_variance(var_thr)
plt.show()
n_components = reducer.get_variance(var_thr)
#select the given number of components
reducer.choose_n(n_components)

#Option 2 - TICA
'''
#chose TICA
reduction_type = "TICA"
#to make an informed decision on lag time range see the timescale of your simulation
traj_times = mdtraj.load(engen.traj, top=engen.ref).time
traj_nframes = len(traj_times)
lags = [int(np.log(i)) for i in np.logspace(1,  int(traj_nframes/50), num=15, base=np.e)]
#initialize TICA reducer
reducer = dimreds[reduction_type](engen, lags)
#estimate number of resolved processes and choose the lag time automatically
lag = reducer.choose_lag_auto()
if lag is None:
    reducer.plot_lag_analysis(chosen_lag=0)
    plt.show()
    lag = int(input("Please enter custom lag or run the cell again with different lags"))
print(lag)
reducer.choose_lag(lag)
#visualize TICA resolved process scales with different lags
reducer.plot_lag_analysis(chosen_lag=lag)
plt.show()
#choose variance threshold and number of components
var_thr = 96
n_components = reducer.get_variance(var_thr=var_thr)
reducer.plot_variance(var_thr=var_thr)
plt.show()
#select the corresponding number of components
reducer.choose_n(n_components)
'''

#Option 3 - HDE

'''
#choose HDE
reduction_type = "HDE"
#find best lag range for your data
traj_times = mdtraj.load(engen.traj, top=engen.ref).time
traj_nframes = len(traj_times)
lags = [int(np.log(i)) for i in np.logspace(1, int(traj_nframes/50), num=15, base=np.e)]
#find best output dimensionality for your data
feat_dims = engen.featurizers[engen.chosen_feat_index].dimension()
# min(50, 10% of the feature dimensions)
output_dims = min(50, int(feat_dims*0.5))
#initialize HDE reducer
reducer = dimreds[reduction_type](engen, lags, output_dims)
#estimate number of resolved processes and choose the lag time automatically
lag = reducer.choose_lag_auto()
#visualize HDE resolved process scales with different lags
reducer.plot_lag_analysis(chosen_lag=lag)
plt.show()
'''

'''
# Option 4 - UMAP

reduction_type = "UMAP"

reducer = dimreds[reduction_type](engen)

#select the given number of components - best 3
n_components = 3
reducer.update_component_number(n_components)'''

Step 2 - inspect the results visually

[20]:
reducer.plot_2d()
plt.show()
../_images/_notebooks_Workflow2-DimensionalityReduction_9_0.png
[21]:
reducer.plot_3d()

Step 3 - save results for Workflow3 and clustering

[22]:
# to apply the dimensionality reduction to the engen structure run:
reducer.apply()

# to save the results from this analysis for workflow3 run:
with open("wf2_resulting_EnGen.pickle", "wb") as file:
    pk.dump(engen, file, -1)