Welcome to EnGens documentation!
EnGens is a Python library for anyone interested in studying protein structures and their flexibility. EnGens provides a computational framework for generation and analysis of representative protein conformational ensembles.
With EnGens you will be able to process your datasets of protein structure both:
static (gathered from the PDB databank or modelled)
dynamic (generated by MD simulation)
EnGens will analyze your dataset and extract a representative ensemble with the following steps:
Data loading and preprocessing
Data featurization
Dimensionality reduction
Clustering and ensemble extraction
Visualization and further analysis
You can use the resulting ensemble for downstram tasks such as docking, structural drug design or other analysis!
Contents
Check out the Installation instructions section for further information regarding the Installation of the project. Check out the Workflow Description section for detailed description of the workflows. Finally, check out EnGens Usage for the instructions on how to use EnGens.
- Installation instructions
- Workflow Description
- Flow Chart
- Workflows detailed description
- Workflow 1S: Extracting featurized representations from raw data (static use-case)
- Workflow 1D: Extracting featurized representations from raw data (dynamic use-case)
- Workflow 2(S&D): Projecting the featurized representation into an embedding in low-dimensional space (static and dynamic use-case)
- Workflow 3(S&D): Clustering embeddings and extracting the ensemble (static and dynamic use-case)
- Workflow 4(S&D): Visualizing the data and analyzing the ensemble
- EnGens Usage
- Usage
- Static use-case
- Dynamic use-case
- Workflow 1 - extract features from the trajectory
- Workflow 2 - reduce the dimensionality of the featurized trajectory
- Workflow 3 - cluster the trajectory
- Workflow 4 - analyze the results
- Cluster plots
- View multiple structures per cluster
- Trajectory movie
- Additional info per cluster - distance between two residues
- Additional info per cluster - RMSD to a frame
- Additional info per cluster - RMSD to another structure (given a PDB file)
- Additional info per cluster - component from dimensionality reduction
- Usage
Acknowledgments
This work was built up on a large amount of community effort including the following tools:
Structural bioinformatics (and MD) software
Visualization
General ML tools
Others
We would like to thank the authors and the maintainers of the tools!