User Tools

Site Tools


running_alphafold_at_scale

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
running_alphafold_at_scale [2026/03/11 11:04] – [Searching the protein sequence databases] 172.20.54.200running_alphafold_at_scale [2026/03/11 11:51] (current) 172.20.80.5
Line 60: Line 60:
 === Preparing AlphaFold3 input query files === === Preparing AlphaFold3 input query files ===
  
-AlphaFold3 is a little awkward with how it wants it input protein sequences formatted. Instead of a simple FASTA file, it requires a JSON file. Thankfully I found a relatively simple way of converting the FASTA into a JSON file using and editing slightly a python script I found online. It is called ''fasta2alphafoldjson.py'' and it is available on the [[https://github.com/RogerLab/gospel_of_andrew|Gospel of Andrew]] GitHub page. Run it simply as follows:+AlphaFold3 is a little awkward with how it wants its input protein sequences formatted. Instead of a simple FASTA file, it requires a JSON file. Thankfully I found a relatively simple way of converting the FASTA into a JSON file using and editing slightly a python script I found online. It is called ''fasta2alphafoldjson.py'' and it is available on the [[https://github.com/RogerLab/gospel_of_andrew|Gospel of Andrew]] GitHub page. Run it simply as follows:
  
 <code> <code>
Line 79: Line 79:
 === Submitting AlphaFold as an array job === === Submitting AlphaFold as an array job ===
  
-If you want to run AlphaFold on many proteins, it may be pragmatic to submit these sequence databases searches as an **array job** on Perun. An array-job is a single job, that in turn schedules the submission of many other jobs. Here the idea is to submit a single general "AlphaFold sequence search job" which then schedules and submits all the individual search jobs for each query protein separately.+If you want to run AlphaFold on many proteins, it may be pragmatic to submit these sequence database searches as an **array job** on Perun. An array-job is a single job, that in turn schedules the submission of many other jobs. Here the idea is to submit a single general "AlphaFold sequence search job" which then schedules and submits all the individual search jobs for each query protein separately.
  
 <code> <code>
Line 260: Line 260:
 Simply submit the script with ''qsub run_alphafold_gpu.sh''. It will generate under ''$STRUC_OUTPUT_DIR'' an output directory for each query protein. That directory will contain many files and subdirectories, but the main file we're after is called ''pyv62_000500_model.cif''. This is your final predicted structure! Simply submit the script with ''qsub run_alphafold_gpu.sh''. It will generate under ''$STRUC_OUTPUT_DIR'' an output directory for each query protein. That directory will contain many files and subdirectories, but the main file we're after is called ''pyv62_000500_model.cif''. This is your final predicted structure!
  
 +=== Evaluating the final output ===
 +
 +== [ID]_model.cif ==
 +CIF stands for Crystallographic Information File
 +
 +You can view the structures using softwares like PyMOL and ChimeraX
 +
 +== [ID]_summary_confidences.json ==
 +
 +Contains information regarding the expected overall accuracy of the predicted structure:
 +
 +The **ptm** or predicted Template Modeling score
 +  * Between 0 and 1, with 1 being the perfect score.
 +  * This is a measure of accuracy of the entire structure
 +
 +The **iptm** or interface pTM score. 
 +  * Also between 0 and 1. Null, if a monomer. 
 +  * This is a measure of confidence in all predicted interfaces between subunits in the multimer, or measure of accuracy of relative positions of subunits to one another
 +  
 +**fraction disordered**
 +  * Also between 0 and 1. 
 +  * What fraction of the structure is disordered?
 + 
 +**has_clash**
 +  * True or False
 +  * True if >50% of atoms of a chain "clash"
 +
 +**ranking_score**
 +  * Ranges from -100 to 1.5 ? 
 +  * Calculated as follows: 0.8 * ipTM + 2 * pTM + 0.5 * disorder - 100 * has_clash
 +  * This calculation is then used to rank the multiple structure predictions
 +
 +There are more metrics to discuss, but I don't have the time right now to continue on them
running_alphafold_at_scale.1773237879.txt.gz · Last modified: by 172.20.54.200