Differences

This shows you the differences between two versions of the page.

--- running_alphafold_at_scale [2026/03/11 11:04] – [Searching the protein sequence databases] 172.20.54.200
+++ running_alphafold_at_scale [2026/03/11 11:51] (current) – 172.20.80.5
@@ Line 60: / Line 60: @@
 === Preparing AlphaFold3 input query files ===
-AlphaFold3 is a little awkward with how it wants it input protein sequences formatted. Instead of a simple FASTA file, it requires a JSON file. Thankfully I found a relatively simple way of converting the FASTA into a JSON file using and editing slightly a python script I found online. It is called ''fasta2alphafoldjson.py'' and it is available on the [[https://github.com/RogerLab/gospel_of_andrew|Gospel of Andrew]] GitHub page. Run it simply as follows:
+AlphaFold3 is a little awkward with how it wants its input protein sequences formatted. Instead of a simple FASTA file, it requires a JSON file. Thankfully I found a relatively simple way of converting the FASTA into a JSON file using and editing slightly a python script I found online. It is called ''fasta2alphafoldjson.py'' and it is available on the [[https://github.com/RogerLab/gospel_of_andrew|Gospel of Andrew]] GitHub page. Run it simply as follows:
 <code>
@@ Line 79: / Line 79: @@
 === Submitting AlphaFold as an array job ===
-If you want to run AlphaFold on many proteins, it may be pragmatic to submit these sequence databases searches as an **array job** on Perun. An array-job is a single job, that in turn schedules the submission of many other jobs. Here the idea is to submit a single general "AlphaFold sequence search job" which then schedules and submits all the individual search jobs for each query protein separately.
+If you want to run AlphaFold on many proteins, it may be pragmatic to submit these sequence database searches as an **array job** on Perun. An array-job is a single job, that in turn schedules the submission of many other jobs. Here the idea is to submit a single general "AlphaFold sequence search job" which then schedules and submits all the individual search jobs for each query protein separately.
 <code>
@@ Line 260: / Line 260: @@
 Simply submit the script with ''qsub run_alphafold_gpu.sh''. It will generate under ''$STRUC_OUTPUT_DIR'' an output directory for each query protein. That directory will contain many files and subdirectories, but the main file we're after is called ''pyv62_000500_model.cif''. This is your final predicted structure!
+=== Evaluating the final output ===
+== [ID]_model.cif ==
+CIF stands for Crystallographic Information File
+You can view the structures using softwares like PyMOL and ChimeraX
+== [ID]_summary_confidences.json ==
+Contains information regarding the expected overall accuracy of the predicted structure:
+The **ptm** or predicted Template Modeling score
+  * Between 0 and 1, with 1 being the perfect score.
+  * This is a measure of accuracy of the entire structure
+The **iptm** or interface pTM score.
+  * Also between 0 and 1. Null, if a monomer.
+  * This is a measure of confidence in all predicted interfaces between subunits in the multimer, or measure of accuracy of relative positions of subunits to one another
+**fraction disordered**
+  * Also between 0 and 1.
+  * What fraction of the structure is disordered?
+**has_clash**
+  * True or False
+  * True if >50% of atoms of a chain "clash"
+**ranking_score**
+  * Ranges from -100 to 1.5 ?
+  * Calculated as follows: 0.8 * ipTM + 2 * pTM + 0.5 * disorder - 100 * has_clash
+  * This calculation is then used to rank the multiple structure predictions
+There are more metrics to discuss, but I don't have the time right now to continue on them