This is an old revision of the document!
BUSCO: Benchmarking Universal Single-Copy Orthologs
Documentation by Sarah Shah and Dayana Salas-Leiva
You can check the completeness of your genome by picking out single-copy orthologs. BUSCO runs tBLASTn, AUGUSTUS, and HMMER 3 based on single-copy orthologs from the OrthoDB database.
The following are examples of a shell script for genomic and a proteomic search:
Genomic:
#!/bin/bash #$ -S /bin/bash ./etc/profile #$ -pe threaded 1 #$ -cwd cd $PWD source activate busco #BUSCO 4.0.5 export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config" busco -i <input_scaffolds_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/protists_ensembl -m geno --cpu 1
Proteomic:
#!/bin/bash #$ -S /bin/bash . /etc/profile #$ -pe threaded 1 #$ -cwd cd $PWD source activate busco #BUSCO 4.0.5 export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config" busco -i <predicted protein fasta> -o <output_dir-name> -l /home/dsalas/Shared/BUSCO/protists_ensembl -m prot --cpu 1
Note: Take out the mitochondrial genome before running this analysis.
In the output folder:
-3 folders for the outputs of tBLASTn, AUGUSTUS, and HMMER, and a folder containing the single-copy BUSCO sequences. If you want to look into the details of the amino acid or nucleotide sequences that were predicted by AUGUSTUS and align them manually (for example, against a transcript contig), pick out the .faa or .fna files from the directory augustus_output/extracted_proteins
-“short summary” file showing how many BUSCO genes were found, how many were complete, fragmented, etc.
-“full table.tsv” showing a list of your contigs matched to BUSCO IDs.
-“missing busco list” showing a list of BUSCO genes that were not found in your genome.
