cgeb2001's DokuWiki!

This is an old revision of the document!

BUSCO: Benchmarking Universal Single-Copy Orthologs

Documentation by Sarah Shah and Dayana Salas-Leiva

Web:http://busco.ezlab.org/

You can check the completeness of your genome by picking out single-copy orthologs. BUSCO runs tBLASTn, AUGUSTUS, and HMMER 3 based on single-copy orthologs from the OrthoDB database.

The following are examples of a shell script for genomic and a proteomic search:

Genomic:

 #!/bin/bash
 #$ -S /bin/bash
 ./etc/profile
 #$ -pe threaded 1
 #$ -cwd
 cd $PWD
 source activate busco
 #BUSCO 4.0.5
 export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"
 busco -i <input_scaffolds_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/protists_ensembl -m geno --cpu 1

Proteomic:

 #!/bin/bash
 #$ -S /bin/bash
 . /etc/profile
 #$ -pe threaded 1
 #$ -cwd
 cd $PWD
 source activate busco
 #BUSCO 4.0.5
 export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"
 busco -i <predicted protein fasta> -o <output_dir-name> -l /home/dsalas/Shared/BUSCO/protists_ensembl -m prot --cpu 1

Note: Take out the mitochondrial genome before running this analysis.

In the output folder:

-3 folders for the outputs of tBLASTn, AUGUSTUS, and HMMER, and a folder containing the single-copy BUSCO sequences. If you want to look into the details of the amino acid or nucleotide sequences that were predicted by AUGUSTUS and align them manually (for example, against a transcript contig), pick out the .faa or .fna files from the directory augustus_output/extracted_proteins

-“short summary” file showing how many BUSCO genes were found, how many were complete, fragmented, etc.

-“full table.tsv” showing a list of your contigs matched to BUSCO IDs.

-“missing busco list” showing a list of BUSCO genes that were not found in your genome.