cgeb2001's DokuWiki!

This is an old revision of the document!

Documentation by Sarah Shah

Web:http://busco.ezlab.org/

You can check the completeness of your genome by picking out single-copy orthologs. BUSCO runs tBLASTn, AUGUSTUS, and HMMER 3 based on single-copy orthologs from the OrthoDB database.

The following is an example of a shell script:

#!/bin/sh

#$ -S /bin/sh

#$ -pe threaded 10

#$ -cwd

export PATH=“/scratch2/software/ncbi-blast-2.6.0+/bin:$PATH”

export PATH=“/opt/perun/hmmer-3.1b2-threads/bin:$PATH”

export PATH=“/opt/perun/augustus-3.2.3/bin:$PATH”

export PATH=“/home/dsalas/augustus-3.2.3/scripts:$PATH”

export AUGUSTUS_CONFIG_PATH=“/home/dsalas/augustus-3.2.3/config/”

cd /pathtoyourcurrentdirectory

python /opt/perun/busco/BUSCO.py -i <yourgenomeinput> -o <outputfoldername> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m geno –cpu <number of cpus you'd like to assign> -sp <name of species of the genome model you'd like to use. If this flag is left out, it defaults to “fly”, which has so far shown to be appropriate for protists>

Note: Take out the mitochondrial genome before running this analysis.

In the output folder:

-3 folders for the outputs of tBLASTn, AUGUSTUS, and HMMER, and a folder containing the single-copy BUSCO sequences. If you want to look into the details of the amino acid or nucleotide sequences that were predicted by AUGUSTUS and align them manually (for example, against a transcript contig), pick out the .faa or .fna files from the directory augustus_output/extracted_proteins

-“short summary” file showing how many BUSCO genes were found, how many were complete, fragmented, etc.

-“full table.tsv” showing a list of your contigs matched to BUSCO IDs.

-“missing busco list” showing a list of BUSCO genes that were not found in your genome.