cgeb2001's DokuWiki!

BUSCO: Benchmarking Universal Single-Copy Orthologs

Documentation by Dayana Salas-Leiva (last update by Dandan Zhao 12-10-2024)

Web: http://busco.ezlab.org/ User Guide: https://busco.ezlab.org/busco_userguide.html

You can check the completeness of your genome by identifying single-copy orthologs from the OrthoDB database. Newer versions of BUSCO utilize Metaeuk as a default gene predictor, can also be run with other tools like tBLASTn, AUGUSTUS, Prodigal, and HMMER3.

tBLASTn for eukaryotic genome and prokaryotic transcriptome modes

Augustus for eukaryotic genome mode

Metaeuk for eukaryotic genome and eukaryotic transcriptome modes

Prodigal for prokaryotic genome mode

HMMER3 for all modes

There are two main dataset versions of busco: odb9 (contains 303 orthologs only compatible with Busco3) and odb10 (contains 255 orthologous only compatible with Busco4)

The following are examples of a shell script for genomic and a proteomic search:

* BUSCO 3 *

Genomic:

 #!/bin/bash
 #$ -S /bin/bash
 ./etc/profile
 #$ -pe threaded 1
 #$ -cwd
 cd $PWD
 #busco3.0.0
 source activate busco-3
 export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"
 run_BUSCO.py -i <fasta_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m geno --cpu 1
 conda deactivate

Proteomic:

 #!/bin/bash
 #$ -S /bin/bash
 ./etc/profile
 #$ -pe threaded 1
 #$ -cwd
 cd $PWD
 #busco3.0.0
 source activate busco-3
 export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"
 run_BUSCO.py -i <fasta_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m prot --cpu 1
 conda deactivate

* BUSCO 4.0.5 *

Genomic:

 #!/bin/bash
 #$ -S /bin/bash
 ./etc/profile
 #$ -pe threaded 1
 #$ -cwd
 cd $PWD
 source activate busco
 #BUSCO 4.0.5
 export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"
 busco -i <input_scaffolds_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m geno --cpu 1

Proteomic:

 #!/bin/bash
 #$ -S /bin/bash
 . /etc/profile
 #$ -pe threaded 1
 #$ -cwd
 cd $PWD
 source activate busco
 #BUSCO 4.0.5
 export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"
 busco -i <predicted protein fasta> -o <output_dir-name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m prot --cpu 1

* BUSCO 5.2.2 *

Genomic: default metaeuk

  source activate busco-5
  INPUT='contigs_clean.fasta'
  OUTDIR='busco5_out'
  MODE='genome'
  # setting the lineage db
  ## the latest busco db for eukaryota is odb10
  LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/busco_downloads/lineages/eukaryota_odb10/'
  ## busco v5 only works with odb10
  ## it will not work with odb9
  # run busco
  ## do not specify output dir with a trailing slash, it will lead to a fatal error
  ## modes are genome, proteins, transcriptome
  ## the below command will use Metaeuk as gene predictor
  busco \
      --in $INPUT \
      --out $OUTDIR \
      --mode $MODE \
      --lineage_dataset $LINEAGEDB \
      --cpu 8
  conda deactivate

Proteomic:

  source activate busco-5
  # in the busco-5 environment, AUGUSTUS_CONFIG_PATH is set to
  # /scratch2/software/anaconda/envs/busco-5/config/
  # but we don't have writing permissions there
  # not sure why we need writing permissions but it doesnt work anyway
  # but we copied that dir to a place where we do have writing permissions:
  # you may want to copy it to your own home
  export AUGUSTUS_CONFIG_PATH="$HOME/busco/config/"
  INPUT='contigs_clean.fasta'
  OUTDIR='busco5_contigs_clean_out'
  MODE='genome'
  AUGUSTUS_SPECIES='leishmania_tarentolae'
  # setting the lineage db
  ## the latest version (as of writing) is odb10
  LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/busco_downloads/lineages/eukaryota_odb10/'
  ## busco v5 only works with odb10
  ## it will not work with odb9
  busco \
      --in $INPUT \
      --out $OUTDIR \
      --mode $MODE \
      --lineage_dataset $LINEAGEDB \
      --cpu 8 \
      --augustus \
      --augustus_species $AUGUSTUS_SPECIES \
  conda deactivate

Note: Take out the mitochondrial genome before running this analysis.

In the output folder:

-3 folders for the outputs of tBLASTn, AUGUSTUS, and HMMER, and a folder containing the single-copy BUSCO sequences. If you want to look into the details of the amino acid or nucleotide sequences that were predicted by AUGUSTUS and align them manually (for example, against a transcript contig), pick out the .faa or .fna files from the directory augustus_output/extracted_proteins

-“short summary” file showing how many BUSCO genes were found, how many were complete, fragmented, etc.

-“full table.tsv” showing a list of your contigs matched to BUSCO IDs.

-“missing busco list” showing a list of BUSCO genes that were not found in your genome.