User Tools

Site Tools


benchmarking_universal_single-copy_orthologs_busco

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
benchmarking_universal_single-copy_orthologs_busco [2020/04/06 11:52] 24.138.68.92benchmarking_universal_single-copy_orthologs_busco [2024/12/10 14:49] (current) 129.173.94.151
Line 1: Line 1:
 **BUSCO: Benchmarking Universal Single-Copy Orthologs** **BUSCO: Benchmarking Universal Single-Copy Orthologs**
  
-Documentation by Sarah Shah and Dayana Salas-Leiva+Documentation by Dayana Salas-Leiva (last update by Dandan Zhao 12-10-2024)
  
-Web:http://busco.ezlab.org/+Web: http://busco.ezlab.org/ User Guide: https://busco.ezlab.org/busco_userguide.html
  
-You can check the completeness of your genome by picking out single-copy orthologs. **BUSCO** runs **tBLASTn****AUGUSTUS**, and **HMMER 3** based on single-copy orthologs from the **OrthoDB** database+You can check the completeness of your genome by identifying single-copy orthologs from the OrthoDB databaseNewer versions of BUSCO utilize Metaeuk as a default gene predictor, can also be run with other tools like tBLASTn, AUGUSTUS, Prodigal, and HMMER3. 
 + 
 +tBLASTn for eukaryotic genome and prokaryotic transcriptome modes 
 + 
 +Augustus for eukaryotic genome mode 
 + 
 +Metaeuk for eukaryotic genome and eukaryotic transcriptome modes 
 + 
 +Prodigal for prokaryotic genome mode 
 + 
 +HMMER3 for all modes 
 + 
 +There are two main dataset versions of busco: odb9 (contains 303 orthologs only compatible with Busco3) and odb10 (contains 255 orthologous only compatible with Busco4)
  
 The following are examples of a shell script for genomic and a proteomic search: The following are examples of a shell script for genomic and a proteomic search:
 +
 +***  BUSCO 3 ***
 +
 +** Genomic: **
 +
 +   #!/bin/bash
 +   #$ -S /bin/bash
 +   ./etc/profile
 +   #$ -pe threaded 1
 +   #$ -cwd
 +   cd $PWD
 +   #busco3.0.0
 +   source activate busco-3
 +   export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"
 +   run_BUSCO.py -i <fasta_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m geno --cpu 1
 +   conda deactivate
 +
 +** Proteomic: **
 +
 +   #!/bin/bash
 +   #$ -S /bin/bash
 +   ./etc/profile
 +   #$ -pe threaded 1
 +   #$ -cwd
 +   cd $PWD
 +   #busco3.0.0
 +   source activate busco-3
 +   export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"
 +   run_BUSCO.py -i <fasta_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m prot --cpu 1
 +   conda deactivate
 +
 +
 +** * BUSCO 4.0.5 * **
  
 ** Genomic: ** ** Genomic: **
Line 19: Line 64:
    #BUSCO 4.0.5    #BUSCO 4.0.5
    export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"    export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"
-   busco -i <input_scaffolds_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/protists_ensembl -m geno --cpu 1+   busco -i <input_scaffolds_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m geno --cpu 1
  
  
Line 33: Line 78:
    #BUSCO 4.0.5    #BUSCO 4.0.5
    export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"    export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"
-   busco -i <predicted protein fasta> -o <output_dir-name> -l /home/dsalas/Shared/BUSCO/protists_ensembl -m prot --cpu 1+   busco -i <predicted protein fasta> -o <output_dir-name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m prot --cpu 1 
 + 
 + 
 +** * BUSCO 5.2.2 * **  
 + 
 +** Genomic: default metaeuk**  
 + 
 +    source activate busco-5 
 +    INPUT='contigs_clean.fasta' 
 +    OUTDIR='busco5_out' 
 +    MODE='genome' 
 +    # setting the lineage db 
 +    ## the latest busco db for eukaryota is odb10 
 +    LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/busco_downloads/lineages/eukaryota_odb10/' 
 +    ## busco v5 only works with odb10 
 +    ## it will not work with odb9 
 +    # run busco 
 +    ## do not specify output dir with a trailing slash, it will lead to a fatal error 
 +    ## modes are genome, proteins, transcriptome 
 +    ## the below command will use Metaeuk as gene predictor 
 +    busco \ 
 +        --in $INPUT \ 
 +        --out $OUTDIR \ 
 +        --mode $MODE \ 
 +        --lineage_dataset $LINEAGEDB \ 
 +        --cpu 8 
 +    conda deactivate
  
 +** Proteomic:** 
  
 +    source activate busco-5
 +    # in the busco-5 environment, AUGUSTUS_CONFIG_PATH is set to
 +    # /scratch2/software/anaconda/envs/busco-5/config/
 +    # but we don't have writing permissions there
 +    # not sure why we need writing permissions but it doesnt work anyway
 +    # but we copied that dir to a place where we do have writing permissions:
 +    # you may want to copy it to your own home
 +    export AUGUSTUS_CONFIG_PATH="$HOME/busco/config/"
 +    INPUT='contigs_clean.fasta'
 +    OUTDIR='busco5_contigs_clean_out'
 +    MODE='genome'
 +    AUGUSTUS_SPECIES='leishmania_tarentolae'
 +    # setting the lineage db
 +    ## the latest version (as of writing) is odb10
 +    LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/busco_downloads/lineages/eukaryota_odb10/'
 +    ## busco v5 only works with odb10
 +    ## it will not work with odb9
 +    busco \
 +        --in $INPUT \
 +        --out $OUTDIR \
 +        --mode $MODE \
 +        --lineage_dataset $LINEAGEDB \
 +        --cpu 8 \
 +        --augustus \
 +        --augustus_species $AUGUSTUS_SPECIES \
 +    conda deactivate
 +    
  
 **Note**: Take out the mitochondrial genome before running this analysis. **Note**: Take out the mitochondrial genome before running this analysis.
benchmarking_universal_single-copy_orthologs_busco.1586184768.txt.gz · Last modified: by 24.138.68.92