User Tools

Site Tools


benchmarking_universal_single-copy_orthologs_busco

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
benchmarking_universal_single-copy_orthologs_busco [2024/12/10 13:59] 129.173.94.151benchmarking_universal_single-copy_orthologs_busco [2024/12/10 14:49] (current) 129.173.94.151
Line 3: Line 3:
 Documentation by Dayana Salas-Leiva (last update by Dandan Zhao 12-10-2024) Documentation by Dayana Salas-Leiva (last update by Dandan Zhao 12-10-2024)
  
-Web: http://busco.ezlab.org/ +Web: http://busco.ezlab.org/ User Guide: https://busco.ezlab.org/busco_userguide.html
-User Guide: https://busco.ezlab.org/busco_userguide.html+
  
-You can check the completeness of your genome by identifying single-copy orthologs from the OrthoDB database. Newer versions of BUSCO utilize **Metaeuk** as a default gene predictor, can also be run with other tools like **tBLASTn****AUGUSTUS****Prodigal**, and **HMMER3**.+You can check the completeness of your genome by identifying single-copy orthologs from the OrthoDB database. Newer versions of BUSCO utilize Metaeuk as a default gene predictor, can also be run with other tools like tBLASTn, AUGUSTUS, Prodigal, and HMMER3.
  
-**tBLASTn** for eukaryotic genome and prokaryotic transcriptome modes +tBLASTn for eukaryotic genome and prokaryotic transcriptome modes
-**Augustus** for eukaryotic genome mode +
-**Metaeuk** for eukaryotic genome and eukaryotic transcriptome modes +
-**Prodigal** for prokaryotic genome mode +
-**HMMER3** for all modes+
  
-There are two main dataset versions of busco: **odb9** (contains 303 orthologs only compatible with Busco3) and **odb10** (contains 255 orthologous only compatible with Busco4)+Augustus for eukaryotic genome mode
  
-The following are examples of a shell script for genomic and proteomic search:+Metaeuk for eukaryotic genome and eukaryotic transcriptome modes 
 + 
 +Prodigal for prokaryotic genome mode 
 + 
 +HMMER3 for all modes 
 + 
 +There are two main dataset versions of busco: odb9 (contains 303 orthologs only compatible with Busco3) and odb10 (contains 255 orthologous only compatible with Busco4) 
 + 
 +The following are examples of a shell script for genomic and proteomic search:
  
 ***  BUSCO 3 *** ***  BUSCO 3 ***
Line 29: Line 32:
    cd $PWD    cd $PWD
    #busco3.0.0    #busco3.0.0
-   LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/odb9/lineages/eukaryota_odb9/' 
    source activate busco-3    source activate busco-3
-   export AUGUSTUS_CONFIG_PATH="$HOME/Shared/BUSCO/config" +   export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config" 
-   run_BUSCO.py -i <fasta_file> -o <output_dir_name> -l $LINEAGEDB -m geno --cpu 1+   run_BUSCO.py -i <fasta_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m geno --cpu 1
    conda deactivate    conda deactivate
  
Line 44: Line 46:
    cd $PWD    cd $PWD
    #busco3.0.0    #busco3.0.0
-   LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/odb9/lineages/eukaryota_odb9/' 
    source activate busco-3    source activate busco-3
    export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"    export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config"
-   run_BUSCO.py -i <fasta_file> -o <output_dir_name> -l $LINEAGEDB -m prot --cpu 1+   run_BUSCO.py -i <fasta_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m prot --cpu 1
    conda deactivate    conda deactivate
  
Line 60: Line 61:
    #$ -cwd    #$ -cwd
    cd $PWD    cd $PWD
-   LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/odb9/lineages/eukaryota_odb9/' +   source activate busco
-   source activate busco-4+
    #BUSCO 4.0.5    #BUSCO 4.0.5
-   export AUGUSTUS_CONFIG_PATH="$HOME/Shared/BUSCO/config" +   export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config" 
-   busco -i <input_scaffolds_file> -o <output_dir_name> -l $HOME/BUSCO/eukaryota_odb9 -m geno --cpu 1+   busco -i <input_scaffolds_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m geno --cpu 1
  
  
Line 75: Line 75:
    #$ -cwd    #$ -cwd
    cd $PWD    cd $PWD
-   LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/odb9/lineages/eukaryota_odb9/' +   source activate busco
-   source activate busco-4+
    #BUSCO 4.0.5    #BUSCO 4.0.5
-   export AUGUSTUS_CONFIG_PATH="$HOME/BUSCO/config" +   export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config" 
-   busco -i <predicted protein fasta> -o <output_dir-name> -l $HOME/BUSCO/eukaryota_odb9 -m prot --cpu 1+   busco -i <predicted protein fasta> -o <output_dir-name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m prot --cpu 1
  
  
-** * BUSCO 5.2.2 * **+** * BUSCO 5.2.2 * ** 
  
-** Genomic: default metaeuk**+** Genomic: default metaeuk** 
  
-   source activate busco-5+    source activate busco-5 
 +    INPUT='contigs_clean.fasta' 
 +    OUTDIR='busco5_out' 
 +    MODE='genome' 
 +    # setting the lineage db 
 +    ## the latest busco db for eukaryota is odb10 
 +    LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/busco_downloads/lineages/eukaryota_odb10/' 
 +    ## busco v5 only works with odb10 
 +    ## it will not work with odb9 
 +    # run busco 
 +    ## do not specify output dir with a trailing slash, it will lead to a fatal error 
 +    ## modes are genome, proteins, transcriptome 
 +    ## the below command will use Metaeuk as gene predictor 
 +    busco \ 
 +        --in $INPUT \ 
 +        --out $OUTDIR \ 
 +        --mode $MODE \ 
 +        --lineage_dataset $LINEAGEDB \ 
 +        --cpu 8 
 +    conda deactivate
  
-   INPUT='contigs_clean.fasta' +** Proteomic:** 
-   OUTDIR='busco5_out' +
-   MODE='genome' +
- +
-   # setting the lineage db +
-   ## the latest busco db for eukaryota is odb10 +
-   LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/busco_downloads/lineages/eukaryota_odb10/' +
-   ## busco v5 only works with odb10 +
-   ## it will not work with odb9 +
- +
- +
-   # run busco +
-   ## do not specify output dir with a trailing slash, it will lead to a fatal error +
-   ## modes are genome, proteins, transcriptome +
-   ## the below command will use Metaeuk as gene predictor +
-   busco \ +
-       --in $INPUT \ +
-       --out $OUTDIR \ +
-       --mode $MODE \ +
-       --lineage_dataset $LINEAGEDB \ +
-       --cpu 8 +
- +
-   conda deactivate +
- +
- +
-**Proteomic:** +
- +
-   source activate busco-5 +
- +
-   # in the busco-5 environment, AUGUSTUS_CONFIG_PATH is set to +
-   # /scratch2/software/anaconda/envs/busco-5/config/ +
-   # but we don't have writing permissions there +
-   # not sure why we need writing permissions but it doesnt work anyway +
-   # but we copied that dir to a place where we do have writing permissions: +
-   # you may want to copy it to your own home +
-   export AUGUSTUS_CONFIG_PATH="$HOME/busco/config/" +
- +
-   INPUT='contigs_clean.fasta' +
-   OUTDIR='busco5_contigs_clean_out' +
-   MODE='genome' +
-   AUGUSTUS_SPECIES='leishmania_tarentolae' +
- +
-   # setting the lineage db +
-   ## the latest version (as of writing) is odb10 +
-   LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/busco_downloads/lineages/eukaryota_odb10/' +
-   ## busco v5 only works with odb10 +
-   ## it will not work with odb9 +
- +
-   busco \ +
-       --in $INPUT \ +
-       --out $OUTDIR \ +
-       --mode $MODE \ +
-       --lineage_dataset $LINEAGEDB \ +
-       --cpu 8 \ +
-       --augustus \ +
-       --augustus_species $AUGUSTUS_SPECIES \ +
-   #    --long +
- +
-   conda deactivate+
  
 +    source activate busco-5
 +    # in the busco-5 environment, AUGUSTUS_CONFIG_PATH is set to
 +    # /scratch2/software/anaconda/envs/busco-5/config/
 +    # but we don't have writing permissions there
 +    # not sure why we need writing permissions but it doesnt work anyway
 +    # but we copied that dir to a place where we do have writing permissions:
 +    # you may want to copy it to your own home
 +    export AUGUSTUS_CONFIG_PATH="$HOME/busco/config/"
 +    INPUT='contigs_clean.fasta'
 +    OUTDIR='busco5_contigs_clean_out'
 +    MODE='genome'
 +    AUGUSTUS_SPECIES='leishmania_tarentolae'
 +    # setting the lineage db
 +    ## the latest version (as of writing) is odb10
 +    LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/busco_downloads/lineages/eukaryota_odb10/'
 +    ## busco v5 only works with odb10
 +    ## it will not work with odb9
 +    busco \
 +        --in $INPUT \
 +        --out $OUTDIR \
 +        --mode $MODE \
 +        --lineage_dataset $LINEAGEDB \
 +        --cpu 8 \
 +        --augustus \
 +        --augustus_species $AUGUSTUS_SPECIES \
 +    conda deactivate
 +    
  
 **Note**: Take out the mitochondrial genome before running this analysis. **Note**: Take out the mitochondrial genome before running this analysis.
benchmarking_universal_single-copy_orthologs_busco.1733853578.txt.gz · Last modified: by 129.173.94.151