User Tools

Site Tools


benchmarking_universal_single-copy_orthologs_busco

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
benchmarking_universal_single-copy_orthologs_busco [2017/07/18 12:16] cgeb2001benchmarking_universal_single-copy_orthologs_busco [2024/12/10 14:49] (current) 129.173.94.151
Line 1: Line 1:
 **BUSCO: Benchmarking Universal Single-Copy Orthologs** **BUSCO: Benchmarking Universal Single-Copy Orthologs**
  
-Documentation by Sarah Shah+Documentation by Dayana Salas-Leiva (last update by Dandan Zhao 12-10-2024)
  
-Web:http://busco.ezlab.org/+Web: http://busco.ezlab.org/ User Guide: https://busco.ezlab.org/busco_userguide.html
  
-You can check the completeness of your genome by picking out single-copy orthologs. **BUSCO** runs **tBLASTn****AUGUSTUS**, and **HMMER 3** based on single-copy orthologs from the **OrthoDB** database+You can check the completeness of your genome by identifying single-copy orthologs from the OrthoDB databaseNewer versions of BUSCO utilize Metaeuk as a default gene predictor, can also be run with other tools like tBLASTn, AUGUSTUS, Prodigal, and HMMER3.
  
-The following is an example of a shell script:+tBLASTn for eukaryotic genome and prokaryotic transcriptome modes
  
-#!/bin/sh+Augustus for eukaryotic genome mode
  
-#$ -S /bin/sh+Metaeuk for eukaryotic genome and eukaryotic transcriptome modes
  
-#$ -pe threaded 10+Prodigal for prokaryotic genome mode
  
-#$ -cwd+HMMER3 for all modes
  
-export PATH="/scratch2/software/ncbi-blast-2.6.0+/bin:$PATH"+There are two main dataset versions of buscoodb9 (contains 303 orthologs only compatible with Busco3) and odb10 (contains 255 orthologous only compatible with Busco4)
  
-export PATH="/opt/perun/hmmer-3.1b2-threads/bin:$PATH"+The following are examples of a shell script for genomic and a proteomic search:
  
-export PATH="/opt/perun/augustus-3.2.3/bin:$PATH"+***  BUSCO ***
  
-export PATH="/home/dsalas/augustus-3.2.3/scripts:$PATH"+** Genomic**
  
-export AUGUSTUS_CONFIG_PATH="/home/dsalas/augustus-3.2.3/config/"+   #!/bin/bash 
 +   #$ -S /bin/bash 
 +   ./etc/profile 
 +   #$ -pe threaded 1 
 +   #$ -cwd 
 +   cd $PWD 
 +   #busco3.0.0 
 +   source activate busco-3 
 +   export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config" 
 +   run_BUSCO.py -i <fasta_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m geno --cpu 1 
 +   conda deactivate
  
-cd /pathtoyourcurrentdirectory+** Proteomic: **
  
-python /opt/perun/busco/BUSCO.py -i <yourgenomeinput> -o <outputfoldername> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m geno --cpu <number of cpus you'd like to assign> -sp <name of species of the genome model you'd like to use. If this flag is left out, it defaults to "fly", which has so far shown to be appropriate for protists>+   #!/bin/bash 
 +   #$ -S /bin/bash 
 +   ./etc/profile 
 +   #$ -pe threaded 1 
 +   #$ -cwd 
 +   cd $PWD 
 +   #busco3.0.0 
 +   source activate busco-3 
 +   export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config" 
 +   run_BUSCO.py -i <fasta_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m prot --cpu 1 
 +   conda deactivate 
 + 
 + 
 +** * BUSCO 4.0.5 * ** 
 + 
 +** Genomic: ** 
 +   #!/bin/bash 
 +   #$ -S /bin/bash 
 +   ./etc/profile 
 +   #$ -pe threaded 1 
 +   #$ -cwd 
 +   cd $PWD 
 +   source activate busco 
 +   #BUSCO 4.0.5 
 +   export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config" 
 +   busco -i <input_scaffolds_file> -o <output_dir_name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m geno --cpu 
 + 
 + 
 +**Proteomic:** 
 + 
 +   #!/bin/bash 
 +   #$ -S /bin/bash 
 +   . /etc/profile 
 +   #$ -pe threaded 1 
 +   #$ -cwd 
 +   cd $PWD 
 +   source activate busco 
 +   #BUSCO 4.0.5 
 +   export AUGUSTUS_CONFIG_PATH="/home/dsalas/Shared/BUSCO/config" 
 +   busco -i <predicted protein fasta> -<output_dir-name> -l /home/dsalas/Shared/BUSCO/eukaryota_odb9 -m prot --cpu 1 
 + 
 + 
 +** * BUSCO 5.2.2 * **  
 + 
 +** Genomic: default metaeuk**  
 + 
 +    source activate busco-5 
 +    INPUT='contigs_clean.fasta' 
 +    OUTDIR='busco5_out' 
 +    MODE='genome' 
 +    # setting the lineage db 
 +    ## the latest busco db for eukaryota is odb10 
 +    LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/busco_downloads/lineages/eukaryota_odb10/' 
 +    ## busco v5 only works with odb10 
 +    ## it will not work with odb9 
 +    # run busco 
 +    ## do not specify output dir with a trailing slash, it will lead to a fatal error 
 +    ## modes are genome, proteins, transcriptome 
 +    ## the below command will use Metaeuk as gene predictor 
 +    busco \ 
 +        --in $INPUT \ 
 +        --out $OUTDIR \ 
 +        --mode $MODE \ 
 +        --lineage_dataset $LINEAGEDB \ 
 +        --cpu 8 
 +    conda deactivate 
 + 
 +** Proteomic:**  
 + 
 +    source activate busco-5 
 +    # in the busco-5 environmentAUGUSTUS_CONFIG_PATH is set to 
 +    # /scratch2/software/anaconda/envs/busco-5/config/ 
 +    # but we don't have writing permissions there 
 +    # not sure why we need writing permissions but it doesnt work anyway 
 +    # but we copied that dir to a place where we do have writing permissions: 
 +    # you may want to copy it to your own home 
 +    export AUGUSTUS_CONFIG_PATH="$HOME/busco/config/" 
 +    INPUT='contigs_clean.fasta' 
 +    OUTDIR='busco5_contigs_clean_out' 
 +    MODE='genome' 
 +    AUGUSTUS_SPECIES='leishmania_tarentolae' 
 +    # setting the lineage db 
 +    ## the latest version (as of writing) is odb10 
 +    LINEAGEDB='/scratch5/db/Eukfinder/BUSCO/busco_downloads/lineages/eukaryota_odb10/' 
 +    ## busco v5 only works with odb10 
 +    ## it will not work with odb9 
 +    busco \ 
 +        --in $INPUT \ 
 +        --out $OUTDIR \ 
 +        --mode $MODE \ 
 +        --lineage_dataset $LINEAGEDB \ 
 +        --cpu 8 \ 
 +        --augustus \ 
 +        --augustus_species $AUGUSTUS_SPECIES \ 
 +    conda deactivate 
 +    
  
 **Note**: Take out the mitochondrial genome before running this analysis. **Note**: Take out the mitochondrial genome before running this analysis.
benchmarking_universal_single-copy_orthologs_busco.1500390980.txt.gz · Last modified: by cgeb2001