benchmarking_universal_single-copy_orthologs_busco
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| benchmarking_universal_single-copy_orthologs_busco [2017/07/18 12:16] – cgeb2001 | benchmarking_universal_single-copy_orthologs_busco [2024/12/10 14:49] (current) – 129.173.94.151 | ||
|---|---|---|---|
| Line 1: | Line 1: | ||
| **BUSCO: Benchmarking Universal Single-Copy Orthologs** | **BUSCO: Benchmarking Universal Single-Copy Orthologs** | ||
| - | Documentation by Sarah Shah | + | Documentation by Dayana Salas-Leiva (last update by Dandan Zhao 12-10-2024) |
| - | Web: | + | Web: http:// |
| - | You can check the completeness of your genome by picking out single-copy orthologs. | + | You can check the completeness of your genome by identifying |
| - | The following is an example of a shell script: | + | tBLASTn for eukaryotic genome and prokaryotic transcriptome modes |
| - | #!/bin/sh | + | Augustus for eukaryotic genome mode |
| - | #$ -S /bin/sh | + | Metaeuk for eukaryotic genome and eukaryotic transcriptome modes |
| - | #$ -pe threaded 10 | + | Prodigal for prokaryotic genome mode |
| - | #$ -cwd | + | HMMER3 for all modes |
| - | export PATH="/ | + | There are two main dataset versions of busco: odb9 (contains 303 orthologs only compatible with Busco3) and odb10 (contains 255 orthologous only compatible with Busco4) |
| - | export PATH="/ | + | The following are examples of a shell script for genomic and a proteomic search: |
| - | export PATH="/ | + | *** BUSCO 3 *** |
| - | export PATH="/ | + | ** Genomic: ** |
| - | export AUGUSTUS_CONFIG_PATH="/ | + | # |
| + | #$ -S /bin/bash | ||
| + | | ||
| + | #$ -pe threaded 1 | ||
| + | #$ -cwd | ||
| + | cd $PWD | ||
| + | # | ||
| + | | ||
| + | export AUGUSTUS_CONFIG_PATH="/ | ||
| + | | ||
| + | conda deactivate | ||
| - | cd / | + | ** Proteomic: ** |
| - | python | + | #!/bin/bash |
| + | #$ -S /bin/bash | ||
| + | | ||
| + | #$ -pe threaded 1 | ||
| + | #$ -cwd | ||
| + | cd $PWD | ||
| + | # | ||
| + | | ||
| + | | ||
| + | | ||
| + | conda deactivate | ||
| + | |||
| + | |||
| + | ** * BUSCO 4.0.5 * ** | ||
| + | |||
| + | ** Genomic: ** | ||
| + | # | ||
| + | #$ -S /bin/bash | ||
| + | | ||
| + | #$ -pe threaded 1 | ||
| + | #$ -cwd | ||
| + | cd $PWD | ||
| + | | ||
| + | # | ||
| + | | ||
| + | busco -i < | ||
| + | |||
| + | |||
| + | **Proteomic: | ||
| + | |||
| + | # | ||
| + | #$ -S /bin/bash | ||
| + | . / | ||
| + | #$ -pe threaded 1 | ||
| + | #$ -cwd | ||
| + | cd $PWD | ||
| + | | ||
| + | # | ||
| + | | ||
| + | busco -i <predicted protein fasta> -o <output_dir-name> -l / | ||
| + | |||
| + | |||
| + | ** * BUSCO 5.2.2 * ** | ||
| + | |||
| + | ** Genomic: default metaeuk** | ||
| + | |||
| + | source activate busco-5 | ||
| + | INPUT=' | ||
| + | OUTDIR=' | ||
| + | MODE='genome' | ||
| + | # setting the lineage db | ||
| + | ## the latest busco db for eukaryota is odb10 | ||
| + | LINEAGEDB='/ | ||
| + | ## busco v5 only works with odb10 | ||
| + | ## it will not work with odb9 | ||
| + | # run busco | ||
| + | ## do not specify output dir with a trailing slash, it will lead to a fatal error | ||
| + | ## modes are genome, proteins, transcriptome | ||
| + | ## the below command will use Metaeuk as gene predictor | ||
| + | busco \ | ||
| + | --in $INPUT \ | ||
| + | --out $OUTDIR \ | ||
| + | --mode $MODE \ | ||
| + | --lineage_dataset $LINEAGEDB \ | ||
| + | --cpu 8 | ||
| + | conda deactivate | ||
| + | |||
| + | ** Proteomic: | ||
| + | |||
| + | source activate busco-5 | ||
| + | # in the busco-5 environment, AUGUSTUS_CONFIG_PATH is set to | ||
| + | # / | ||
| + | # but we don't have writing permissions there | ||
| + | # not sure why we need writing permissions but it doesnt work anyway | ||
| + | # but we copied that dir to a place where we do have writing permissions: | ||
| + | # you may want to copy it to your own home | ||
| + | export AUGUSTUS_CONFIG_PATH="$HOME/ | ||
| + | INPUT=' | ||
| + | OUTDIR=' | ||
| + | MODE=' | ||
| + | AUGUSTUS_SPECIES=' | ||
| + | # setting the lineage db | ||
| + | ## the latest version (as of writing) is odb10 | ||
| + | LINEAGEDB='/ | ||
| + | ## busco v5 only works with odb10 | ||
| + | ## it will not work with odb9 | ||
| + | busco \ | ||
| + | --in $INPUT \ | ||
| + | --out $OUTDIR \ | ||
| + | --mode $MODE \ | ||
| + | --lineage_dataset $LINEAGEDB \ | ||
| + | --cpu 8 \ | ||
| + | --augustus \ | ||
| + | --augustus_species $AUGUSTUS_SPECIES \ | ||
| + | conda deactivate | ||
| + | | ||
| **Note**: Take out the mitochondrial genome before running this analysis. | **Note**: Take out the mitochondrial genome before running this analysis. | ||
benchmarking_universal_single-copy_orthologs_busco.1500390980.txt.gz · Last modified: by cgeb2001
