User Tools

Site Tools


taxonomy_recovery

This is an old revision of the document!


Quick taxonomy recovery using the Accession numbers from either a Blast or Plast output:

use acc2tax program available in the environmental path.

here is an example of how to run it for protein IDs (-p):

 acc2tax -i /db1/extra-data-sets/Acc2tax/acc2taxIN_example -p -d /db1/extra-data-sets/Acc2tax/Acc2Tax_071119 -o taxonomy.out

don't forget to make sure that your input file contains only the accession numbers without their version, see the example file given above.

-_- E.g., MBI4782295.1 shall be MBI4782295, otherwise the bugs will occur:

Couldn't find: [MBI4782295.1]

'^_^' Trim the version “.1” behind the accession MBI4782295.1

> sed 's/\(.*\)\..*/\1/g' file >out_file

Note: It still could acquire a lot of unknown even the NCBI taxonomy database is the latest. Any ideas?

Couldn't find: [MBR3349819] Couldn't find: [HBS54143] Couldn't find: [MYJ28876]

acc2tax database:

/scratch3/rogerlab_databases/other_dbs/Acc2Tax_Feb122021 (Up to date Feb 23, 2021)

/misc/db1/extra-data-sets/Acc2tax/Acc2tax_092021 (Up to date Sep 20, 2021)

taxonomy_recovery.1632242028.txt.gz · Last modified: by 134.190.232.139