This is an old revision of the document!
Quick taxonomy recovery using the Accession numbers from either a Blast or Plast output:
use acc2tax program available in the environmental path.
here is an example of how to run it for protein IDs (-p):
acc2tax -i /db1/extra-data-sets/Acc2tax/acc2taxIN_example -p -d /db1/extra-data-sets/Acc2tax/Acc2Tax_071119 -o taxonomy.out
don't forget to make sure that your input file contains only the accession numbers without their version, see the example file given above.
-_- E.g., MBI4782295.1 shall be MBI4782295, otherwise the bugs will occur:
Couldn't find: [MBI4782295.1]
'' Trim the version “.1” behind the accession MBI4782295.1
> sed 's/\(.*\)\..*/\1/g' file >out_file
Note: It still could acquire a lot of unknown like below even the NCBI taxonomy database is the latest. Any ideas?
Couldn't find: [MBR3349819] Couldn't find: [HBS54143] Couldn't find: [MYJ28876]
acc2tax database:
/scratch3/rogerlab_databases/other_dbs/Acc2Tax_Feb122021 (Up to date Feb 23, 2021)
/misc/db1/extra-data-sets/Acc2tax/Acc2tax_092021 (Up to date Sep 20, 2021)
