This is an old revision of the document!
Quick taxonomy recovery using the Accession numbers from either a Blast or Plast output:
use acc2tax program available in the environmental path.
here is an example of how to run it for protein IDs (-p):
acc2tax -i /db1/extra-data-sets/Acc2tax/acc2taxIN_example -p -d /db1/extra-data-sets/Acc2tax/Acc2Tax_071119 -o taxonomy.out
don't forget to make sure that your input file contains only the accession numbers without their version, see the example file given above.
E.g., MBI4782295.1 shall be MBI4782295, otherwise the bugs will occur:
Couldn't find: [MBI4782295.1]
Trim the version “.1” behind the accession MBI4782295.1
> sed 's/\(.*\)\..*/\1/g' file >out_file
Note: It still could acquire a list of unknown like below even the NCBI taxonomy database is the latest, which might due to these protein IDs(MBR3349819,HBS54143) from the species cannot put into the taxonomy like NP_051083
Couldn't find: [MBR3349819] Couldn't find: [HBS54143] Couldn't find: [MYJ28876] NP_051083 cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliopsida,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Brassicales,Brassicaceae,Camelineae,Arabidopsis,Arabidopsis thaliana
acc2tax database:
/scratch3/rogerlab_databases/other_dbs/Acc2Tax_Feb122021 (Up to date Feb 23, 2021)
/misc/db1/extra-data-sets/Acc2tax/Acc2tax_092021 (Up to date Sep 20, 2021)
