User Tools

Site Tools


taxonomy_recovery

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
taxonomy_recovery [2019/07/12 07:51] 24.138.71.142taxonomy_recovery [2024/06/11 10:31] (current) 129.173.94.151
Line 3: Line 3:
 use acc2tax program available in the environmental path.\\ use acc2tax program available in the environmental path.\\
  
-here is an example of how to run it:\\+here is an example of how to run it for protein IDs (-p):\\
  
  
    acc2tax -i /db1/extra-data-sets/Acc2tax/acc2taxIN_example -p -d /db1/extra-data-sets/Acc2tax/Acc2Tax_071119 -o taxonomy.out    acc2tax -i /db1/extra-data-sets/Acc2tax/acc2taxIN_example -p -d /db1/extra-data-sets/Acc2tax/Acc2Tax_071119 -o taxonomy.out
  
-don't forget to make sure that your input file contains only the accession numbers without their version, see the example file given above.+don't forget to make sure that your input file contains only the accession numbers without their version, see the example file given above. 
  
 +:-( E.g., MBI4782295.1 shall be MBI4782295, otherwise the bugs will occur:
  
 +<code>
 +Couldn't find: [MBI4782295.1]
 +
 +</code>
 +
 +:-D Trim the version ".1" behind the accession MBI4782295.1
 +
 +<code>
 +> cat file | cut -d '.' -f1 > out_file
 +
 +</code>
 +
 +Note: 
 +1. You can get the accession list from Blast/Plast result (output.txt) directly using the command below:
 +
 +<code>
 +> cat output.txt | cut  -f2 | cut -d '.' -f1 > out_file
 +</code>
 +
 +2. If there are "|" in the accession numbers (i.e., gb|KAA8922376.1|)
 +
 +<code>
 +> cat output.txt | cut -d "|" -f2 | cut -d '.' -f1 > out_file
 +</code>
 +
 +3. It can still acquire a list of unknown like below even the NCBI taxonomy database is updated to the latest.
 +
 +<code>
 +Couldn't find: [MBR3349819]
 +Couldn't find: [HBS54143]
 +Couldn't find: [MYJ28876]
 +</code>
 +
 +This might due to these protein IDs(MBR3349819,HBS54143) from the species cannot put into the taxonomy like NP_051083. i.e., Lineage is not in (full) status.
 +
 +
 +<code>
 +NP_051083 cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliopsida,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Brassicales,Brassicaceae,Camelineae,Arabidopsis,Arabidopsis thaliana
 +</code>
 +
 +==========================
 +acc2tax database Location:\\
 +
 +/db1/extra-data-sets/Acc2tax/
 +
 +/db1/extra-data-sets/Acc2tax/Acc2Tax_04_01_2024 (Up to date Jan 04, 2024)
 +
 +\\
 +
 +<Last updated by Dandan Zhao on Jun 11, 2024>
taxonomy_recovery.1562928662.txt.gz · Last modified: by 24.138.71.142