User Tools

Site Tools


taxonomy_recovery

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
taxonomy_recovery [2021/09/20 18:28] 134.190.232.139taxonomy_recovery [2024/06/11 10:31] (current) 129.173.94.151
Line 10: Line 10:
 don't forget to make sure that your input file contains only the accession numbers without their version, see the example file given above.  don't forget to make sure that your input file contains only the accession numbers without their version, see the example file given above. 
  
--_- E.g., MBI4782295.1 shall be MBI4782295, otherwise the bugs will occur:+:-E.g., MBI4782295.1 shall be MBI4782295, otherwise the bugs will occur:
  
 <code> <code>
Line 17: Line 17:
 </code> </code>
  
-^_^ Trim the version ".1" behind the accession MBI4782295.1+:-D Trim the version ".1" behind the accession MBI4782295.1
  
 <code> <code>
-sed 's/\(.*\)\..*/\1/gfile >out_file+cat file | cut -d '.' -f1 > out_file
  
 </code> </code>
  
 +Note: 
 +1. You can get the accession list from Blast/Plast result (output.txt) directly using the command below:
 +
 +<code>
 +> cat output.txt | cut  -f2 | cut -d '.' -f1 > out_file
 +</code>
 +
 +2. If there are "|" in the accession numbers (i.e., gb|KAA8922376.1|)
 +
 +<code>
 +> cat output.txt | cut -d "|" -f2 | cut -d '.' -f1 > out_file
 +</code>
 +
 +3. It can still acquire a list of unknown like below even the NCBI taxonomy database is updated to the latest.
 +
 +<code>
 +Couldn't find: [MBR3349819]
 +Couldn't find: [HBS54143]
 +Couldn't find: [MYJ28876]
 +</code>
 +
 +This might due to these protein IDs(MBR3349819,HBS54143) from the species cannot put into the taxonomy like NP_051083. i.e., Lineage is not in (full) status.
 +
 +
 +<code>
 +NP_051083 cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliopsida,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Brassicales,Brassicaceae,Camelineae,Arabidopsis,Arabidopsis thaliana
 +</code>
  
 ========================== ==========================
-acc2tax database:\\+acc2tax database Location:\\ 
 + 
 +/db1/extra-data-sets/Acc2tax/ 
 + 
 +/db1/extra-data-sets/Acc2tax/Acc2Tax_04_01_2024 (Up to date Jan 04, 2024) 
 + 
 +\\
  
-/scratch3/rogerlab_databases/other_dbs/Acc2Tax_Feb122021 (Up to date Feb 232021)\\+<Last updated by Dandan Zhao on Jun 112024>
taxonomy_recovery.1632173304.txt.gz · Last modified: by 134.190.232.139