User Tools

Site Tools


taxonomy_recovery

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
taxonomy_recovery [2021/09/21 13:45] 134.190.232.139taxonomy_recovery [2024/06/11 10:31] (current) 129.173.94.151
Line 20: Line 20:
  
 <code> <code>
-sed 's/\(.*\)\..*/\1/gfile >out_file+cat file | cut -d '.' -f1 > out_file
  
 </code> </code>
  
-Note: It still could acquire a list of unknown like below even the NCBI taxonomy database is the latest, which might due to these protein IDs(MBR3349819,HBS54143) from the species cannot put into the taxonomy like NP_051083+Note:  
 +1. You can get the accession list from Blast/Plast result (output.txt) directly using the command below: 
 + 
 +<code> 
 +> cat output.txt | cut  -f2 | cut -d '.' -f1 > out_file 
 +</code> 
 + 
 +2. If there are "|" in the accession numbers (i.e., gb|KAA8922376.1|) 
 + 
 +<code> 
 +> cat output.txt | cut -d "|" -f2 | cut -d '.' -f1 > out_file 
 +</code> 
 + 
 +3. It can still acquire a list of unknown like below even the NCBI taxonomy database is updated to the latest.
  
 <code> <code>
Line 30: Line 43:
 Couldn't find: [HBS54143] Couldn't find: [HBS54143]
 Couldn't find: [MYJ28876] Couldn't find: [MYJ28876]
 +</code>
  
 +This might due to these protein IDs(MBR3349819,HBS54143) from the species cannot put into the taxonomy like NP_051083. i.e., Lineage is not in (full) status.
 +
 +
 +<code>
 NP_051083 cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliopsida,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Brassicales,Brassicaceae,Camelineae,Arabidopsis,Arabidopsis thaliana NP_051083 cellular organisms,Eukaryota,Viridiplantae,Streptophyta,Streptophytina,Embryophyta,Tracheophyta,Euphyllophyta,Spermatophyta,Magnoliopsida,Mesangiospermae,eudicotyledons,Gunneridae,Pentapetalae,rosids,malvids,Brassicales,Brassicaceae,Camelineae,Arabidopsis,Arabidopsis thaliana
 </code> </code>
  
 ========================== ==========================
-acc2tax database:\\+acc2tax database Location:\\
  
-/scratch3/rogerlab_databases/other_dbs/Acc2Tax_Feb122021 (Up to date Feb 232021)+/db1/extra-data-sets/Acc2tax/ 
 + 
 +/db1/extra-data-sets/Acc2tax/Acc2Tax_04_01_2024 (Up to date Jan 042024)
  
-/misc/db1/extra-data-sets/Acc2tax/Acc2tax_092021 (Up to date Sep 20, 2021) 
 \\ \\
 +
 +<Last updated by Dandan Zhao on Jun 11, 2024>
taxonomy_recovery.1632242733.txt.gz · Last modified: by 134.190.232.139