User Tools

Site Tools


bioinformatics_tools3

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
bioinformatics_tools3 [2021/10/09 13:19] 134.190.232.9bioinformatics_tools3 [2022/03/15 10:28] (current) 134.190.232.106
Line 1: Line 1:
 **Parsing the InterProScan results** **Parsing the InterProScan results**
 +
 InterProScan is a powerful and useful protein identifier explorer.However, the result of it is not easy to handle. For example, the example of it have many columns and duplicated rows. Here, we are going to use script to merge the same gene ID to one lines. So the script is called interproscan_to_one_line.py. Please find the difference before and after using the script on the TAIR 10 data. The Homo_sapiens data is at the bottom of the page. InterProScan is a powerful and useful protein identifier explorer.However, the result of it is not easy to handle. For example, the example of it have many columns and duplicated rows. Here, we are going to use script to merge the same gene ID to one lines. So the script is called interproscan_to_one_line.py. Please find the difference before and after using the script on the TAIR 10 data. The Homo_sapiens data is at the bottom of the page.
  
Line 30: Line 31:
 If you are new to InterProScan, please find detailed usage at the Step #6 in (Zhang et.al 2021)(https://doi.org/10.1016/j.xpro.2021.100619) If you are new to InterProScan, please find detailed usage at the Step #6 in (Zhang et.al 2021)(https://doi.org/10.1016/j.xpro.2021.100619)
  
-   * Step Two: Run the interproscan_to_one_line.py script+   * Step Two: Run the interproscan_to_one_line.py script which can be found via (https://github.com/zx0223winner/InterProScan_Parser)
  
 <code> <code>
-python3 interproscan_to_one_line.py clps.tsv out.txt Pfam+python3 interproscan_to_one_line.py test_data.tsv out.txt Pfam
 </code> </code>
  
Line 42: Line 43:
 GCF_000001405.39_GRCh38.p13_protein.faa GCF_000001405.39_GRCh38.p13_protein.faa
  
-The directory is upcoming soon,+1. This protein includes alternative splicing transcripts translated proteins (e.g.,XX.t1, XX.t2, XX.t3) 
 +Use this script (isoform2one https://github.com/zx0223winner/isoform2one) to select the primary transcript protein sequence (longest transcript)  
 + 
 +2. Then run the Interproscan analysis. Results documented here: /misc/scratch2/xizhang/HSDatabase/Results/Interproscan_
  
 <Last updated by Xi Zhang on Oct 8th,2021> <Last updated by Xi Zhang on Oct 8th,2021>
bioinformatics_tools3.1633796342.txt.gz · Last modified: by 134.190.232.9