Differences

This shows you the differences between two versions of the page.

--- ortologs_searches_using_panther_hmmrs [2023/04/27 15:07] – 134.190.232.186
+++ ortologs_searches_using_panther_hmmrs [2023/05/04 10:50] (current) – [Step 6: Parsing HMM outpus and creating inputs for a second HMM search] 134.190.232.186
@@ Line 93: / Line 93: @@
 For more information on species contained in Panther see the file ''PTHR_103classification.tsv'' or visit http://www.pantherdb.org/.
-   source activate python36-generic
+<code>
-   python BlastParser.by.Pident.py RNA.reduced.blastout
+source activate python36-generic
-   source deactivate
+python BlastParser.by.Pident.py RNA.reduced.blastout
+source deactivate
+</code>
-The resulting file 'RNA.reduced.blastout_pparsed.tab' should contain one result by query. To check this, use the following commands:
+The resulting file ''RNA.reduced.blastout_pparsed.tab'' should contain one result by query. To check this, use the following commands:
 To figure out how many sequences you started with:
    grep ‘>’ RNAreduced.seqs | wc -l
-   outputs is -> 97 RNAreduced.seqs
+   # outputs is -> 97 RNAreduced.seqs
 To figure out how many blast hits output you have:
    wc -l RNA.reduced.blastout
-   outputs is -> 99 RNA.reduced.blastout
+   # outputs is -> 99 RNA.reduced.blastout
    wc -l RNA.reduced.blastout_pparsed.tab
-   outputs is -> 98 RNA.reduced.blastout_pparsed.tab
+   # outputs is -> 98 RNA.reduced.blastout_pparsed.tab
-Note: Remember that this file has a header, so the total of blast results is 97, so everything is fine so far.
+NOTE: Remember that this file has a header, so the total of blast results is 97, so everything is fine so far.
 If your number of queries and blast output results differ: 1) double check your input files for errors in format 2) go online to panther to check if the panther family for the query of your interest has been curated and/or exists.
-**Step 2:** Getting the panther codes
+==== Step 2: Getting the PANTHER codes ====
-Obtain the codes for each panther super-family and subfamily for each query by using the command commands below and the PTHR_103classification.tsv to create a customized file for your queries.
+Obtain the codes for each PANTHER superfamily and subfamily for each query by using the command commands below and the ''PTHR_103classification.tsv'' to create a customized file for your queries.
 Post-processing the blast output to get panther information:
 ) Separate the first column that correspond to the queries accession numbers:
    cut -d $'\t' RNA.reduced.blastout_pparsed.tab -f1 > queries_acc
 ) The uniprot accession numbers (these numbers will be used to grep PTHR_103classification.tsv):
    cut -d $'\t' RNA.reduced.blastout_pparsed.tab -f2|cut -d '|' -f2 > hits_acc
 ) Create a file containing both columns for future crosschecking:
    paste -d $'\t' queries_acc hits_acc > query_hits_columns
-)  remove the header of query_hits_columns:
+) Remove the header of query_hits_columns:
     sed -i '/query ID\tsubject ID/d' query_hits_columns
-Getting the information from panther classification:
-Now, create a file containing the panther information only for 97 queries, by grepping the hits_acc information from the PTHR_103classification.tsv:
+Getting the information from PANTHER classification:
+Now, create a file containing the panther information only for 97 queries, by grepping the hits_acc information from the ''PTHR_103classification.tsv'':
 ) create a file containing the panther information 97 queries:
    grep -w -F -f hits_acc /scratch3/rogerlab_databases/other_dbs/PTHR_103classification.tsv > Panther97queries_hit_info.tsv
 ) Create a tsv file containing identify the panther families by hits_acc:
    cut -d $'\t' -f1,2,3 Panther97queries_hit_info.tsv |cut -d '|' -f3,4|cut -d '=' -f2 > Pantherby97Uniprotaccession.tsv
 ) eliminate extra tabulations in the file:
    sed -i.bak 's/\t\t/\t/g' Pantherby97Uniprotaccession.tsv
    sed -i 's/:/_/g' Pantherby97Uniprotaccession.tsv
 ) Creating a cheat sheet for you. Sort and Merge the headerless file ‘query_hits_columns’ with ‘Pantherby97Uniprotaccession.tsv’:
@@ Line 153: / Line 168: @@
-**Step 3:** Creating shells and running a HMMR search by **PANTHER SUBFAMILY**\\
+==== Step 3: Creating shells and running a HMMR search by PANTHER SUBFAMILY ====
 ) You are ready to create the master shell for the hmmr search:
    source activate python36-generic
@@ Line 171: / Line 187: @@
-**Step 4:** Parsing HMM outputs and creating inputs for a second HMM search\\
+==== Step 4: Parsing HMM outputs and creating inputs for a second HMM search ====
 Parsing the HMM outputs
    source activate python36-generic
@@ Line 184: / Line 201: @@
-**Step 5:**  Creating shells and running a HMMR search
+==== Step 5:  Creating shells and running a HMMR search ====
 ) You are ready to create the master shell for the hmmr search by **PANTHER SUPERFAMILY**:
@@ Line 195: / Line 212: @@
    qsub Panther.HmmrSearch.sh2
-**Step 6:** Parsing HMM outpus and creating inputs for a second HMM search
+==== Step 6: Parsing HMM outputs and creating inputs for a second HMM search ====
 Parsing the HMM outputs
    source activate python36-generic
@@ Line 210: / Line 228: @@
 ) Create an input file (Input4ETE) to be later applied with the ETE_standAlone1.4.py script
-**Step 6:** Start the tree search by submitting your jobs:
+==== Step 7: Start the tree search by submitting your jobs ====
    ls -1 *Reconstruction.sh > list_of_shells
    for i in `cat list_of_shells`; do qsub $i; done
-**Step 8:** Map protein domain architecture to each tree and build a pdf file by panther super-family
+==== Step 8: Map protein domain architecture to each tree and build a pdf file by panther super-family ====
 ) Create an input file separated by tabs containing a list records by line following this format: fastafile treefile
     source activate python27-generic
     xvfb-run -a python ETE_standAlone1.4.py Input4ETE
     source deactivate
-**Step 9:** Creating a tabulated file to keep track of the findings.
+==== Step 9: Creating a tabulated file to keep track of the findings. ====
    NOTE: you will need a metadata file. it may consist of accession numbers and a fasta header. please see the format of the metadata provided for this example 'RNAreduced.METADATA'
    source activate python36-generic
@@ Line 230: / Line 252: @@
   Error: File existence/permissions problem in trying to open HMM file /db1/extra-data-sets/panther/PANTHER13.1/books/PTHR44316/hmmer.hmm.
-**Step 10:** Move the pdf files and 'MAIN_TABLE.txt' to your desktop for manual tree inspection and orthology assignment.
+==== Step 10: Move the pdf files and 'MAIN_TABLE.txt' to your desktop for manual tree inspection and orthology assignment. ====