Differences

This shows you the differences between two versions of the page.

--- ortologs_searches_using_panther_hmmrs [2023/04/27 15:10] – [Step 1: Blast vs Uniprot103] 134.190.232.186
+++ ortologs_searches_using_panther_hmmrs [2023/05/04 10:50] (current) – [Step 6: Parsing HMM outpus and creating inputs for a second HMM search] 134.190.232.186
@@ Line 93: / Line 93: @@
 For more information on species contained in Panther see the file ''PTHR_103classification.tsv'' or visit http://www.pantherdb.org/.
-   source activate python36-generic
+<code>
-   python BlastParser.by.Pident.py RNA.reduced.blastout
+source activate python36-generic
-   source deactivate
+python BlastParser.by.Pident.py RNA.reduced.blastout
+source deactivate
+</code>
-The resulting file 'RNA.reduced.blastout_pparsed.tab' should contain one result by query. To check this, use the following commands:
+The resulting file ''RNA.reduced.blastout_pparsed.tab'' should contain one result by query. To check this, use the following commands:
 To figure out how many sequences you started with:
    grep ‘>’ RNAreduced.seqs | wc -l
-   outputs is -> 97 RNAreduced.seqs
+   # outputs is -> 97 RNAreduced.seqs
 To figure out how many blast hits output you have:
    wc -l RNA.reduced.blastout
-   outputs is -> 99 RNA.reduced.blastout
+   # outputs is -> 99 RNA.reduced.blastout
    wc -l RNA.reduced.blastout_pparsed.tab
-   outputs is -> 98 RNA.reduced.blastout_pparsed.tab
+   # outputs is -> 98 RNA.reduced.blastout_pparsed.tab
-Note: Remember that this file has a header, so the total of blast results is 97, so everything is fine so far.
+NOTE: Remember that this file has a header, so the total of blast results is 97, so everything is fine so far.
 If your number of queries and blast output results differ: 1) double check your input files for errors in format 2) go online to panther to check if the panther family for the query of your interest has been curated and/or exists.
-==== Step 2: Getting the panther codes ====
+==== Step 2: Getting the PANTHER codes ====
-Obtain the codes for each panther super-family and subfamily for each query by using the command commands below and the PTHR_103classification.tsv to create a customized file for your queries.
+Obtain the codes for each PANTHER superfamily and subfamily for each query by using the command commands below and the ''PTHR_103classification.tsv'' to create a customized file for your queries.
 Post-processing the blast output to get panther information:
 ) Separate the first column that correspond to the queries accession numbers:
    cut -d $'\t' RNA.reduced.blastout_pparsed.tab -f1 > queries_acc
 ) The uniprot accession numbers (these numbers will be used to grep PTHR_103classification.tsv):
    cut -d $'\t' RNA.reduced.blastout_pparsed.tab -f2|cut -d '|' -f2 > hits_acc
 ) Create a file containing both columns for future crosschecking:
    paste -d $'\t' queries_acc hits_acc > query_hits_columns
-)  remove the header of query_hits_columns:
+) Remove the header of query_hits_columns:
     sed -i '/query ID\tsubject ID/d' query_hits_columns
-Getting the information from panther classification:
-Now, create a file containing the panther information only for 97 queries, by grepping the hits_acc information from the PTHR_103classification.tsv:
+Getting the information from PANTHER classification:
+Now, create a file containing the panther information only for 97 queries, by grepping the hits_acc information from the ''PTHR_103classification.tsv'':
 ) create a file containing the panther information 97 queries:
    grep -w -F -f hits_acc /scratch3/rogerlab_databases/other_dbs/PTHR_103classification.tsv > Panther97queries_hit_info.tsv
 ) Create a tsv file containing identify the panther families by hits_acc:
    cut -d $'\t' -f1,2,3 Panther97queries_hit_info.tsv |cut -d '|' -f3,4|cut -d '=' -f2 > Pantherby97Uniprotaccession.tsv
 ) eliminate extra tabulations in the file:
    sed -i.bak 's/\t\t/\t/g' Pantherby97Uniprotaccession.tsv
    sed -i 's/:/_/g' Pantherby97Uniprotaccession.tsv
 ) Creating a cheat sheet for you. Sort and Merge the headerless file ‘query_hits_columns’ with ‘Pantherby97Uniprotaccession.tsv’:
@@ Line 173: / Line 187: @@
-==== Step 4:** Parsing HMM outputs and creating inputs for a second HMM search ====
+==== Step 4: Parsing HMM outputs and creating inputs for a second HMM search ====
 Parsing the HMM outputs
@@ Line 198: / Line 212: @@
    qsub Panther.HmmrSearch.sh2
-==== Step 6: Parsing HMM outpus and creating inputs for a second HMM search ====
+==== Step 6: Parsing HMM outputs and creating inputs for a second HMM search ====
 Parsing the HMM outputs
@@ Line 214: / Line 228: @@
 ) Create an input file (Input4ETE) to be later applied with the ETE_standAlone1.4.py script
-==== Step 6: Start the tree search by submitting your jobs ====
+==== Step 7: Start the tree search by submitting your jobs ====
    ls -1 *Reconstruction.sh > list_of_shells