ortologs_searches_using_panther_hmmrs
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| ortologs_searches_using_panther_hmmrs [2023/04/27 15:07] – 134.190.232.186 | ortologs_searches_using_panther_hmmrs [2023/05/04 10:50] (current) – [Step 6: Parsing HMM outpus and creating inputs for a second HMM search] 134.190.232.186 | ||
|---|---|---|---|
| Line 93: | Line 93: | ||
| For more information on species contained in Panther see the file '' | For more information on species contained in Panther see the file '' | ||
| - | source activate python36-generic | + | < |
| - | | + | source activate python36-generic |
| - | | + | python BlastParser.by.Pident.py RNA.reduced.blastout |
| + | source deactivate | ||
| + | </ | ||
| - | The resulting file ' | + | The resulting file '' |
| To figure out how many sequences you started with: | To figure out how many sequences you started with: | ||
| grep ‘>’ RNAreduced.seqs | wc -l | grep ‘>’ RNAreduced.seqs | wc -l | ||
| - | | + | # outputs is -> 97 RNAreduced.seqs |
| To figure out how many blast hits output you have: | To figure out how many blast hits output you have: | ||
| wc -l RNA.reduced.blastout | wc -l RNA.reduced.blastout | ||
| - | | + | # outputs is -> 99 RNA.reduced.blastout |
| + | |||
| wc -l RNA.reduced.blastout_pparsed.tab | wc -l RNA.reduced.blastout_pparsed.tab | ||
| - | | + | # outputs is -> 98 RNA.reduced.blastout_pparsed.tab |
| - | Note: Remember that this file has a header, so the total of blast results is 97, so everything is fine so far. | + | |
| + | NOTE: Remember that this file has a header, so the total of blast results is 97, so everything is fine so far. | ||
| If your number of queries and blast output results differ: 1) double check your input files for errors in format 2) go online to panther to check if the panther family for the query of your interest has been curated and/or exists. | If your number of queries and blast output results differ: 1) double check your input files for errors in format 2) go online to panther to check if the panther family for the query of your interest has been curated and/or exists. | ||
| - | **Step 2:** Getting the panther | + | ==== Step 2: Getting the PANTHER |
| - | Obtain the codes for each panther super-family | + | |
| + | Obtain the codes for each PANTHER superfamily | ||
| Post-processing the blast output to get panther information: | Post-processing the blast output to get panther information: | ||
| + | |||
| 1) Separate the first column that correspond to the queries accession numbers: | 1) Separate the first column that correspond to the queries accession numbers: | ||
| cut -d $' | cut -d $' | ||
| + | |||
| 2) The uniprot accession numbers (these numbers will be used to grep PTHR_103classification.tsv): | 2) The uniprot accession numbers (these numbers will be used to grep PTHR_103classification.tsv): | ||
| cut -d $' | cut -d $' | ||
| + | |||
| 3) Create a file containing both columns for future crosschecking: | 3) Create a file containing both columns for future crosschecking: | ||
| paste -d $' | paste -d $' | ||
| - | 4) | + | |
| + | 4) Remove | ||
| sed -i '/ | sed -i '/ | ||
| - | Getting the information from panther | + | |
| - | Now, create a file containing the panther information only for 97 queries, by grepping the hits_acc information from the PTHR_103classification.tsv: | + | Getting the information from PANTHER |
| + | Now, create a file containing the panther information only for 97 queries, by grepping the hits_acc information from the '' | ||
| 5) create a file containing the panther information 97 queries: | 5) create a file containing the panther information 97 queries: | ||
| grep -w -F -f hits_acc / | grep -w -F -f hits_acc / | ||
| + | |||
| 6) Create a tsv file containing identify the panther families by hits_acc: | 6) Create a tsv file containing identify the panther families by hits_acc: | ||
| cut -d $' | cut -d $' | ||
| + | |||
| 7) eliminate extra tabulations in the file: | 7) eliminate extra tabulations in the file: | ||
| sed -i.bak ' | sed -i.bak ' | ||
| sed -i ' | sed -i ' | ||
| + | |||
| 8) Creating a cheat sheet for you. Sort and Merge the headerless file ‘query_hits_columns’ with ‘Pantherby97Uniprotaccession.tsv’: | 8) Creating a cheat sheet for you. Sort and Merge the headerless file ‘query_hits_columns’ with ‘Pantherby97Uniprotaccession.tsv’: | ||
| Line 153: | Line 168: | ||
| - | **Step 3:** Creating shells and running a HMMR search by **PANTHER SUBFAMILY**\\ | + | ==== Step 3: Creating shells and running a HMMR search by PANTHER SUBFAMILY |
| 1) You are ready to create the master shell for the hmmr search: | 1) You are ready to create the master shell for the hmmr search: | ||
| | | ||
| Line 171: | Line 187: | ||
| - | **Step 4:** Parsing HMM outputs and creating inputs for a second HMM search\\ | + | ==== Step 4: Parsing HMM outputs and creating inputs for a second HMM search |
| Parsing the HMM outputs | Parsing the HMM outputs | ||
| | | ||
| Line 184: | Line 201: | ||
| - | **Step 5:** | + | ==== Step 5: Creating shells and running a HMMR search |
| 1) You are ready to create the master shell for the hmmr search by **PANTHER SUPERFAMILY**: | 1) You are ready to create the master shell for the hmmr search by **PANTHER SUPERFAMILY**: | ||
| Line 195: | Line 212: | ||
| qsub Panther.HmmrSearch.sh2 | qsub Panther.HmmrSearch.sh2 | ||
| - | **Step 6:** Parsing HMM outpus | + | ==== Step 6: Parsing HMM outputs |
| Parsing the HMM outputs | Parsing the HMM outputs | ||
| | | ||
| Line 210: | Line 228: | ||
| 7) Create an input file (Input4ETE) to be later applied with the ETE_standAlone1.4.py script | 7) Create an input file (Input4ETE) to be later applied with the ETE_standAlone1.4.py script | ||
| - | **Step 6:** Start the tree search by submitting your jobs: | + | ==== Step 7: Start the tree search by submitting your jobs ==== |
| ls -1 *Reconstruction.sh > list_of_shells | ls -1 *Reconstruction.sh > list_of_shells | ||
| for i in `cat list_of_shells`; | for i in `cat list_of_shells`; | ||
| - | **Step 8:** Map protein domain architecture to each tree and build a pdf file by panther super-family | + | ==== Step 8: Map protein domain architecture to each tree and build a pdf file by panther super-family |
| 1) Create an input file separated by tabs containing a list records by line following this format: fastafile treefile | 1) Create an input file separated by tabs containing a list records by line following this format: fastafile treefile | ||
| source activate python27-generic | source activate python27-generic | ||
| xvfb-run -a python ETE_standAlone1.4.py Input4ETE | xvfb-run -a python ETE_standAlone1.4.py Input4ETE | ||
| source deactivate | source deactivate | ||
| - | **Step 9:** Creating a tabulated file to keep track of the findings. | + | |
| + | ==== Step 9: Creating a tabulated file to keep track of the findings. | ||
| + | |||
| NOTE: you will need a metadata file. it may consist of accession numbers and a fasta header. please see the format of the metadata provided for this example ' | NOTE: you will need a metadata file. it may consist of accession numbers and a fasta header. please see the format of the metadata provided for this example ' | ||
| | | ||
| Line 230: | Line 252: | ||
| Error: File existence/ | Error: File existence/ | ||
| - | **Step 10:** Move the pdf files and ' | + | ==== Step 10: Move the pdf files and ' |
| | | ||
ortologs_searches_using_panther_hmmrs.1682618868.txt.gz · Last modified: by 134.190.232.186
