bioinformatics

ProtocolMaker: how to tidy up your bioinformatics work into a protocol

The difference between a good student and a great one is that a good student is concerned more about the outcome while a great one is fascinated by the process of learning. - Prof. Feynman

Dealing with bioinformatics projects can produce many challenges. Overcoming these challenges means progress. And surely, there is bonus pay throughout this process. However, the real exploration is not limited to the data analysis itself. There are better ways to spread the mid-step work, such as which method you choose, how you proceed the analysis, and what if the outcome is unexpected. These mid-step efforts can also become bioinformatic publications. Here, as an early career researcher working on different bioinformatics projects, I will share my experience with real protocol cases, such as using a pipeline for minimizing redundancy and complexity in large phylogenetic datasets, executing a tool to merge and minimize “bad words” from BLAST hits against multiple eukaryotic gene annotation databases, and running a web server for identifying, annotating, categorizing, and visualizing duplicated genes in eukaryotic genomes.

Since protocols are a different format than a research article or a review, they require a different approach to make. The primary criteria for publication in Protocols are usability and reproducibility with aims to make the day-to-day life of researchers easier by publishing robust and clear step-by-step computational and experimental protocols. So, the authors should focus on the clarity of the procedures rather than on the novelty of the protocol.

There are several guidances to consider:

Clarity – Are the steps clear and easy to understand?
Timing – Do the authors include information about the time that each step or section of the protocol takes?
Critical Steps – Do the authors note the critical steps that must be followed exactly?
Troubleshooting – The authors should include some tips for users on what went wrong in their lab. What steps did they find tricky, and what were the solutions? Do the authors refer to the corresponding protocol step in the Troubleshooting section?
Alternates – Are there any reagents or pieces of equipment that cannot be substituted, OR are there reagents that have similar versions from other companies?

References:

Xi Zhang*, Yining Hu, David Roy Smith*. (2021). Protocol for HSDFinder to help identify, categorize and annotate duplicate genes in eukaryotic nuclear genomes. STAR Protocols. DOI: 10.1016/j.xpro.2021.100619
Xi Zhang*, Yining Hu, David Roy Smith*. (2021). Protocol for using NoBadWordsCombiner to merge and minimize ‘bad words’ from BLAST hits against multiple eukaryotic gene annotation databases. DOI: 10.1016/j.xpro.2021.100888
Xi Zhang*, Yining Hu, Laura Eme, Shinichiro Maruyama, Robert JM Eveleigh, Bruce A. Curtis, Shannon J. Sibbald, Julia F. Hopkins, Gina V. Filloramo, Klaas J. van Wijk, John M. Archibald*. (2022) TreeTuner: A pipeline for minimizing redundancy and complexity in large phylogenetic datasets.DOI: 10.1016/j.xpro.2022.101175
Xi Zhang*, Yining Hu, Zhenyu Cheng, John M. Archibald*. (2022). HSDecipher: pipeline for comparative genomics analysis of highly similar duplicates data in eukaryotic genomes. Star Protocols.