When I was in college, I did an interdisciplinary research project with my immunology professor and another professor in the computer science department. I was supposed to make a computer model of a specific cell-signaling pathway, where many proteins interact in a particular order. I was using a program called NetLogo, which is a graphical user interface and programming language meant for scientists to model various phenomena, as well as for educational purposes. Although the coding in NetLogo was relatively simple, I would become so frustrated and stuck trying to figure out every word and line that I was nearly in tears each time I worked on it.
Flash forward to now, the third year of my Ph.D., and I am finding that learning various types of code is almost a necessity for me. My research focuses primarily on the gut microbiome, and studying this involves sequencing the 16S rRNA gene, which only bacteria possess. By determining the sequence of nucleotides in that gene, one can use curated databases of bacterial genomes to determine what bacteria are present in the sample. Normally, I would prep my DNA samples and then send them away to a sequencing core. Then, sequencing results would be sent to a bioinformaticist who would analyze it all and then send back some figures and statistics. This is great, but I quickly learned that this was not giving me the education I needed to be an expert on this topic. How could I graduate with a Ph.D. if I didn’t even know the steps that went into the analysis of my own data?
Well, this problem led me down what has now been a seven-month rabbit hole of learning how to do it myself! Along the way, I’ve run into countless roadblocks, but instead of getting frustrated, I’m excited for the next step. I’m even going to go as far to say I’m having fun coding, and there is definitely a weird, nerdy endorphin rush when you get something to work without errors. Here are a few tips from someone who is just getting into coding and wants to encourage others to learn, too!
- Phone a friend. When you get stuck, what do you do? Well, chances are someone has already run into the same problem before you. And that person probably posted about it on a forum. If you’re trying to learn, you will quickly find that forums are your saviors. Reading threads that relate to an issue you’re having is amazingly helpful and informative. Often, even experts chime in. In addition to reading forum posts, don’t be too shy to make your own! For example, for my microbiome analysis, I have been using QIIME2. It has an incredibly active community of users and developers, and the QIIME2 forum is a treasure trove of information. I have made extremely specific posts on the forum and received great and timely feedback on how to solve various issues. In general, it is always good practice to include as much information in your posts as possible (i.e., the version of software you are running, the exact commands you used, the exact error message you came across). Tip: Use the -- verbose flag in your command to get a more detailed explanation of what’s going on. This way, you’re more likely to receive more helpful input to solve your problem!
- Take a course. Johns Hopkins is a huge institution with many courses. If you are wondering about a software or coding language, chances are a course about it is offered, or at least someone knows about it! For example, this fall I took a class on R through the Center for Computational Genomics. There are lots of different, useful programming languages for biologists, believe it or not. R can run almost any statistical test you can imagine, in addition to making beautiful figures and graphs. Also, countless plug-ins such as ggplot2 are available that can get you even fancier graphs. Other languages like Bash and Python can be useful for handling big data sets.
- Use a cluster. Your little MacBook Air might not be able to handle all the data you generate. It might try running a command for five days before pooping out on you and failing because you only have 8GB of RAM, even though you carried around your laptop, open, for those five days so it wouldn’t quit (this may have happened to me). Well, do I have a solution for you: Use a computing cluster! A cluster is a group of computers that work together so they can essentially function as one mega computer system. For example, Johns Hopkins has access to the cluster Bluecrab, which is available through the Maryland Advanced Research Computing Center (MARCC). The great thing about using Bluecrab is that you can log into it remotely on your computer through Terminal, and then have access to all of its computing power! This way, the commands are running on the cluster and not your computer. Another helpful aspect is that MARCC uses Slurm (Simple Linux Universal Resource Manager) to queue jobs. Basically, you can write a snippet of code you’d like to run, submit it to Slurm in a job, then close your computer and do whatever you want. Take a nap. Go to the gym. Maybe do your lab work. No need to constantly keep your computer awake! You can even specify in your code that you want to receive an email when your job is completed.
- Keep. Extensive. Notes. Keeping a detailed log of the code you use (specifically, code that works) will help others understand what you did. But most of all, it will help you. There are many ways to do this — the simplest is using a Word document or a text editor. However, there are some fancier ways, too. Jupyter notebooks are popular — they are web-based, and allow the user to create and share documents that contain code, equations, visualizations and text. These notebooks can be shared for collaboration with other users, and they support over 40 programming languages. An app called Quiver has similar features. Whatever method you use to keep track of code, you can always include #comments in line with your code — anything following the hashtag on that line won’t run in your code, so it is a great way to include an explanation of what your commands mean.
- The internet is your best friend. Bottom line of all these tips: Basically, you can find out anything you want to know on the internet. There are so many amazing and comprehensive tutorials, and even free courses, that people have spent a lot of time making, so go ahead and use them! Here are a few great resources, and this only scratches the surface:
- Codecademy: Good for learning the basics of widely used languages (for example, Python)
- R Tutorial: An introduction to common uses of R
- LinuxCommand.org: A comprehensive website for basic use of the command line
- Coppola Lab Bioinformatics Training Resources: A very extensive guide for learning bioinformatics
- edX: A high quality open-source program run by Harvard University and the Massachusetts Institute of Technology, with over 107 computer science courses
- explainshell.com: A useful website for learning shell commands
- Code.org: Mainly geared toward teaching children to code, but it’s fun
- Introduction to Applied Bioinformatics: A continuously evolving resource introducing core concepts of bioinformatics in the context of their application
- Derek Banas’ YouTube channel: In-depth videos on coding languages
Happy coding, everyone!
- Cracking the Code: How to Create Intuitive Tools for Visualizing Data
- Drowning in Data: New Challenges in Modern Biology
Want to read more from the Johns Hopkins School of Medicine? Subscribe to the Biomedical Odyssey blog and receive new posts directly in your inbox.