Drowning in Data: New Challenges in Modern Biology
Biology is a vast and rapidly evolving field. The science of life ranges from studying the physical and chemical interactions of molecules to mapping a network of billions of brain cells. How can we make sense of an entity as complicated as an organism? A prevailing attitude in biomedical research has been that if we just develop the tools to observe biological systems at every level, we will be able to understand how an organism functions.
Recently developed tools have brought us closer than ever to our goal of collecting enough biological data. In fact, the emphasis is beginning to shift away from creating new data-collecting tools toward developing new approaches to analyze the large amount of data already collected. Two areas that have had particularly rapid expansion are the study of genes and the study of the brain. It is becoming clear that creative analysis of datasets will be just as important as collecting new information to understand each of these aspects of biology.
DNA: Just the Beginning
Less than 20 years ago, the human genome was sequenced for the first time. Essentially a code made up of over three billion units (base pairs) of DNA, our genome contains the instructions to create every cell in the body. So if scientists have already cracked the code, what else is there left to study? A lot, it turns out. One of the biggest challenges in genetics is determining how the same DNA sequence, which is inside every cell in your body, can be used to create a myriad of different cell types that make up the body’s distinct organs.
One approach is to look at which RNAs (what is RNA) are present in each cell. In order for DNA to be used to synthesize proteins, the building blocks of cells, it is transcribed into messenger RNA (mRNA), which contains the instructions for how to assemble the proteins. By measuring which mRNAs are present in each cell, researchers can obtain a snapshot of which proteins are important for creating that cell.
What’s New: Sequencing
Our ability to measure RNA levels has skyrocketed thanks to advances in a technology called RNA sequencing (RNA-Seq). Essentially, RNA-Seq allows scientists to compare how much of each mRNA is present in each cell, an incredibly powerful tool for determining how the cell functions and what makes that cell unique. But as good as that sounds, it quickly becomes unwieldy to perform such analyses because there are tens of thousands of genes encoded in the genome and trillions of cells in the body. Comparing cells in a healthy individual versus an individual with disease adds an additional layer to the puzzle.
Sorting through this vast genetic dataset is one challenge tackled by researchers in the field of computational genomics. Scientists rely on computer programs that find patterns in the levels of mRNA across hundreds of cells to “cluster” them into groups of cells with common mRNA expression. For instance, a recent study from researchers at Johns Hopkins was able to identify two distinct populations of cells in the cortex with different locations and long-range connections based solely on levels of mRNA in each cell. With efforts like these, future breakthroughs in understanding how genetics contribute to the function of the body in health and disease will likely occur on the computer rather than at the lab bench.
The Brain: The Most Complicated Network of All
Some 80 years ago, our understanding of the nervous system took a giant leap with the characterization of the electrical signals within one special cell: the squid giant axon. Because nerve cells use electrical currents to transmit information along their processes, we have a unique opportunity to monitor the activity of the nervous system by recording changes in electrical activity, an approach known as electrophysiology.
Unfortunately, it is no easy task to collect the electrical activity from even a subset of the billions of neurons in our brain. Decades of work have made inferences about the functions of various parts of the brain from studies collecting activity of at most a few dozens of cells. How can we ever understand the basis of emotion, decision-making or consciousness by sampling a tiny fraction of the brain’s cells?
What’s New: Mapping
New technologies have allowed us to start collecting data from much larger populations of cells. One exciting advance is using GCaMP, a fluorescent calcium indicator, to visualize when cells are active using calcium imaging. GCaMP produces fluorescent light when it comes into contact with calcium, which is closely linked to the electrical activity of a cell. So instead of directly measuring electrical activity, calcium imaging uses a microscope to detect changes in the amount of fluorescent light related to a cell’s activity. This allows detection of the activity of hundreds of cells at once.
Just like with RNA-Seq, however, having the ability to collect the data is only half the battle. True understanding of how the brain processes information and produces behavior will require novel approaches to interpreting the brain’s activity. One increasingly popular tactic is to use machine learning. After inputting a portion of the neural activity to a machine learning program, the computer creates a model that allows the program to predict what the animal is experiencing during the remaining portion of the data. The success of the model in predicting the animal’s actions or experiences is a clue about what kind of information is contained within the activity of those cells. One study using this approach shows that the activity of cells in the orbitofrontal cortex can be used by a model to predict how quickly a monkey will make a choice between two reward options.
These two examples of large, complicated biological datasets are just a sampling of the opportunities in which modern computing will allow us to make new insights into the function of biological systems. Biology has already started moving from the lab to the computer, and that will continue to happen more and more over the years.
- Research Transitions: From Wet Lab to Big Data
- Watch: #TomorrowsDiscoveries: Data and Medical Care – Dr. Christian Jone
- Mining Big Data to Help Patients