“I need to go check on my babies,” is a common statement among scientists. Unsuspecting onlookers may not realize that the scientists are, in fact, referring to their cells. While gently swishing their pink cell culture media during one of their washing steps, they will use smooth circles to avoid killing the fragile cells that they worked hard to rear.
But, as in any relationship, even the personified relationship between scientists and their cells, an underlying falsehood might be lurking undetected. The scientist may not know who exactly the cell babies are. Many times, their loved ones are contaminated or simply misidentified.
The misidentification and/or contamination of cells is predicted to be rather common, with estimates indicating that about one-third to one-fifth of cells studied in research are affected. Biospecimen authentication seeks to address the problem of irreproducibility that can arise when cell lines are misidentified or contaminated. One proposed method is the use of short tandem repeats.
Short tandem repeats (STRs) are DNA sequences consisting of one to six base pairs that are repeated many times. About 3% of the human genome is thought to consist of these short tandem repeats, around 92% of which are non-protein coding regions typically regarded as “junk DNA.” Individuals have highly polymorphic, or variable, STRs; for example, individual A may have X number of repeats at a certain genomic position, while individual B may have Y number of repeats in that same region.
With humans sharing about 99.9% of their DNA with one another, it can be hard to find genomic positions that can be used as markers for individual identification, particularly cases in which a high degree of precision is required, such as in forensic criminal investigations. In their quest to identify genomic regions that can be used for criminal suspect identification, DNA analysts in the FBI developed a system called the Combined DNA Index System (CODIS) in 1997. CODIS examines 13 STR polymorphic regions, quantifies the number of repeats in each of these genetic positions, and compares it to a population representative sample containing the distribution of repeats in these same locations. CODIS provides a match based on all 13 regions, with the likelihood of a false positive match estimated at about 1 in 1 billion.
Forensic scientists compare DNA collected at a crime scene to the CODIS registry containing the DNA profiles of convicts and arrestees; the system may search for either a direct hit or a familial match. Since its inception, CODIS has profiled about 20 million individuals and aided over 545,000 investigations. With the advent of genomic technologies and direct-to-consumer genetic tests provided by companies such as 23andme and Ancestry.com, there has been renewed fear of how this genetic data may be used by law enforcement. Though 23andme and Ancestry.com have denied granting access to DNA information to law enforcement, other less regulated, third-party websites to which genetic results might be uploaded for more health information may grant this access.
Of notable importance is how the data is formatted. For example, 23andme uses genotyping data based on single nucleotide variants (SNVs) within the gene — not the typical format used by forensic scientists. Different formats of genetic information could also maintain silos between research and law enforcement, which helps to allay ethical, legal and social concerns.
Debra Matthews from the Johns Hopkins Berman Institute of Bioethics and Department of Genetic Medicine and Natalie Ram in the Carey School of Law at the University of Maryland address the importance of maintaining silos between law enforcement and biomedical research in a policy forum published in Science, titled “Get law enforcement out of biospecimen authentication.” Bringing to light the commonly used approach of utilizing STR kits for biospecimen authentication, Matthews and Ram highlight how the tools used by biomedical researchers, largely resembling those of law enforcement, have the potential to isolate research participants who belong to marginalized communities. The use of STR kits may additionally welcome the involvement of law enforcement in biomedical research, particularly in the context of population studies in which biological samples are collected, threatening principles of confidentiality and deidentification.
Using DNA data obtained from direct-to-consumer testing, police have been able to make 200 arrests — and in one case, the San Francisco police repurposed the DNA of a victim of sexual assault to point her as a suspect in a property crime. Maintaining a different system of biospecimen authentication such as through the use of SNVs, and steering away from the use of STRs that disintegrate the silos between research and law enforcement, may assist scientists in their goals to be inclusive of marginalized communities in their research. The importance of this cannot be overstated, particularly in the context of collecting biological samples in biobanks for widespread use. While scientists may want to quickly make sure their cell babies are who they really think through biospecimen authentication, the landscape in which these authentication procedures are performed should not be neglected as we collectively think about the implications of the tools we use in our research.
- Biomedical Research Has a Plastic Problem
- Insilico Medicine: A Hopkins Startup Using AI to Enhance Biomedical Research
- Genome Engineering Emerges from the Shadows
- With the Corn, Against the Grain
Want to read more from the Johns Hopkins School of Medicine? Subscribe to the Biomedical Odyssey blog and receive new posts directly in your inbox.