Most would agree that a principal goal of scientific research is to enhance society’s understanding of the world around us. In biomedical research, we are particularly interested in discovering the workings of the body to find better treatments for disease and enact better guidelines and policies for healthy living.
A crucial element of this endeavor is sharing the results of our research with a wide audience so that our findings can be implemented now and drawn upon for future research. In spite of this critical goal of biomedical research, there are huge barriers to freely sharing our findings: paywalled scientific journals and opaque data analysis methods. Fortunately, there has been a trend in recent years to address these issues and make scientific data more accessible.
If you have ever tried to read a research article from home (or anywhere outside an academic institution), you have probably encountered a message informing you that if you want to read the full article, you need to pay something like $10–$30, or you can purchase a subscription to the journal for around $200–$300. This cost is obviously prohibitive for most people interested in reading the article. If you are trying to get a grasp on the latest research on, say, diabetes, you would need to read dozens of articles amounting to a cost of thousands of dollars.
So how do journals get away with this? How can anyone access enough articles to get an understanding of the field or conduct their own research? To protect students and employees from needing to pay these costs, institutions such as Johns Hopkins will pay subscription fees to provide access to a journal’s articles for anyone on their campus. It is a seamless process for those on campus; the website automatically recognizes you are accessing the article from the institution and grants you access. But, essentially, this policy creates a system where only people affiliated with an institution that can pay for journal access can actually read the latest scientific findings.
There are considerable moral issues with this approach. After all, billions of dollars of taxpayer money go into scientific research every year. Shouldn’t everyone be able to read the results that their money is paying for? And shouldn’t any person who is curious about a scientific topic be able to read about the latest findings? How many treatments, new policy ideas and inventions are delayed because certain discoveries are hidden behind a paywall?
Recently, there have been several efforts to change the scientific publishing system. One effort is the rising use of preprint servers such as arXiv and, specific for biomedical research, bioRxiv. These are open databases where researchers can post their manuscripts before they have been published by a scientific journal. They are a good opportunity to get early feedback on your work before it gets reviewed by a journal, and they allow you to share your results much more quickly than waiting to publish — the journal review process can take months (if not years, in the worst case).
This brings up the question: why do we submit scientific work to journals at all if we can just put papers up on a server? The main purpose of scientific journals is to facilitate peer review: the process whereby new research articles are sent to other scientists in the field to evaluate the experimental results and suggest improvements to the article. Preprints have not been peer-reviewed, so anything you read on these servers may have issues with the methodology or data interpretation that have yet to be worked out. Fortunately, many journals allow you to upload the final (peer-reviewed) version of your article to the preprint server, so bioRxiv may yet be the best place to find freely available biomedical research.
There has been a shift toward accessibility within the journal community as well. Many new “open access” journals have formed over the last few years, with the fundamental goal of making the peer-review process more transparent and the final articles available to all. The trend was started by PLOS ONE and continued by journals such as Frontiers and eLife. These journals still require authors to pay to publish papers, but they allow unrestricted access to the final versions, which have a creative commons license (CC-BY), permitting free use of the material contained within as long as the authors are properly acknowledged. Open access journals are well-regarded in the scientific community, and their popularity has encouraged the traditional journals to start open access initiatives as well, such as Nature’s fully open access journal, Nature Communications. Thankfully, with these new publishing options, journal articles are more accessible than ever, and it is getting easier to access the scientific literature every year.
Journal articles are a great way for scientists to summarize their findings, but, especially with the increase in size and complexity of scientific datasets (as I wrote about previously), not all the data can be presented in the paper. Modern biomedical studies can involve the collection of gigabytes of microscope images, genetic information from individual cells, or brain activity, for example. Scientific journals are not equipped to host all this data or include it in published articles, but providing access to this data is a critical part of disseminating information to a broader audience. For instance, many studies require creative implementation of complex computer programming in order to properly analyze the data, and sharing the code with other researchers is necessary to help scientists build upon each other’s findings, rather than starting from scratch every time. Moreover, there may be discoveries hidden within a dataset that the original researchers did not think to look for; making the data accessible gives other people the opportunity to derive new findings.
For these reasons, there has been a large push, both by journals and researchers, for scientists to upload their original data and code associated with a scientific article to an online database. Examples include Image Data Resource for microscope images, ArrayExpress for genomic information, and GIN for neuroscience data. Hopefully, an increase in the amount of publicly accessible data will lead both to improvements in the way we analyze data — as individuals are able to check and critique each other’s work — and to more discoveries — as individuals have the opportunity to mine already collected data.