‘Pushing Cancer Research Forward:’ Powerful ISB Cancer Genomics Cloud Tool in Spotlight
Institute for Systems Biology, Google and CSRA have jointly created a cloud-based platform that allows researchers to quickly, reliably and securely access massive amounts of data in ways that, until now, haven’t been possible.
As technological improvements and an influx of raw data rapidly accelerate how cancer research is conducted, organizations are rushing to develop computing tools to keep up.
Institute for Systems Biology (ISB), Google and CSRA have jointly created a cloud-based platform that allows researchers to quickly, reliably and securely access massive amounts of data in ways that, until now, haven’t been possible.
The ISB Cancer Genomics Cloud (ISB-CGC) began as a pilot program three years ago, one of three funded by the National Cancer Institute (NCI), and was described this month in a special issue of Cancer Research.
“Cancer data sets have become too large for individual researchers to be able to work with them on their own machines – moving the analysis to the cloud is the only way to be able to continue pushing cancer research forward.” said Dr. Sheila Reynolds, senior research scientist in ISB’s Shmulevich Lab and lead author of“The ISB Cancer Genomics Cloud: A Flexible Cloud-Based Platform for Cancer Genomics Research.”
ISB-CGC provides cancer researchers several ways to access and analyze petabytes of genomic and molecular information produced by The Cancer Genome Atlas (TCGA). The ability to optimize different work flows makes the platform useful to a variety of researchers, from algorithm developers to computational research scientists to biologists and clinicians. Some of the use cases include file-based analysis, query-based analysis and web-based interactive analysis.
Improving treatment and outcomes
In the past year, more than 21,000 terabytes of ISB-CGC cloud-hosted data were accessed, over 99 percent of which were further analyzed in the cloud, allowing researchers to avoid the time-consuming and expensive download step. In that same time period, the ISB-CGC database resources were accessed more than 155,000 times.
The ISB-CGC data repository includes more than 2 petabytes of accessible data. (To put that in perspective, it would take about seven years to download that amount of information over a typical “fast” home internet connection.) Processing this amount of data requires more computational power than what is available to most scientists. In fact, most research institutions don’t have the capacity to store that amount of data, much less analyze it.
Understanding the challenges presented by their rapidly growing data sets, NCI launched a “Cancer Genomics Cloud” pilot project in 2014 to explore various approaches to bring cloud computing to bear. ISB was awarded one of three contracts, and partnered with Google and SRA International (now CSRA) to develop a cloud-based research platform.
“NCI launched these pilot efforts in order to democratize access to these massive data sets. We look forward to continuing to work with NCI to develop a national Cancer Research Data Commons to make these large cancer data sets more accessible to a broad range of researchers,” Reynolds said. “The ultimate goal, of course, is to improve the treatment of cancer patients, and improve outcomes.”