Let’s take a look back at how much genome sequencing used to cost during the span of the last 15 years. In 2001, for the price of sequencing 1 human genome, a person could buy 1000 Porsche 911 cars.
New rate-determining steps
No one has predicted this fast paced evolution of sequencing technologies and now the world of computational biology is facing some major challenges regarding issues such as data storage, management, security and analysis. These are the new rate-determining steps of introducing genomics into our everyday lives.
Slow evolution of computers
Lower cost and hence greater scale of genomic sequencing is producing enormous amounts of data and since computers are not evolving as fast as sequencing technologies, we find ourselves facing major CPU and storage problems. Bioinformatics is one of the first industries that adopted the cloud. Cloud infrastructures are flexible and dynamic, providing users with possibilities of scaling their allocated resources up and down according to their needs. In comparison to using a computer cluster to increase the CPU and storage potential, bioinformatics in the cloud can be performed by an individual user, for example a PhD student working in a lab without a strong bioinformatics base.
The formatting pain
Any person doing bioinformatics at any scale will come across a universal problem: countless amount of file formats. Researchers estimate that they spend about 80% of their time on data grooming and only 20% on actual data analysis. Lack of standardised file types and inconsistent data formatting means every new program results in a new data format.
The solution to this problem is automating these tedious data-grooming tasks, giving researchers more time to focus on data analysis. For instance, the Genestack platform is “format-free”, meaning that when data is uploaded onto the platform in any of the possible formats, it ‘loses’ the format and becomes a meaningful biological object, with all objects of the same kind acting identically regardless of underlying formatting differences.
The reproducibility struggle
Other common complaints involve the problems with reproducibility and metadata organization, such as incorrectly annotated genes or lack of data annotation whatsoever. Keeping track of the data provenance is essential and details such as scripts or specific versions of tools used must be carefully recorded, so that someone can reproduce the analysis in the future. This is crucial, since reproducibility is an absolute necessity for cumulative science. Noting down all scripts and parameters is incredibly time consuming and automation of this process is a great advantage, and saves significant time and effort for researchers.
Unrealistic expectations
Future of genomics in the clinic
The world of genomics is rapidly changing the landscape of healthcare as we know it. With prices of genome sequencing dropping below $1000, personalised medicine and treatment plans based on your genetic make up will become our everyday reality. What are the challenges of using NGS tools in the clinic? The most important ones include data security, storage, analysis and interpretation. Raw sequencing runs generate hundreds of gigabytes of data from a single measurement, and this means current clinical data management infrastructure is not enough to handle such enormous amounts of data. With the development of cloud computing, it seems realistic that this way of storing and managing data will soon be more and more common in the clinical setting. However, many remain uncertain whether cloud computing will meet the standards of data security and archiving and how will it comply with regulatory requirements. As a result, new and integrated better systems and methods are required so we can unleash the full potential of genomics. In my next article I’ll describe the project that our team at Genestack, together with our partners, have been working on to bring the benefits of using a cutting-edge bioinformatics platform to the clinic.
By Kalina Cetnar from Genestack, an innovative bioinformatics platform. You can contact the author at kalina(at)genestack.com
[tw_callout size="waves-shortcode" text="" callout_style="style2" thumb="" btn_text="Republish the article" color="#37a0d9" btn_url="https://scinote.net/blog/republish/" btn_target="_blank"]