Working with high-throughput sequencing data using a genomic ordered relational database architecture

Dr Hákon Guðbjartsson –  21 August, 2017 

Overview

High-throughput genomic sequencing has created formidable challenges in data management and analysis of genomic variation data. The conventional database management systems have been found to be inadequate to handle this data and this has led bioinformaticians to develop of a myriad of specialized tools and file formats, that are often incompatible, causing much computation and IO being spent on data conversions and developer’s time to master a large set of disparate tools.

Our aim was to create a general-purpose relational data format and analysis tools to provide an efficient and coherent framework for working with large volumes of genomic data. For this purpose, we developed the GORdb system. It is based on a genomic ordered architecture and uses a declarative query language which combine features from SQL and shell pipe syntax in a novel manner. The system is highly efficient for genomic data analysis use-cases, provide efficient parallelism in two dimensions and can leverage elastic computational resources in cloud setting.

In this talk, we will describe our GORdb platform, its command-line query tool and few other end-user applications that use it as a back-end query server, such as rich end-user GUI application, with syntax aware ad-hoc query tool, report-builder for canned reports, and powerful genome browser that can provide visual representation of query results.

Short Biography

Dr. Hákon Guðbjartsson, the VP of Informatics at WuXiNexode Genomics, has over 35 years of experience in software development. At deCODE genetics he directed the launch of the first web-based personal genomics service, deCODEme, and was also the lead architect and developer of many software systems including a patient identity protection system, a system for patient cohort definitions and reporting based on a novel Set Definition Language (SDL), and a highly scalable database system (GORdb) based on a genomic ordered architecture and a novel query syntax. Dr. Gudbjartsson received his B.Sc. in electrical engineering in 1990 from the University of Iceland and his M.Sc. in electrical engineering and computer science from MIT in 1992. In 1996, he received his PhD from the Massachusetts Institute of Technology and performed post-doctoral research concerning magnetic resonance imaging at Brigham and Women’s Hospital in Boston until joining deCODE genetics.