Data Science and Engineering @ DCEG


Portable Data Science Applications for Cancer Precision Prevention. For positions opened see also pdf. Prospective intership candidate are typically challenged by a test project which is then discussed in the selection interview.


Cancer Precision Prevention places an increasing focus on data-intensive platforms that can reach, and can be engaged, as consumer-facing digital applications. Ultimately, the emergence of a Learning Health Care System is orchestrated by computational systems that orchestrate both medical reccords and consumer-facing services, from wearable sensors to genomics. A new generation of cohort studies, such as NCI/DCEG Connect, is being designed accordingly.


BigData designates the computational aggregation of large volumes of diverse data and diverse analytical environments in order to enable comprehensive integrative analysis. Even more than the logistic challanges, BigData typically has to navigate complex governance and complaince landscapes that can only be accomplished in Cloud Computing environments. Confluence is an international initiative aggregating data on 300k control and 300k breast cancer cases.

FAIR Data Platform

The data platform developed for Confluence is being abstracted into a distributed FAIR data platform for cohort studies.


Identifying novel algorithms and designing Web Applications backed by Cloud hosted APIs is the upbiquitous technology stack. EpiSphere seeks to integrate a multitude of health data streams generated and consumed in real time with the goal of contextualization of individual observatin by reference BigData. This process defines the API ecosystems of Epidemiology Data Commons.

Digital Pathology (patterns)

epiPath, imageBox, Active Learning (in press)

Time series

Mortality tracker - J. Bioinformatics PMID:33135727


MutationSignature (bioinformatics)

under development


Open-source code repositories at

Who, Where

EpiSphere is a software engineering research project of the Data Science Group at Division of Cancer Epidemiology and Genetics(DCEG) of the National Cancer Institute (NIH.NCI).