An Integrative Analysis Framework for Collaborative Large-Scale Micro-population Studies in Cancer Immunotherapy

Active Sites

Duke University


Cancer immunotherapy is revolutionizing treatment by offering long-term remission in patients with advanced disease. Unfortunately, only a minority of patients achieve long-term remission. Since immunotherapy is expensive and has toxic side effects, we urgently need to identify biomarkers that predict those who are likely to benefit, and gain mechanistic insights as to why only some respond. Single-cell or micro-population analysis is invaluable for studying cancer and immune-cell populations, but machine learning is necessary to interpret the complex data sets resulting from such studies. Current approaches based on flow/mass cytometry and single-cell genomics with physical/video meetings between experts result in months between data generation and interpretation. We have identified an untapped opportunity to accelerate the study of the cancer-immune system at the micro-population level.


The proposed work collectively leverages our recent success in advancing microfluidic biochip technologies, as well our strengths in biostatistics and machine learning.  We propose to investigate a “Big Data” framework that will enable researchers to remotely set up and collaboratively conduct single cell/micro-population studies in real-time using advanced microfluidic biochips. We envision that this framework will allow collaborative analysis to be carried out on a distributed network supported by gamification, machine learning and data visualization tools.


Krishnendu Chakrabarty and Robert Calderbank (ECE, Duke) will be responsible for the computing algorithm development and coordination with biological context. Yiran Chen and Hai Li (ECE, Duke) will be responsible for the deployment of algorithms on hardware computing platforms.

Experimental Plan and Industrial Relevance

(i) Unified Network-Based Modeling of Immunotherapy Applications

Map molecular mechanisms of cancer to a biology network. While this mapping has been attempted before at genomic or transcriptomic levels, we will leverage longitudinal monitoring to construct multi-omic networks. The computational framework will be developed using a scalable machine learning environment such as Apache Spark;

(ii) Development of Visual Analytics for Decision Support

A decision-support framework will be developed based on a knowledge-based reasoning system. A knowledge base will be trained based on the biology networks obtained from clinical trials. An inference engine will be designed to mimic expert reasoning. Visual analytics using Stanford Seaborn and Tableau will facilitate diagnostics analysis by an expert, and additional training. The overall framework will be deployed on Apache Spark

(iii) Gamification

The “reward” for thegame will motivate researchers to gather clinical data and participate in collaborative decision making. Participants will be rewarded based on contributions; the reward can be intellectual such as peer recognition and awards based on demonstrated relevance/utility of the data, or monetary based on commercialization. We will study mechanisms to set up the game rules, e.g., tournaments

(iv) Prototype

We will deploy the Apache Spark implementation on top of Apache Cassandra. The Cassandra data model will provide real-time communication of experimental results.

In addition to medical and healthcare, the proposed research will be of interests to multiple groups of industrial partners, such as defense industry.


The first-year deliverables include the development of molecular mapping mechanisms and the algorithms of the decision-support framework. The end-of-project deliverables include (i) the complete knowledge set of the decision-support framework including the design of gamification, and (ii) the prototype of the Apache Spark implementation on top of Apache Cassandra.

Milestones and Time-to-Completion

The estimated duration of this project is 3 years. The milestones are listed in the following table.

Year 1

Year 2

Year 3

Develop mapping scheme and framework algorithm

Develop decision-support framework with gamification

Prototype the Apache Spark implementation on top of Apache Cassandra

Number of Graduate Students Supported




Total Cost to Completion