Skip to the content.

AI2Health

AI2Health is a research cluster within the Ken Kennedy Institute at Rice University geared towards fundamental and interdisciplinary research in AI and human health. Numerous scientific advances have emerged in recent years that are specific to the application of AI to human health. The goal of AI2Health is to leverage this momentum and develop AI methods and tools to make advances in essential problems in human health through three research areas in Computational Biology: (i) Systems and Integrative Biology, (ii) Structural and Functional Biology, and (iii) Metagenomics and Microbiome Biology.

Core members

Affiliated members

News

Selected Publications

(AI2Health faculty highlighted in bold)

AI2Health in Detail

We will now provide more details about some of the research questions our cluster will pursue.

1) AI for Systems and Integrative Biology

Systems biology is one of the early adopters of incorporating computational advances and machine learning, with a heavy focus on Bayesian methods. However, as increasingly more specific, high throughput biological assays continue to grow (both in number and in measurement types), there are new methodological challenges:

Incorporating biologically-inspired intuition into AI model formulation is key to building generalizable methods. One example of such approaches is building biologically meaningful embedding spaces, an AI/ML technique representing high-dimensional data in lower-dimensional spaces while capturing complex nonlinear relationships and intrinsic structures in the original data. Instead of using problem-agnostic embeddings, AI2Health core members Yao and Segarra have developed biologically motivated embedding methods to enable joint modeling of protein interaction networks from different organisms.

Moreover, we observed that in biology, gene set comparisons are routine (e.g., doing gene set enrichment comparing annotated genes with a collection of new genes), yet even in research areas where embeddings are used routinely, such as natural language processing, most efforts to compare sets rely on simple averages. Noticing this gap led to the development of a new, effective general-purpose set comparison method for embeddings that shows promise for broader non-biological applications. These examples highlight our vision for leveraging biological insights to innovate AI methods, which can synergistically enhance fundamental research in AI.

2) AI for Structural and Functional Biology

AlphaFold has unveiled the enormous potential of AI applied to the problem of protein folding and structure prediction, making paradigm-shifting progress on a 50-year-old grand challenge in computational biology. Despite this major success, there are two main limitations.

Building on the success of AlphaFold in protein structure prediction, we are expanding our focus to develop novel machine learning techniques in Functional Biology, particularly for assigning functions to protein-coding genes. Highlighting this area, Segarra and Treangen have developed an ensemble machine learning method to predict microbial pathogenic functions, which we plan to enhance by incorporating protein structure data and improving the handling of poorly annotated genes.

The challenge in computational pathogen screening includes dealing with complex host interactions, virulence factor dynamics, and community-level dynamics. Cluster members Kavraki and Treangen are now exploring an LLM-inspired model that leverages protein sequences and structures to predict functions, specifically targeting virulence factors. This model integrates evolutionary features derived from the DistilProtBert language model with protein structures in a graph convolutional network, promising significant advancements in understanding and predicting protein functions that impact disease causation.

3) AI for Metagenomics and Microbiome Biology

The microbiome refers to the collection of microbes (bacteria, viruses, fungi) that occupy a specific ecological niche (human gut, skin, air filters, etc). Given the established relevance of the human microbiome to human health, there is a recent push towards applying ML to the human host microbiome (in particular, the gut microbiome) to learn signatures of microbiome health and disease states.

Our motivating example on repeat detection highlights the untapped potential of learning discriminative graph features through graph neural networks (GNNs). Unlike predefined features, GNNs generate these characteristics through trainable iterative computations, making them adaptive to specific data samples. This novel approach has shown success in fields such as wireless networks, material discovery, and molecular design, yet its application in metagenomics is still emerging. A primary challenge in genomic data analysis is that most of the data is unlabeled, particularly in distinguishing between repeat and non-repeat sequences.

Modern machine learning techniques seek to embrace this unlabeled data through self-supervised learning, which starts with initially noisy labels and is refined through subsequent machine learning iterations and fine-tuning. Recently, we presented the first use of graph-based self-supervised learning for repeat detection in metagenomics. This serves as an illustration of the potential benefits that can be unlocked by further exploring this avenue.

4) AI2Health long-term goal

As our long-term goal, the AI2Health research cluster aims to tackle pressing health issues of our time. Toward this goal, AI2Health will leverage the research expertise of its core and affiliate members and collaborations with clinicians and scientists at the Texas Medical Center. Our research cluster will focus on transformative Health-outcome-inspired AI research in predicting, diagnosing, and treating health issues. Examples of specific goals include (i) improved cancer screening for early cancer detection and treatment, (ii) early warning systems for pathogen outbreak tracking and mitigation, and (iii) improved vaccine and drug design.

Acknowledgements