Finding All Significant Closed Connected Subgraphs

Hector Roux de Bézieux, Fanny Perraudeau, Arnaud Mary, Sandrine Dudoit, Laurent Jacob

November 2020

PDF Project Poster

Abstract

In many applications, covariates describing the data are structured according to a known graph $\mathcal{G}$. Subgraphs of $\mathcal{G}$ can then serve to design new covariates, yielding an enriched representation of the data. Testing the association of these new covariates to an outcome of interest can provide more insight on critical biological processes. However, the number of subgraphs is often exponential in the number of original covariates. Therefore, a method testing all possible subgraphs would have very low power, due to multiple testing corrections and could quickly become computationally intractable. The concept of testable hypothesis has been used to simultaneously address both issues in similar contexts. Here, we introduce a method leveraging this concept to test all closed connected subgraphs, i.e., those which are not included in a larger one leading to the exact same covariate. We propose a novel enumeration scheme for these objects which fully exploits the pruning opportunity offered by testability, leading to drastic improvements in speed. We illustrate this improvement on both real and simulated datasets. This paves the way for numerous applications in biomedicine, especially for genome-wide association studies in bacterial genomes.

Type

Conference paper

Publication

Machine Learning in Computational Biology (MLCB)

Main Work

Finding All Significant Closed Connected Subgraphs

Abstract

Hector Roux de Bézieux

Ph.D Student in Biostatistics

Related