Global Searching for Genetic Associations by

Pattern Discovery : Methods and Examples

Andrea Califano, Ph.D.

Co-founder, Chief Technology Officer

Computational Genetics Group

First Genetic Trust, Inc

Linkage-based parametric & non-parametric methods have proven successful in localizing the genetic factors of Mendelian traits. However, the dissection of complex inheritance of common phenotypic traits requires new analytical approaches to reveal the small-to-medium effects of multiple susceptibility loci.

A common limitation of current methods is the local nature of the analysis. Tests are usually performed either on individual markers or on small sets of markers within a short contiguous region of the genome. Although single locus analysis is straightforward and the statistics to evaluate significance has been adequately formulated, it increasingly lacks the power to dissect the genetic complexity of common heterogeneous diseases associated to small individual effects and gene-gene interactions.

This talk presents a new promising non-parametric technique, designed to address precisely this issue. This is based on the global, exhaustive discovery of genotype/haplotype patterns that cosegregate with complex traits and may therefore be associated with their genetic factors. Such patterns include arbitrarily distant markers, possibly spanning several different chromosomes and are therefore ideally suited to a whole-genome analysis approach. The underlying deterministic pattern discovery algorithm can efficiently comb through very large data sets involving hundreds of patients and thousands of markers and the significance of the discovered patterns is assessed using a variety of statistical tests against both theoretical and simulated distributions.

We will discuss the application of this approach both to a whole genome analysis of Hirschsprung disease and schizophrenia data, as well as to the analysis of gene expression microarray data on lymphoma and brain tumours.