Sample menu:

Efforts in Genetic Mapping for Complex Human Disease

I continue to actively contribute to mapping genetic loci associated complex disease, with a specific focus on type-2 diabetes, cholesterol levels, cardiovascular disease. This effort manifests in a number of projects:

Mapping new T2D loci by meta-analysis. In collaboration with Danish Saleheen (Univ. of Penn, CVI, CCEB), and other groups representing populations from Southeast Asian, we are combining data with publicly available data from the DIAGRAM Consortium (v3, European Populations). New loci discovered will be intrepreted in the context of existing associations using a battery of informatics and phenotypic integrations.

Genome-wide studies of MI/CAD stratified by T2D. Type-2 diabetics are at a 2-4 increased risk for cardiovascular events, yet little is known about the underlying genetic and biological causes for heightened risk in this group. To close this gap in knowledge, and in collaboration with Colin Palmer (Univ. of Dundee), Natalie van Zuydam (Oxford), and members from the CARIoGRAM consortium, we are performing genome-wide association studies for CAD in type-2 diabetic cases as well as non-diabetics.

Targetted resequencing via Molecular Inversion Probe (MIP) capture. In the search for rare variants influencing risk to complex disease, the community will eventuall turn toward whole-genome sequencing (from exomes). However, the statistical and analytical challenges to testing hypothesis of rare, non-coding variants is not entirely clear. To better understand and overcome the technical and scientific challenges, we are planning to resequence ~500kb of genomic, non-coding genomic territory in ~25,000 individuals focused on T2D. To accomplish this, we are taking advantage of technologies that facilitate DNA capture and sequencing without library construction, keeping per-sample costs to a minimum, allowing us to design well-powered experiments. We are engaged in a pilot project with the Rader lab, targetting ~40kb in ~1500 samples from the extremes of HDL. Collaboration for the larger project is with the Saleheen lab and in the PROMIS cohort.

Mendelian Randomization: Methdology and Application

Traditional observational methods in epidemiology are limited in their ability to infer causal relationships between intermediate traits or biomarkers and disease liability. While true, randomized control trials are one standard by which causal relationships can be inferred, naturally segregating genetic variation in the population can also be used to make such inference. This approach, dubbed Mendelian Randomization (MR), capitalizes on the fact that alleles segregating at meiosis assort randomly, facilitating causal inference tests that minimize issues of confounding or reverse causality in interpretation. Research in the lab seeks to statistically extend the application of this methodology, provide resources to the community to make the approach more accessible and sophisticated, ultimately with the aim to identify causal factors implicated in human disease.

Analysis and inference of causal relationships. Active effort in the lab seeks to test epidemiologically correlated factors with a range of important clinical outcomes. Recent published work was on Serum Urate levels associated with T2D, CAD, Stroke, and Heart Failure, but we are investigating Blood pressure and serum calcium level for other disease endpoints

New statistical methods for MR analysis. The existing MR approaches based on summary data carry with them assumptions that are often violated in realistic biological situations. We have developed a simulation engine which allows us to generate example of arbitarily complex causal graphs, for which new methdology can be based. In addition, new statistics based on summary data which consider heterogeneity along with causal estimates are also being explored.

Population Genetics

Modeling variation in the Population Genetic Mutation Rate.The rate of mutation varies substantially by position in human and mammalian genomes and fundamentally influences evolution and incidence of genetic disease. Using a novel statistical framework we developed and applied to large-scale human population genomics data, we showed that the three nucleotides of sequence context that flank a polymorphic site - a seven nucleotide window in total - explained >81% of variability in substitution probabilities and highlights new, mutation promoting motifs. There are many open projects available to build upon this methodology, apply it to data, and develop statistics for the analysis of complex disease.

New statistics to identify balancing Selection in Genomes.

Analysis of shared, selective sweeps in the human genome.

Sequencing in non-human species

Next-generation sequencing technologies have delivery transformative science in human genetics, but are only just spilling over into applications into ecology and evolutionary biology. One goal of the lab is to contribute actively to these efforts, taking advantage of previous experience handling large-scale data sets and developing population genetic methods in humans:

Genomics in two species of the order Araneae. Despite their deep mythology, historical interest, unusual sex-dimorphic trait distributions, and production of unusual macromolecules (venom and silk), little genetically is known about species of spider at a high-resolution molecular level. To overcome this gap in knowledge, and in collaboration with Linden Higgens and Ingi Agnarsson (Univ. of Vermont), we are generating de novo assembly at high coverage (100x) with diverse fragment libraries, in two species of the Order Araneae: Nephila clavipes (a species of Golden orb-web spider) and Caerostris darwini (Darwin's bark spider). This assembly will be further augmented and annotated with transcriptomic profiles obtained by RNA-seq in whole-body as well as specific tissues of interest.