| |
Statistical methods for the
analysis of functional genomic data
Many of my substantive collaborations at Michigan have been with
biologists generating high-dimensional datasets using modern
high-throughput molecular assays. This forces
biologists to deal with making sense of these
high-dimensional genomic datasets. Our group has experience dealing
with the various steps of analysis that are needed in the consideration
of functional genomic data. These include the following:
preprocessing, normalization and differential expression.
While statistical methods
have been proposed for issues such as differential expression with
these data, less work has been done on higher-level issues,
such as the development of correlative models and classification
methods for the genomic markers with clinical outcomes. In addition,
relatively little work has been done in terms of incorporating
biological knowledge in the statistical
analysis of high-throughput biological data in
human disease settings. I recently obtained a five-year R01 grant from
the National Institutes of Health for developing new methods for the
analysis of functional genomic data. I am particularly interested in
methods that attempt to integrate several genomic data sources.
Statistical methods for cancer biomarkers
While the array of technologies that generate high-dimensional data is
staggering, it is also important to not lose sight of one big aspect,
which is the development of biomarkers for prognosis and/or early
detection of disease. My experience at Michigan has focused mostly on
cancer research, where I have been able to provide statistical guidance
in the design, conduct and analysis of biomarker studies.
Two problems have interested me recently. The first is incorporation of
monotonicity into the evaluation of biomarkers.
I am developing isotonic modeling procedures for modeling the effect of
biomarkers in both nonparametric and semiparametric models. These
methods have been applied to case-control studies that
Arul Chinnaiyan's
lab has conducted
as part of the Early Detection Research Network, funded by the National
Cancer Institute. I have explored theoretical aspects of these
approaches in conjunction with
Moulinath Banerjee
in the Department of
Statistics at Michigan.
The second is in the area of combining biomarkers.
In many medical settings, it is becoming increasingly clear that
one biomarker will not be sufficient to serve as a screening
device for early detection of many diseases. As an example, we
consider prostate cancer. Typically, prostate-specific antigen
(PSA) has been used for detection of prostate cancer. If a man
has a PSA measurement between 4 and 10 ng/mL, then this leads to a
prostate needle biopsy. While PSA is known for being a relatively
sensitive biomarker, it is not known as being a very specific
measurement. As a result, many biopsies yield negative results
for tumor, even when the PSA is between 4-10 ng/mL. Many
investigators now believe that a combination of biomarkers will
potentially lead to more sensitive screening rules. How best to
combine these measurements remains an open question. I am currently
working on adapting algorithms from computer science, termed machine
learning techniques, to this problem. In particular, Zheng Yuan, a
current Ph.D. student, is developing model combining methods for
biomarkers as part of his dissertation.
New multiple testing proceedures
The genomic data analysis work has also spurred methodological research
in multiple testing. In particular, I have been involved with
the development of Empirical Bayes multiple testing procedures for
high-dimensional data. This has lead to a methodology I term shrunken
p-values for assessment of differential expression (SPADE). I am
currently working on a unified testing and estimation framework for such
problems.
Other areas of interest can be gleaned from my
CV .
|