Debashis Ghosh, Ph.D.

Current Research


 

Statistical methods for the analysis of functional genomic data
Many of my substantive collaborations at Michigan have been with biologists generating high-dimensional datasets using modern high-throughput molecular assays. This forces biologists to deal with making sense of these high-dimensional genomic datasets. Our group has experience dealing with the various steps of analysis that are needed in the consideration of functional genomic data. These include the following: preprocessing, normalization and differential expression. While statistical methods have been proposed for issues such as differential expression with these data, less work has been done on higher-level issues, such as the development of correlative models and classification methods for the genomic markers with clinical outcomes. In addition, relatively little work has been done in terms of incorporating biological knowledge in the statistical analysis of high-throughput biological data in human disease settings. I recently obtained a five-year R01 grant from the National Institutes of Health for developing new methods for the analysis of functional genomic data. I am particularly interested in methods that attempt to integrate several genomic data sources.

Statistical methods for cancer biomarkers
While the array of technologies that generate high-dimensional data is staggering, it is also important to not lose sight of one big aspect, which is the development of biomarkers for prognosis and/or early detection of disease. My experience at Michigan has focused mostly on cancer research, where I have been able to provide statistical guidance in the design, conduct and analysis of biomarker studies.

Two problems have interested me recently. The first is incorporation of monotonicity into the evaluation of biomarkers. I am developing isotonic modeling procedures for modeling the effect of biomarkers in both nonparametric and semiparametric models. These methods have been applied to case-control studies that Arul Chinnaiyan's lab has conducted as part of the Early Detection Research Network, funded by the National Cancer Institute. I have explored theoretical aspects of these approaches in conjunction with Moulinath Banerjee in the Department of Statistics at Michigan.
The second is in the area of combining biomarkers. In many medical settings, it is becoming increasingly clear that one biomarker will not be sufficient to serve as a screening device for early detection of many diseases. As an example, we consider prostate cancer. Typically, prostate-specific antigen (PSA) has been used for detection of prostate cancer. If a man has a PSA measurement between 4 and 10 ng/mL, then this leads to a prostate needle biopsy. While PSA is known for being a relatively sensitive biomarker, it is not known as being a very specific measurement. As a result, many biopsies yield negative results for tumor, even when the PSA is between 4-10 ng/mL. Many investigators now believe that a combination of biomarkers will potentially lead to more sensitive screening rules. How best to combine these measurements remains an open question. I am currently working on adapting algorithms from computer science, termed machine learning techniques, to this problem. In particular, Zheng Yuan, a current Ph.D. student, is developing model combining methods for biomarkers as part of his dissertation.

New multiple testing proceedures
The genomic data analysis work has also spurred methodological research in multiple testing. In particular, I have been involved with the development of Empirical Bayes multiple testing procedures for high-dimensional data. This has lead to a methodology I term shrunken p-values for assessment of differential expression (SPADE). I am currently working on a unified testing and estimation framework for such problems.

Other areas of interest can be gleaned from my CV .