Senior PhD Student Research Showcase

April 3, 2023

9:00am - 4:00pm
SPH I, Room 1680

The 2023 Senior PhD student Research Showcase will showcase current research of our PhD students who are graduating this year. 

RSVP to attend this event

2022 Senior PhD student Research Showcase Symposium


Each of the four sessions scheduled for the Senior PhD Student Research Showcase will feature four PhD candidates delivering 10-minute presentations about their research. The remaining time in each session (approximately 15 minutes) will be an opportunity for the chair of the session -- each session will be chaired by a departmental postdoctoral researcher -- to facilitate discussions among the presenters and audience members.

EVENT SCHEDULE

9:00am - 9:30am

Breakfast Social

SESSION ONE

9:30am - 10:30am
Chair: Dr. Kalins Banerjee

Irena Chen
Individual Variances as a Predictor of Health Outcomes: A Hierarchical Bayesian Approach
Tsung-Hung Yao
Bayesian Learning of Structured Covariances with Applications to Cancer Data
Rupam Bhattacharyya
Functional Integrative Bayesian Analysis of High-dimensional Multiplatform Genomic Data
Yibo Wang
A Latent Variable Model for Individual Degree Estimation in Respondent-Driven Sampling
10:30am - 10:45am Coffee/Refreshment Break

SESSION TWO

10:45am - 11:45am
Chair: Dr. Jiyeon Song

Margaret Banker
Regularized Simultaneous Estimation of Changepoint and Functional Parameter in Functional Accelerometer Data Analysis
Yuhua Zhang
Statistical Modeling of Large-Scale Network Data
Stephen Salerno
Novel Deep Learning Approaches for Semi-Competing Risk Prediction
Jieru Shi
A Meta-Learning Method for Estimation of Causal Excursion Effects to Assess Time-Varying Moderation

12:00 pm - 1:30pm

Lunch

SESSION THREE

1:30pm-2:30pm
Chair: Dr. Xiaoqing Wang

Ying Ma
Statistical and Computational Methods for High-Dimensional Genomics Data
Lulu Shang
Statistical Methods for Genetic and Genomic Studies
Lam Tran
Approaches for Constrained Variable Selection in Large Datasets
Lap Sum Chan
Censoring-based Differential Abundance Analysis for Microbiome Data
2:30pm - 2:45pm Coffee/Refreshment Break

SESSION FOUR

2:45pm - 3:45pm
Chair: Dr. Satwik Acharyya

Yuqi Zhai
Improving Estimation Efficiency by Integrating External Summary Information from Heterogeneous Populations
Elizabeth Chase
Modeling Basal Body Temperature Data Using Horseshoe Process Regression
Xuemei Ding
Models and Methods for Analyzing Clustered Recurrent Hospitalizations in the Presence of COVID-19 Effects
Fatema Shafie Khorassani
Data Fusion for Time-to-Event Outcomes
3:45pm - 4:00pm Closing Remarks

MARGARET BANKER

Margaret BankerRegularized Simultaneous Estimation of Changepoint and Functional Parameter in Functional Accelerometer Data Analysis

Abstract coming soon.


Rupam Bhattacharyya

Rupam BhattacharyyaFunctional Integrative Bayesian Analysis of High-dimensional Multiplatform Genomic Data

Large-scale multi-omics datasets offer complementary, partly independent, high-resolution views of the human genome. Modeling and inference using such data poses challenges like high-dimensionality and structured dependencies but offers potential for understanding the complex biological processes characterizing a disease. We propose fiBAG, an integrative hierarchical Bayesian framework for modeling the fundamental biological relationships underlying such cross-platform molecular features. Using Gaussian processes, fiBAG identifies mechanistic evidence for covariates from corresponding upstream information. Such evidence, mapped to prior inclusion probabilities, informs a calibrated Bayesian variable selection (cBVS) model identifying genes/proteins associated with the outcome. Simulation studies illustrate that cBVS has higher power to detect disease-related markers than non-integrative approaches. A pan-cancer analysis of 14 TCGA cancer datasets is performed to identify markers associated with cancer stemness and patient survival. Our findings include both known associations like the role of RPS6KA1/p90RSK in gynecological cancers and interesting novelties like EGFR in gastrointestinal cancers.


Lap Sum Chan

Lap Sum ChanCensoring-based Differential Abundance Analysis for Microbiome Data

Abstract coming soon.


Elizabeth Chase

Elizabeth Chase

Modeling Basal Body Temperature Data Using Horseshoe Process Regression

Biomedical data often exhibit jumps or abrupt changes. For example, women’s basal body temperature may jump at time of ovulation and menstruation. These sudden changes make these data challenging to model: many methods will oversmooth the sharp changes or overfit in response to measurement error. We develop horseshoe process regression (HPR) to address this problem. We define a horseshoe process as a stochastic process in which each increment is horseshoe-distributed. We use the horseshoe process as a nonparametric Bayesian prior for modeling a potentially nonlinear association between an outcome and its continuous predictor. We find that HPR performs well when fitting functions that have sharp changes, such as women’s basal body temperature trajectory. We apply HPR to model women’s basal body temperatures over the course of the menstrual cycle and propose modifications to more fully incorporate prior information about basal body temperature patterns. 


Irena Chen

Irena ChenIndividual Variances as a Predictor of Health Outcomes: A Hierarchical Bayesian Approach

Modeling variability as a predictor of health outcomes may provide critical information about disease risk and health outcomes. Existing methods for longitudinal data limit scientists’ ability to leverage subject-level biomarker variability for predicting health outcomes.  In this talk, I will describe a joint modeling framework that estimates subject-level means and variances of multiple longitudinal predictors in order to predict an outcome of interest. This framework enables systematic investigation of the role of multi-marker variability in health outcomes. I will also present a simulation study in which the model demonstrates excellent recovery of true parameters. Finally, I will present a concrete application of this model to women's health, where we investigate the effects of individual estradiol and follicle-stimulating hormone variabilities and co-variability on women’s fat distribution over the course of menopause. In addition, I will also outline ongoing and future research directions for modeling subject-level variances.


Xuemei Ding

Xuemei DingModels and Methods for Analyzing Clustered Recurrent Hospitalizations in the Presence of COVID-19 Effects

Current methods are inadequate to analyze data from many dialysis facilities with multiple hospitalizations, especially when adjustments are needed for multiple time scales. We propose a method that has a flexible baseline rate function and is computationally efficient. The proposed method demonstrates substantially improved computational efficiency over the existing R package survival in simulations. Finally, we illustrate the method with an important application to monitoring dialysis facilities in the U.S., while making time-dependent adjustments for COVID-19’s effects.


Ying Ma

Ying MaStatistical and Computational Methods for High-Dimensional Genomics Data

Recent explosion of various transcriptomic technologies such as single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomic (SRT) datasets has provided comprehensive cell atlas and enabled the thorough characterization of transcriptomic landscapes on tissues for mechanistic understanding of many biological processes. In the meantime, improvements in transcriptomic technologies have raised both the volume and complexity of data, introducing new computational and statistical challenges for data analysis. In this talk, I will present several methods to address these challenges for capturing and dissecting the heterogeneity within cells and tissues with high statistical power and accuracy while providing new insight into the biological systems. Specifically, we develop effective and efficient statistical methods for integrative differential expression and gene set enrichment analysis in scRNA-seq studies, for spatially informed cell type deconvolution, and for integrative reference-informed tissue segmentation analysis in SRT studies. I will illustrate our methods by showing results from applications to human embryonic stem cell data, human ductal adenocarcinoma (PDAC) data, and human dorsolateral prefrontal cortex (DLPFC) data.


Stephen Salerno

Stephen SalernoNovel Deep Learning Approaches for Semi-Competing Risk Prediction

In the era of precision medicine, time-to-event outcomes such as time to death or disease progression are routinely collected, along with risk factors that often have complex relationships. Recent emphasis has been placed on developing novel machine learning approaches for survival estimation and prognostication in settings with one outcome of interest, however, many survival processes in real applications involve multiple competing events. Semi-competing risk problems, a variant of competing risk problems, have commonly been encountered in clinical studies. By semi-competing, we mean that the occurrence of one event, i.e., a non-terminal event, is subject to the occurrence of another, terminal event, but not vice versa. In this dissertation, we propose a series of deep learning approaches for survival prediction and causal inference in this setting of semi-competing risks. Our motivation comes from the Boston Lung Cancer Survival Cohort study, one of the largest cancer epidemiology cohorts investigating the complex mechanisms of lung cancer.


Fatema Shafie Khorassani

Fatema Shafie KhorassaniData Fusion for Time-to-Event Outcomes

Despite significant reductions in cancer mortality over the past three decades, racial disparities in cancer-specific mortality persist. Studying factors associated with these observed disparities requires data on many variables, including demographics, healthcare access, socioeconomic status, and comorbidities. There are existing national cancer surveillance databases that each collect parts of the information needed for studying racial disparities in cancer. Integrating data from multiple sources allows us to study associations between race and cancer-specific mortality over time adjusted for important confounders. We propose a method for data fusion of time-to-event outcomes motivated by confounder adjustment when studying racial disparities in cancer-specific mortality. Data fusion is a particularly challenging problem in data integration, in which no subject has complete data on all the covariates and outcome. Some existing missing data methods have been extended to the setting of data fusion; however, they do not account for time-to-event outcomes. We present a method for regressing a time-to-event outcome on a set of covariates from two integrated datasets that include some overlapping variables. We will present a class of doubly robust estimators which are unbiased if either the data source model or the model of the unobserved covariates is specified correctly. Through simulation studies we will present the bias and coverage of our estimators under correctly specified and misspecified models and will apply the method to fuse cancer-specific mortality information from the Surveillance, Epidemiology, and End Results (SEER) Program with confounders collected in the National Cancer Database (NCDB) that are not available in SEER. 


Lulu Shang

Lulu ShangStatistical Methods for Genetic and Genomic Studies

Recent advances in array-based and sequencing-based technologies have enabled genome-wide profiling of gene expression and various epigenetic markers. Extracting valuable biological information from these various omics data types requires the development of new computational and statistical methods. My dissertation centers around developing statistical methods and analyzing various omics data. In this dissertation, we propose several effective and efficient statistical and computational methods to address critical biological problems encountered in various genomics fields including spatial transcriptomics, single cell, and bulk RNA-seq studies. In addition, we have conducted two large-scale comprehensive quantitative trait loci (QTL) mapping studies in underrepresented African Americans in the GENOA cohort, to carefully examine how inherited genetic variation affects local gene expression and DNA methylation in the under-represented populations. 

In Chapter II, I focus on data collected from various spatial transcriptomic technologies and developed a method called SpatialPCA for spatially aware dimension reduction in spatial transcriptomics. We demonstrate the advantages of SpatialPCA through spatial transcriptomics visualization, spatial domain detection, spatial trajectory inference on the tissue, and high-resolution spatial map reconstruction. In Chapter III, I continue to focus on spatial transcriptomics data and develop a method, Stella (SpaTially variable cELL type specific gene identificAtion), that enables spatially variable cell type specific gene identification for spatial transcriptomics studies. We demonstrate ability of Stella in detecting genes that display spatial expression patterns in a cell type specific fashion, providing calibrated type I error control with enhanced detection power across a variety of technical platforms. In Chapter IV, I connect genome-wide association studies (GWAS) with single cell and bulk RNA-seq data and develop a method, CoCoNet (COmposite likelihood-based COvariance regression NETwork model). CoCoNet utilizes tissue-specific gene co-expression networks to infer trait-relevant tissues by integrating GWAS and gene expression studies. We demonstrate how CoCoNet can be used to identify specific glial cell types associated with neurological disorders and disease-targeted colon tissues associated with autoimmune disorders. In Chapter V, I conducted two large-scale cis-QTL mapping studies to link genetic variants with gene expression and various epigenetic markers. We performed expression and methylation cis-QTL mapping studies on African Americans in the GENOA cohort to identify genetic variants that influence either gene expression or DNA methylation. Our results promote diversity, equity, and inclusion in genetic research and enhance the current understanding of the genetic architecture underlying gene expression and DNA methylation in the underrepresented African American population.


Jieru Shi

Jieru ShiA Meta-Learning Method for Estimation of Causal Excursion Effects to Assess Time-Varying Moderation

Twin revolutions in wearable technologies and smartphone-delivered digital health interventions have significantly expanded the accessibility and uptake of mobile health (mHealth) interventions in multiple health science domains. In this talk, the estimation of causal excursion effects is revisited from a meta-learner perspective, where the analyst is agnostic to the choices of supervised learning algorithms used to estimate nuisance parameters.


Lam Tran

Lam TranApproaches for Constrained Variable Selection in Large Datasets

Abstract coming soon.


Yibo Wang

Yibo WangA Latent Variable Model for Individual Degree Estimation in Respondent-Driven Sampling

Individual network size (degree) is a crucial factor in respondent-driven sampling analysis, as it is often used as a proxy for sampling probability. However, self-reported data from the interview, which is a commonly used estimation, typically suffers from substantial measurement error. To address this issue, we propose a latent variable model that blends the analysis of reporting behaviors and responses to questions about the number of acquaintances in a particular subpopulation. We demonstrate via simulation studies that our approach provides accurate degree estimation and improves statistical inferences when using it as the sampling probability.


Tsung-Hung Yao

Tsung-Hung YaoBayesian Learning of Structured Covariances with Applications to Cancer Data

The identification of scientifically-driven dependence structures is of interest across many biomedical domains. Examples include tree- and graph-based structures that manifest themselves in precision medicine and genomic contexts. Such dependence structures can be compactly represented as covariance or precision matrices, which is useful for both characterization and interpretation of the complex dependencies. This presentation focuses on the tree structure of dependency with the application of cancer treatments. Specifically, we propose a novel Bayesian probabilistic tree-based framework for patient-derived xenografts data to investigate the hierarchical relationships between treatments by inferring treatment cluster trees, referred to as treatment trees (Rx-tree). The framework motivates a new metric of mechanistic similarity between two or more treatments accounting for inherent uncertainty in tree estimation; treatments with a high estimated similarity have potentially high mechanistic synergy. Building upon Dirichlet Diffusion Trees, we derive a closed-form marginal likelihood encoding the tree structure, which facilitates computationally efficient posterior inference via a new two-stage algorithm. Simulation studies demonstrate superior performance of the proposed method in recovering the tree structure and treatment similarities. The analyses of a recently collated PDX dataset produce treatment similarity estimates that show a high degree of concordance with known biological mechanisms across treatments in five different cancers. More importantly, our analysis uncovers new and potentially effective combination therapies that confer synergistic regulation of specific downstream biological pathways for future clinical investigations.


Yuqi Zhai

Yuqi ZhaiImproving Estimation Efficiency by Integrating External Summary Information from Heterogeneous Populations

Abstract coming soon.


Yuhua Zhang

Yuhua ZhangStatistical Modeling of Large-Scale Network Data

Scientists are increasingly interested in discovering community structure from modern relational data arising on large-scale social networks. While many methods have been proposed, few account for the fact that modern networks arise from processes of interactions in the population and that interactions may exhibit different categories. In this presentation, we first introduce a novel statistical model for the study of interaction networks with latent node-level community structure. In particular, this model allows network properties such as sparsity and power-law degree distributions. These properties are frequently observed in real-world networks. We then discuss a joint model that allows integration of interaction-wise prior knowledge into node-level community detection. We demonstrate the proposed models using post-comment interaction data from Talklife, a large-scale online peer-to-peer support network, through identifying its underlying online user groups.