Department of Epidemiology

Balancing Validity, Precision, Generalizability and Importance in the Design of Epidemiological Investigations

notes of Jim Koopman for Epid655 Lecture of 1-22-99

The art of study design:

Designing an epidemiological study is an art. To design effective studies, you must have well-developed criteria and procedures to help you choose between design alternatives. But the development of such criteria is only the first step in developing skill in the art of study design. There are not a limited set of design alternatives between which you can choose. There are an unlimited variety of design possibilities and study objectives. The artful study designer will develop a unique approach for every new study. The development of that unique approach requires making judgments about how well the objectives can be achieved under different design possibilities and constraints. These judgements will influence your choice of objectives. The universe of design possibilities extends beyond the conceptual capacities of any one person. This is the case for your 655 projects even though your possibilities are confined by the arrangements Professor Schottenfeld and I have made for you. You must be creative in finding good directions. You must work as a team to develop creative new approaches. And you must follow your inspirations wisely in continuing to develop those directions with promise.

This creative art is as much concerned with conceptualizing scientific tasks as it is in being familiar with technological alternatives to pursue those tasks. Scientific tasks include filling gaps in data, filling gaps in theory, and generating new paradigms that define new gaps. Epid 655 students are likely to see their task in terms of filling gaps in data. But such a narrow conceptualization will not set you on the path to becoming a great epidemiologist. The more comprehensive your view of the role of science in human endeavors, the more creative potential you will have to design studies that generate new knowledge.

Your creativity in conceptualizing a focus for your research must interact with your creativity in finding ways to pursue that focus. Creativity is required to find opportunities to gather data in sufficient quantity and with sufficient validity to accomplish your objectives. As you explore opportunities to gather data you must again be creative enough to view that data not only from your original perspective. In each set of data you consider collecting you should search for potential to 1) resolve hypotheses which impact public health actions, 2) build a knowledge base for constructing scientific theories, and/or 3) explore the unknown.

There is no formula which can take the investigative team from a statement of a research problem to a statement of a study design. Indeed, stating the operational objectives for a study should be influenced by the process of finding feasible ways to meet study objectives. The process of stating research objectives may take several iterations. Initial objectives are stated. Then resource opportunities and limitations are discovered which require reformulating objectives. Then a rethinking of objectives leads the team to consider new data that might be gathered. Each study situation has unique potential and limitations. Creative vision is needed to meld a variety of potential study goals and resource opportunities into a design that overcomes resource limitations and operational impediments.

Creativity and scientific vision must be focused into practical study designs by making a set of judgments which maximize study virtues. These judgments are the focus of this presentation. We limit our discussion to study objectives which seek to relate the exposure experience of individuals to the disease experience of those individuals. As I discussed in my objectives lecture, this class of objectives may not be as important as objectives that have to do with conceptualizing and understanding epidemiological systems. But this narrow focus on the relationships within individuals of exposures and disease is where epidemiology had its origins and should be mastered by epidemiology students before progressing to the broader goals to which epidemiology is increasingly turning. Those broader goals include developing and validating theory regarding the systems and processes by which patterns of exposures get generated in populations and then get translated into patterns of disease in those populations.

Virtues of studies and parameter estimates

Making good study design judgments in our restricted context of relating exposure to disease in individuals requires the investigative team to have a clear vision of study virtues. Accordingly this presentation focuses on three desired qualities or virtues: 1) Usefulness, 2) Validity, and 3) Precision. These are virtues of both studies and parameters estimated in studies. The virtues of a study are the virtues of the parameters it can estimate.

As discussed in the previous lecture, the objectives of an epidemiological study that proposes to collect and analyze data should be stated in terms of the parameters that the study seeks to estimate. This is true whether a study is designed to impact decisions affecting health, serve as a basis for constructing scientific theory, or chart out unknown territory. The more explicitly that the parameters to be estimated can be stated in the objectives, the better the study team will be able to balance the virtues of alternative study designs. The most virtuous study will estimate the most useful parameters with the greatest validity and precision.

The usefulness of study results relates to the models in which the parameters estimated are embedded. Two types of usefulness are usefulness for making decisions that will directly affect the health of individuals and usefulness for distinguishing alternative theories about what generates patterns of disease in populations. Regarding the first type, the most useful parameter estimates are those that can be applied to the greatest number of individuals to make the most important health decisions. Regarding the second, the most useful parameter estimates are those that maximize our ability to distinguish between competing theories about what generates patterns of disease in populations.

The validity and precision of a parameter estimates relate to how the study is carried out. Validity and precision are the study and parameter virtues that we most commonly tradeoff as we make decisions about what statistical target population to study, how to select study subjects from that population, and how to collect data from that population. Of course valid and precise estimates are more useful than invalid and imprecise estimates. They will help us make better decisions regarding things affecting the health of populations and regarding competing theories.

All parameter estimates, and thereby all study designs producing the estimates, may have different degrees of the three virtues. In the context of relating exposure to disease in individuals we will define these virtues relevant to epidemiological parameters that relate exposure to disease in individuals like odds ratios, risk ratios, or risk differences. These parameters are not necessarily parameters in the most scientifically useful and generalizable models. They are not the parameters of ecological models, infection transmission models, or sociological models. These more inclusive models may ultimately be of greater use to Public Health than models of exposure effects on disease in individuals. But this is an introductory course. It can only consider simple models of disease causation in populations.

Tradeoffs in the Art of Study Design

A thorough review of the literature regarding a problem combined with an effort to conceptualize that problem in creative new ways is the first step in defining what parameters in what models will be most useful. Any creative endeavor is enhanced if there is an initial stage where ideas flow without constraints by the standard prejudices regarding what is useful, possible, or good. After this free-flow, the socially possible should be winnowed from the wild fantastic. From this process should come several study objectives for more in depth exploration. In this course these study objectives will most likely entail estimating measures of association or attributable risk between exposure and disease. These may be stratified by or otherwise controlled for third variable effects. They might also describe joint effects of multiple exposures on disease outcomes.

The study design art might not be driven only by a stated problem. It might just as productively be driven by a consideration of resources available and how those resources might be put to maximal use. In Epid 655, Dr. Schottenfeld and I have made an effort to provide you with some resources. You should consider all of the different study design possibilities made possible by these resources. You should consider the possibilities for using different types of information. You should consider using demographic characteristics of the census tracts where individuals live. You should consider information collected in the past and available on questionnaires or on medical records. And you should consider administering your own questionnaire or collecting your own lab specimens. You should consider the possibilities for defining study populations in different ways. You should consider defining study populations at the present point of time and at past points in time. You should consider selecting study populations by different characteristics including disease or exposure status.

Then you should consider what study objectives are most appropriate given your study design possibilities. This is where your judgement capacities in making tradeoffs between usefulness, validity, and precision come most into play. Once you have the possibilities in front of you, you must evaluate the validity and precision with which the various parameters intrinsic to your objectives can be estimated. Then you must evaluate the usefulness of the various parameters given the possible levels of validity and precision. You must consider how changing your sample selection and data collection procedures will impact the validity and precision of the parameter estimates you can make. This will impact on which parameter estimates will be most useful. You must consider how adding to the number of different parameters that you can estimate might degrade the validity or precision of each parameter estimate that you will make. To make all of these tradeoffs appropriately, you should have clear concepts regarding how to assess the usefulness, validity, and precision of parameter estimates.

Usefulness:

Two things determine the usefulness of a parameter estimate.

  1. The importance of the decisions that the parameter-estimate helps one make.
  2. The number of individuals to whom the parameter-estimate applies.

We may call these two categories importance and generalizability.

Generalizability and importance are linked virtues that relate to the models in which parameters are elements. If a parameter is an element in a statistical model that only applies to the statistical target population of a study, then that parameter estimate is less generalizable than a parameter estimate that is an element of a causal model that has scientific generalizability to many other populations. The more generalizable a parameter estimate, the more important it is likely to be. For example, an estimate of myocardial infarction risk due to changes in high-density lipoprotein levels will be more important if the parameter capturing that risk relationship can be accurately applied to many different populations. If it can be applied only to the totality of the population in which the parameter was estimated, it will be less important. If it can be applied to all subgroups in the population from which it was estimated it would be more valuable than if it can be applied only to the totality of the population from which it was estimated.

The estimate of a parameter in a statistical model that is not generalizable beyond the statistical target population or to subgroups in the population may, however, have importance beyond the statistical target population if it distinguishes which of various competing theories makes the best predictions in a population under study. In general, theory has potential utility in many different situations while facts may have utility only for dealing with the populations about which the facts are known.

You will be making choices regarding whether you want to estimate attributable risk parameters or association parameters. Attributable risk parameters will in general be more important for making Public Health resource allocation decisions because they express the amount of disease caused by exposure and thus facilitate cost-benefit calculations involved in making resource allocation decisions. Association measures do not do this. They do, however, better reflect the actions of confounding variables and biases and are therefor more appropriate when the decision to be made is merely whether or not an exposure is causally associated with a disease.

The choice between risk ratios and risk differences as study parameters to estimate is not only relevant to the importance of the decision that can be made on the basis of the parameter estimate. It is also important with regard to generalizability. There are some situations where a relative risk may be generalizable from one population to another and there are other situations where it won't. Likewise, there are some situations where a risk difference may be generalizable and other situations where it won't. Epigenesis theory can be used to help identify these situations (Koopman JS and Weed DL. Epigenesis Theory: A mathematical model relating causal concepts of pathogenesis in individuals to disease patterns in populations. Am J Epidemiol, 1990; 132:366-390.). Epigenesis theory is based on analyses that go beyond the level of this course so we will not explain it here. Relevant to the generalizability of risk ratios or risk differences, however, epigenesis theory predicts that risk ratios will be generalizable if a risk factor affects something that is on every single possible pathway to disease. On the other hand, risk differences will be generalizable if a risk factor affects a subset of the pathways to disease for which there is little variance in the frequency of other factors on that pathway. Without going into much more detail, the conditions where risk ratios are generalizable are more likely to be found for diseases like hypertension and coronary artery disease. On the other hand, the conditions where risk differences are more generalizable are more likely to be found for infectious diseases where the risk factor for disease under consideration is the infection itself. For example, the risk difference for otitis media due to Moraxella cattharalis should be generalizable to populations with either high or low levels of non-typable Haemophilus influenza infections. The risk ratio would not be generalizable. On the other hand, the risk ratio of a high cholesterol diet would be more generalizable to populations with high or low levels of hypertension than would a risk difference.

I include here additional comments on importance and generalizability which may be redundant because my time for completely redacting this document is running out.

Generalizability:

Scientific generalizability refers to the validity of applying parameter estimates made in a particular statistical target population to other populations. This must be distinguished from statistical generalizability which refers to the validity with which estimates made in a sample population can be applied to the statistical target population. Statistical generalizability so defined relates to what epidemiologist most often call validity. When epidemiologists talk with statisticians, miscommunication is likely because when epidemiologists speak of generalizability they are usually thinking about scientific generalizability while when statisticians speak of generalizability, they are much more likely to be referring to statistical generalizability.

Generalizing parameter estimates from one population to another depends upon having parameterized the exposure disease relationships in the statistical target population using a model which corresponds to the forces causing disease and then having that model also apply to the population to which one is generalizing the estimate. Quite often this is done by assuming that the same forces which generated an OR or an RR in the statistical target population are also acting in the population to which one is generalizing results. But if very simple models which do not take background causes of disease and cofactors into account are used, then there is a good probability that the model will not apply in both populations.

If one generalizes an odds ratio between endometrial cancer and exercise estimated in a Michigan population to a Florida population, then one is assuming that all the background causes of disease and cofactors are the same in those populations. Suppose that there are two different pathways to the development of clinically evident endometrial cancer. Suppose estrogen promotes tumor development and growth in one pathway but not the other and that diet and genetic factors affect the frequency of disease progression down the pathway to the non-estrogen dependent cancers while exercise affects the pathway to estrogen dependent cancers. In that case generalizing from the Michigan population to the Florida population would require one to assume that the dietary and genetic factors are acting the same in both populations, that the levels of estrogen are the same in the two populations, and that exercise affects estrogen levels identically in the two populations. The more one know about the pathways leading to disease, the more validly one will be able to generalize results from one population to another.

In summary, the scientific generalizability of study results from the statistical target population to a different population depends upon at least one of the following being true:

1) there are no unmeasured modifiers of an exposure disease relationship,

2) the distribution of unmeasured modifiers is the same in the population studied and in the population to which the results are being generalized, or

3) all modifiers can be assessed in both the population studied and in the population to which the results are being generalized so that effects at the separate levels of the modifying variables are what are generalized from the statistical target population to the scientific target population.

You should be aware that a cause of disease which does not modify the risk ratio, rate ratio or odds ratio will modify the risk difference and vice versa. We will go over this later when we come to doing stratified analyses. Thus how generalizable study results are may depend upon which parameter we are talking about.

With regard to point 1 above, there are almost always factors which act as modifiers of the risk ratio or the risk difference. There will, however, be no factors which modify the risk ratio when there is only one pathogenic process leading to a disease outcome and there is no other variable that acts via the same mechanism to cause disease as the exposure of interest. There will be no modifiers of the risk difference when there is no variation in any population of factors needed to cause disease other than the exposure of interest. This essentially means that there will always be modifiers of the risk difference.

For dichotomous variables, if there are no risk ratio modifiers, risk ratios are perfectly generalizable. If one has studied a number of variables that might be different between the target population and the population to which the results are being generalized and has found that they do not modify the RR, then one has more confidence in making the generalization. Consider the relative risk of an STD between condom users and non-users. If in your study population you demonstrate that frequency of an infection in different segments of your population or different contact patterns in different segments of your population do not modify this relative risk, then you will have much more confidence generalizing this relative risk from your statistical target population to other scientific target populations.

If the exposures of interest and their modifiers have the same joint distribution in the study population and in the population to which the results are being generalized, there is no problem in making the generalization. (If age and sex are each observed to affect relative risk of condom use for STDs but the scientific target population has the same age and sex distribution as your statistical target population, then you can still generalize from the statistical to the scientific target population. Note, however, that it may not be possible to generalize to populations which have the same age distribution and the same sex distribution but which have different joint age-sex distributions.)

In the face of a demonstrated effect modification to a factor that differs in frequency between the study population and the population to which one wishes to generalize the results, one must specify the effects of to be expected in each study population strata determined by the effect modifier. Then one can generalize to the corresponding strata of population where one wishes to make inferences. Of course within those strata the above conditions must still apply.

If parameter estimates have been about the same in a range of populations, one has some confidence in generalizing to a population that falls within that range of characteristics even if the modifiers have not been studied or are not understood. (If the condom use-STD relationship has been observed to be the same in old and young engineering students and in old and young LS&A students, then one has more confidence in generalizing to all students, even those at other universities)

Importance:

There are two different aspects to importance. These relate to the two types of introductory sentences which might be part of a statement of "specific aims" in your study proposal and two different topics that might be dealt with in the significance section of your proposal:

1) The degree to which the problem area addressed is one that affects human well being.

2) The degree to which an estimate of the parameter studied will affect either scientific theory or public health action.

Importance is ultimately a matter of values. Values are subject to change upon evaluation. Thus it is worth while to examine each of the following types of importance before making a study design decision.

1) The impact of general classes of exposures and agents on Public Health:

We choose to work in different areas not just because of the importance of a narrow question addressed by one of our studies, but also because we want to build a career in which we will make lifetime contributions. There may be many different criteria for making such decisions. Ideally we should have epidemiologists filling as many of the niches as possible, even when the health impact of some niches is small, it would not be good to have no one working in those areas. But one's ranking of what types of exposures represent threats to health probably influences one's choice of problems to address. My personal ranking of very general classes of factors as to the impact they have on public health influences the general type of epidemiological research I do. The "Public" whose health lies behind this ordering is the world population. My personal ranking of the importance of general factors on Public Health is:

a) Microbial agents (classic pathogens and agents that alter microbial ecological balances affecting health)

b) Diet.

c) Social environment

d) Knowledge and attitudes.

e) Genes.

f) Cognitive factors and stress.

g) Man made chemicals and environmental pollutants.

2) The degree to which the factors being measured can be altered with Public Health programs.

Condom use may be a more alterable cause of sexually transmitted infections than social settings where partners are met and therefore you may decide to study the former more intensely than the latter. On the other hand, you might come up with an innovative program of creating safe meeting places that can protect a population by insuring that infected and uninfected individuals do not mix. Given the potential for such a program, the study of contact patterns becomes more important because you know that contact patterns determine the level of infection in a population and you might be able to alter contact patterns.

The social environment determining child care arrangements may be a stronger determinant of acute suppurative otitis media than knowledge and attitudes of the parents regarding pacifier use. But perhaps the susceptibility of child care arrangements to mass public health action raises its importance. On the other hand, it may be easier to vaccinate than to address either parental knowledge and attitudes or the social environment determining day care arrangements. We must be careful not to abandon important control paths, however, because some other path shows more promise. Vaccines will have serious limitations for controlling otitis and that restores a lot of the relative importance of pacifiers and day care.

3) The degree of uncertainty about a measurement influences its importance.

A fair number of studies give us some indication about condom use patterns and effects. Thus it is not too likely that an inexpensive study will increase the precision and validity of our knowledge with regard to condom use. There is very little known, however, about the associations between study subject characteristics like new partnership formation rate and partnership characteristics like courtship time. Thus almost anything you do, no matter how ill conceived and poorly executed, is likely to add to our knowledge in this area.

Likewise, the dearth of information about how pacifier use affects the risks of otitis media in different conditions makes this a very important issue to address.

4) Helpfulness of an observation in choosing between alternate theories or clarifying scientific concepts.

There may be no way to intervene on variables such as the determinants of contact patterns. These, however, are so essential in determining transmission patterns and understanding in which population subgroups we can expect the change in things like condom use to broadly reduce infection risks, that they might have a greater priority for study than even a factor like condom use which we can directly control.

5) The breadth of theory to which an observation could relate.

To my mind, this is the most important factor determining the importance of an investigation. Science, like public health, is a social process. If the knowledge you generate cannot be integrated into a broader body of knowledge, society and other scientists will not pay attention to it. Isolated facts applicable only in the population studied are relatively unimportant facts even when the populations in which these facts are determined have important numbers of individuals who could be helped by these facts. On the other hand, facts generated in populations with less numerous individuals who can be benefited by those facts may be quite important if they relate to a body of theory that increases understanding of a wide variety of situations. This is why scientists often decide to study what sounds like something very esoteric and uncommon. The ability to describe one esoteric system may affect theory about more common systems. We certainly have learned a lot about genetics, for example, in studying drosophila, E. coli, and lambda phage. Similarly, although students may not represent the population that most suffers from STDs, they represent a population in which it is most possible to study contact patterns and make some evaluation as to how the determinants of contact patterns act to alter infection risks.

Validity:

Validity refers to the ability of parameter estimates to on average reflect what they are intended to reflect. The most common reasons for invalid estimates for parameters relating exposure to disease are sample selection bias, information bias, and confounding.

The term "validity" can have different meaning given different contexts. Beware not to think that the way I use the term is universal. I present here the logic that I find useful for helping to choose study designs that provide "valid" estimates. We go into more detail about validity here than you did in Epid 601 but we do not comprehensively cover the many different ways that the concept of validity has been used.

Different authors have used the concept of validity quite differently. Distinctions have been made between "face" validity, "internal" validity, and "external" validity. In general, internal validity corresponds to making correct inferences about the study population or sometimes about the statistical target population. External validity refers to making correct inferences about other populations. Face validity corresponds to estimating a parameter that can be appropriately applied in the way the parameter estimate will be used.

Biostatisticians use the concept of validity in reference to estimation procedures. This use is distinct from the use of validity with regard to parameter estimates themselves. Estimation procedures are biased (invalid) if the estimates are not centered around the true parameter value even when there are no information biases or sample selection biases.

Inferences may be invalidated at any of three steps in the inferential process. These are

  1. choice of model and parameter for the particular purpose that the study seeks to accomplish.,
  2. choice of parameter estimation procedure for the particular model and parameter chosen,
  3. choice of study sample and data collection methods to provide data for the estimation procedure.

Since this course deals with the choice and use of field methods, it mainly deals with the third source of invalid estimates. We deal with the second issue mainly in terms of using stratified or crude analyses with some discussion of the choice of multivariate analytic methods. In your Biostatistics 560 course next year you will have much more discussion on this topic. Since your professor is a theoretician, he has probably included too much in this document about the first two issues. We begin with the more theoretical first issue.

1) The parameter must be an element in a model that validly abstracts reality for the purpose intended.

The validity of a model used depends upon the uses to which an estimate is put. I feel that the most important uses of parameter estimates are to assess which of two competing theories are most applicable in the real world. That use of parameters, however, requires knowledge and understanding which most Epid 655 students have not yet pursued. In this course we consider two simple uses of parameter estimates.

  1. Parameters may be used to make inferences about patterns of disease occurrence and/or the risk levels to individuals in the statistical target population.
  2. Parameters may be used to make inferences about causal connections between exposure and disease in individuals.

Consider how validity is dependent upon parameter use for different parameters relating exposure to disease. A parameter that provides a valid estimate of the relative frequency of disease in exposed and unexposed individuals in the statistical target population may not provide a valid estimate of how much disease can be prevented in the population by eliminating exposure. To describe relative frequency, the model need not take third variables into account. It must merely abstract patterns of occurrence using models which reflect population patterns. The simplest model would be one in which all exposed individuals would be treated as homogenous with respect to disease risks and all unexposed individuals would similarly be considered to be homogenous. No role for third variables needs to be specified when the purpose of the model is just to describe relative frequencies. It is sufficient that possible relationships between variables in the model correspond to the observed relationships in the data. But if the parameter is intended to reflect causal relationships between exposure and disease, as is usually the case in epidemiology, then models must correctly take into account how third variables affect the relationship between exposure and disease. The models we will most commonly use in this course to take third variables into account are models where we stratify on the third variable and in so doing only assume homogeneity of the exposed groups with regard to disease risk within strata of the third variable.

Models which meet the purpose of describing causal relationships will always meet the first purpose of describing patterns of occurrence. But the reverse is not true. Disease patterns may be described and individual risks assigned using any procedure that puts individuals into relatively homogeneous risk categories. But causal relationships between exposure and disease can be inferred only if the models employed reflect the causal actions that actually occurred in the population. That means that confounding variables must be part of the model and that they must be included in the model in ways that correspond to the causal actions of those variables.

The art of conceptualizing models and variables which maximize the validity of epidemiological inferences should be nurtured throughout one's career. No one can be expected to master all aspects of this art. Consultation and collaboration with mathematical modelers and statisticians is desirable in this regard. Consultation with statisticians is often more relevant to insuring that the shape of models corresponds to the shape of observed data. Most statisticians have spent less time with considering what makes models causally relevant. Consultations with epidemiological theoreticians and mathematical modelers will often be needed in this regard. In this course we will deal only with models of discrete exposure categories relating to a binary outcome such as "well" vs. "diseased".

For making causal inferences, the key aspect of model validity in our restricted framework of discrete exposures related to discrete outcomes in individuals is the inclusion of confounding variables in the model. If a confounding variable is not included in the model, then valid associations measured between exposure and disease in the population will not provide a valid base for inferring causal effects in that population. If effect modifying variables are not included, we will not have a valid basis for generalizing the effects observed in our study population to other populations. To maximize validity, we should thus include all the confounding and effect modifying variables we can. As we add variables to our study, however, we must make tradeoffs.

The more strata of confounding and/or effect modifying variables we divide our population into, the more valid our causal inferences will be. But seeking maximum validity by including all potential confounding and effect modifying variables has important tradeoffs in that this can markedly decrease some of the other virtues. Complete validity in the inference about any relationships between exposure and disease requires that all causes of disease be included in the model. But the more your study gets cluttered up with additional variables, the more chance for sample selection biases and loss of precision in estimates when people react to a long interview or questionnaire. Likewise the more confounders and effect modifiers you specify in your model, the less precise will be the estimate of relationship between exposure and disease which is of primary interest.

2) The biostatistical estimation procedure must be unbiased.

An unbiased estimation procedure is one that gives estimates centered around the true population parameter. There are often alternative statistics that can be used for estimating the same parameter. These may or may not use the same data. Some estimation procedures may provide biased estimates of the population parameter even when there are no sample selection biases or information biases. We sometimes tolerate slightly biased estimation procedures either because they use more readily available data or because they the give estimates with narrower distributions or confidence intervals. Epidemiologists generally depend upon biostatisticians for advice on these issues.

3) The sample must be selected in an unbiased fashion and the data collected must be accurate.

Most of the tradeoffs considered in designing studies for this course involve choices that increase or decrease the chances of sample selection biases and/or information biases. We will have much more to say on this later.

In summary, valid inferences about relationships between exposure and disease require models which incorporate confounding and effect modifying variables in a theoretically appropriate manner, unbiased estimation procedures, unbiased samples, and unbiased data.

One way to think about the validity of a parameter estimate for a valid model relates to how well we are doing in hitting a target. Assuming that the model in which a parameter is embedded is valid, valid parameter estimates will be centered around the true parameter value. They may have different degrees of dispersion around that target. The degree of dispersion, however, does not relate to the concept of validity. The degree of dispersion relates to precision.

Precision:

Precision reflects the degree to which parameter estimates cluster upon repeated execution of a study under identical conditions. Of course we can't actually repeat studies under identical conditions. So we use statistical models to help us predict the degree of dispersion that would be expected on the basis of chance. Within the frequentist traditions of statistical inference, confidence intervals are used to estimate expected degrees of dispersion. The narrower the confidence interval of a parameter estimate, the greater its precision. One way to measure how much a scientific study contributes to knowledge is to consider the range in which you think a particular parameter might lie before you begin a study and see how much the study narrows that range. If a parameter can only be measured with a precision larger than one's current degree of uncertainty, the study estimating that parameter cannot contribute much new knowledge.

The relationship between precision and validity can be illustrated by figure 4.2 of DCR. Figure 4.2 actually refers to the measurement of a variable in different individuals rather than the estimate of an exposure-disease relationship parameter in a population. The concepts are relatively similar between the precision and accuracy of a measurement in individuals and the precision and validity of a parameter estimation in a population. Individual accuracy corresponds to valid population inferences. Think of the five dots in each of the targets as the results of five repeated studies in the same statistical target population. The left hand target of fig 4.2 would represent a precise but invalid estimate. The second would represent an imprecise but valid estimate. The third would represent precise and valid estimate, and the fourth would represent an imprecise and invalid estimate.

The concept of precision is related to the concept of power to detect an association between exposure and disease. Power concentrates on just one end of the confidence interval, the one closest to the null value of the parameter of interest. Precision is a more inclusive concept than power. The precision with which exposure-disease relationships are estimated depends upon several factors. You need to fully understand the way that the first four factors listed here act. The last two factors are also important to understand. They are, however rather subtle for a course at this level, however.

1) the model in which the parameter is being estimated:

If an exposure-disease relationship is measured in a model having many parameters relating confounding variables to disease, then the parameter expressing the relationship of the exposure of interest to disease will have wider confidence intervals. As one includes more confounding variables in one's model, one gains validity but loses precision.

2) sample size:

Larger sample sizes result in smaller confidence intervals. The main way one gains precision is to increase sample size.

3) frequency of exposure and disease:

Exposure and/or Disease frequencies distant from 0.5 will result in less precise estimates of ORs or RDs than exposure or disease frequencies closer to 0.5. That is the main reason we might want to choose a case-control or an exposure cohort sample selection procedure instead of taking a cross sectional sample of a population. Those sample selection procedures can help us get either the disease frequency in the sample population or the exposure frequency closer to 0.5.

4) measurement and classification errors:

If there is a high chance of random error in classification by exposure or disease, not only will ORs or RDs be biased toward the null, the confidence intervals are likely to be enlarged as well. Likewise, if there is random miss-measurement of exposure levels or disease levels on continuous scales, estimates of regression slopes between exposure and disease will have wider confidence intervals.

5) frequency of cofactors in the study population:

Cofactors in the study population which act in conjunction with (complementary to) the exposure of interest will increase the true OR or RD by increasing the rate at which the exposure of interest causes disease. The increase in frequency of disease not only increases the true value of the OR or RD. In most study designs, it also increases the precision with which these parameters can be estimated. Thus we can increase the power of a study to illuminate aspects of an exposure disease association if we focus on a high risk population where the high-risk is due to co-factors that act in conjunction with the variables of interest to cause disease. For example, if we wanted to study different aspects of pacifier use that might be related to otitis media rates, we would do well to study a 9-18 month old population that is in day care. This population will have greater susceptibility than a younger population because it will have lost its maternal immunity. It will have a greater susceptibility than an older population because it will not have acquired as much immunity as the older population. It will have more potential for the pacifier to act as a vehicle of transmission because there will be more otitis causing agents in the environment.

6) frequency of unrelated causes of disease in the study population:

Choosing a high-risk population for study will not always increase the precision with which parameters relating exposure to disease can be estimated. If the factors making a population a high-risk population act independently of the risk factors under study rather than in conjunction of them, then greater precision can be obtained by studying low-risk populations.

Background causes of disease in the study population that do not act in concert with the exposure will increase disease rates in both the exposed and unexposed. This will move ORs and RDs toward the null value and decrease the power to detect an association between exposure and disease. The ORs and RDs estimated in these conditions will have wider confidence intervals because the variation in risks or rates of disease in different exposure categories will be unrelated to the exposure effect of interest. Thus if high risk populations are high risk because of the actions of causes which do not act at all in concert with the exposure of interest, then we want to avoid these high risk populations.

Consider the situation where we were studying Moraxella cattarhalis as a cause of otitis media and wanted to elucidate the role of immunity to this agent in protecting against otitis. Since specific diagnosis of the otitis in terms of the agent causing it is so difficult, we might just measure antibody levels in kids to the Moraxella and then compare the otitis rates in populations with high and low levels of antibodies. If we chose a population to study that had a high rate Streptococcus pneumonia infection, our study results would be so diluted by these infections that we might not be able to perceive the effect of Moraxella antibodies.

Choosing study designs that optimize virtues

Epidemiologists must understand how different aspects of study design affect each one of these virtuous characteristics of studies because when we make a decision that increases one virtue, it inevitably decreases some other virtue. Thus one of the key aspects of judgments about study designs is that they achieve a proper balance of these desired qualities. With regard to generalizability, precision, and validity, it may useful in balancing them to consider how they relate to errors in parameter use. Accordingly we deal with this in the next section. But the key thing that will help you weigh study design aspects is to understand how different aspects of study design relate to these virtues. This will be the subsequent topic

Errors in the use of parameter estimates:

Errors in the use of parameter estimates may arise for six reasons. The first relates to precision, the next four relate to validity, and the final one relates to generalizability.

1) Chance:

Errors may arise as a result of chance. Whether one decides to use the point estimate, the upper confidence level, or the lower confidence level to make a decision about an exposure disease relationship, chance will affect the value of the estimate. The narrower the confidence intervals, the less likely it is that chance could lead you to make an important error.

2) Information Bias:

We have information bias when the observations do not represent the true state of affairs in the sample because errors are made in data collection. Two types of errors can be made which cause odds ratio, risk ratio, or risk difference estimates to be invalid.

  1. There may be information bias such that the sensitivity or specificity of exposure classification is different in diseased and well individuals or the sensitivity or specificity of disease classification may be different in exposed and unexposed individuals, or
  2. There may be unbiased misclassification errors of exposure or disease. When dealing with dichotomous variables, unbiased errors cause OR, RR, or RD estimates to be biased towards the no effect values of those parameters.

3) Sample selection bias:

We have sample selection bias if the sample selected does not represent the target population. Statistical theory addresses how this can arise by chance. Usually, however, samples fail to represent the statistical target population not on the basis of chance but rather because of some systematic bias as to who comes into the study. Statistical theory will not help with this problem. An understanding of what led to the bias and what is needed to avoid it is needed.

4) Unmeasured confounders:

The exposure-disease relationships reflect the effects of unmeasured confounding variables causing disease rather than exposure causing disease.

5) Inappropriate analytical model:

The model used to control for confounding may not correspond to the true nature of the confounding being controlled and therefore the "controlled" estimates may be wrong. Another way of saying this is that incorrect model assumptions in the control of measured confounding variables lead to invalid parameter estimates. For example, a logistic regression model may be used to control for confounding. This model assumes independence of outcomes between individuals in the study and multiplicative relationships between multiple predictor variables and the outcome. If these assumptions are wrong, the parameter estimate could be wrong. In that case a more appropriate model should be sought.

This type of error can arise even when the only method you use to control for confounding is stratification. If your analytic model generates a risk difference and your theoretical needs are for a ratio measure, you have used an inappropriate analytical model.

6) Undescribed effect modification:

The relationships in the target population might truly be those observed in the sample data. The pertinent confounding variables might have been measured and correctly controlled in the analysis. But if modifying variables are found in different frequencies in the statistical target population and the scientific target population, then the use of the parameter estimate in the scientific target population will be erroneous. Note that in this case we would have valid parameter estimates if we applied them only to our statistical target population, but we would have invalid estimates if we applied them to our scientific target population.

Also note that all third variables which cause disease are effect modifying variables in one sense or another. If causal third variables do not modify the Risk Ratio, they will modify the Risk Difference, and vice versa. Confounding variables are a subset of third variables which cause disease. They are those causal third variables which are associated with the exposure variable in the statistical target population. Later when we talk about stratification we will try to make this distinction between confounding and effect modifying variables clearer. It is a very key distinction for making valid inferences from epidemiological data.

What should you do to increase the validity of your study findings:

First you need to identify all potential sources of sample selection bias, measurement or classification errors, information bias, and confounding. Then you need to identify statistical target populations where these causes of invalid estimations are minimized. For example it is common to choose a study population that is relatively homogeneous so that there are unlikely to be much variation in the levels of unmeasured confounding variables. When there is no variation in unmeasured confounding variables, they cannot confound your relationships of interest. For example variability in immunity to the agents causing otitis might be greater in older children. Those older children who are most exposed to a risk factor like day care may have had the most chance to acquire immunity. If you are focusing on the risks generated by day care, you might want to restrict your statistical target population to the ages in which children usually experience their first infections with the agents causing otitis in order to reduce unmeasured variation due to immunity. Of course you have to watch out that you do not restrict your study population to a group that has no variation in your exposure variables of interest. For example, if you restrict a study of partnership patterns on STD risks to members of a swinger's club, it may be hard to study the effects of factors affecting the choice of partners because everyone has the same partners.

Next you need to identify all sampling procedures that might reduce the chances of sample selection bias. For example, in a case-control sample design, if you have a list of incident cases from a population and a list of non-cases from that same population, you can use those lists as the basis for a formalized randomization procedure that will not be subject to the high likelihood of sample biases that accompanies more haphazard selection of cases and controls. Of course, quite often procedures such as two staged sampling designs that decrease the chance of selection bias greatly increase the cost of a study so that you would only have enough money to study a small number of subjects and thus be left with quite imprecise estimates.

Next you need to identify all the methods of observation that will reduce error. Once again, however, you have important tradeoffs to consider. Not only will the use of an invasive procedure like a nasopharyngeal culture increase the cost of a study, it will also increase the rate of refusal to participate in your study. Perhaps that refusal will be biased by exposure and outcome frequencies in ways that could greatly increase the chance of selection bias. Thus trying to reduce information biases could increase sample selection biases.

Once you have identified confounding variables, you can reduce uncontrolled confounding by measuring as many of them as you can. Again, however, there are tradeoffs. If you include too many variables on your questionnaire, some study subjects will just throw them in the wastebasket and if this wastebasket behavior is associated with your exposure or outcome variables, then you will once again be creating important selection biases. Keeping questionnaires to one page is an important strategy for reducing sample selection bias on mailed questionnaires.

What should you do to increase the precision of your study findings:

First identify statistical target populations that have exposure and disease frequencies close to 0.5. If you are studying continuous variables, choose statistical target populations that have a wide variance of values on that variable.

When you are unable to find populations with appropriate rates of exposure or disease, choose a sample design that will bring overall frequencies closer to 0.5. For example, given a rare disease, sample by outcome strata. (that is, select a case-control sample) Given a rare exposure, sample by exposure strata.

In choosing a statistical target population, choose one that has a high rate of other exposures that act in the same pathogenic processes as your variable of interest. For example, in studying the effect of casual partners on STD risks, it might be advantageous to study a population that does not have a lot of immunity and that does not use a lot of condoms. Then each encounter with a casual partner will have greater risk. The Finnish study of pacifier use may have been particularly advantageous in detecting a pacifier effect because it was conducted in a population of children in day care and exposure to the agent may be an essential step in the pathway to acute otitis related to pacifier use.

Conversely, choose a statistical target population that does not have a lot of risk that acts completely independently of your exposure of interest. For example, in studying the effect of condom use with casual partners, do not choose a population where there are many regular partners who have a high rate of infection and with whom condoms are not used. Another example would be if you were studying the effects of different types of partners on non-specific vaginitis, it would not be good to study a population that has a high rate of vaginitis due to the use of commercial douches. If you wanted to study the effect of pneumococcal vaccines on otitis, you would want to reduce the background of cases due to other agents. This might be done by defining your outcome as bacteriologically proven pneumococcal otitis. But most likely this would be impractical so it would be ideal to find a population with the highest relative frequency of pneumococcal otitis to the other agents. Quite likely specific age ranges with specific exposures could be defined in this regard. Note that just the population with the highest frequency of pneumococcal otitis may not be ideal if that population has even higher frequencies of otitis due to the other agents.

Finally, do all you can to reduce your cost per study subject to a minimum. That way given your fixed budget you will be able to study the maximum number of subjects. Be careful, however, in reducing costs that you do not let in sample selection biases or information biases that are going to invalidate your results.

What should you do to increase the generalizability of your study findings:

First you should consider alternatives in the choice of a statistical target population that might enhance your ability to generalize. For example in a study of condom use in students, one may decide to study both students who are U.S. residents and students from foreign lands in order to increase the breadth of your study population. The trouble with expanding the breadth of your statistical target population, however, is that when your study population is more diverse, there may be more chance for uncontrolled confounding and it may be more difficult to develop questionnaires to which the various different population segments will provide valid responses.

Generalizability of studies to minority populations has been an issue of great importance. NIH now has requirements for the inclusion of minority populations in studies. That means that in the tradeoff discussed above between generalizability and validity or precision, political considerations now dictate that when your decision involves whether or not to include minority populations, you must make this tradeoff in the direction of generalizability. By including minority populations in the study, statistical generalizability to a population that includes minorities is more justified.

But more important than statistical generalizability to a population that includes minorities might be the ability to make scientific generalizations from the study population to minority populations. Many times generalizability will be more dependent upon the quality of scientific inferences than upon the nature of the statistical target population. By including minority populations it may be possible to detect or at least suggest modification of effects by minority status. This may lead to the identification of the specific causes that modify the effect and thereby to greater scientific understanding and greater generalizability.

Another step in maximizing generalizability is to identify all potentially effect modifying variables so that you can measure as many of them as possible. If you can identify and describe all effect modifiers, then you can generalize your results to any population where the frequency of these effect modifiers is known. Of course if you include too many variables in your study your subjects are just going to throw your questionnaire in the wastebasket rather than taking the time to fill it out. That will hurt both validity and precision.

What should you do to increase the importance of your study:

The principal thing you must do to increase the importance of your study is attain a thorough understanding of all relevant theory and facts with regard to the issue you will address. Then you must assess how your study can contribute to that body of facts and theory. Finally, you must decide how that body of theory and facts relates to issues of public health importance.

You don't do this as a linear process starting from the contribution your study could make to a body of theory and then the contribution of this theory to public health. You generally bounce back and forth.

As you increase your understanding of relevant theory, you get new ideas for your study. These new ideas then force you to explore new aspects about what is known or hypothesized about your subject matter.

Study design decisions affecting tradeoffs

Now that we understand different aspects of the desired qualities of studies and what factors affect them, we need to consider different types of decisions made in designing studies and examine how these might affect the desired qualities of studies. Each of the following classes of study design decision can affect tradeoffs between the different desired qualities of studies. The types of study design decisions we will consider here are:

a) Variables to be studied.

b) Measurement procedures to be used.

c) Relationships to be studied. (Epidemiological parameters to be estimated.)

d) Statistical target population to be sampled.

e) Sampling procedures to be used.

Examples can be organized by aspects of the five study design decisions or by the six tradeoffs in study outcome aspects. There are in total 30 decision-tradeoff examples for which we might discuss examples. We will just pick of few of the prominent ones here. Note that the a), b), c), d), and e) subheadings below correspond to above list of study design decisions. We have not considered all of the elements of this list in each case so that we have a, c, and d for 1 rather than a, b, and c.

1) Importance-Validity:

a) Variables:

The most important variables in the Public Health sense may be the most difficult to measure or may be strongly related to confounders that are difficult or impossible to measure. Thus the parameter estimates relative to them might have little validity. This problem seems to be especially frequent in social epidemiology. The choice of whether to study peer influences or knowledge as behavioral influences is a case in point. Peer influences might theoretically be more important determinants. They can have a big effect on behavior. Peers, however, may have been chosen under the influence of factors affecting behavior which are hard to measure. Moreover peer influences might be much more difficult to measure than knowledge, and they might be very highly correlated with family influences which also affect behavior. Thus even though knowledge might have a less direct impact on behaviors and might not be as important to intervene upon, knowledge might be more easily related to behaviors than peer influences. Thus you might choose to study knowledge instead of the more important peer influence. I

Another example from social epidemiology is the choice to study intentions to engage in behavior given specific circumstances versus studying behavior itself. Behavior itself may be the more important outcome. But it may be difficult to validly assess the behavioral outcomes of specific influences. Behavior may be so conditioned by the immediate circumstances in which it occurs that it is hard to assess the influence of determinants of interest. For example if one wants to study the influence of knowledge on the use of condoms, it may be that knowledge is so correlated with the immediate conditions in which people find themselves and that these may be so difficult to assess that it will be difficult to get a valid estimate of the influence of knowledge on behavior. While the influence of knowledge on behavioral intentions may be a less important relationship to study, it might be studied in a more valid fashion.

c) Relationships to be Studied:

Frequently it may be more important to measure quantitative effects in terms of risk differences rather than just associations in terms of risk ratios. The use of risk differences for example allows one to balance risks versus benefits in personal decisions such as whether or not to give your child a pacifier but the use of risk ratios does not allow one to calculate an appropriate balance. It may be more possible, however, to validly estimate ratios. The valid estimate of ratios might be more feasible because one could use a case control sample to estimate the ratio measure. While a non-population based case-control study will have a greater likelihood of sample selection biases which could affect validity, it might be possible in such a study to adequately control for confounders. Besides importance and validity, precision plays a role in the tradeoff. The case control study may allow so much more precise stratified estimates of effect that no stratified estimates will be at all feasible using a cross sectional sampling procedure. Since the stratified estimate controls for confounding it will be more valid than the crude estimate.

Another issue relates to the validity of scientific generalizations and not just the validity of the estimates themselves. It may be more possible scientifically generalize a ratio estimate than a difference estimate because factors that modify the difference measure might not modify the ratio measure.

d) Target populations:

The choice of population to study may not only involve the type of tradeoffs between generalizability and validity we discussed with regard to including minority populations, it might involve an importance-validity issue as well. An assessment of the effect of pacifier use in a random sample of the U.S. population might be quite important for deciding on the recommendations to be made on a national level regarding pacifier use. On the other hand, the diversity of the sample may increase the chance that uncontrolled confounding variables could generate observed associations between pacifier use and otitis.

Another choice in this regard involves ease of measurement as well as the above issue of confounding. Exposure-disease relationships might be most validly estimated in populations which for some reason have no important confounders and in which very accurate measurements are available or obtainable. These may not be the populations which are key in a public health sense, however. For example, finding the determinants of knowledge about HIV transmission might be more validly done in U of M students than in inner city Detroit teenager members of drug using gangs. But it may be more important to determine the latter.

2) Precision-Validity:

This is the tradeoff which has received the most attention in the epidemiological literature. Part of the reason for this attention is that statisticians can readily relate to this tradeoff.. It has clear quantitative formulations. It is discussed in some of your readings both by Schlesselman and Rothman. Indeed this is a very important tradeoff, but as a determinant of the success of epidemiological studies, its consideration holds no primacy over the other tradeoffs.

b) Variable measurement procedures

Variable measurement procedures that have the least error (and therefore will give the most valid results) may require so much effort that very few observations can be made. That means imprecise parameter estimates. For example, choosing a history of angina as the variable in a study of the determinants of heart disease as opposed to choosing the degree of coronary artery narrowing as an outcome may mean that many other factors which influence pain perception or response may be affecting your relationships of interest. But you will be able to get many more histories for angina than you will be able to get cardiac catheterization to measure coronary artery narrowing. Similarly measuring otitis outcome by culturing middle ear fluid would add great specificity to the outcome measurement and in so doing increase both the precision and validity of exposure disease relationships. The small number of people this could be done in, however, might quite considerably decrease precision and if that reduced number is a biased fraction of the whole, it could also reduce validity.

d) Statistical target population

One target population might have much higher levels of an exposure than another allowing for more precise RRs from a cross sectional sample design than in a population with a more rare exposure. The factor that led to the increased exposure, however, might also have led to a confounding factor that cannot be well controlled. For example, it might be that one ethnic group almost always uses condoms while another uses them with only 50% of partners. If you want to study the effects of condoms on STD risk, you will have more power in the second ethnic group. The second ethnic group, however, may have a lot of previously acquired immunity which is difficult to measure and could be confounding any of the relationships that you want to examine. Likewise it might be worthwhile to study a population with a very high incidence of otitits media. But a high incidence of otitis might also mean a high frequency of immunity and this confounding factor may cause a loss of validity which is greater than the gain in precision.

In the pediatric practices, one has the choice of restricting the statistical target population to the patients enrolled in HMO's. In the general practice populations there might be a set of children with a tendency to seek care in emergency rooms and also with a tendency toward more pacifier use. Thus using the restricted population in a case-control sampling mode would yield more valid results because the cases who might seek care in emergency rooms could be detected. But the HMO population is only a fraction of the population and the total numbers may mean that the precision of the estimates possible in this fraction of the population is not great.

e) Sampling procedures

Sample stratification procedures that even up diseased groups, such as case control study designs, often result in more precision, but they might greatly increase the chances of bias. Selection biases are a considerable risk in case-control sampling procedures, especially those that do not proceed from a pre-defined population base where all cases have been ascertained. A population based case-control sampling procedure will almost always suffer from less sample selection bias than a non-population based case-control sampling procedure such as a hospital or clinic based case-control sampling procedure. The population based case-control sample, however, might require many times the effort of a hospital based case-control sample. That means that a population based case-control sample will have fewer observations and less precise estimates of exposure-disease associations than a clinic based case-control sample. The greater precision of the clinic based sample, however, is bought at the cost of potential loss of validity from the many selection biases that are inherent to clinic based case control samples.

One might debate using a cross sectional sampling frame or an outcome stratified sampling frame for studying otitis in a clinical practice. In the case-control sampling procedure one would get a list of patients with a recent diagnosis of otitis and another list without such a diagnosis. In the cross sectional sampling design the study subjects would be selected from the practice lists without regard to whether or not they had a diagnosis. The cross sectional sampling frame might capture a set of otitis cases that occur in families who seek care for their kids in emergency rooms rather than the practice and who also have a higher tendency to use pacifiers. Thus this sampling procedure will have fewer biases than a case-control sampling procedure. But if the same number of individuals are interviewed, far fewer cases of otitis will occur in the cross sectional sampling frame. Thus the validity gained by picking up the cases where parents sought care in emergency rooms for their children's ear infections would be gained at a great expense in precision.

3) Precision-Generalizability;

d) Statistical target population

A target population that is atypical of other populations may have some characteristic, such as preexisting records, that facilitates data collection. The greater number of observations possible in such a population will increase the precision of parameter estimates. The differences between this population and other populations, however, might make the results less generalizable. For example it may be possible to identify subsets of the pediatric population whose insurance arrangements include documentation of day care arrangements. A subset of patients who are University employees might have this characteristic. Because the data is already available, at a given cost it might be possible to ascertain day care arrangements for many more individuals in this population than for the overall practice population where day care histories would have to be obtained by questioning the parents. The overall practice population might be more representative of the populations to which you would like to generalize results. But for the same amount of money you could get more precise estimates if you used the population with established records.

e) Sampling procedures

Stratifying a sample on levels of a third variable may facilitate generalizations in two ways. First, Risk Differences or Risk Ratios may be observed to be constant across strata. In that case one has evidence that there are few modifying factors for the parameter concerned. This provides support for generalizing results to other populations. Second, stratification may result in the quantification of how much modification results from key variables through the estimation of strata specific effects. If one has strata specific estimates and data on the strata levels in the population to which one is generalizing results, then generalizations can be made within strata levels. But the effort and cost of stratification will reduce the total number of observations and therefore the precision of the parameter estimates.

4) Precision-Importance:

a) Variables to be studied

The variable on which it is easiest to get a large number of observations may be the variable with the least importance. Choosing the variable where you can get large numbers will then increase precision at the cost of importance. Some of the same variable choice issues discussed above with regard to the validity-importance tradeoff also involve a precision-importance tradeoff. For example, in some cases it may be more difficult to gather data on actual behavior as compared to intent to undertake behaviors. It may be more important, however, to document the actual behavior. For otitis media, many more important inferences might be made if the outcome variable were documented infection with a specific agent rather than symptoms of infection. But few outcomes could be observed using this outcome variable. Thus this same decision which involves a validity-importance tradeoff also involves a precision-importance tradeoff.

d) Statistical target population

Some target populations may offer greater ease of data collection, but they may not be the populations which are most important to study. The greater ease of data collection would permit the study of a greater number of individuals and would thereby lead to more precise parameter estimates. An example from my research involves estimating transmission probabilities. It is easier to assess HIV transmission probabilities in couples having stable relationships. It is more important, however, to know the transmission probabilities in those having numerous partners. Another example is that it is easier to assess STD risk factor effects in students than in drop outs. It is more important, however, to assess these risk factor effects in the dropouts. Both of the issues just discussed also involve to some extent the issue of generalizability. If it is possible to generalize findings from the more easily studied population to the more important population, then the findings in either population are equally important and the tradeoff should go in the direction of studying the population where the most precise (and at the same time probably the most valid) estimates can be obtained. To the extent that generalizations cannot be made, the weight of this tradeoff moves to studying the more important population.

e) Sampling procedures

By choosing an outcome stratified sample (a case-control sample), one can increase the precision of an OR estimation relating exposure and disease. But being able to estimate attributable risks might be a more important task. Thus because of the nature of the parameters that one can estimate from different sampling designs, the gain in precision of estimating the exposure-disease relationship that one can gain by using case-control sampling is counteracted by a loss in the importance of the type of relationship that one can estimate.

5) Validity-Generalizability:

d) Statistical target population

Special populations may offer more accurate information and less chance of confounding, but they might be quite different from the usual population to which results might be applied. For example one can estimate the duration of asymptomatic stages of HIV infection better in groups infected from blood transfusions because the time of infection is known in these groups. Individuals in this risk group, however, are very different than the majority of HIV infected individuals.

Relationships between pacifier use and otitis might be more validly estimated in an HMO population as compared to a fee for service population. But that could create problems generalizing results to non-HMO populations.

e) Sampling procedures

Stratification of a sample on some third variable, such as age, usually increases both Validity and Generalizability of estimates exposure-disease association because confounders are controlled and effect modification is described. The Precision of the estimates will be decreased, however, because the expense involved in stratifying on the third variable will decrease the number of subjects that can be studied. In addition estimates that summarize across strata have wider confidence intervals than crude estimates which presume that the strata do not make any difference.

6) Importance-Generalizability:

Generally we might feel that study findings which are more generalizable are more important. Therefore there are not a lot of tradeoffs here.

a) Variables to be studied

Mental health may be one of the most important outcome variables in terms of the amount of human misery caused. The effects of different factors on mental health may be highly dependent upon cultural contexts, however, so that one could not expect the finding of an important determinant of mental health, such as the organized provision of father figures for fatherless children, to have very generalizable effects from one population to another.

The process of considering tradeoffs

Having a logical structure for considering tradeoffs might lead to a more formal and thorough consideration of them in the process of study design. While I have never formally set up a structure of each tradeoff for each study decision, I do believe that constantly keeping the nature of these tradeoffs in mind results in better designed studies.

I would greatly appreciate it if each student would e-mail me with tradeoffs that they considered so that I might include them as examples to be included in this document. I would especially appreciate it if Dr. Schottenfeld's students would do this.

Cancer Group Class Preparation:

For Friday's class, the endometrial and breast cancer groups should be ready to discuss issues of validity, precision, and importance and the tradeoffs between these in choosing between the following two protocols:

  1. Cases are selected from patients treated at University of Michigan Hospital and newly diagnosed between 1995 and 97. Controls are selected from random digit dialing of Michigan residents. The controls are frequency or individually matched to cases by age and race. A mailed questionnaire is used for cases and telephone interviews are used for controls.
  2. Cases are selected from a population cancer incidence registry. Controls are randomly selected from the same source population as the cases matching for age and race. Questionnaire administration procedures could be similar to those used for protocol A.

Be sure you can state exactly what validity, precision, and importance would refer to in these two protocols.

Otitis Group Class Preparation

For Friday's class, the otitis groups should be ready to discuss issues of validity, precision, and importance and the tradeoffs between these in choosing between the following two protocols:

  1. A cross sectional sample of 1,000 infants born during Nov. or Dec. of 1996 has a questionnaire mailed to the infant's parents which deals with timing and characteristics of the first otitis the child ever experienced, day care attendance, pacifier use, and feeding. For those who do not send back either a questionnaire or a refusal to participate card, follow up phone calls encourage return of the questionnaire by those who still have the questionnaire. For those who have lost or discarded their questionnaire, a phone version of the questionnaire is administered.
  2. A list of all children attending pediatric clinics from May through December of 1997 is used to select otitis cases and controls. The sample drawn consists of 300 infants below one year of age who were diagnosed with otitis and a sample of 300 infants of comparable ages who did not experience otitis. The parents are phoned and interviews are conducted dealing with timing and characteristics of the first otitis the child ever experienced, day care attendance, pacifier use, and feeding.

Be sure you can state exactly what validity, precision, and importance would refer to in these two protocols.