JVDI
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Journal of Veterinary Diagnostic Investigation Vol. 21 Issue 1, 3-14
Copyright © 2009 by the American Association of Veterinary Laboratory Diagnosticians
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Fosgate, G. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fosgate, G. T.

Review/Special Articles

Practical sample size calculations for surveillance and diagnostic investigations

Geoffrey T. Fosgate1

Correspondence: 1Corresponding Author: Geoffrey T. Fosgate, Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX 77843. gfosgate{at}cvm.tamu.edu


    Abstract
 TOP
 Sources and manufacturers
 Abstract
 Introduction
 Epidemiologic Errors
 Sample Size Adjustment Factors
 Sample Size Situations
 Conclusions
 References
 
The likelihood that a study will yield statistically significant results depends on the chosen sample size. Surveillance and diagnostic situations that require sample size calculations include certification of disease freedom, estimation of diagnostic accuracy, comparison of diagnostic accuracy, and determining equivalency of test accuracy. Reasons for inadequately sized studies that do not achieve statistical significance include failure to perform sample size calculations, selecting sample size based on convenience, insufficient funding for the study, and inefficient utilization of available funding. Sample sizes are directly dependent on the assumptions used for their calculation. Investigators must first specify the likely values of the parameters that they wish to estimate as their best guess prior to study initiation. They further need to define the desired precision of the estimate and allowable error levels. Type I (alpha) and type II (beta) errors are the errors associated with rejection of the null hypothesis when it is true and the nonrejection of the null hypothesis when it is false (a specific alternative hypothesis is true), respectively. Calculated sample sizes should be increased by the number of animals that are expected to be lost over the course of the study. Free software routines are available to calculate the necessary sample sizes for many surveillance and diagnostic situations. The objectives of the present article are to briefly discuss the statistical theory behind sample size calculations and provide practical tools and instruction for their calculation.

Key Words: Diagnostic testing • epidemiology • sample size • study design • surveillance


    Introduction
 TOP
 Sources and manufacturers
 Abstract
 Introduction
 Epidemiologic Errors
 Sample Size Adjustment Factors
 Sample Size Situations
 Conclusions
 References
 
Calculation of sample size is important for the design of epidemiologic studies,18,62 and specifically for surveillance9 and diagnostic test evaluations.6,22,32 The probability that a completed study will yield statistically significant results depends on the choice of sample size assumptions and the statistical model used to make calculations. The statistical methodology of employed sample size calculations should parallel the proposed data analysis to the extent possible.18 The most frequently chosen sample size routines are based on frequentist statistics, and these have been reviewed previously for other fields.1,11,20,33,35,36,50,54,61,62 Issues specifically related to diagnostic test validation also have been discussed.2,28,42,48 Sample size routines related to issues of surveillance, as well as diagnostic test validation using Bayesian methodology, also have been developed.7,56

Surveillance and diagnostic situations that require sample size calculations include the detection of disease in a population to certify disease freedom, estimation of diagnostic accuracy, comparison of diagnostic accuracy among competing assays, and equivalency testing of assays. The appropriate sample size depends on the study purpose, and no calculations can be made until study objectives have been defined clearly. Sample size calculations are important because they require investigators to clearly define the expected outcome of investigations, encourage development of recruitment goals and a budget, and discourage the implementation of small, inconclusive studies. Common sample size mistakes include not performing any calculations, making unrealistic assumptions, failing to account for potential losses during the study, and failing to investigate sample sizes over a range of assumptions. Reasons for inadequately sized studies that do not achieve statistical significance include failing to perform sample size calculations, selecting sample size based on convenience, failing to secure sufficient funding for the project, and not using available funding efficiently.

There is no single correct sample size "answer" for any given epidemiologic study objective or biologic question. Calculated sizes depend on assumptions made during their calculation, and such assumptions cannot be known with certainty. If assumptions were known to be true with certainty, then the study that is being designed will likely not add to the scientific understanding of the problem. There are concepts that are important to consider when performing sample size calculations, despite the inability to classify certain results as correct or incorrect. A few simple formulas are generally sufficient for most sample size situations that would be encountered in the design of studies to determine disease freedom and evaluate diagnostic tests. The objectives of the present article are to briefly discuss the statistical theory behind sample size calculations and provide practical tools and instruction for their calculation. This review will only discuss issues related to frequentist approaches to sample size calculation and will emphasize conservative methods that result in larger sample sizes.


    Epidemiologic Errors
 TOP
 Sources and manufacturers
 Abstract
 Introduction
 Epidemiologic Errors
 Sample Size Adjustment Factors
 Sample Size Situations
 Conclusions
 References
 
The current presentation of statistical results in the medical literature tends to be a blending of significance testing attributed to the work of Fisher,21 subsequently discussed by others,29,55 and hypothesis testing as attributed to Neyman and Pearson.46,47 The P value in the Fisher significance testing approach is considered a quantitative value documenting the level of evidence for or against the null hypothesis. The P value is formally defined as the probability of observing the current data or more extreme when the null hypothesis is true. The hypothesis testing approach as introduced by Neyman and Pearson was based on rejection or acceptance of null hypotheses using specified P value cutoffs. The hypothesis testing interpretation of statistical results allows for the definition of type I and type II errors as the errors associated with rejection of the null hypothesis when it is indeed true and the acceptance of the null hypothesis when it is false (and a particular alternative hypothesis is true), respectively.47 The probabilities of making these errors are frequently referred to as alpha ({alpha}) and beta (β) for type I and type II errors, respectively.36,54 Current sample size procedures are derived from the hypothesis testing approach as put forth by Neyman and Pearson; however, current convention is to use the terminology of "failure to reject" rather than acceptance of a null hypothesis.

A requirement for sample size calculation is the specification of alpha and beta when considering the testing of a statistical hypothesis (Table 1). Precision-based sample size methods must specify alpha, but beta is not included in the equations and, based on the typical large-sample approximation methods, is consequently assumed to be 50% for the alternative hypothesis that the true value falls outside the limits of the calculated confidence interval.17 The P value obtained after statistical analysis will equal the prespecified alpha if the assumptions of the sample size calculations are observed exactly in the collected data due to their similar probabilistic definition. However, the meaning of beta is often misunderstood as simply "the probability of accepting the null hypothesis when a true difference exists" based on presentations in tables and figures.11,33,36,54 The issue is that there are an infinite number of specific alternative hypotheses that could be true if the null hypothesis is false, and many will be less probable than the null itself. Beta can be calculated only after an explicit alternative hypothesis has been specified. The hypothesis that is chosen during sample size calculation is the expected difference between the population values. Alpha and beta correspond to areas under sampling distributions for population means (including proportions) under the null and alternative hypotheses, respectively (Fig. 1). The statistical power of a test is defined as 1 – β or the probability of rejecting the null hypothesis when the alternative hypothesis is true.


View this table:
[in this window]
[in a new window]

 
Table 1 Definition of type I and type II errors.*

 

Figure 01
View larger version (9K):
[in this window]
[in a new window]

 
Figure 1 Sampling distributions presented under the null (HO, black line) and alternative (HA, gray line) hypotheses. Black shaded area corresponds to alpha (type I error), and gray shaded area corresponds to beta (type II error). Sample size calculations solve for the number so that the critical value (cv) corresponds to the location, where Pr Z ≤ z = 1 – {alpha}/2 and Pr Z ≤ z = β (or Pr Z ≤ z = 1 – {alpha} for a 1-sided test).

 

    Sample Size Adjustment Factors
 TOP
 Sources and manufacturers
 Abstract
 Introduction
 Epidemiologic Errors
 Sample Size Adjustment Factors
 Sample Size Situations
 Conclusions
 References
 
Sample size calculations are often based on large-sample approximation methods.24,30,51 The quality of the approximate results depends on the specific sample size situation, and adjustment factors have been developed to improve their approximation to exact distributions. Some of the typical adjustment factors include the finite population, continuity correction, and variance inflation factors.

The finite population correction factor4,19 is typically considered when the study objective is to estimate a population proportion. Typically, sampling without replacement is performed, and if the sample size is relatively large compared with the total population, then this correction factor should be considered. A typical recommendation is to employ this factor when the sample includes 10% or more of the population.19 The need for this correction factor is derived from the fact that sampling is hypergeometric (sampling without replacement, as from a deck of cards), and sample size formulas are based on binomial (sampling with replacement) theory. The formula19 for the correction is


Formula 21

(21)
where n is the corrected sample size, and n and N are the uncorrected and population sizes, respectively. Applying this correction factor causes the sample size to be smaller than the uncorrected. Confidence interval algorithms have been developed based on hypergeometric sampling,52 but the author is not aware of their availability in statistical packages. Usual confidence interval calculation methods are based on either normal approximation or exact binomial methodology. Application of the finite population correction factor is only recommended by the author when the analysis also incorporates adjustment for hypergeometric sampling.

The continuity correction factor51 is employed when the study objective is to compare 2 population proportions (including diagnostic sensitivity or specificity). The difference in proportions is approximated by a normal distribution in typical sample size formulas, even though binomial distributions are discrete and normal distributions are continuous. The normal approximation might not always be adequate, and continuity correction should be applied to better approximate the exact distribution (Fig. 2). The formula25 for continuity correction is


Formula 21

(21)
where n and n are the corrected and uncorrected sample sizes, respectively, and P1 and P2 are the hypothesized proportions. Applying the continuity correction increases the sample size over the uncorrected and should typically be applied. Frequently employed methods for the comparison of proportions use continuity correction when calculating chi-square test statistics.8,58


Figure 02
View larger version (18K):
[in this window]
[in a new window]

 
Figure 2 Cumulative probability function for a binomial distribution (n = 12, p = 0.5) (gray shading) overlaid with the corresponding cumulative normal distribution (µ = 6, {sigma} = 1.732) denoting the uncorrected (A; probability = 0.500 ) and continuity-corrected (B; probability = 0.614) probability for observing 6 successes. Continuity correction improves the approximation of the true binomial cumulative probability for observing 6 successes, which is 0.613.

 
Sample size calculations for estimating proportions typically involve making the assumption of independence among sampling units. Lack of independence that is introduced when a clustered sampling design is employed can be adjusted by inflating the variance estimate. The design effect (DE),10,38 or variance inflation factor, is defined as the variance of the sampling design compared with simple random sampling. The formula10,45,59 for its calculation is


Formula 21

(21)
where {rho} is the intraclass correlation, and m is the sample size within each cluster. When clustered sampling is employed, then the sample size estimated by the usual methods assuming independence is multiplied by the DE to account for the expected dependence.

The intraclass correlation is a relative measure of the homogeneity of sampling units within each cluster compared with a randomly selected sampling unit. This correlation is formally defined as the proportion of total variation within sampling units that can be accounted by variation among clusters.38,45 A high correlation indicates more dependence within the data, resulting in a larger DE. The intraclass correlation is generally estimated from pilot data or based on estimates available from the literature. If the number of clusters is fixed by design and the cluster sample size is unknown, then it is not possible to simply use the previously mentioned formula for the DE. The sample size per cluster (m) must first be estimated, and it is based on the effective sample size (ESS), which is the sample size estimated assuming independence. It is also necessary to know the number of clusters (k) and the intraclass correlation ({rho}). The formula27 for calculation of the cluster sample size is


Formula 21

(21)


    Sample Size Situations
 TOP
 Sources and manufacturers
 Abstract
 Introduction
 Epidemiologic Errors
 Sample Size Adjustment Factors
 Sample Size Situations
 Conclusions
 References
 
Surveillance or Detection of Disease
The detection of disease in a population is important for herd certification programs and for documenting freedom from disease after an outbreak. It has implications in regional and international trade of animals and animal products. The first step is to determine the prevalence of disease that is important to detect. A prevalence of disease at this level or greater is considered biologically important. Documenting a zero prevalence of disease is not typically possible because it would require testing the entire population with a perfect assay. The next step is to define the level of confidence for which it is desired to find the disease should it be present in the population at the hypothesized prevalence or higher. Again, 100% confidence is not feasible because it would require sampling all animals and testing with a perfect assay. Alpha is calculated as 1 – confidence. The final step is to determine the statistical model to use for calculations. In small populations, sample size calculations should be based on a hypergeometric distribution (sampling without replacement). In larger populations, it is often assumed that the true hypergeometric distribution can be well approximated by the binomial (sampling with replacement). The sample size formula assuming a binomial model is based on the following relationship: (1 – p)n = (1 – confidence). The formula19 after solving for the sample size is


Formula 21

(21)
where {alpha} is 1 – confidence, and p is the prevalence worth detecting. The corresponding formula19 based on hypergeometric sampling is


Formula 21

(21)
where {alpha} is 1 – confidence, N is the population size, and D is the expected number of diseased animals in the population.

The necessary sample size for various combinations of prevalence and confidence can be tabulated (Table 2), and software is available that will perform the necessary calculations. Survey Toolboxa can perform these calculations and is available free for download. The software performs calculations based on both binomial and hypergeometric sampling and can also adjust for imperfect sensitivity and specificity of employed tests.


View this table:
[in this window]
[in a new window]

 
Table 2 Number needed for study to be confident that the disease will be detected if present at or above a specified prevalence based on hypergeometric sampling and assuming a perfect test.

 
An example of this type of sample size problem is illustrated by the regulatory agency in Texas when it decided to perform active surveillance for bovine tuberculosis (Table 3). There are approximately 7,650 registered pure-bred beef seed stock producers in Texas, and it was decided that a herd-level prevalence of 0.001 (1 in 1,000 herds infected) or greater was important to detect with 95% confidence. Survey Toolbox can be used to solve this sample size problem. From the menu, choose Freedom From Disease -> Sample Size. Click on the Sample Size tab and input 100 for the sensitivity and specificity of the test. Change the population size to 7,650 and set the minimum expected prevalence to %0.1. Click on the Options tab and be sure that the type I error is set at 0.05. Click to have the program calculate the sample size based on the simple binomial model. No other changes are necessary. Go back to the Sample Size tab and click on the Calculate button. The sample size based on the hypergeometric model can be calculated by changing to the Modified Hypergeometric Exact on the Options tab before clicking on the Calculate button.


View this table:
[in this window]
[in a new window]

 
Table 3 Sample size situation for the detection of bovine tuberculosis (TB) in beef cattle herds.

 
A binomial model suggested that the necessary sample size would be 2,994 of the 7,650 beef operations (39%). The interpretation is that assuming that the true prevalence is at least 0.001, then a sample consisting only of noninfected herds would occur 5% of the time or less when the sample size is 2,994 (assuming a perfect test at the herd level). The hypergeometric model might be more appropriate, because sampling would be from a finite population without replacement; using the hypergeometric formula, the sample size is 2,388 herds (31%) of the 7,650 total.

Estimation of a Population Proportion
Calculating the sample size necessary to estimate a population proportion is important when an estimate of disease prevalence or diagnostic test validation is desired. The sensitivity and specificity of an assay should be considered population estimates in the same manner as other proportions. The sample size formulas employed for these calculations are typically considered to be precision based because they involve finding confidence intervals of a specified width rather than testing hypotheses. The typical sample size formula37,58 based on the normal approximation to the binomial is


Formula 21

(21)
where P is the expected proportion (e.g., diagnostic sensitivity), e is one half the desired width of the confidence interval, and Z1–{alpha}/2 is the standard normal Z value corresponding to a cumulative probability of 1 – {alpha}/2. The investigator must specify a best guess for the proportion that is expected to be found after performing the study. The investigator also needs to specify the desired width of the interval around this proportion and the level of confidence. In essence, this procedure will find the sample size that, upon statistical analysis, would result in a confidence interval with the specified probability and limits if the assumed proportion were in fact observed by the study (Fig. 3). The resulting sample size could be adjusted using the finite population correction factor, and if this is performed then the statistical analysis should be similarly adjusted at the end of the study. Sample sizes calculated using formulas should always be rounded up to the nearest whole number.


Figure 03
View larger version (7K):
[in this window]
[in a new window]

 
Figure 3 The sample size is determined so that the sampling distribution of the hypothesized proportion (P) has an area under the curve between the specified upper (PU) and lower (PL) bounds of the confidence interval equal to the specified probability (grade shaded area); Pr(PL ≤ P ≤ PU) = confidence level.

 
The sample size methods based on the normal approximation to the binomial might not be adequate when the expected proportion is close to the boundary values of 0 or 1. Exact binomial methods are preferred when the proportion is expected to fall outside the range of 0.2–0.8.26 The binomial probability function is the basis of exact sample size methods, and it is


Formula 21

(21)
where P is the hypothesized proportion, n is the sample size, and x is the number of observed "successes." Derivation of a sample size algorithm based on the binomial probability function has been described previously.26 It is based on the mid-P adjustment5,41 for the Clopper-Pearson method of exact confidence interval estimation.14 The investigator specifies PU and PL as the desired limits of the confidence interval around the hypothesized proportion (P) and the desired level of confidence. The calculated sample size could be adjusted using the finite population correction factor if deemed appropriate.

Software is available to calculate the necessary sample size for estimating population proportions. Epi Infob includes software that can perform these calculations33 and is available free for download. The software performs calculations based on normal approximation methods and will apply the finite population correction factor if the population size is specified. Software to perform calculations based on binomial exact methods (Mid-P Sample Size routine) can be obtained by contacting the author.

An example of this type of sample size problem is the design of a study to estimate the diagnostic specificity of a new assay to screen healthy cattle for Foot-and-mouth disease virus (FMDV; Table 4). The number of cattle necessary to sample could be calculated for an expected specificity of 0.99 and the desire to estimate this specificity ±0.01 with 99% confidence. For this example, it will be assumed that sampling is from a large population, and a simple random sampling design will be employed. Epi Info 6 can be used to make the calculation based on normal approximation methods (newer versions of Epi Info have not retained presented sample size routines). From the menu, choose Programs -> EPITABLE. From the menus in EPITABLE, choose Sample -> Sample size -> Single proportion. The size of the population does not need to be changed unless the application of the finite population correction is desired. The design effect should be 1.0 unless the variance is to be adjusted for clustered sampling techniques. Enter 1% for the desired precision, 99% for the expected prevalence, and check 1% for alpha. Alternatively, the modified exact sample size routine could be used. Open the Mid-P Sample Size routine and enter 0.99 for the proportion, 0.01 for the error limit, and 0.99 for the confidence level. The normal approximation method suggested that 657 cattle would need to be sampled, whereas the method based on the exact binomial distribution suggested that 974 should be sampled. Neither of these numbers incorporates finite population correction. The sample size based on the exact distribution is preferred, and it is substantially larger than the sample size based on the normal approximation because the expected proportion is very close to 1.


View this table:
[in this window]
[in a new window]

 
Table 4 Sample size situation for estimating the specificity of a test to screen cattle for Foot-and-mouth disease virus (FMDV).

 
Typically, sample size calculations for studies that will perform clustered sampling first calculate the necessary sample size assuming independence or lack of clustering. Calculated sample sizes are then multiplied by the DE to account for the lack of independence. Expert opinion can be used to account for expected correlation of sampling units when prior information concerning the intraclass correlation is not available. A sample size routine incorporating a method to estimate the DE based on expert opinion for a fixed number of clusters has been developed27 and is available from the author.

Comparison of 2 Proportions
Independent Proportions
Calculating the sample size necessary to compare 2 population proportions is important when a comparison of the accuracy of diagnostic tests is desired. Sensitivity and specificity are population estimates, and comparison between 2 assays should be based on this sample size situation. The usual sample size formula13,25,53 based on the normal approximation to the binomial with equal group sizes is


Formula 21

(21)
where P1 and P2 are the expected proportions in each group, and P is the simple average of the expected proportions. Variables Z1–{alpha}/2 and Zβ are the standard normal Z values corresponding to the selected alpha (2-sided test) and beta, respectively. Typical presentation of the formula11,12 above uses Z{alpha}/2 instead and an addition of the 2 components within the numerator. Solving these 2 formulations gives the same sample size because the numerator is squared. The specific formulation has been included here because alternative hypotheses have been presented in figures as being on the positive side of the null hypothesis, and therefore Zβ should be negative. This is also consistent with the algebraic manipulation to solve for Zβ, as presented in the section related to power calculation. The resulting sample size should be adjusted using the continuity correction factor, and all sample sizes should be rounded up to the nearest whole number. The magnitude of the difference between the 2 proportions has a greater effect on calculated sample sizes than typical values for alpha and beta (Fig. 4). The absolute magnitude of the proportions affects the calculations, with proportions closer to 0.5 resulting in larger sample sizes11 because the variance of a proportion is greatest at this value. The formula for the standardized difference (SDiff) in proportions3,36,61 is


Formula 21

(21)
Software is available to calculate the necessary sample size to compare 2 independent population proportions. Epi Infob can be used to perform these calculations. The calculations are based on normal approximation methods and will apply a continuity correction factor. An example of this type of sample size problem is the design of a study to compare the diagnostic sensitivity of magnetic resonance imaging (MRI) for detection of intervertebral disk disease between chondrodystrophoid and nonchondrodystrophoid breeds of dogs (Table 5). The number of dogs necessary to sample could be calculated for expected sensitivities of 90% and 80% in chondrodystrophoid and nonchondrodystrophoid dogs, respectively. The statistical test could be desired to have an alpha of 5% and beta of 20% to detect this difference in proportions. The ratio of chondrodystrophoid to nonchondrodystrophoid dogs also needs to be specified, and the assumption could be made to have equal group sizes. Epi Info 6 can be used to make the calculation. From the menu, choose Programs -> EPITABLE. From the menus in EPITABLE, choose Sample -> Sample size -> Two proportions. The ratio of group 1 to group 2 should be 1, the percentage in group 1 would be 90%, the percentage in group 2 would be 80%, alpha should be 5%, and the power should be set at 80%. Calculations suggest that 219 dogs are necessary in each group (chondrodystrophoid and nonchondrodystrophoid) for a total of 438. The reported sample size includes continuity correction.


Figure 04
View larger version (22K):
[in this window]
[in a new window]

 
Figure 4 Sample size estimates are affected by the standardized difference and the specified alpha (type I error) and beta (type II error).

 

View this table:
[in this window]
[in a new window]

 
Table 5 Sample size situation for comparing the sensitivity of magnetic resonance imaging (MRI) for detection of intervertebral disk disease (IVDD) between chondrodystrophoid and nonchondrodystrophoid breeds of dogs.

 
Sample size calculations for the comparison of proportions when the group sizes are not equal are a simple modification of the presented formula.60 The formula also can be modified to allow for the estimation of odds ratios and risk ratios.20,53 All presented formulas correspond to the necessary sample sizes for 2-sided statistical tests. Variable Z1–{alpha}/2 is replaced with Z1–{alpha} to modify the formula for a 1-sided test.

Dependent Proportions
When multiple tests are performed based on specimens collected from the same animal, then the proportions (i.e., sensitivity and specificity) should be considered dependent. There are multiple conditional and unconditional approaches to solving this sample size problem,15,16,23,39,40,43,44,49,57 and a formula is not presented in this section due to increased complexity and lack of consensus among competing methods. An example of this type of sample size problem is the design of a study to compare the diagnostic specificity of 2 tests for FMDV screening in healthy cattle (Table 6). Serum samples from each selected animal for study will have both tests performed in parallel. The number of cattle necessary to sample could be calculated based on expected specificities of 99% and 95% in test 1 and test 2, respectively. The statistical test could be desired to have alpha be 1% and beta 10% to detect this difference in proportions. Software is available to calculate the necessary sample size to compare 2 dependent population proportions. WinPepic includes software that can perform these calculations and is available free for download. From the main menu, the program PAIRSetc should be selected. Sample size should be chosen from the top menu of PAIRSetc. The correct type of sample size procedure corresponds to the McNemar test, and S1 should therefore be selected. The significance level should be set as 1% and the power as 90%. The expected percentage of "Yes" in the first set of observations should be set as 99%, and the other percentage of "Yes" should be set as 95%. Numbers without "%" should be entered, and the remainder of the input boxes should be left blank. Calculations suggest that 544 pairs of observations are required (544 cattle total). This sample size is smaller than the corresponding sample size if the proportions were considered to be independent.


View this table:
[in this window]
[in a new window]

 
Table 6 Sample size situation for comparing the specificity of 2 tests for Foot-and-mouth disease virus (FMDV) screening in healthy cattle.

 
Epi Info 6 could be used to make the calculation if the paired design was ignored. From the menu, choose Programs -> EPITABLE. From the menus in EPITABLE, choose Sample -> Sample size -> Two proportions. The ratio of group 1 to group 2 should be 1, the percentage in group 1 would be 99%, the percentage in group 2 would be 95%, alpha should be 1%, and the power should be set at 90%. Calculations suggest that 588 cattle are necessary for each group, and this would be the total number of necessary cattle because of the paired design. This sample size is not much different (8% greater) than the calculation based on the paired design, and since it is larger, it would not be necessarily incorrect to use the usual unmatched method for sample size determination.

Equivalency Testing
A study that aims to determine whether or not a certain test has the equivalent (or noninferior) accuracy44 of another, typically well-established test is based on separately comparing sensitivity and specificity between tests. The first step is to consider the sensitivity and specificity of the well-established test and then quantify the level of difference in the accuracy that would be allowable while still considering the 2 tests equivalent or the new test not inferior. It is not possible to calculate a sample size to determine zero difference for the similar reason that it is not possible to calculate a sample size to be 100% sure that a given population has no disease (zero prevalence). An example would be to determine equivalency of a new test to a well-established test that has been reported to be 90% sensitive and 95% specific. Further assumptions could be that as long as the new test is at least 85% sensitive and 90% specific, then it would be considered equivalent. The allowable alpha and beta values could be assumed to be 5% (2-sided) and 20%, respectively. However, power values greater than 80% and larger alpha values are sometimes assumed for equivalency studies.54 Epi Info could be used to calculate the necessary sample size as described previously for 2 independent proportions. If equal group sizes are assumed (for each test), then the necessary sample size is 726 infected animals within each group tested by the 2 tests for sensitivity comparison and 474 uninfected animals within each group for specificity comparison. If a paired design were planned, then these numbers would be a reasonably good estimate for the total number of animals necessary for the evaluation. Often for noninferiority testing a 1-sided statistical test will be employed, and therefore the sample size calculation should be adjusted accordingly. Equivalency testing in general requires large sample sizes, and the discussed example is a simplified situation. Literature related to these studies documents several methods of calculation and varies based on the determination of regions associated with rejection of the null hypothesis of no difference between tests. The simplified example has been presented to give a general idea of how studies should be designed, and interested readers should review the paper by Lu et al.44

Calculation of Power When Sample Size is Fixed
When the sample size is fixed by design, then it is good planning to determine the power of a statistical test to identify a biologically important difference. Estimating the power to compare 2 population proportions is important when it is desired to compare the accuracy of diagnostic tests. The usual formula for calculating the power for this comparison is an algebraic manipulation of the previously presented sample size formula and assuming equal group sizes is


Formula 21

(21)
A modification of the above formula,24 including continuity correction, is


Formula 21

(21)
where n is the sample size, P1 and P2 are the expected proportions in each group, and P is the simple average of the expected proportions. Variables Z1–{alpha}/2 and Zβ are standard normal Z values. Power is determined as 1 – cumulative probability associated with Zβ as calculated from the formula (Table 7). Typical presentations of these formulas24 incorporate Z{alpha}/2 and addition of the numerator components.


View this table:
[in this window]
[in a new window]

 
Table 7 Common standard normal Z scores for use in sample size formulas and power estimation.*

 
An example would be to compare diagnostic sensitivity between 2 tests when both tests were independently performed on 100 infected animals. Assume that the tests are believed to have sensitivities of 85% and 90%, and a test with an alpha of 5% is desired. Epi Info 6 can be used to calculate the power of the test to compare these 2 proportions. From the menu, choose Programs -> EPITABLE. From the menus in EPITABLE, choose Sample -> Power calculation -> Cohort study. The number of exposed should be set to 100, and the ratio of exposed to exposed as 1 (exposed and nonexposed is simply a way to distinguish the 2 groups). The relative risk worth detecting should be set to 1.06 (90%/85%; larger proportion over the smaller), the attack rate in the unexposed should be set as 85% as the lower of the 2 proportions, and alpha should be 5%. The power calculation includes continuity correction and is reported as 13.3% by Epi Info. Using the presented formulas, the powers are calculated as 18.5% and 12.8% for the uncorrected and continuity-corrected formulas, respectively.

The calculation of power is dependent on the specification of an alternative hypothesis. The sampling distribution of the proportion under the null hypothesis is determined, and the critical value (Pr Z ≤ z = 1 – {alpha}/2) is located on this distribution. The alternative hypothesis is set as the expected difference in the 2 population proportions, and the sampling distribution of this difference is plotted with the critical value under the null hypothesis. The area under the sampling distribution of the alternative hypothesis to the right of the critical value is the power of the statistical test (Fig. 5). The shapes of these curves depend on the hypothesized proportions and the sample size. There is only a single power value related to each possible alpha and alternative hypothesis (expected difference in proportions).


Figure 05
View larger version (17K):
[in this window]
[in a new window]

 
Figure 5 The sampling distribution under the null (black line) and alternative (gray line) hypotheses for the situation when P1 = 0.2 and P2 = 0.4 with equal group sizes. HO is the null hypothesis that the true proportion is 0.3 (simple average of P1 and P2), and HA is the alternative hypothesis that P1 = 0.2 and P2 = 0.4 and is centered at P2. Alternatively, HA could have been centered at P1. The gray shaded area corresponds to the power for the statistical test with alpha of 5% when the sample size per group is 20 (A), 40 (B), 80 (C), and 160 (D). The power is 50% in panel B because the observed P value of the comparison is equal to the specified alpha (5%).

 

    Conclusions
 TOP
 Sources and manufacturers
 Abstract
 Introduction
 Epidemiologic Errors
 Sample Size Adjustment Factors
 Sample Size Situations
 Conclusions
 References
 
The calculation of the sample size is very important during the design stage of all epidemiologic studies and should match the proposed statistical analysis to the extent possible. It is important to recognize that there is no single correct sample size, and all calculations are only as good as the employed assumptions. The sample size ensures statistical significance if the subsequent data collection is perfectly consistent with the assumptions made for the sample size calculation (assuming power was set as 50% or greater). If the null hypothesis is false and the assumed alternate hypothesis is true, then the probability of observing statistical significance will be equal to the assumed power of the test. The choice of assumptions for calculations is very important because their validity determines the likelihood of observing statistical significance. The traditional choices of 5% alpha and 20% beta can simply be used unless the investigator has specific reasons for other values. The choices of the best guesses or hypothesized values for the proportions that will be estimated by the study are more difficult. Values for these assumptions should be based on available literature or expert opinion. When there is doubt concerning their value, then proportions could be assumed to be close to 0.5. A proportion of 0.5 has the maximum variance, and therefore would result in the largest sample size.

Sample size calculations correspond to the number of animals that are required to complete the study and be available for statistical analysis. They are the minimum sample sizes required to achieve the desired statistical properties. Sample size calculations should be increased by the number of animals that are anticipated to be lost during the study. The study design influences the number of animals expected to be lost during implementation. Cross-sectional studies should have minimal losses, but there is always the possibility of mislabeled samples, lost records, and laboratory errors. Sample sizes for cross-sectional studies should be increased 1–5% to account for these potential losses. Prospective studies that cover long time periods could have substantial losses, but these types of study designs are unusual for diagnostic investigations.

Some published recommendations include the post-hoc calculation of power when study results fail to achieve statistical significance.34 However, there is no statistical basis for this calculation.31 The power of a 2-sided test with an alpha set to be equal to the observed P value is typically 50%,34 as presented in Figure 5. Therefore, post-hoc power calculations will typically be less than 50% for observed nonsignificant results. This fact, in conjunction with the one-to-one relationship between P value and power, suggests that little information can be garnered from their calculation. Post-hoc calculations of power could be useful if performed for magnitudes of differences other than what was observed by the study. In general, however, the post-hoc calculation of power is akin to determining the probability that an event will be observed after the event has already occurred (or not).

A primary purpose of sample size calculations is to ensure that the proposed study will be of an appropriate size to find an important difference statistically significant. Therefore, calculations should be performed prior to the determination of the study size. In practice, however, sample sizes are sometimes performed after the number of animals for study has been set, for reasons that might include cost or availability. Often, the assumptions are simply modified based on trial and error until calculations lead to the predetermined sample size, and these calculations are presented in grant applications or other proposed research plans. Also, studies are sometimes performed without performing any sample size calculations. Many journals require discussion of sample size calculations, and therefore such calculations are sometimes performed after the fact, with assumptions modified until the appropriate size is found. These are obviously not appropriate uses of sample size calculations. A better approach often would be the calculation of power based on the sample size expected to be used for the study. Though such post-hoc determinations are inappropriate or misleading, many epidemiologists and statisticians likely have been asked to perform these calculations. Unfortunately, the realities of research do not always coexist peacefully within the service of science itself. It is hoped that the material presented in the present article will demystify sample size calculations and encourage their use during the initial design phase of surveillance and diagnostic evaluations.


    Acknowledgments
 
This manuscript was prepared in part through financial support by the U.S. Department of Agriculture, Cooperative State Research, Education, and Extension Service, National Research Initiative Award 2005-35204-16087. The author would like to thank the anonymous reviewers for helpful suggestions, which resulted in a better overall paper.


    Sources and manufacturers
 TOP
 Sources and manufacturers
 Abstract
 Introduction
 Epidemiologic Errors
 Sample Size Adjustment Factors
 Sample Size Situations
 Conclusions
 References
 
From the Department of Veterinary Integrative Biosciences, College of Veterinary Medicine and Biomedical Sciences, Texas A&M University, College Station, TX. Back

a Survey Toolbox version 1.04. by Angus Cameron et al. Available at http://www.ausvet.com.au/content.php?page=res_software. Back

b Epi InfoTM version 6.04d for Windows, Centers for Disease Control and Prevention, Atlanta, GA. Available at http://www.cdc.gov/epiinfo/Epi6/ei6.htm. Back

c WINPEPI (PEPI-for-Windows) version 6.8 by J. H. Abramson. Available at http://www.brixtonhealth.com/pepi4windows.html. Back


    References
 TOP
 Sources and manufacturers
 Abstract
 Introduction
 Epidemiologic Errors
 Sample Size Adjustment Factors
 Sample Size Situations
 Conclusions
 References
 

  1. Aitken C.G. 1999 Sampling—how big a sample? J Forensic Sci 44 750 760.[Medline]
  2. Alonzo T.A., Pepe M.S., Moskowitz C.S. 2002 Sample size calculations for comparative studies of medical tests for detecting presence of disease. Stat Med 21 835 852.[Medline]
  3. Altman D.G. 1980 Statistics and ethics in medical research: III. How large a sample? Br Med J 281 1336 1338.[Free Full Text]
  4. Berry D.A., Lindgren B.W. 1996 Statistics: theory and methods, 2nd ed Duxbury Press Belmont, CA.
  5. Berry G., Armitage P. 1995 Mid-P confidence intervals: a brief review. Statistician 44 417 423.
  6. Bochmann F., Johnson Z., Azuara-Blanco A. 2007 Sample size in studies on diagnostic accuracy in ophthalmology: a literature survey. Br J Ophthalmol 91 898 900.[Abstract/Free Full Text]
  7. Branscum A.J., Johnson W.O., Gardner I.A. 2006 Sample size calculations for disease freedom and prevalence estimation surveys. Stat Med 25 2658 2674.[Medline]
  8. Breslow N.E., Day N.E. 1980 Statistical methods in cancer research. In: The analysis of case-control studies, vol.1. IARC Scientific Publications no. 32 131 133 International Agency for Research on Cancer Lyon, France.
  9. Cameron A.R., Baldock F.C. 1998 A new probability formula for surveys to substantiate freedom from disease. Prev Vet Med 34 1 17.[Medline]
  10. Campbell M.K., Mollison J., Grimshaw J.M. 2001 Cluster trials in implementation research: estimation of intracluster correlation coefficients and sample size. Stat Med 20 391 399.[Medline]
  11. Carlin J.B., Doyle L.W. 2002 Sample size. J Paediatr Child Health 38 300 304.[Medline]
  12. Carpenter T.E. 2001 Use of sample size for estimating efficacy of a vaccine against an infectious disease. Am J Vet Res 62 1582 1584.[Medline]
  13. Casagrande J.T., Pike M.C. 1978 An improved approximate formula for calculating sample sizes for comparing two binomial distributions. Biometrics 34 483 486.[Medline]
  14. Clopper C.J., Pearson E.S. 1934 The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26 404 413.[Free Full Text]
  15. Connett J.E., Smith J.A., McHugh R.B. 1987 Sample size and power for pair-matched case-control studies. Stat Med 6 53 59.[Medline]
  16. Connor R.J. 1987 Sample size for testing differences in proportions for the paired-sample design. Biometrics 43 207 211.[Medline]
  17. Daly L.E. 1991 Confidence intervals and sample sizes: don't throw out all your old sample size tables. BMJ 302 333 336.[Free Full Text]
  18. Delucchi K.L. 2004 Sample size estimation in research with dependent measures and dichotomous outcomes. Am J Public Health 94 372 377.[Abstract/Free Full Text]
  19. Dohoo I.R., Martin W., Stryhn H. 2003 Veterinary epidemiologic research. AVC Inc. Charlottetown, PE, Canada.
  20. Donner A. 1984 Approaches to sample size estimation in the design of clinical trials—a review. Stat Med 3 199 214.[Medline]
  21. Fisher R.A. 1929 Tests of significance in harmonic analysis. Proc R Soc Lond A Math Phys Sci 125 54 59.
  22. Flahault A., Cadilhac M., Thomas G. 2005 Sample size calculation should be performed for design accuracy in diagnostic test studies. J Clin Epidemiol 58 859 862.[Medline]
  23. Fleiss J.L., Levin B. 1988 Sample size determination in studies with matched pairs. J Clin Epidemiol 41 727 730.[Medline]
  24. Fleiss J.L., Levin B.A., Paik M.C. 2003 Statistical methods for rates and proportions, 3rd ed Wiley Hoboken, NJ.
  25. Fleiss J.L., Tytun A., Ury H.K. 1980 A simple approximation for calculating sample sizes for comparing independent proportions. Biometrics 36 343 346.
  26. Fosgate G.T. 2005 Modified exact sample size for a binomial proportion with special emphasis on diagnostic test parameter estimation. Stat Med 24 2857 2866.[Medline]
  27. Fosgate G.T. 2007 A cluster-adjusted sample size algorithm for proportions was developed using a beta-binomial model. J Clin Epidemiol 60 250 255.[Medline]
  28. Georgiadis M.P., Johnson W.O., Gardner I.A. 2005 Sample size determination for estimation of the accuracy of two conditionally independent tests in the absence of a gold standard. Prev Vet Med 71 1 10.[Medline]
  29. Goodman S.N. 1993 p values, hypothesis tests, and likelihood: implications for epidemiology of a neglected historical debate. Am J Epidemiol 137 485 496.[Abstract/Free Full Text]
  30. Greenland S. 1985 Power, sample size and smallest detectable effect determination for multivariate studies. Stat Med 4 117 127.[Medline]
  31. Greenland S. 1988 On sample-size and power calculations for studies using confidence intervals. Am J Epidemiol 128 231 237.[Abstract/Free Full Text]
  32. Greiner M., Gardner I.A. 2000 Epidemiologic issues in the validation of veterinary diagnostic tests. Prev Vet Med 45 3 22.[Medline]
  33. Grimes D.A., Schulz K.F. 1996 Determining sample size and power in clinical trials: the forgotten essential. Semin Reprod Endocrinol 14 125 131.[Medline]
  34. Hoenig J.M., Heisey D.M. 2001 The abuse of power: the pervasive fallacy of power calculations for data analysis. Am Stat 55 19 pp.[Medline]
  35. Houle T.T., Penzien D.B., Houle C.K. 2005 Statistical power and sample size estimation for headache research: an overview and power calculation tools. Headache 45 414 418.[Medline]
  36. Jones S.R., Carley S., Harrison M. 2003 An introduction to power and sample size estimation. Emerg Med J 20 453 458.[Abstract/Free Full Text]
  37. Kelsey J.L. 1996 Methods in observational epidemiology, 2nd ed Oxford University Press New York, NY.
  38. Killip S., Mahfoud Z., Pearce K. 2004 What is an intracluster correlation coefficient? Crucial concepts for primary care researchers. Ann Fam Med 2 204 208.[Abstract/Free Full Text]
  39. Lachenbruch P.A. 1992 On the sample size for studies based upon McNemar's test. Stat Med 11 1521 1525.[Medline]
  40. Lachin J.M. 1992 Power and sample size evaluation for the McNemar test with application to matched case-control studies. Stat Med 11 1239 1251.[Medline]
  41. Lancaster H.O. 1961 Significance tests in discrete distributions. J Am Stat Assoc 56 223 234.
  42. Li J., Fine J. 2004 On sample size for sensitivity and specificity in prospective diagnostic accuracy studies. Stat Med 23 2537 2550.[Medline]
  43. Lu Y., Bean J.A. 1995 On the sample size for one-sided equivalence of sensitivities based upon McNemar's test. Stat Med 14 1831 1839.[Medline]
  44. Lu Y., Jin H., Genant H.K. 2003 On the non-inferiority of a diagnostic test based on paired observations. Stat Med 22 3029 3044.[Medline]
  45. McDermott J.J., Schukken Y.H. 1994 A review of methods used to adjust for cluster effects in explanatory epidemiological studies of animal populations. Prev Vet Med 18 155 173.
  46. Neyman J., Pearson E.S. 1928 On the use and interpretation of certain test criteria for purposes of statistical inference: part I. Biometrika 20A 175 240.
  47. Neyman J., Pearson E.S. 1933 On the problem of the most efficient tests of statistical hypotheses. Proc R Soc Lond A Math Phys 231 289 337.
  48. Obuchowski N.A. 1998 Sample size calculations in studies of test accuracy. Stat Methods Med Res 7 371 392.[Abstract/Free Full Text]
  49. Parker R.A., Bregman D.J. 1986 Sample size for individually matched case-control studies. Biometrics 42 919 926.
  50. Rigby A.S., Vail A. 1998 Statistical methods in epidemiology. II: A commonsense approach to sample size estimation. Disabil Rehabil 20 405 410.[Medline]
  51. Rothman K.J., Greenland S. 1998 Modern epidemiology, 2nd ed Lippincott-Raven Philadelphia, PA.
  52. Sahai H., Khurshid A. 1995 A note on confidence intervals for the hypergeometric parameter in analyzing biomedical data. Comput Biol Med 25 35 38.[Medline]
  53. Schlesselman J.J. 1974 Sample size requirements in cohort and case-control studies of disease. Am J Epidemiol 99 381 384.[Free Full Text]
  54. Sheps S. 1993 Sample size and power. J Invest Surg 6 469 475.[Medline]
  55. Sterne J.A., Davey Smith G. 2001 Sifting the evidence—what's wrong with significance tests? BMJ 322 226 231.[Free Full Text]
  56. Suess E.A., Gardner I.A., Johnson W.O. 2002 Hierarchical Bayesian model for prevalence inferences and determination of a country's status for an animal pathogen. Prev Vet Med 55 155 171.[Medline]
  57. Suissa S., Shuster J.J. 1991 The 2 x 2 matched-pairs trial: exact unconditional design and analysis. Biometrics 47 361 372.
  58. Thrusfield M.V. 2005 Veterinary epidemiology, 3rd ed Blackwell Science Ames, IA.
  59. Ukoumunne O.C. 2002 A comparison of confidence interval methods for the intraclass correlation coefficient in cluster randomized trials. Stat Med 21 3757 3774.[Medline]
  60. Ury H.K., Fleiss J.L. 1980 On approximate sample sizes for comparing two independent proportions with the use of Yates' correction. Biometrics 36 347 351.[Medline]
  61. Whitley E., Ball J. 2002 Statistics review 4: sample size calculations. Crit Care 6 335 341.[Medline]
  62. Wickramaratne P.J. 1995 Sample size determination in epidemiologic studies. Stat Methods Med Res 4 311 337.[Abstract/Free Full Text]




This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Fosgate, G. T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Fosgate, G. T.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS