The javascript used on this site for creative design effects is not supported by your browser. Please note that this will not affect access to the content on this web site.
Skip Navigation
H H S Department of Health and Human Services
Health Resources and Services Administration
Primary Care: The Health Center Program

A-Z Index  |  Questions?  |  Order Publications

  • Print this
  • Email this

Calculating Sample Size

Types of Samples  

Subjective or Convenience Sample 
- Has some possibility of bias
- Cannot usually say it is representative
- Selection made by ease of collection

Simple Random Sample 
- No subjective bias
- Equal chance of selection; e.g., select the fifth chart seen on every third day
- Can usually be backed to say it is representative

Systematic Sample 
- Is a random sample
- Equal chance of selection due to methodology; e.g., computer-generated list of
random numbers, or every fifth name on a generated list
- Can usually be backed to say it is representative

Stratified Sample 
- Breakdown the population into subgroups, then take a random sample from each subset
- Can usually be backed to say it is representative

 

Sample Size Calculation  

The Automated Method

If you know your population size and desired confidence level you may use this Web-based calculator to automatically calculate sample size.


The Manual Calculation Method

To perform sample size calculation manually, you need the following values:

Population Value: Size of the population from which the sample will be selected. (Number of users or number of encounters)
Expected Frequency of the Factor under Study always err toward 50%

Worst Acceptable Frequency
If 50% is the true rate in the population, what is the result farthest from the rate that you would accept in your sample? If your confidence interval were 4%, then your worst acceptable frequency would be 54% or 46%.

2. Formula: Sample Size = n / [1 + (n/population)]
In which n = Z * Z [P (1-P)/(D*D)]

P = True proportion of factor in the population, or the expected frequency value
D = Maximum difference between the sample mean and the population mean,
Or Expected Frequency Value minus (-) Worst Acceptable Value
Z = Area under normal curve corresponding to the desired confidence level

Confidence Level/ Value for Z
90% / 1.645
95% / 1.960
99% / 2.575
99.9% / 3.29

B. Population Survey Characteristics
1. The sample to be taken must be a simple random or otherwise representative sample. A systematic sample, such as every fifth person on a list, is acceptable if the sample is representative. Choosing every other person from a list of couples would not give a representative sample, since it might select only males or only females.
2. The question being asked must have a "yes/no" or other two-choice answer, leading to a proportion of the population (the "yes's") as the final result.

 

Examples of Sample Size Calculation  


Trait or Factor Prevalence 
Suppose that you wish to investigate whether or not the true prevalence of HIV antibody in a population is 10%. You plan to take a random or systematic sample of the population to estimate the prevalence. You would like 95% confidence that the true proportion in the entire population will fall within the confidence level calculated from your sample.

Let's say that the population size is 5000, the estimate of the prevalence of 10%, and either 6% or 14% as the "worst acceptable" value, which is the end point of your confidence level. (Please note: the high and low values are calculated by adding and subtracting your confidence level, in this case "4", to your estimate of the prevalence.)

Population Value = 5000
Expected Frequency of the Factor under Study = 10%
Worst Acceptable Frequency = 14% or 6%

P = Expected Frequency Value = 10%
D = (Expected Frequency - Worst Acceptable) = 14%-10%=4%, OR 10%-6%=4%
Z = 1.960 with Confidence Level of 95% (See Confidence Level values, page 3-2)

Formula: Sample Size = n / [1 + (n/population)]
In which n = Z * Z [P (1-P)/(D*D)]

First, calculate the value for "n".

N = Z * Z [P (1-P)/(D*D)]
N = 1.960 * 1.960 [0.10(1 - 0.10) / (0.04 * 0.04)
N = 1.960 * 1.960 [0.10(0.90) / (0.0016)
N = 1.960 * 1.960 [.09 / .0016]
N = 1.960 * 1.960 [56.25]
N = 1.960 * 110.25
N = 216.09

Next, Calculate the Sample Size. (S = Sample Size)

S = n / [1 + (n / population)
S = 216.09 / [1 + (216.09 / 5000)]
S = 216.09 / [1 +. 043218]
S = 216.09 / 1.043218
S = 207

Clinical Performance Rates 
Suppose you want to evaluate the compliance of your center with standard Quality Assurance procedures or with the Clinical Measures. You plan a random or systematic sample of the center's charts, and seek a 95% confidence level that the sample is representative of all the center's charts and that the compliance rate will fall within the confidence level you desire. As this is a measure of how personnel perform a task, you would expect a high rate of compliance in completing a required task. Thus, it is strongly suggested that you use 95% (no lower than 90%) as your Expected Frequency, as 99.9% perfection is not a reasonable expectation. Performance is expected of all trained personnel and should not fall below a reasonable level. This level is suggested as 85% (no lower than 80%) for the "Worst Acceptable" value. The population size will equal the population of the life cycle or subset: in this example we will use 800. It is strongly suggested that you use the 95% Confidence Level for the Z Value.

Population Value = 800
Expected Frequency of the Factor under Study =95%
Worst Acceptable Frequency = 85%

P = Expected Frequency Value = 95%
D = (Expected Frequency - Worst Acceptable) = 95%- 85% = 10%
Z = 1.960 with a Confidence Level of 95% (See Confidence Level Values, page 3-2)

Formula: Sample Size = n / [1 + (n/population)]
In which n = Z * Z [P (1-P)/(D*D)]

First, calculate the value for "n".
N = Z * Z [P (1-P)/(D*D)]
N = 1.960 * 1.960 [0.95(1 - 0.95) / (0.10 * 0.10)
N = 1.960 * 1.960 [0.95(0.05) / (0.01)
N = 1.960 * 1.960 [.0475 / .01]
N = 1.960 * 1.960 [4.75]
N = 1.960 * 9.31
N = 18.24

Next, Calculate the Sample Size. (S = Sample Size)

S = n / [1 + (n / population)
S = 18.24 / [1 + (18.24 / 800)]
S = 18.24 / [1 + 0.0228]
S = 18.24 / 1.0228
S = 17.8, or 18

NOTE: If the calculated sample size is lower than 25 at a 95% confidence level, the Clinical Measures require you to use a minimum of 25 charts annually.

The requirement of 25 minimum can be explained by the concept of Margin of Error. This is calculated by taking the square root of the sample size and dividing it into 1, then multiplying by 100%. A graph would show that a sample size of 25 gives a Margin of Error at 20%. Actually, by this method the most practical sample size is 40, giving a Margin of Error at 15%. Over 40, the improvement in the error is very small.

Definitions and Terms

Population
The entire group of objects or people about which information is wanted is called the population.

Sample
A sample is a part of the population that is actually examined in order to gather information.

Representative Sample 
A sample is representative of the population from which it is taken if the characteristics of the sample mimic those of the population.

Confidence Interval 
A range of values of a sample statistic that is likely (at a given level of probability, called a confidence level) to contain a population parameter. The interval that will include the population parameter a certain percentage (confidence level) of the time. The wider the confidence interval, the higher the confidence level.

Confidence Level 
A desired percentage of the scores (usually 95% or 99%) that would fall within a certain range of confidence limits.

Confidence Limits
The upper and lower values of a confidence interval, that is, the values defining the range of a confidence interval.

EXAMPLE: From the general population 1,000 people covering the voting age range were polled on the senatorial race. The poll predicted that, if the election were held today, the Republican candidate for Senator would win 60% of the vote. This prediction could be qualified by saying that the pollster was 95% certain (confidence level) that the prediction was accurate plus or minus 3% (confidence interval). This means the Republican candidate has a 95% chance of winning between 57% and 63% (confidence limits) of the vote.