Figure 6-4:
Population percentages for responses to ACT math-help question.
Now take all possible samples of size 1,000 from this population and find the proportion in each who said they needed math help. The distribution of these sample proportions is in Figure 6-5. It has an approximate normal distribution with mean
p
= 0.38 and standard error equal to
(or about 1.5%). This approximation is valid because the two conditions for the CLT are met: 1)
np
= 1,000(0.38) = 380 (which is at least 10); and 2)
n
(1 -
p
) = 1,000(0.62) = 620 (also at least 10).
Figure 6-5:
Proportion of students responding yes to ACT math-help question for samples of size 1,000.
Finding Probabilities for
For the ACT test example, suppose it's reported that 0.38 or 38% of all the students taking the ACT test would like math help. Suppose you took a random sample of 1,000 students. What is the chance that more than 40 percent of them say they need help?
What the question wants is the probability that the sample proportion,
is greater than 0.40; that is, P(
> 0.40).This question is answered using the normal approximation for
described in the previous section, given the stated conditions are met.
We first check the conditions: 1) is
np
at least 10? Yes because 1,000 * 0.38 = 380 = 38; 2) is
n
(1 -
p
) at least 10? Again yes because 1,000 * (1 - 0.38) = 620 checks out. So you can use the normal approximation to answer the question.
We make the conversion of the
-value to a
z
-value using
to get
. Now we find
P(
Z
> 1.30) = 1 - 0.9032 = 0.0968. So if 38 percent of students wanted help, the chance of taking a sample of 1,000 students and getting more than 40 percent needing help is approximately 0.0968 (by the CLT).
Comparing sample results to a claim about the population is called
hypothesis testing
. Because the chance of getting more than 40% of the students in our sample who requested help is 0.0968, we wouldn't reject the claim that 38% of the population of all ACT takers request help. To reject this claim most statisticians would want this probability be less than 0.05 (see Chapter 8 for more on hypothesis testing).
Chapter 7
:
Confidence Intervals
In This Chapter
Confidence interval components
Interpreting confidence intervals
Details of confidence intervals for one or two means/proportions
In this chapter, you find out how to build, calculate, and interpret confidence intervals, and you work through the formulas involving one or two population means or proportions. You also get the lowdown on some of the finer points of confidence intervals: what makes them narrow or wide, what makes you more or less confident in their results, and what they do and don't measure.
Making Your Best Guesstimate
A
confidence interval
(abbreviated CI) is used for the purpose of estimating a population parameter (a single number that describes a population) by using statistics (numbers that describe a sample of data). For example, you might estimate the average household income (parameter) based on the average household income from a random sample of 1,000 homes (statistic). However, because sample results will vary (see Chapter 6) you need to add a measure of that variability to your estimate. This measure of variability is called the margin of error, the heart of a confidence interval. Your sample statistic, plus or minus your margin of error, gives you a range of likely values for the parameter — in other words, a confidence interval.
The margin of error is the amount of "plus or minus" that is attached to your sample result when you move from discussing the sample itself to discussing the whole population that it represents; that's why the general formula for the margin of error contains a "
±
" in front of it.
For example, say the percentage of kids who like baseball is 40 percent, plus or minus 3.5 percent. That means the percentage of kids who like baseball is somewhere between 40% - 3.5% = 36.5% and 40% + 3.5% = 43.5%. The lower end of the interval is your statistic minus the margin of error, and the upper end is your statistic plus the margin of error.
The margin of error is not the chance a mistake was made; it measures variation in the random samples due to chance. Because you didn't get to sample everybody in the population, you expect your sample results to be "off" by a certain amount, just by chance. You acknowledge that your results could change with subsequent samples, and that they're only accurate to within a certain range, which is the margin of error.
To estimate a parameter with a confidence interval:
1. Choose your confidence level and your sample size (see details later in this chapter).
2. Select a random sample of individuals from the population.
3. Collect reliable and relevant data from the individuals in the sample.
See Chapter 12 for survey data and Chapter 13 for data from experiments.
4. Summarize the data into a statistic (for example, a sample mean or proportion.)
5. Calculate the margin of error.
(Details later in this chapter.)
6. Take the statistic plus or minus the margin of error to get your final estimate of the parameter.
This is called a
confidence interval
for that parameter.
For example, the formula for a confidence interval for the
mean of a population is
; the statistic here is
(the
sample mean), and the margin of error is the piece following
the plus/minus sign:
. (This formula is fully broken down
in the section, "Confidence Interval for One Population Mean.")
The Goal: Small Margin of Error
The ultimate goal when making an estimate using a confidence interval is to have a small margin of error. The narrower the interval, the more precise the results are.
For example, suppose you're trying to estimate the percentage of semi trucks on the interstate between the hours of 12 a.m. and 6 a.m., and you come up with a 95% confidence interval that claims the percentage of semis is 50%, plus or minus 40%. Wow, that narrows it down! (Not.) You've defeated the purpose of trying to come up with a good estimate — the confidence interval is much too wide. You'd rather say something like: A 95% confidence interval for the percentage of semis on the interstate between 12 a.m. and 6 a.m. is 50%, plus or minus 3% (thus between 47% and 53%).
How do you go about ensuring that your confidence interval will be narrow enough? You certainly want to think about this issue before collecting your data; after the data are collected, the width of the confidence interval is set.
Three factors affect the size of the margin of error:
The confidence level