So how do you interpret the 95%? It goes back to the definition of a confidence level. A
confidence level
is the percentage of all possible samples of size
n
whose confidence intervals contain the population parameter. When taking many random samples from a population, you know that some samples (in this case 95% of them) will represent the population, and some won't (in this case 5% of them) just by random chance. Random samples that represent the population will result in confidence intervals that contain the population parameter (that is, they are correct); and those that do not represent the population will result in confidence intervals that are not correct.
For example, if you randomly sample 100 exam scores from a large population, you might get more low scores than you should in your sample just by chance, and your confidence interval will be too low; or you might get more high scores than you should in your sample just by chance, and your confidence interval will be too high. These two confidence intervals won't contain the population parameter, but with a 95% confidence level this type of error (called
sampling error
) should only happen 5% of the time.
Confidence level (such as 95%) represents the percentage of all possible random samples of size
n
that typify the population and hence result in correct confidence intervals. It isn't the probability of a single confidence interval being correct.
Another way of thinking about the confidence level is to say that if the organization took a sample of 1,000 people over and over again and made a confidence interval from its results each time, 95 percent of those confidence intervals would be right. (You just have to hope that yours is one of those right results.)
To correctly interpret your particular confidence interval you can say "A range of likely values for the population mean is XXX to XXX, with a confidence level of 95%." Or you could say it like the Gallup Organization does:
"For these results, one can say with 95% confidence that the maximum amount of sampling (margin of) error is plus or minus XXX."
It's all about the sampling process, not a single sample.
Spotting Misleading Confidence Intervals
There are two possible reasons that a confidence interval is incorrect (does not contain the population parameter). First, it can be incorrect by random chance because the random sample it came from didn't represent the population; or second, it can be incorrect because the data that went into it weren't any good. I discuss the first situation in the previous section, and it can't be prevented. The second situation can be prevented (or at least minimized) through good data-collection practices.
A good slogan to remember when examining statistical results is
"garbage in = garbage out."
No matter how nice and scientific someone's confidence interval may look, the formula that was used to calculate it doesn't have any idea of the quality of the data that went into it. It's up to you to check it out. For example, if the data for the confidence interval was based on a
biased
sample (one that favored certain people over others); a bad design; bad data-collection procedures; or misleading questions, the margin of error is suspect — if the bias is bad enough, the results will be bogus.
For example, suppose a total of 50,000 people were surveyed on a certain issue. This incredibly high sample size sounds great — until you realize they were all visitors to a certain Web site. The tiny reported margin of error is a result of the huge n, yet it means nothing because it is based on biased data that didn't come from a random sample. Of course, some people will go ahead and report it anyway, so you're left to determine whether the results are based on good information or garbage. If garbage, you know what to do about the margin of error: Ignore it.
Before I get on too high of a horse here, it's important to note that even the best of surveys can still contain a little bias. The Gallup Organization addresses the issue of what margin of error does and does not measure in the follow disclaimer added to its reports:
"In addition to sampling error, question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of public opinion polls."
What Gallup is saying is that besides the error that happens in random samples just by chance, surveys can have additional errors or bias due to things like missing data from people who don't respond, or phone numbers no longer in service. Margin of error cannot measure the extent of those types of nonsam-pling errors. However, a good survey design like Gallup does can go a long way toward helping minimize bias and get credible results. (See Chapter 12 for full details on doing good surveys.)
Chapter 8
:
Hypothesis Tests
In This Chapter
General ideas for a hypothesis test
Type I and Type II errors in testing
Specific hypothesis tests for one or two population means or proportions
Hypothesis testing is a statistician's way of trying to confirm or deny a claim about a population using data from a sample. For example, you might read on the Internet that the average price of a home in your city is $150,000 and wonder if that number is true for the whole city. Or you hear that 65% of all Americans are in favor of a smoking ban in public places — is this a credible result? In this chapter I give you the big picture of hypothesis testing as well the details for hypothesis tests for one or two means or proportions. And I examine possible errors that can occur in the process.
Doing a Hypothesis Test
A
hypothesis test
is a statistical procedure that's designed to test a claim. Typically, the claim is being made about a population parameter (one number that characterizes the entire population). Because parameters tend to be unknown quantities, everyone wants to make claims about what their values may be. For example, the claim that 25% (or 0.25) of all women have varicose veins is a claim about the proportion (that's the
parameter
) of all women (that's the
population
) who have varicose veins.
Identifying what you're testing
To get more specific, the varicose vein claim is that the parameter, the population proportion (
p
), is equal to 0.25. (This claim is called the
null hypothesis.
) If you're out to test this claim, you're questioning the claim and have a hypothesis of your own (called the
research hypothesis,
or
alternative hypothesis
). You may hypothesize, for example, that the actual proportion of women who have varicose veins is lower than 0.25, based on your observations. Or, you may hypothesize that due to the popularity of high-heeled shoes, the proportion may be higher than 0.25. Or, if you're simply questioning whether the actual proportion is 0.25, your alternative hypothesis is, "No, it isn't 0.25."
In addition to testing hypotheses about categorical variables (having or not having varicose veins is a categorical variable), you can also test hypotheses about numerical variables, such as the average commuting time for people working in Los Angeles or their average household income. In these cases, the parameter of interest is the population average or mean (denoted
μ
). Again, the claim is that this parameter is equal to a certain value, versus some alternative.
Setting up the hypotheses
Every hypothesis test contains two hypotheses. The first hypothesis is called the
null hypothesis,
denoted H
o
. The null hypothesis always states that the population parameter is
equal
to the claimed value. For example, if the claim is that the average time to make a name-brand ready-mix pie is five minutes, the statistical shorthand notation for the null hypothesis in this case would be as follows: H
o
:
μ
= 5.
What's the alternative?
Before actually conducting a hypothesis test, you have to put two possible hypotheses on the table — the null hypothesis is one of them. But, if the null hypothesis is found not to be true, what's your alternative going to be? Actually, three possibilities exist for the second (or alternative) hypothesis, denoted H
a
. Here they are, along with their shorthand notations in the context of the example:
The population parameter is
not equal
to the claimed value (H
a
:
μ
≠
5).
The population parameter is
greater than
the claimed value (H
a
:
μ
> 5).
The population parameter is
less than
the claimed value (H
a
:
μ
< 5).
Which alternative hypothesis you choose in setting up your hypothesis test depends on what you're interested in concluding, should you have enough evidence to refute the null hypothesis (the claim). For example, if you want to test whether or not a company is correct in claiming its pie takes 5 minutes to make, you use the not-equal-to alternative. Your hypotheses for that test would be H
o
:
μ
= 5 versus H
a
:
μ
≠
5.
If you only want to see whether the time turns out to be greater than what the company claims (that is, the company is falsely advertising its prep time), you use the greater-than alternative, and your two hypotheses are H
o
:
μ
= 5 versus H
a
:
μ
> 5. Suppose you work for the company marketing the pie, and you think the pie can be made in less than 5 minutes (and could be marketed by the company as such). The less-than alternative is the one you want, and your two hypotheses would be H
o
:
μ
= 5 versus H
a
:
μ
< 5.
Knowing which hypothesis is which
How do you know which hypothesis to put in H
o
and which one to put in H
a
? Typically, the null hypothesis says that nothing new is happening; the previous result is the same now as it was before, or the groups have the same average (their difference is equal to zero). In general, you assume that people's claims are true until proven otherwise.
Hypothesis tests are similar to jury trials, in a sense. In a jury trial, H
o
is similar to the not-guilty verdict, and H
a
is the guilty verdict. You assume in a jury trial that the defendant isn't guilty unless the prosecution can show beyond a reasonable doubt that he or she is guilty. If the jury says the evidence is beyond a reasonable doubt, they reject H
o
, not guilty, in favor of H
a
, guilty.