Quiz

Name*

First NameLast Name
Name 2

First NameLast Name
E-mail*
example@example.com
Section 1 (9-10am)Section 2(10-11am)
Background

The Shargorodsky, Curhan, Curhan, and Eavey (2010) study from Investigation 1.14 actually focused on comparing the current hearing loss rate among teens (12-19 years) to previous years to see whether teen hearing loss is increasing, possibly due to heavier use of ear buds. In addition to the 1771 participants in the NHANES 2005-6 study (333 with some level of hearing loss), they also had hearing loss data on 2928 teens from NHANES III (1988-1994), with 480 showing some level of hearing loss.

Research question: Was the proportion of U.S. teens with hearing loss larger in 2005-6 than in 1988-1994?

Goals: In this investigation, you will

Investigate the distribution of the difference in sample proportions
Learn how to test a hypothesis comparing population proportions
Interpret a 95% confidence interval for the difference in population proportions.

Descriptive Statistics
When we have two samples for a binary categorical variable, it is often useful to organize the data using a two-way table.

(a) Complete the following two-way table of counts.

Rows	1988-1994	2005-6	Total
some hearing loss
no hearing loss
Total

(b) Explain why it does not make sense to conclude that hearing loss was more prevalent in 1988-1994 than in 2005-2006 based only on the comparison that 480 > 333.

Definition: The simplest statistic for comparing a binary variable between two groups is the difference in the proportion of “successes” for each group. These proportions, calculated separately for each group rather than looking at the overall proportion, are called conditional proportions.
In this case, we compute the difference in the proportion of teens with some level of hearing loss between the two years (

p̂₉₄ - p̂₀₆= 480/2928 − 333/1771).

The next step is to examine an effective graphical summary for comparing the two groups. Here is a segmented bar graph from an applet.

segmented bar graph for hearing loss data

(c) Write a sentence or two comparing the distributions of hearing loss between these two studies. Be sure to report an appropriate statistic.
(d) Do these data convince you that there is a difference in the population proportions? If not, what could be another explanation for the difference you see in these numerical and graphical summaries for these two samples?

Inferential Statistics

As we’ve said before, it certainly is possible to obtain sample proportions this far apart, just by random chance, even if the population proportions (of teens with some hearing loss) were the same. The question now is how likely such a difference would be if the population proportions were the same. We can answer this question by modeling the sampling variability, arising from taking random samples from these populations, for the difference in two sample proportions. Investigating this sampling variability will help us to assess whether this particular difference in sample proportions is strong evidence that the population proportions actually differ.

Let 𝜋₉₄ represent the proportion of all American teenagers in 1988-1994 with at least some hearing loss, and similarly for 𝜋₀₆. Define the parameter of interest to be 𝜋₉₄ – 𝜋₀₆, the difference in the population proportions between these two years.

(e) State appropriate null and alternative hypotheses about this parameter to reflect the researchers’ conjecture that hearing loss by teens is becoming more prevalent.
Ho: blanks = blank
Ha: blanks blank blank
(f) Prediction: As before, we will begin our inferential analysis by assuming the null hypothesis is true. What “random process” are we simulating? What is the source of the randomness in this study? What do we need to assume to perform this simulation? Describe how you could carry out a simulation analysis to investigate whether the observed difference in sample proportions provides strong evidence that the population proportions with hearing loss differed between these two time periods. How will you estimate the p-value?

Simulation

We will begin our simulation analysis by assuming the population proportions are the same. Without loss of generality, let's use the overall proportion of success between the two groups (p̂ = (480 + 333)/(2928 + 1771) = 0.173). We simulate the drawing of two different random samples from this population, one to represent the 1988-1994 study and the other for the 2005-6 study. Because the population size is very large compared to the sample sizes, we will model this by treating the population as infinite and sampling from a binomial process. Then we examine the distribution of the difference in the conditional proportions with some hearing loss between these two years. Finally, we repeat this random sampling process for many trials.

Open the Two Population Proportions applet and specify our model:
- Specify .173 as the probability of success for both populations.
- Specify 2928 and 1771 as the two sample sizes.
- Leave the Number of samples set to 1.
- Press Draw Samples.

Did your neighbors get the same values for the sample proportions? For the difference?

Change the Number of samples from 1 to 999 and press Draw Samples for a total of 1,000 independent random samples.

(g) Does the distribution of p̂1 behave as you expected? (Be sure it's clear what you expected and why.) Does the distribution of difference in sample proportions behave as you expected? In particular, does the mean of this distribution make sense? (How does it compare to the means of the individual p̂ distributions?) Explain.
Now determine the simulation-based p-value by counting how often the simulated difference in conditional proportions is at least as extreme as the actual value observed in the study by entering the observed value in the Count Samples box and pressing the Count button. (Make sure the direction matches your alternative hypothesis.)
(h) Include a screen capture of your results.

Cancelof
(i) Report your empirical p-value and indicate what conclusion you would draw from it.
Check the Normal Approximation box to overlay a normal curve on your null distribution to evaluate whether the simulated differences appear to match up with observations from a normal distribution.
(j) Does the normal model appear to be a reasonable approximation to the null distribution? Is the "theory-based" p-value similar to the "simulation-based" p-value? (Cite the values you are comparing.)

Mathematical Model
Turns out there is not an "exact" probability distribution for the difference between two binomial distributions. However, there is a "Central Limit Theorem" when the sample sizes are large. The primary advantage of using this model is a straight-forward confidence interval formula.

We will often simplify these technical conditions to as "at least 5 successes and at least 5 failues in each sample."

We will consider the populations large if each is more than 20 times the sample size.

We will consider the samples independent if they are selected independent of each other (e.g., no overlap or relationshnips among the participants in the two samples).
(j) Do you consider the "large sample size" condition met for our samples? What are the 4 values you are comparing to 5?

So let's use the Theory-Based Inference applet (or R or JMP) to carry out a two-sample z-test or two proportion z-test.

In the Theory-Based Inference applet:
- Change the Scenario to Two proportions
- Specify the sample sizes
- Specify either the sample counts or the sample proportions
- Press Calculate
Check the Test of Significance box, set the hypotheses, and find the p-value.
Check the Confidence interval box and press Calculate CI.
Include a readable screen capture of your output.

File Upload

Cancelof
(k) Write a one-sentence interpretation of your confidence interval. (Hint: It is technically incorrect to say there has been a 0.15% to 4.67% increase in hearing loss from 1994 to 2006, because “percentage change” implies a multiplication of values, not an addition or subtraction as we are considering here. It would be acceptable to say that the increase is between 0.15 and 4.67 percentage points.)

Conclusions
(l) Summarize your conclusions from this study as if to a concerned parent (i.e., do not use a lot of technical jargon). Be sure to address statistical significance, statistical confidence, and the populations you are willing to generalize the results to. Also, are you willing to conclude that the change in the prevalence of hearing loss is due to the increased use of ear buds among teenagers between 1994 and 2006? Explain why or why not.
Should be Empty:

Background

Descriptive Statistics

Inferential Statistics

Simulation

Mathematical Model

Conclusions