Quiz  Logo
  • Background

  • The Shargorodsky, Curhan, Curhan, and Eavey (2010) study from Investigation 1.14 actually focused on comparing the current hearing loss rate among teens (12-19 years) to previous years to see whether teen hearing loss is increasing, possibly due to heavier use of ear buds. In addition to the 1771 participants in the NHANES 2005-6 study (333 with some level of hearing loss), they also had hearing loss data on 2928 teens from NHANES III (1988-1994), with 480 showing some level of hearing loss.   
     

    Research question: Was the proportion of U.S. teens with hearing loss larger in 2005-6 than in 1988-1994?

    Goals: In this investigation, you will

    • Investigate the distribution of the difference in sample proportions
    • Learn how to test a hypothesis comparing population proportions
    • Interpret a 95% confidence interval for the difference in population proportions.

     

  • Descriptive Statistics

  • When we have two samples for a binary categorical variable, it is often useful to organize the data using a two-way table. 

  •  
  • Definition: The simplest statistic for comparing a binary variable between two groups is the difference in the proportion of “successes” for each group. These proportions, calculated separately for each group rather than looking at the overall proportion, are called conditional proportions.
    In this case, we compute the difference in the proportion of teens with some level of hearing loss between the two years (
    p̂94 - p̂06  = 480/2928 − 333/1771).

    The next step is to examine an effective graphical summary for comparing the two groups.  Here is a segmented bar graph from an applet.

    segmented bar graph for hearing loss data

  • Inferential Statistics

  • As we’ve said before, it certainly is possible to obtain sample proportions this far apart, just by random chance, even if the population proportions (of teens with some hearing loss) were the same. The question now is how likely such a difference would be if the population proportions were the same. We can answer this question by modeling the sampling variability, arising from taking random samples from these populations, for the difference in two sample proportions. Investigating this sampling variability will help us to assess whether this particular difference in sample proportions is strong evidence that the population proportions actually differ. independent random samples

     Let 𝜋94 represent the proportion of all American teenagers in 1988-1994 with at least some hearing loss, and similarly for 𝜋06. Define the parameter of interest to be 𝜋94 – 𝜋06, the difference in the population proportions between these two years. 

  • (e) State appropriate null and alternative hypotheses about this parameter to reflect the researchers’ conjecture that hearing loss by teens is becoming more prevalent.
    Ho: =
    Ha:

  • Simulation

  • We will begin our simulation analysis by assuming the population proportions are the same. Without loss of generality, let's use the overall proportion of success between the two groups (p̂ = (480 + 333)/(2928 + 1771) = 0.173). We simulate the drawing of two different random samples from this population, one to represent the 1988-1994 study and the other for the 2005-6 study. Because the population size is very large compared to the sample sizes, we will model this by treating the population as infinite and sampling from a binomial process. Then we examine the distribution of the difference in the conditional proportions with some hearing loss between these two years. Finally, we repeat this random sampling process for many trials. 

    • Open the Two Population Proportions applet and specify our model:
      • Specify .173 as the probability of success for both populations.
      • Specify 2928 and 1771 as the two sample sizes.
      • Leave the Number of samples set to 1.
      • Press Draw Samples.

     

    Did your neighbors get the same values for the sample proportions? For the difference?

     

    • Change the Number of samples from 1 to 999 and press Draw Samples for a total of 1,000 independent random samples.
  • Now determine the simulation-based p-value by counting how often the simulated difference in conditional proportions is at least as extreme as the actual value observed in the study by entering the observed value in the Count Samples box and pressing the Count button. (Make sure the direction matches your alternative hypothesis.)

     

  • Browse Files
    Drag and drop files here
    Choose a file
    Cancelof
  • Check the Normal Approximation box to overlay a normal curve on your null distribution to evaluate whether the simulated differences appear to match up with observations from a normal distribution. 

  • Mathematical Model

  • Turns out there is not an "exact" probability distribution for the difference between two binomial distributions. However, there is a "Central Limit Theorem" when the sample sizes are large. The primary advantage of using this model is a straight-forward confidence interval formula. 

    We will often simplify these technical conditions to as "at least 5 successes and at least 5 failues in each sample."

    We will consider the populations large if each is more than 20 times the sample size.

    We will consider the samples independent if they are selected independent of each other (e.g., no overlap or relationshnips among the participants in the two samples).

  • So let's use the Theory-Based Inference applet (or R or JMP) to carry out a two-sample z-test or two proportion z-test.

     

    • In the Theory-Based Inference applet: 
      • Change the Scenario to Two proportions
      • Specify the sample sizes
      • Specify either the sample counts or the sample proportions 
      • Press Calculate
    • Check the Test of Significance box, set the hypotheses, and find the p-value.
    • Check the Confidence interval box and press Calculate CI.
    • Include a readable screen capture of your output.
  • Browse Files
    Drag and drop files here
    Choose a file
    Cancelof
  • Conclusions

  • Should be Empty: