Quiz
  • Goals

    • To investigate a research question and draw conclusions through simple numerical and graphical summaries and through exploration of the concept of "statistical significance." In other words, apply the Six Steps of a Statistical Investigation.
  • Background

  • Recall the study (Sonoda et al., 2011, Gut—An International Journal of Gastroenterology and Hepatology) in Japan which tested an eight-year-old black Labrador (Marine) to see whether she could detect colorectal cancer.

    Step 1: Ask a Research Question Does Marine have a genuine tendency to pick the cancer breath? How accurate is she in the long-run? marine 

    Step 2: Design a study and collect data Marine first smelled a bag that had been breathed into by a patient with colorectal cancer. This was the standard that the dog would use to judge the other bags. Marine then smelled the breath in five different bags from five different patients, only one of which contained breath from a different colorectal cancer patient; the other four bags contained breath from noncancer patients. Marine was trained to sit next to the bag which she thought contained breath from a cancer patient (i.e., had the cancer scent). If she sat down next to the correct bag, she was rewarded with a tennis ball.

    In this lab, you will examine numerical and graphical summaries of Marine's results and apply the 3S Process to help decide whether there is convincing evidence that Marine tends to pick the cancer breath more often than we would expect by chance alone.

     

  • Step 3: Explore the data

  • Recall that Marine was correct in 30 of the 33 trials. Taking the observational unit to be one trial, we can define the variable as did Marine pick the correct bag, which is categorical and binary. The sample size is the number of observational units we have data for, so in this case, it is just n = 33.

    With a binary categorical variable, we arbitrarily define one outcome to be "success" and the other to be "failure." We can then use either the number of success or the proportion of successes as the statistic.

    We will often use the symbol "p-hat" to refer to a sample proportion.

    To graph the data, we use a simple bar graph. We will use a different applet than in Investigation 1  to explore these categorical data.

    • Enter the value of n and either the count of how many Marine correctly picked (and the applet computes the proportion) or enter Marine's proportion of correct picks (and the applet computes the count).
    • Press the Calculate button.
  • Browse Files
    Drag and drop files here
    Choose a file
    Cancelof
  • Step 4: Draw inferences beyond the data

  • In contrast to the dolphins and the infant studies, we cannot use a coin to represent the "by chance alone" outcome.  If Marine was simply guessing across the 5 bags, then we would expect Marine to pick correctly in only 1/5 or 20% of the trials in the long run. 

    Null hypothesis: Marine would be correct 20% of the time in the long run

    Alternative hypothesis: Marine would be correct more than 20% of the time in the long run 

    Our observed statistic is that Marine was correct 90.9% of these 33 trials, in the direction of the alternative hypothesis. But could this have been random chance?

    Below is the simulation model to represent a guessing dog (Bud):
    One trial Choice among 5 breath samples
         Success Choosing the correct bag
         Failure Choosing one of the four incorrect bags
    Chance of success 0.20 if Marine is guessing (null hypothesis)
    One repetition One set of 33 attempts
    confused dog

    We can use a spinner to represent this random process.  (Use the applet below or click here to open the applet in its own window)

    Simulate a "could have been" outcome (to see what could have happened if Marine blindly guessed among the five bags each time):

    • Set the Probability of heads to 0.20 (which will change to the more general "probability of success")
    • Set the Sample size to 33.
    • Press the Draw Samples button. Note the number of successes in this set of spins.
    • Now uncheck the Show animation box and click the Draw Samples button again. If this outcome for number of successes differs, this just by random chance alone.
    • Click Draw Samples 8 more times, watching how the number of successes changes from repetition to repetition.
    • Now run at least 1,000 repetitions (e.g., enter 990 in the Number of samples box and press Draw Samples).
    • Check the Summary Statistics box. TBIQ: Is the mean about what you expected?
    • Enter Marine's observed number of successes in the As extreme as box and press Count
  • Browse Files
    Drag and drop files here
    Choose a file
    Cancelof
  • Step 5: Formulate conclusions

  • Step 6: Look back and ahead

  • Now we want to critique the study a bit: Step back and reflect on what you learned in this study and what could be improved upon.

  • Exploring further

  • It seems clear to me that Marine's probability of picking the correct sample is larger than 0.50. But how much larger? Is Marine's probability larger than 0.70? In other words, is 0.90 statistically significantly larger than 0.70?

    (k) Carry out a simulation analysis using the One Proportion applet to decide whether 0.70 is a plausible value for Marine's probability of detecting the cancer breath. 
  • Browse Files
    Drag and drop files here
    Choose a file
    Cancelof
  • Summary of technology skills learned/practiced in this lab

    • Using technology to tally and produce a simple bar graph for a binary (two outcome) variable.
    • Using a computer simulation (One Proportion applet) to generate a "null distribution" of "could have been results" for a research study involving a binary variable with a non-0.50 probability.
  • Summary of statistical skills learned in this lab:

    • Interpreting probability as a long-run proportion.
    • Simulating outcomes of a random process in order to investigate the underlying properties of that process.
  • Should be Empty: