Quiz

Name1

First NameLast Name
Name2

First NameLast Name
E-mail
example@example.com
Goals
- To investigate a research question and draw conclusions through simple numerical and graphical summaries and through exploration of the concept of "statistical significance." In other words, apply the Six Steps of a Statistical Investigation.

Background

Recall the study (Sonoda et al., 2011, Gut—An International Journal of Gastroenterology and Hepatology) in Japan which tested an eight-year-old black Labrador (Marine) to see whether she could detect colorectal cancer.

Step 1: Ask a Research Question Does Marine have a genuine tendency to pick the cancer breath? How accurate is she in the long-run?

Step 2: Design a study and collect data Marine first smelled a bag that had been breathed into by a patient with colorectal cancer. This was the standard that the dog would use to judge the other bags. Marine then smelled the breath in five different bags from five different patients, only one of which contained breath from a different colorectal cancer patient; the other four bags contained breath from noncancer patients. Marine was trained to sit next to the bag which she thought contained breath from a cancer patient (i.e., had the cancer scent). If she sat down next to the correct bag, she was rewarded with a tennis ball.

In this lab, you will examine numerical and graphical summaries of Marine's results and apply the 3S Process to help decide whether there is convincing evidence that Marine tends to pick the cancer breath more often than we would expect by chance alone.

Step 3: Explore the data

Recall that Marine was correct in 30 of the 33 trials. Taking the observational unit to be one trial, we can define the variable as did Marine pick the correct bag, which is categorical and binary. The sample size is the number of observational units we have data for, so in this case, it is just n = 33.

With a binary categorical variable, we arbitrarily define one outcome to be "success" and the other to be "failure." We can then use either the number of success or the proportion of successes as the statistic.

We will often use the symbol "p-hat" to refer to a sample proportion.

To graph the data, we use a simple bar graph. We will use a different applet than in Investigation 1 to explore these categorical data.

Enter the value of n and either the count of how many Marine correctly picked (and the applet computes the proportion) or enter Marine's proportion of correct picks (and the applet computes the count).
Press the Calculate button.

(a) Include a screen capture of your output, being sure to display both the summary data = (sample size n, count of successes, proportion of success, and the graph)

Cancelof
(b) Based on the numerical and graphical summaries you have just produced, write a sentence or two summarizing the results for this sample. Imagine that your reader cannot see your graph and/or summary statistics. For example: What did you learn about the tendency for Marine to pick the cancer bag? Was there a majority or a pretty equal split? Did the size of the majority surprise you? For example: In the 33 trials, Marine ....

Step 4: Draw inferences beyond the data

In contrast to the dolphins and the infant studies, we cannot use a coin to represent the "by chance alone" outcome. If Marine was simply guessing across the 5 bags, then we would expect Marine to pick correctly in only 1/5 or 20% of the trials in the long run.

Null hypothesis: Marine would be correct 20% of the time in the long run

Alternative hypothesis: Marine would be correct more than 20% of the time in the long run

Our observed statistic is that Marine was correct 90.9% of these 33 trials, in the direction of the alternative hypothesis. But could this have been random chance?

Below is the simulation model to represent a guessing dog (Bud):

One trial	Choice among 5 breath samples
Success	Choosing the correct bag
Failure	Choosing one of the four incorrect bags
Chance of success	0.20 if Marine is guessing (null hypothesis)
One repetition	One set of 33 attempts

We can use a spinner to represent this random process. (Use the applet below or click here to open the applet in its own window)

Simulate a "could have been" outcome (to see what could have happened if Marine blindly guessed among the five bags each time):

Set the Probability of heads to 0.20 (which will change to the more general "probability of success")
Set the Sample size to 33.
Press the Draw Samples button. Note the number of successes in this set of spins.
Now uncheck the Show animation box and click the Draw Samples button again. If this outcome for number of successes differs, this just by random chance alone.
Click Draw Samples 8 more times, watching how the number of successes changes from repetition to repetition.
Now run at least 1,000 repetitions (e.g., enter 990 in the Number of samples box and press Draw Samples).
Check the Summary Statistics box. TBIQ: Is the mean about what you expected?
Enter Marine's observed number of successes in the As extreme as box and press Count

(c) Include a screen capture of your results for the guessing dog (your inputs and the resulting dotplot showing the p-value). [Option: On the far right in the applet (use the scroll bar) you can check the "Hide spinners" box to save some space before taking your screen shot.]

Cancelof
Write a one-sentence interpretation of your p-value in context.

Step 5: Formulate conclusions
(d) Based on this dotplot, when we set the probability of success to be 0.20, what are reasonably likely (typical) outcomes for the "number of successes" in 33 attempts?
(e) Based on this dotplot, would you say it is plausible that 30 successes would occur in 33 spins by chance alone? Also compute and report how many standard deviations 30 is from the mean of your null distribution.
(f) Do you consider Marine's performance to be statistically significant? Briefly justify.
(g) Do Marine's results in these 33 trials convince you that she has a genuine ability to detect colorectal cancer from a breath sample? (This is kinda the same question as (e) but now about Marine's attempts rather than spinner outcomes.) Explain your reasoning as if to a friend who hasn't taken a statistics course (feel free to reference your graph and/or Bud!)
(h) Do Marine's results PROVE that she has cancer detection ability? Explain briefly
(i) Are you willing to generalize this conclusion to all dogs? (Hint: This question is a bit of a judgement call.)

Step 6: Look back and ahead
Now we want to critique the study a bit: Step back and reflect on what you learned in this study and what could be improved upon.
(j) If you were to repeat a study similar to this one, what changes would you recommend in how the data were collected or what new aspects to the question would you want to investigate?

Exploring further

It seems clear to me that Marine's probability of picking the correct sample is larger than 0.50. But how much larger? Is Marine's probability larger than 0.70? In other words, is 0.90 statistically significantly larger than 0.70?

(k) Carry out a simulation analysis using the One Proportion applet to decide whether 0.70 is a plausible value for Marine's probability of detecting the cancer breath.

Include your screen capture

Cancelof
Explain your process and your conclusion (in context).

Summary of technology skills learned/practiced in this lab
- Using technology to tally and produce a simple bar graph for a binary (two outcome) variable.
- Using a computer simulation (One Proportion applet) to generate a "null distribution" of "could have been results" for a research study involving a binary variable with a non-0.50 probability.
Summary of statistical skills learned in this lab:
- Interpreting probability as a long-run proportion.
- Simulating outcomes of a random process in order to investigate the underlying properties of that process.
Should be Empty:

Background

Step 3: Explore the data

Step 4: Draw inferences beyond the data

Step 5: Formulate conclusions

Step 6: Look back and ahead

Exploring further

Summary of technology skills learned/practiced in this lab

Summary of statistical skills learned in this lab: