Quiz

Name*

First NameLast Name
Name 2

First NameLast Name
E-mail*
example@example.com
What section are you in?

Section 1 (9-10am)Section 2(10-11am)
Background

What is a healthy body temperature? German physician Carl Wunderlich analyzed a million temperatures from 25,000 patients and in 1869 published that the normal human-body temperature is 98.6⁰ F. But several more recent studies have found that number to be too high, leading to speculation that Dr. Wunderlich was wrong, or that human body temperature has changed over time. In a study published by Mackowiak, Wasserman, & Levine (Journal of the American Medical Association, 1992), body temperatures (oral temperatures using a digital thermometer) were recorded for healthy men and women, aged 18-40 years, who were volunteers in Shigella vaccine trials at the University of Maryland Center for Vaccine Development, Baltimore. For these adults, the sample mean body temperature was found to be 98.249^oF with a standard deviation of 0.733^oF.

Research question: Is 98.6^oF still an appropriate standard for the average healthy body temperature?

Goals: In this investigation, you will

Investigate the distribution of sample means and the associated standardized statistic
Learn how to test a hypothesis about a population mean
Interpret a 95% confidence interval for a population mean.

Note: The applets you are using today are rather wide. You may want to follow the links to open them in a separate window to better fit on your screen, make screen captures etc.

Testing the claim
(a) Explain (in words, in context) what is meant by the following symbols as applied to this study: n, 𝑥̅, s , μ, σ. If you know a value, report it. Otherwise, define the symbol in words/context.
n blank
𝑥̅
s
μ
σ
(b) Pick a null hypothesis for testing Wunderlich’s axiom using appropriate symbols
Pick an alternative hypothesis for testing Wunderlich's axiom

Key Result: If all possible samples of size n are selected from a large population or an infinite random process with mean 𝜇 and standard deviation 𝜎, then the sampling distribution of these sample means will have the following characteristics:
• The mean will be equal to 𝜇.
• The standard deviation will be equal to 𝜎/√𝑛 (we can call this SD(𝑥̅) or 𝜎_𝑥̅).
• Central Limit Theorem: The shape will be normal if the population distribution is normal, or approximately normal if the sample size is large regardless of the shape of the population distribution.

The convention is to consider the sample size large enough if n > 30. However, this rule really depends on the amount of skewness in the population. The more non-symmetric the population distribution, the larger the sample size necessary before the distribution of sample means is reasonably modeled by a normal distribution.
The population is considered large if it is more than 20 times the size of the sample.

(c) Suppose that Wunderlich's axiom is correct and many different random samples of 13 adults are taken from a large normally distributed population with mean 98.6 degrees F and standard deviation 0.733 degrees F. What does the Central Limit Theorem tell you about the theoretical distribution of sample means? (Indicate any necessary information that is missing.)
Shape: blank
Mean: blank (report a number)
Standard deviation:(report a number)
(d) Determine how many standard deviations the sample mean (98.249) falls from the hypothesized value of 98.6 (that is, calculate the standardized statistic). Show your work.
(e) Based on this calculation, would you consider the observed value of the sample mean (98.249) to be surprising, if the population mean were really equal to 98.6? Explain how you are deciding.
(f) Roughly approximate a 95% confidence interval by taking 𝑥̅ ± 2 x SD(𝑥̅)). Show your work and practice a one-sentence interpretation of the interval in context (ask the instructor/TA if you aren't sure!).

Confession #1: We don't know 𝜎

The previous analysis is fine if our model of the population is correct. The components of our model included a large normally distributed population with s = 0.733. This model allowed us to use the sample mean to make claims about the population mean, but let's loosen some of those assumptions and see what differences they do/do not make.

In particular, 0.733 wasn't the population standard deviation, it was only the sample standard deviation. If I don't know the population mean, then I probably don't know the population standard deviation either. When we estimate SD(𝑥̅) from the sample data, we call it the "standard error"...

Definition: The standard error of the sample mean, denoted by SE(𝑥̅), is an estimate of the standard deviation of 𝑥̅ (the sample to sample variability in sample means from repeated random samples) calculated by substituting the sample standard deviation s for the population standard deviation 𝜎: SE(𝑥̅) = 𝑠/√𝑛

We can still use this value to standardize our statistic, but is the distribution of this standardized statistic still well modeled by a normal distribution?? Is 2 still considered a large value?

Let's again simulate many random samples to explore the behavior of the (standardized) statistic. Instead of creating a hypothetical population, this time we will sample from a theoretical probabilty distribution (i.e., an infinitely large population).

Open the Sampling from a Population Model applet and specify our model:
- Population mean = 98.6
- Population SD = 0.70
- Population shape: Normal
- Also check the Show skewness box?
Press Set Population.
Check the Show Sampling Options box and specify a large Number of samples and a sample size of 13.
Press Draw Samples and scroll to the right to display the distribution of the sampled statistics (means).
Confirm that the behavior of the distribution of sample means is consistent with the Central Limit Theorem from part (c).

But what about when we use the standard error and calculate the standardized statistic (using each sample's mean and sd values) instead?

From the Choose Statistic list, select t-statistic (that's what we will call this new equation)

(g) Use the applet to approximate the p-value by counting how many of the simulated t-statistics are at least as extreme as what you calculated for the Maryland sample in part(d). (Hint: Remember to be consistent with the alternative hypothesis). Report your "simulation-based" p-value.
But can we fit a model to the distribution of these t-statistics?
(h) The distribution of these t-statistics is blanks (shape), with mean blank and standard deviation blank
(i) Vote: In the applet, check the box to Overlay Normal Distribution and then check the box to Overlay t distribution. Visually, which model do you think is a better fit to our distribution of t-statistics? (Hint: Which gives you an estimated probability closest to the simulation results?)

Normal distributiont-distributionNeitherReally can't see a difference between them

Detour: Student's t-distribution
If we zoom in on the tails of the distribution, we see that more of the simulated of t-statistics lie in those tails than the normal distribution would predict.

To model the sampling distribution of the t-statistic = (𝑥̅ − 𝜇)/(s/√𝑛), we need a probability model with heavier tails than the standard normal distribution. William S. Gosset, a chemist turned statistician, showed in 1908, while working for the Guinness Breweries in Dublin Ireland, that, when the population of observations follows a normal distribution, a “t probability curve” provides a better model for the sampling distribution of this standardized statistic .

All that means for us is we will compare the t-statistic to the t-distribution rather than to the normal distribution to convert our standardized statistic to a p-value.

Confession #2: We probably don't have a normally distributed population

We also assumed that our population distribution was normally distributed. What if the population distribution is not normally distributed?

Return to the Sampling from a Population Model applet and specify a new model:
- Population mean = 1 (not a typo, trust me)
- Population SD = 0.70
- Population shape: Skewed right
- Also check the Show skewness box?
Press Set Population.
Check the Show Sampling Options box and specify a large Number of samples and a sample size of 13.
Press Draw Samples and scroll to the right.
From the Choose Statistic list, select t-statistic

(j) Focusing on the histogram of simulated t-statistics, does the sampling distribution you have created appear to be well-modelled by a t distribution?
(k) Here's another confession, there were actually n = 130 subjects in the Maryland study. Change the sample size to 130 and press Draw Samples. Does this distribution of t-statistics appear well-modeled by the t-distribution?
Upload a screen capture of your simulation results for n = 130 (the distribution of t-statistics)

Cancelof

One-sample t-test
One-sample t-test for : To test a null hypothesis about a population mean H₀:  = _ , when we don’t know the population standard deviation (pretty much always), we will use the sample standard deviation to calculate the standard error SE(𝑥̅) and then compare the standardized statistic

to a t-distribution with n – 1 degrees of freedom (more of what those are Monday). Theoretically, this approximation requires the population to follow a normal distribution. However, statisticians have found this approximation to also be reasonable for other population distributions whenever the sample size is large. How large the sample size needs to be depends on how skewed the population distribution is. Consequently, we will consider the t procedures valid when either the population distribution is symmetric or the sample size is large (more on that Monday).

Application

Yesterday, I collected temperature data for both sections. The data can be found here. Open this data file and copy and paste the data into the Theory Based Inference applet.

Use the Scenario pull-down menu to now select One mean
Check the Paste data box.
Check the Includes header box
Press Clear to empty the data window and then paste in the data from your clipboard. Press Use Data.
I see an outlier I'm comfortable removing as a data entry error. Manually delete that value from the data window and press Use Data again.
You should have 54 observations. (There are two missing values coded as * that the applet knows to ignore.)

First, consider how you will summarize the behavior of the sample, as if to someone who can't see the output.

Then (scroll right and) check the Test of significance box and use the applet to test whether there is convincing evidence that the population mean body temperature differs from 98.6.

Then (scroll down?) check the Confidence interval box. Use the CI results. (We will talk about the PI results on Monday.)

Create a screen capture of your results.

Upload your screen capture.

Cancelof

Conclusions
Write a paragraph recapping your analysis of the class data as if to a scientist (so you can use technical terms but should also use complete sentences and clear communication). Make sure you include:
1. Describe the sample: The results obtained in the sample
2. Significance: Your conclusion (with justification) regarding whether 98.6 is a plausible value for the population mean based on our sample
3. Estimation: Include a one-sentence interpretation of your conflidence interval
4. Evaluation of the validity conditions: Whether you believe it is valid to use the t-procedures for these data
5. Generalizability: If you haven't yet, clarify (with justification) to what population are you willing to generalize these results.
6. Any cautions or concerns you have about this analysis (e.g., nonsampling errors)?
(l) Paragraph here:
Should be Empty:

Background

Testing the claim

Confession #1: We don't know 𝜎

Detour: Student's t-distribution

Confession #2: We probably don't have a normally distributed population

One-sample t-test

Application

Conclusions