Quiz  Logo
  • Background

  • What is a healthy body temperature? German physician Carl Wunderlich analyzed a million temperatures from 25,000 patients and in 1869 published that the normal human-body temperature is 98.60 F. But several more recent studies have found that number to be too high, leading to speculation that Dr. Wunderlich was wrong, or that human body temperature has changed over time. In a study published by Mackowiak, Wasserman, & Levine (Journal of the American Medical Association, 1992), body temperatures (oral temperatures using a digital thermometer) were recorded for healthy men and women, aged 18-40 years, who were volunteers in Shigella vaccine trials at the University of Maryland Center for Vaccine Development, Baltimore. For these adults, the sample mean body temperature was found to be 98.249oF with a standard deviation of 0.733oF. temperature reading

    Research question: Is 98.6oF still an appropriate standard for the average healthy body temperature?

    Goals: In this investigation, you will

    • Investigate the distribution of sample means and the associated standardized statistic
    • Learn how to test a hypothesis about a population mean
    • Interpret a 95% confidence interval for a population mean.

    Note: The applets you are using today are rather wide. You may want to follow the links to open them in a separate window to better fit on your screen, make screen captures etc.

  • Testing the claim

  • (a) Explain (in words, in context) what is meant by the following symbols as applied to this study: n, 𝑥̅, s , μ, σ. If you know a value, report it. Otherwise, define the symbol in words/context.
    n
    𝑥̅
    s
    μ
    σ

  • Key Result: If all possible samples of size n are selected from a large population or an infinite random process with mean 𝜇 and standard deviation 𝜎, then the sampling distribution of these sample means will have the following characteristics:
    • The mean will be equal to 𝜇.
    • The standard deviation will be equal to 𝜎/√𝑛 (we can call this SD(𝑥̅) or 𝜎𝑥̅).
    • Central Limit Theorem: The shape will be normal if the population distribution is normal, or approximately normal if the sample size is large regardless of the shape of the population distribution.

    • The convention is to consider the sample size large enough if n > 30. However, this rule really depends on the amount of skewness in the population. The more non-symmetric the population distribution, the larger the sample size necessary before the distribution of sample means is reasonably modeled by a normal distribution.
    • The population is considered large if it is more than 20 times the size of the sample.

     

  • (c) Suppose that Wunderlich's axiom is correct and many different random samples of 13 adults are taken from a large normally distributed population with mean 98.6 degrees F and standard deviation 0.733 degrees F. What does the Central Limit Theorem tell you about the theoretical distribution of sample means? (Indicate any necessary information that is missing.)
    Shape:
    Mean: (report a number)
    Standard deviation:(report a number)

  • Confession #1: We don't know 𝜎

  • The previous analysis is fine if our model of the population is correct.  The components of our model included a large normally distributed population with s = 0.733. This model allowed us to use the sample mean to make claims about the population mean, but let's loosen some of those assumptions and see what differences they do/do not make.  

    In particular, 0.733 wasn't the population standard deviation, it was only the sample standard deviation. If I don't know the population mean, then I probably don't know the population standard deviation either.  When we estimate SD(𝑥̅) from the sample data, we call it the "standard error"...

    Definition: The standard error of the sample mean, denoted by SE(𝑥̅), is an estimate of the standard deviation of 𝑥̅ (the sample to sample variability in sample means from repeated random samples) calculated by substituting the sample standard deviation s for the population standard deviation 𝜎: SE(𝑥̅) = 𝑠/√𝑛

    We can still use this value to standardize our statistic, but is the distribution of this standardized statistic still well modeled by a normal distribution?? Is 2 still considered a large value?

  • Let's again simulate many random samples to explore the behavior of the (standardized) statistic. Instead of creating a hypothetical population, this time we will sample from a theoretical probabilty distribution (i.e., an infinitely large population).

    • Open the Sampling from a Population Model applet and specify our model:
      • Population mean = 98.6
      • Population SD = 0.70
      • Population shape: Normal
      • Also check the Show skewness box?
    • Press Set Population.
    • Check the Show Sampling Options box and specify a large Number of samples and a sample size of 13.
    • Press Draw Samples and scroll to the right to display the distribution of the sampled statistics (means).
    • Confirm that the behavior of the distribution of sample means is consistent with the Central Limit Theorem from part (c).

    But what about when we use the standard error and calculate the standardized statistic (using each sample's mean and sd values) instead?

    • From the Choose Statistic list, select t-statistic (that's what we will call this new equation)
  • But can we fit a model to the distribution of these t-statistics?

  • (h) The distribution of these t-statistics is (shape), with mean and standard deviation

  • Detour: Student's t-distribution

  • If we zoom in on the tails of the distribution, we see that more of the simulated of t-statistics lie in those tails than the normal distribution would predict.

    normal distribution overlay
    To model the sampling distribution of the t-statistic = (𝑥̅ − 𝜇)/(s/√𝑛), we need a probability model with heavier tails than the standard normal distribution. William S. Gosset, a chemist turned statistician, showed in 1908, while working for the Guinness Breweries in Dublin Ireland, that, when the population of observations follows a normal distribution, a “t probability curve” provides a better model for the sampling distribution of this standardized statistic .

    t distribution vs. normal distribution

    All that means for us is we will compare the t-statistic to the t-distribution rather than to the normal distribution to convert our standardized statistic to a p-value.

  • Confession #2: We probably don't have a normally distributed population

  • We also assumed that our population distribution was normally distributed. What if the population distribution is not normally distributed?

     

    • Return to the Sampling from a Population Model applet and specify a new model:
      • Population mean = 1 (not a typo, trust me)
      • Population SD = 0.70
      • Population shape: Skewed right
      • Also check the Show skewness box?
    • Press Set Population.
    • Check the Show Sampling Options box and specify a large Number of samples and a sample size of 13.
    • Press Draw Samples and scroll to the right.
    • From the Choose Statistic list, select t-statistic 
  • Browse Files
    Drag and drop files here
    Choose a file
    Cancelof
  • One-sample t-test

  • One-sample t-test for : To test a null hypothesis about a population mean H0:  =  , when we don’t know the population standard deviation (pretty much always), we will use the sample standard deviation to calculate the standard error SE(𝑥̅) and then compare the standardized statistic

    to a t-distribution with n – 1 degrees of freedom (more of what those are Monday). Theoretically, this approximation requires the population to follow a normal distribution. However, statisticians have found this approximation to also be reasonable for other population distributions whenever the sample size is large. How large the sample size needs to be depends on how skewed the population distribution is. Consequently, we will consider the t procedures valid when either the population distribution is symmetric or the sample size is large (more on that Monday).

     

  • Application

  • Yesterday, I collected temperature data for both sections. The data can be found here.  Open this data file and copy and paste the data into the Theory Based Inference applet.

    • Use the Scenario pull-down menu to now select One mean
    • Check the Paste data box.
    • Check the Includes header box
    • Press Clear to empty the data window and then paste in the data from your clipboard. Press Use Data.
    • I see an outlier I'm comfortable removing as a data entry error. Manually delete that value from the data window and press Use Data again. 
    • You should have 54 observations. (There are two missing values coded as * that the applet knows to ignore.)

    First, consider how you will summarize the behavior of the sample, as if to someone who can't see the output.  

    Then (scroll right and) check the Test of significance box and use the applet to test whether there is convincing evidence that the population mean body temperature differs from 98.6.

    Then (scroll down?) check the Confidence interval box.  Use the CI results. (We will talk about the PI results on Monday.)

    Create a screen capture of your results.

  • Browse Files
    Drag and drop files here
    Choose a file
    Cancelof
  • Conclusions

  • Write a paragraph recapping your analysis of the class data as if to a scientist (so you can use technical terms but should also use complete sentences and clear communication). Make sure you include:

    1. Describe the sample: The results obtained in the sample 
    2. Significance: Your conclusion (with justification) regarding whether 98.6 is a plausible value for the population mean based on our sample
    3. Estimation: Include a one-sentence interpretation of your conflidence interval
    4. Evaluation of the validity conditions: Whether you believe it is valid to use the t-procedures for these data
    5. Generalizability: If you haven't yet, clarify (with justification) to what population are you willing to generalize these results. 
    6. Any cautions or concerns you have about this analysis (e.g., nonsampling errors)?
  • Should be Empty: