This site uses analytics to measure page views and engagement.
R∑
Course Assistant
R & Statistics
Ask me anything about R, statistics concepts, or your code. Please note that AI tools can make mistakes, so always double-check responses against the course materials and your own work.
This lesson introduces the logic and mechanics of statistical hypothesis testing, then applies that framework to three t-tests that will appear throughout the rest of the course.
We begin with the foundations of hypothesis testing: what a null and alternative hypothesis are, why we assume the null is true until the evidence says otherwise, and how the p-value approach works. A p-value is the probability of observing a result at least as extreme as the one calculated, assuming the null hypothesis is true. A small p-value means the observed data would be very unlikely if the null were true — giving us grounds to reject it. We set a decision rule in advance: reject \(H_0\) if the p-value is less than alpha (typically 0.05). We also cover the three tail configurations — two-tailed (\(\neq\)), right-tailed (\(>\)), and left-tailed (\(<\)) — and how the direction of the alternative hypothesis determines which tail of the distribution we look at.
The lesson then works through three t-tests, each using R’s t.test() function with different arguments. All three follow the same three-step procedure: specify hypotheses, compute the test statistic and p-value, interpret and conclude. The t-distribution is used instead of the z-distribution because we are estimating population parameters from sample data — specifically, we rarely know the population standard deviation. Where a z-score measures how many standard deviations a value is from the mean, a t-statistic measures how many standard errors the sample mean is from the hypothesized value. As sample size grows, the t-distribution approaches the normal distribution.
The one-sample t-test compares a single sample mean to a known or hypothesized value — for example, testing whether average study hours differ from 12. The independent samples t-test compares the means of two unrelated groups — for example, whether systolic blood pressure differs between males and females. The dependent (paired) samples t-test compares two related measurements, such as weight before and after an intervention, where each observation in one group is matched to a specific observation in the other. The key to choosing the right test is understanding how the data are structured.
By the end of this lesson, you should be able to set up and interpret any of the three t-tests in R, choose the correct test for a given research question, and write a formal conclusion from the output. Work through every code example in your own R script alongside the reading.
AI and Hypothesis Testing: Where Misinterpretation Happens
AI can run any of these t-tests in seconds and produce a plausible-sounding interpretation of the result. That interpretation may be wrong. The most common AI errors in hypothesis testing are:
Conflating statistical significance with practical significance. A p-value below 0.05 means the result is unlikely under the null — it does not mean the effect is large enough to matter in a business context. AI will rarely volunteer this distinction.
Misstating what a p-value means. The correct interpretation is “the probability of observing a result at least this extreme, assuming the null is true” — not “the probability that the null hypothesis is true.” AI frequently produces the incorrect version.
Choosing the wrong test. AI infers the test structure from your prompt. If you describe your data ambiguously, it may set up an independent-samples test when you need a paired one, or vice versa.
Understanding the logic in this chapter is what gives you the ability to catch these errors. Statistical significance, practical significance, and correct test selection are all human judgment calls — not outputs you can read off a p-value alone.
At a Glance
In order to succeed in this section, you will need to understand how to set up and test a hypothesis, knowing that we can reuse this logic for many other types of models throughout the course. Pay close attention to why we use the t-distribution — and how it differs from the z-distribution — for one-sample, independent samples, and dependent samples tests. Additionally, note when the test is two-tailed, right-tailed, or left-tailed, and what that means for the interpretation of the results.
Lesson Objectives
Explain the logic of hypothesis testing: null and alternative hypotheses, p-value interpretation, and the decision rule.
Distinguish between two-tailed, right-tailed, and left-tailed tests and set up each correctly.
Compare a sample mean to a hypothesized population value with a one-sample t-test.
Compare two unrelated sample means with an independent-samples t-test.
Compare two related sample means with a dependent-samples (paired) t-test.
Select the correct t-test for a given research question based on data structure.
Consider While Reading
In this lesson, we introduce hypothesis testing — the formal statistical process for deciding whether sample evidence supports a claim about a population. This framework carries forward for the remainder of the course: ANOVA, regression, and correlation all follow the same fundamental logic of comparing a test statistic to a null distribution and interpreting the resulting p-value.
We cover three t-tests that use the t.test() command with different arguments. The differences between them come down to one question: what are you comparing? A sample mean to a benchmark, two independent groups, or two related measurements on the same subjects?
As you read, pay particular attention to how the alternative hypothesis determines the direction of the test — and therefore which side of the distribution you examine — and how paired = TRUE versus the tilde (~) syntax in t.test() changes the structure of the test.
Hypothesis Testing
A hypothesis test resolves conflicts between two competing opinions (hypotheses).
In a hypothesis test, we define the null hypothesis as \(H_0\), which is the presumed default state of nature or status quo. We define the alternative hypothesis as \(H_A\), a contradiction of the default state of nature or status quo.
In statistics we use sample information to make inferences regarding the unknown population parameters of interest. We conduct hypothesis tests to determine if sample evidence contradicts \(H_0\).
On the basis of sample information, we either “Reject the null hypothesis” in which sample evidence is inconsistent with \(H_0\). This means that we show support for our alternative hypothesis, \(H_A\).
Or we “Do not reject the null hypothesis.” in which the sample evidence is not inconsistent with \(H_0\); or we do not have enough evidence to “accept” \(H_0\). This means that there is no support for our alternative hypothesis, \(H_A\).
General guidelines:
Null hypothesis, \(H_0\), states the status quo.
Alternative hypothesis, \(H_A\), states whatever we wish to establish (i.e., contests the status quo).
Use the following signs in hypothesis tests:
\(H_0\): \(=\) or \(>=\) or \(<=\) to specify status quo.
\(H_A\): \(\neq\) or \(<\) or \(>\) to contradict \(H_0\).
Where \(H_0\) always contains the equality sign.
Hypothesis Testing Covers a Variety of Tests
We will cover the following:
One-Sample t-test;
Independent samples t-test;
Dependent (or Paired) samples t-test.
Using a t-Distribution
All of these hypothesis tests listed above use a t-distribution, which is like a z-distribution (i.e., z-score), but specifically accounts for some specific population information being unknown.
The z-distribution shows how many sample standard deviations (SD) some value is away from the mean.
The t-distribution shows how many standard errors (SE) some value is away from the mean.
All of these hypothesis tests in the assigned readings use the t.test() command to execute in R with different parameters set.
Approaches to Hypothesis Testing
All approaches to scientific hypothesis testing enable us to determine whether the sample evidence is inconsistent with what is hypothesized under the null hypothesis, \(H_0\).
Basic principle: First assume that \(H_0\) is true and then determine if sample evidence contradicts this assumption.
This means that we can never assume our claim (against status quo) is supported without rejecting the null statistically through the scientific process.
There are two general approaches to hypothesis testing:
The Critical Value Approach
The p-value Approach
The Critical Value Approach: Not used here.
In this approach, we calculate the test statistic, which is also based on the type of test conducted. We then see whether the test statistic is past the critical value either looked up in a table or computed directly from a function in R.
The p-Value Approach: What we are using here.
The p-value is simply the probability of observing a result at least as extreme as the one we calculated, assuming the null hypothesis is true. A very small p-value means that what we observed would be very unlikely if the null were true — giving us reason to reject it. We compare the p-value to a pre-determined significance level (alpha, usually set at 0.05). If the p-value is less than alpha, we reject the null hypothesis. If it is greater than or equal to alpha, we fail to reject it. Note: failing to reject the null is not the same as proving it is true.
In this approach, we still calculate the test statistic, which is still based on the type of test conducted. However, in the p-value approach, this test statistic corresponds directly to a p-value, or probability value, that we use to make a decision on whether the hypothesis is supported or not.
The p-value is the likelihood, or probability, of observing a sample mean that is at least as extreme as the one derived from the given sample, under the assumption that the null hypothesis is true.
The calculation of the p-value depends on the specification of the alternative hypothesis.
We set a decision rule to reject \(H_0\) if p-value \(< \alpha\), which is set apriori. Specifically, the p-value approach sets an \(\alpha\) value (e.g., .05, .01, .001) and then determines if the p-value calculated is less than the set alpha value. The smaller the \(\alpha\), the more difficult it is to find significance in your alternative hypothesis, where you reject \(H_0\).
We typically set a \(\alpha < .05\) for a variety of tests.
We can go lower (e.g. .01, .001), but we typically do not go higher than .05.
Why Use The p-Value Approach
The critical value approach has variance in the formulas and numbers looked up (based on distribution and sometimes degrees of freedom), while the p-value is used consistently, we reject the null if our p-value is less than an \(\alpha\) set apriori.
Three Step Procedure Using The p-value Approach
Specify null and alternative hypothesis.
Compute the test statistic and calculate the p-value.
Interpret the probability and state the conclusion.
Setting up Hypotheses Tests
There are three types of hypotheses tests that you can analyze:
Two-tailed test: \(H_0 =\) vs \(H_A \neq\)
Right-tailed test: \(H_0 <=\) vs \(H_A >\)
Left-tailed test: \(H_0 >=\) vs \(H_A <\)
In a two-tailed test, the null is set to the equality (\(H_0 =\)) vs an alternate set to the inequality (\(H_A \neq\)). In a right-tailed test, the null is set to less than or equal to (\(H_0 <=\)), while the alternative is set to greater than (\(H_A >\)). Finally, in a left-tailed test, the null is set to greater than or equal to (\(H_0 >=\)), while the alternative is set to less than (\(H_A <\)).
In the figure below, the green area is the Rejection region where we reject \(H_0\) and show support for our alternative hypothesis \(H_A\).
The rest of the area under the curve is the Failure to Reject\(H_0\) region in which we cannot show support for our alternative hypothesis \(H_A\).
Hypothesis Test
In a hypothesis test, you have to decide if a claim is true or not. Before you can figure out if you have a left-tailed test or right-tailed test, you have to make sure you have a single tail to begin with. A tail in hypothesis testing refers to the tail at either end of a distribution curve.
Step 1: Write your null hypothesis statement and your alternate hypothesis statement.
Step 2: Draw a normal distribution curve.
Step 3: Shade in the related area under the normal distribution curve. The area under a curve represents 100%, so shade the area accordingly. The number line goes from left to right, so the first 25% is on the left and the 75% mark would be at the right tail.
Hypothesis Testing: Single Population: A Right-Tailed Test
A right-tailed test is where your hypothesis statement contains a greater than (>) symbol. In other words, the inequality points to the right.
For example, consider comparing the life of batteries before and after a manufacturing change. If you want to know whether the battery life is greater than the original 90 hours, your hypothesis statements might be:
Null hypothesis: No change or less than (\(H_0\) ≤ 90)
Hypothesis Testing: Single Population: A Left-Tailed Test
A left-tailed test is where your hypothesis statement contains a less than (<) symbol. In other words, the inequality points to the left.
For example, consider comparing the life of batteries before and after a manufacturing change. If you want to know whether the battery life is less than 70 hours, your hypothesis statements might be:
Null hypothesis: No change or greater than (\(H_0\) ≥ 70)
Hypothesis Testing: Single Population: A Two-Tailed Test
The two-tailed test is called a two-tailed test because the rejection region can be in either tail.
For example, consider comparing the life of batteries before and after a manufacturing change. If you want to know whether the battery life is different from the mean value, your hypothesis statements might be:
Null hypothesis: Equals mean (\(H_0 = \mu\)).
Alternate hypothesis: Not equal to mean (\(H_A \neq \mu\)).
Two-Tailed Hypothesis Test
Forming a Hypothesis
When forming a hypothesis, we need to do the following first.
Identify the relevant population parameter of interest (e.g., \(\mu\)).
Determine whether it is a one- or a two-tailed test.
Include some form of the equality sign in \(H_0\) and use \(H_A\) to establish a claim.
For an example in how to set up a hypothesis, we can look at a trade group that predicts that back-to-school spending will average $606.40 per family this year. The group uses this information to set an economic model in making predictions. A different economic model will be needed if the prediction is wrong.
Parameter of interest is \(\mu\) since we are interested in the average back-to-school spending.
Since we want to determine if the population mean differs from 606.4 (i.e, \(\neq\)), it is a two-tail test.
The hypothesis test is as follows: \(H_0: \mu = 606.4\) versus \(H_A: \mu \neq 606.4\)
t.test() Command
The t.test() command can be used in a variety of ways to test many different kinds of hypotheses. It is important to look up that command in your help and see all the different arguments you can change. The following screenshot shows a snippet of this.
t.test Command
We use the default S3 method in conducting the t.test:
We change arguments as needed to satisfy the type of test that we are conducting.
With the framework for hypothesis testing established, the next three sections apply it to three specific scenarios using R’s t.test() function. The choice of test depends entirely on what you are comparing and how the data are structured: one group versus a known value, two unrelated groups, or two related measurements on the same subjects.
One-Sample t-Test
A one-sample t-test compares a mean to a population or hypothesized value.
A one-sample t-test requires a normal distribution; however, if underlying distribution is not normal, then by the rules outlined in central limit theorem (CLT), the sampling distribution is the approximately normal only if \(n >= 30\).
t.test() Command for One-Sample t-Test
In base R, the t.test() command discussed above is useful for getting the t for a one-sample t-test. The command arguments include the name of the variable and the hypothesized or population value (\(\mu\)) to compare it to. We can also include the alternative statement to signify a right-tailed, left-tailed, or two-tailed test.
One-Sample Step 1: Specify Null and Alternative Hypothesis
In order to set up the appropriate null and alternative hypothesis, we need to determine the type of test: two-tailed, right-tailed, or left-tailed.
Two-Tailed Test
Reject \(H_0\) on either side of the hypothesized value of the population parameter.
\(H_0: \mu =\mu_0\) versus \(H_A\): \(\mu \neq \mu_0\).
The \(\neq\) symbol in \(H_A\) indicates that both tail areas in the distribution will be used to make the decision regarding the rejection of \(H_0\).
A not equal to sign in the alternative signifies a two-tailed test.
An example of a two-tailed problem: A Dean is interested if the hours of study time per week is different than 12 hours.
\(H_0\): Study hours is not different than 12 hours (\(=\)).
\(H_A\): Study hours is different than 12 hours. (\(\neq\)).
Example of command in R:
two.sided for two-tailed. However, this is the default value so it could be left out of the statement for the same results.
Hours
Min. : 4.00
1st Qu.:11.00
Median :16.00
Mean :16.37
3rd Qu.:20.50
Max. :35.00
OneSample <-t.test(StudyHours$Hours, alternative ="two.sided", mu =12)OneSample
One Sample t-test
data: StudyHours$Hours
t = 3.5842, df = 34, p-value = 0.001047
alternative hypothesis: true mean is not equal to 12
95 percent confidence interval:
13.89281 18.85005
sample estimates:
mean of x
16.37143
# This test is significant at the .01 level as indicated by the# p-value of .001047. We are 95% confident that our true mean lies# between 13.89281 and 18.85005
One-Tailed Test
Reject \(H_0\) only on one side of the hypothesized value of the population parameter.
\(H_0\): \(\mu <=\mu_0\) versus \(H_A\): \(\mu > \mu_0\) (right-tailed test).
\(H_0\): \(\mu >=\mu_0\) versus \(H_A\): \(\mu < \mu_0\) (left-tailed test).
Note that the inequality in the alternative \(H_A\) determines which tail area will be used to make the decision regarding the rejection of \(H_0\), right (>) or left (<).
A greater than sign in the alternative signifies a right-tailed test.
An example of a right-tailed problem: A Dean is interested if the hours of study time per week is greater than 10 hours.
\(H_0\): Study hours are less than or equal to than 10 hours a week. (\(<=\)) \(H_A\): Study hours are greater than 10 hours a week. (\(>\))
Example of right-tailed command in R:
greater for right-tailed.
\(\mu\) of interest is 10 for 10 hours a week.
OneSampleRight <-t.test(StudyHours$Hours, alternative ="greater", mu =10)OneSampleRight
One Sample t-test
data: StudyHours$Hours
t = 5.224, df = 34, p-value = 4.398e-06
alternative hypothesis: true mean is greater than 10
95 percent confidence interval:
14.3091 Inf
sample estimates:
mean of x
16.37143
# test statistic is 5.224 p-value is very small, close to 0. This# test is significant at the .001 level as indicated by the p-value# of 4.398e-06. Reject the null and find that the hours of study are# greater than 10 hours. We are 95% confident that the true mean is# greater than 14.3091
A less than sign in the alternative signifies a left-tailed test.
Example of a left-tailed problem: A Dean is interested if the hours of study time per week is less than 24 hours.
\(H_0\): Study hours are greater than or equal to than 24 hours a week. (\(>=\))
\(H_A\): Study hours are less than 24 hours a week. (\(<\))
Example of left-tailed command in R:
less for left-tailed.
\(\mu\) of interest is 24 for 24 hours a week.
OneSampleLeft <-t.test(StudyHours$Hours, alternative ="less", mu =24)OneSampleLeft
One Sample t-test
data: StudyHours$Hours
t = -6.2547, df = 34, p-value = 2.015e-07
alternative hypothesis: true mean is less than 24
95 percent confidence interval:
-Inf 18.43376
sample estimates:
mean of x
16.37143
# This test is significant at the .001 level as indicated by the# p-value of 2.015e-07. test statistic: -6.2547 p-value < .001 which# means reject the null and find that true mean is less than 24 We# are 95% confident that the true mean is less than 18.43376
One-Sample Step 2: Compute the Test Statistic and p-value
When the population standard deviation \(\sigma\) is unknown, the test statistic for testing the population mean \(\mu\) is assumed to follow the \(t\) distribution with a computed degrees of freedom (\(df\)) based on whether population variances are assumed to be equal or not.
Formula for the Test Statistic: \(t = (m_x-\mu_x)/(s_x/\sqrt(n_x))\)
In the One-sample t-test, \(m_x\) represents the mean of the variable \(x\), the variable to be tested, \(\mu_x\) is the population mean or hypothesized value of the variable, \(s_x\) is the sample standard deviation of \(s\), and \(n\) is the sample size.
R will compute the \(df\) based on the parameters set in the t.test() command along with the test statistic, so we can rely on it to calculate this for us.
One-Sample Step 3: Interpret and Conclude
Set alpha (\(\alpha\)) to a common level, like .05, and compare it to the calculated p-value from your output in R.
Reject \(H_0\) if p-value is less than \(\alpha\) value. This means we show support for our alternative hypothesis \(H_A\), which is against status quo.
Interpret the results in plain English.
Example of a One-Sample t-Test in R
To conduct a one-sample t-Test in R, first, let’s read in a dataset nhanes2016.csv from your book files and go ahead and load the tidyverse. Tidyverse is loaded so that we can subset the dataset in the next example.
Second, lets select a variable to test. We are going to compare the mean of variable BPXSY1 to 120, which makes it a two-tailed test (\(= vs \neq\)) with the \(\mu\) set to 120.
Once we have our problem idea, we can go through the steps listed above.
Step 1: Write null and alternative.
H0: There is no difference between mean systolic BP and the cutoff for normal BP, 120 mmHG.
HA: There is a difference between mean systolic BP and the cutoff for normal BP, 120 mmHG.
Step 2: Calculating test-statistic and p-value.
Again, the t.test() command does both, the statement and output are listed below.
t.test(x = nhanes.2016$BPXSY1, mu =120, alternative ="two.sided")
One Sample t-test
data: nhanes.2016$BPXSY1
t = 2.4491, df = 7144, p-value = 0.01435
alternative hypothesis: true mean is not equal to 120
95 percent confidence interval:
120.1077 120.9711
sample estimates:
mean of x
120.5394
### p-value is 0.01435 At an alpha of .05 - we reject the null. At### an alpha of .01 or .001 - we fail to reject the null.## We are 95$ confident that our true mean is in between 120.1077 and## 120.9711 sample mean of 120.5394
Step 3: Interpret probability and results.
We found a test statistic is 2.4491, and a corresponding p-value is .01435. If our alpha was set to .05, .01435 is smaller than that (p-value < alpha) This indicates that we should reject the null (\(H_0\)) and support the alternative (\(H_A\)) that the true mean is not equal to 120. This is in regards to blood pressure.
Therefore, the sample mean systolic BP not equal to 120, or more formally the mean systolic blood pressure in a sample of 7,145 people was 120.54 (sd = 18.62). And our one-sample t-test found this mean to be statistically significantly different from the hypothesized mean of 120 [t(7144) = 2.449; p = 0.01435].
This indicates that the sample likely came from a population with a mean systolic blood pressure not equal to 120, signifying that the true population is likely not equal to 120.
Take note that if we used a smaller alpha (like .01 or .001), we would not pass the test and instead fail to reject the null.
Example of a One-Sample Using a Subset
Let’s do another example using a subset of the data. In this example, let’s create a subset of people 65+ years old to run the same two-tailed test. This allows us to see if the people 65 years and older had a different blood pressure than 120.
In the command below, we use tidyverse to set up a new data frame object named nhanes.2016.65plus to save the filtered data.
H0: There is no difference between mean systolic BP and the cutoff for normal BP, 120 mmHG when people are 65 or over.
HA: There is a difference between mean systolic BP and the cutoff for normal BP, 120 mmHG when people are 65 or over.
Step 2: Calculating test-statistic and p-value.
Again, we can use one t.test() command to handle steps 2 and 3.
t.test(x = nhanes.2016.65plus$BPXSY1, mu =120)
One Sample t-test
data: nhanes.2016.65plus$BPXSY1
t = 29.238, df = 1232, p-value < 2.2e-16
alternative hypothesis: true mean is not equal to 120
95 percent confidence interval:
135.4832 137.7106
sample estimates:
mean of x
136.5969
### Reject Ho and find that Systolic BP is different for people 65### and older.
Step 3: Interpret probability and results.
Based on our output, we found a test statistic is 29.238, and a corresponding p-value is .000. If our alpha was set to .05, .000 is smaller than that (p-value < alpha). This indicates that we should reject the null and support the alternative that the true mean is not equal to 120 for people 65 and older. This again is in regards to blood pressure.
The mean systolic blood pressure in a sample of 1233 NHANES participants who were age 65 and above was 136.60 (sd = 19.93). The mean systolic blood pressure was found to be statistically significantly different from the hypothesized mean of 120 via a one-sample t-test (t(1232) = 29.238, p < 0.001). The true mean systolic blood pressure in the population of adults 65 and older is likely not equal to 120.
Independent Samples t-Test
Independent samples t-test compares the means of two unrelated groups, so instead of just one population, we have 2 mutually exclusive groups.
More formally, two (or more) random samples are considered independent if the process that generates one sample is completely separate from the process that generates the other sample.
The samples are clearly delineated.
\(\mu_1\) is the mean of the first population.
\(\mu_2\) is the mean of the second population.
The sampling distribution is assumed to be normal and the linear combination of normally distributed random variables is also normally distributed.
If underlying distribution is not normal, then by the CLT, the sampling distribution is approximately normal only if both \(n_1 >= 30\) and \(n_2 >= 30\).
t.test() Command for Independent Samples t-Test
In base R, the t.test() command is useful for getting the t for the Independent samples t-test. This is the same t.test() command used above.
The command needs to be altered to handle the second group. In the command, the arguments include a formula which is formatted as t.test(continuous_variable ~ grouping_variable) or t.test(population1, population2). Deciding on the correct format of the argument depends on the shape of the data discussed later on.
Tilde or Comma
Using the ~ symbol between variables: This method allows you to run the t-test without having to split the data manually if your data is not already split. You specify the numerical variable (your continuous measurement) on the left and the categorical variable (which identifies the two groups) on the right of the ~ symbol.
Independent Samples Step 1: Specify Null and Alternative Hypothesis
When conducting hypothesis tests concerning \(\mu_1 - \mu_2\) , the competing hypotheses will take one of the following forms:
Where \(d_0\) is the hypothesized difference between \(\mu_1 - \mu_2\).
Independent Samples Table
A separate (\(d_0\)) value can be set here under the (\(\mu\)) parameter if the value is not equal to 0, however 0 is the most common which tests for differences between populations (i.e., is the mean of population 1 different than mean of population 2). The alternative statement can be set at “two.sided”, “greater”, or “less”.
Example of a two-tailed test:
Does spending differ between men and women?
\(H_0\): No difference in spending between men and women. (\(=\)).
\(H_A\): There is a difference in spending between men and women (\(\neq\)).
Example of a two-tailed command in R:
The statement assumes that men’s spending and women’s spending are in 2 different columns in the data set. Therefore, the format t.test(var1, var2) should be used.
Spend <-read.csv("data/Spend.csv")summary(Spend)
MenSpend WomenSpend
Min. : 49.00 Min. : 13.00
1st Qu.: 89.25 1st Qu.: 62.50
Median : 99.00 Median : 84.00
Mean :100.90 Mean : 87.23
3rd Qu.:117.75 3rd Qu.:100.50
Max. :140.00 Max. :220.00
t.test(Spend$MenSpend, Spend$WomenSpend, alternative ="two.sided", mu =0)
Welch Two Sample t-test
data: Spend$MenSpend and Spend$WomenSpend
t = 1.559, df = 45.469, p-value = 0.1259
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.984055 31.317388
sample estimates:
mean of x mean of y
100.90000 87.23333
# This test is not significant as indicated by a p-value = 0.1259# which is greater than .05. 95% confident that the true mean# difference is in between -3.984055 and 31.317388
The alternative \(H_A\) and \(\mu\) are not changed off their default values, so the statement above could be simplified from what is provided to the statement below.
t.test(Spend$MenSpend, Spend$WomenSpend)
Welch Two Sample t-test
data: Spend$MenSpend and Spend$WomenSpend
t = 1.559, df = 45.469, p-value = 0.1259
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-3.984055 31.317388
sample estimates:
mean of x mean of y
100.90000 87.23333
# Take note of the order of the means present in the t-test function.mean(Spend$MenSpend) #100.9
[1] 100.9
mean(Spend$WomenSpend) #87.23333
[1] 87.23333
Example of a right-tailed test: Is men’s spending (\(\bar{x}_1\)) greater than that of women’s spending (\(\bar{x}_2\))?
\(H_0\): Men’s spending is less than or equal to women’s spending (\(<=\)).
\(H_A\): Men’s spending is greater than women’s spending (\(>\)).
Example of a right-tailed command in R:
Again, the statement assumes that men’s spending and women’s spending are in 2 different columns in the data set. The alternative is changed off the default value, so the statement parameter is required.
t.test(Spend$MenSpend, Spend$WomenSpend, alternative ="greater")
Welch Two Sample t-test
data: Spend$MenSpend and Spend$WomenSpend
t = 1.559, df = 45.469, p-value = 0.06296
alternative hypothesis: true difference in means is greater than 0
95 percent confidence interval:
-1.0521 Inf
sample estimates:
mean of x mean of y
100.90000 87.23333
# This test is not significant as indicated by a p-value = 0.06296# which is greater than .05.
Example of a left-tailed test: Is men’s spending (\(\bar{x}_1\)) less than that of women’s spending (\(\bar{x}_2\))?
\(H_0\): Men’s spending is greater than or equal to women’s spending (\(>=\)).
\(H_A\): Men’s spending is less than women’s spending (\(<\)).
Example of a left-tailed command in R:
Again, the statement assumes that men’s spending and women’s spending are in 2 different columns in the data set. The alternative is changed off the default value, so the statement parameter is required.
t.test(Spend$MenSpend, Spend$WomenSpend, alternative ="less")
Welch Two Sample t-test
data: Spend$MenSpend and Spend$WomenSpend
t = 1.559, df = 45.469, p-value = 0.937
alternative hypothesis: true difference in means is less than 0
95 percent confidence interval:
-Inf 28.38543
sample estimates:
mean of x mean of y
100.90000 87.23333
# This test is not significant as indicated by a p-value = 0.937# which is greater than .05.
Independent Samples Step 2: Compute the Test Statistic and p-value
Formula for the Test Statistic: \(t = (m_1 - m_2)/\sqrt((s^2_1/n_1)+(s^2_2/n_2))\)
In the independent samples t-test formula, \(m_1\) is the mean of one group and \(m_2\) is the mean of the other group; the difference between the means makes up the numerator. The larger the difference between the group means, the larger the numerator will be and the larger the t-statistic will be!
The denominator includes the variances for the first group, \(s^2_1\), and the second group, \(s^2_2\) along with the sample sizes for each group, \(n_1\) and \(n_2\).
For the independent samples t-test, degrees of freedom are approximated using the Welch-Satterthwaite equation (computed automatically by R).
There is a 95% confidence interval around the difference between the groups.
Calculate the p-value and Compare it to a Predetermined Alpha level
Independent Samples Step 3: Interpret and Conclude
Example of an Independent Samples t-Test in R
Using the nhanes data set, let’s bring over some code from the last section so that we can work this example. This includes reading in the data set and creating the subset for the ages 65 and over.
When conducting independent samples, grouping variables are common, so you can see if there is a difference between groups. In this example, we can test for a difference in blood pressure by sex (Male vs Female). This is coded as 1 and 2 in the data set, so we need to recode it before we begin the official test.
We can do a quick check of the mean values before we begin by either using a tapply or tidyverse. The means seem close here, but that does not mean that we won’t find significance. We need to do the t-test to formally confirm whether a mean difference is present or not.
The tapply uses tidyverse and the group_by uses tidyverse to calculate these mean differences.
tapply() is a base R function used to apply a function, such as mean or sum, to subsets of a vector that are defined by one or more grouping factors. It takes three main arguments: the vector to be analyzed, the grouping factor(s), and the function to apply. This function is useful for performing quick, grouped summaries in base R. However, while tapply() is concise, it is less flexible than tidyverse tools like dplyr when handling more complex data manipulations or when working with data frames.
In the tidyverse, the group_by() and summarise() functions from the dplyr package are commonly used together to calculate summary statistics like the mean for different groups within a dataset. The group_by() function is used to define the grouping variable(s), essentially splitting the data into subsets based on the unique values of that variable. Once the data is grouped, summarise() is used to apply a summary function, such as mean(), to each group.
# A tibble: 2 × 2
sex mean_systolic
<fct> <dbl>
1 Male 122.
2 Female 119.
Step 1: Write null and alternative.
\(H_0\): There is no difference between systolic blood pressure for males and females (\(=\)).
\(H_A\): There is a difference between systolic blood pressure for males and females (\(\neq\)).
This is a two tailed test.
Step 2: Calculating test-statistic and p-value.
The statement below assumes that the grouping variable (sex) is in one column, and the continuous variable (systolic) is in another column. In this case, instead of a comma between the two columns of sex (like above), we write the statement as t.test(continuousVar ~ groupingVar) like below.
# All other parameters can be left at default valuest.test(nhanes.2016.cleaned$systolic ~ nhanes.2016.cleaned$sex)
Welch Two Sample t-test
data: nhanes.2016.cleaned$systolic by nhanes.2016.cleaned$sex
t = 7.3135, df = 7143, p-value = 2.886e-13
alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
95 percent confidence interval:
2.347882 4.067432
sample estimates:
mean in group Male mean in group Female
122.1767 118.9690
Step 3: Interpret probability and results.
We found a test statistic is 7.3135, and a corresponding p-value is .000. If our alpha was set to .05, .000 is smaller than that (p-value < alpha). This indicates that we should reject the null and support the alternative that the men and women had different systolic blood pressure. Thus, there is a statistically significant difference in the mean blood pressure of males and females in the 2016 data set (t(7143) = 7.3135, p = .000).
Example Independent t-test Using a Subset
Create a subset of the data frame of people 65+ years old to run the same test on blood pressure (two-tailed test).
Welch Two Sample t-test
data: nhanes.2016.65plus.clean$systolic by nhanes.2016.65plus.clean$sex
t = -2.0141, df = 1231, p-value = 0.04422
alternative hypothesis: true difference in means between group Male and group Female is not equal to 0
95 percent confidence interval:
-4.5080955 -0.0591939
sample estimates:
mean in group Male mean in group Female
135.4486 137.7323
Step 3: Interpret probability and results.
We found a test statistic is -2.0141, and a corresponding p-value is 0.04422. If our alpha was set to .05, 0.04422 is smaller than that (p-value < alpha). This indicates that we should reject the null and support the alternative that the men and women 65 or older had different systolic blood pressure. Thus, there is a statistically significant difference in the mean blood pressure of males and females over the age of 65 participating in the 2016 NHANES (t(1231) = -2.01, p = 0.044).
Dependent Samples t-Test
Dependent samples or a Paired samples t-test compares the means of two related groups.
The Dependent means there is a deliberate Pairing, or more specifically Matched Pairing that generally come from 2 methods:
“Before” and “after” studies characterized by a measurement, some type of intervention, and another measurement, all on the same subject.
Example: Measuring the weight of clients before and after a diet plan.
A pairing of observations, where it is not on the same subject that gets sampled twice.
Example: Matching 20 adjacent plots of land using a nonorganic fertilizer on one half of the plot and an organic fertilizer on the other.
The parameter of interest is the mean difference D where \(D = x_1 - x_2\) and the random variable \(x_1\) and \(x_2\) are a matched pair.
This works under the assumption regarding CLT that both \(x_1\) and \(x_2\) are normally distributed or \(n >= 30\).
t.test() Command for Dependent Samples t-Test
Option 1: Use the paired = TRUE parameter. The default for the command is paired = FALSE, which performs an independent samples t-test.
Option 2: Manually calculate the difference between the paired values and pass that as the single variable to t.test(). This is equivalent to the paired approach.
Summary: In a paired t-test, you’re comparing two related sets of observations, such as measurements taken from the same subjects under different conditions. The key is that each observation in one group has a corresponding observation in the other group. In R, you can perform a paired t-test by either specifying paired = TRUE in the t.test function or by directly calculating the differences between the paired observations and running the t-test on those differences. If you choose the second approach, you subtract one set of values from the other and then use that result as the first argument in the t.test function (essentially calculating the \(\mu_d\). This directly tests whether the mean of the differences is significantly different from zero, giving the same result as a paired t-test but with the differences calculated explicitly beforehand. Either method is fine.
Paired Step 1: Specify Null and Alternative Hypothesis
When conducting hypothesis tests concerning \(\mu_d\) , the competing hypotheses will take one of the following forms:
Right-tailed Test: \(H_0: \mu_d <= d_0\) versus \(H_A: \mu_d> d_0\)
Left-tailed Test: \(H_0: \mu_d >= d_0\) versus \(H_A: \mu_d< d_0\)
Where \(d_0\) typically is equal to 0, but not always. If \(d_0\) is something other than 0, then you change the mu parameter in R in the t.test() command.
Dependent Samples Table
Example of a two-tailed test:
Is the weight before and after quitting smoking different from each other?
The before and after signifies a paired test.
\(H_0\): No difference in weight before and after smoking (\(=\)).
\(H_A\): There is a difference in weight before and after smoking (\(\neq\)).
Example of a two-tailed command in R for a paired test:
Smoke <-read.csv("data/Smoke.csv")# Other default values are left off this statement. Assumes a mu at# 0.t.test(Smoke$After, Smoke$Before, paired =TRUE)
Paired t-test
data: Smoke$After and Smoke$Before
t = 6.6072, df = 49, p-value = 2.695e-08
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
4.870942 9.129058
sample estimates:
mean difference
7
# This test is significant at the .001 level as indicated by the# p-value of 2.695e-08.
Example of a right-tailed test:
Is the weight after quitting smoking (\(\mu_1\)) greater than the weight before (\(\mu_2\))?
\(H_0\): Weight after smoking is less than or equal to the weight before (\(<=\)).
\(H_A\): Weight after smoking is greater the weight to before (\(>\)).
Example of a right-tailed command in R for a paired test:
# The paired statement is true (due to the before after) If we bring# WeightBeforeQuitting to the right side of the equation we get# WeightAfterQuitting > WeightBeforeQuitting, which is what our# hypothesis wants. Therefore, in the statement below, after comes# first as m1 followed by a comma, followed by the before as m2. The# alternative parameter is then set to 'greater' Other default values# are left off this statementt.test(Smoke$After, Smoke$Before, paired =TRUE, alternative ="greater")
Paired t-test
data: Smoke$After and Smoke$Before
t = 6.6072, df = 49, p-value = 1.347e-08
alternative hypothesis: true mean difference is greater than 0
95 percent confidence interval:
5.223767 Inf
sample estimates:
mean difference
7
# This test is significant at the .001 level as indicated by the# p-value of 1.347e-08.
Example of a left-tailed test:
Is the weight after quitting smoking (\(\mu_1\)) less than the weight before (\(\mu_2\))?
\(H_0\): Weight after smoking is greater than or equal to the weight before (\(>=\)).
\(H_A\): Weight after smoking is less to the weight before (\(<\)).
Example of a left-tailed command in R for a paired test:
# The alternative parameter switches to 'less' leaving all else the# same as above. Again, other default values are left off this# statementt.test(Smoke$After, Smoke$Before, paired =TRUE, alternative ="less")
Paired t-test
data: Smoke$After and Smoke$Before
t = 6.6072, df = 49, p-value = 1
alternative hypothesis: true mean difference is less than 0
95 percent confidence interval:
-Inf 8.776233
sample estimates:
mean difference
7
# This test is not significant as indicated by a p-value of 1, which# is greater than .05.
Paired Step 2: Compute the Test Statistic and p-value
Formula for the Test Statistic: \(t = (m_d-d_0)/\sqrt((s^2_d/n_d)\)
Dependent samples t-test formula: The \(m_d\) is the mean of the differences between the related measures, the \(s^2_d\) is the variance of the mean difference between the measures, and \(n_d\) is the sample size. The dependent samples t-test worked a little differently from the independent samples t-test. In this case, the formula uses the mean of the differences between the two related measures.
Calculate the p-value and Compare it to a Predetermined Alpha level
Paired Step 3: Interpret and Conclude
The steps involved with setting up the null and alternative hypotheses and calculating the test statistic are based on different rules and formulas for all three t-tests, but the rest of the steps are the same.
Example of an Dependent Samples t-Test in R
Given the nhanes.2016 dataset, let’s ask the following research question: Is there a difference between measure 1 and 2 for systolic BP?
First, bring back in the data set from the last section.
Paired t-test
data: nhanes.2016.cleaned$systolic and nhanes.2016.cleaned$systolic2
t = 9.3762, df = 7100, p-value < 2.2e-16
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
0.4310514 0.6589360
sample estimates:
mean difference
0.5449937
Step 3: Interpret probability and results
We found a test statistic is 9.3762, and a corresponding p-value is 0.000. If our alpha was set to .05, 0.000 is smaller than that (p-value < alpha). This indicates that there is a difference in systolic blood pressure 1 and 2. Thus, there is a statistically significant difference in the mean blood pressure of reading 1 and 2 in the 2016 NHANES (t(7100) = 9.3762, p = 0.000).
reshape function
The reshape function allows you to restructure a data frame by specifying a timevar (The variable in the data that defines the different time points or conditions), and idvar (The variable that uniquely identifies each subject or observational unit), and a direction (“wide” or “long”)
The syntax is as follows: reshape(data, timevar, idvar, direction, …)
The direction of reshaping, either “wide” to spread the data across columns or “long” to gather the data into rows.
data("sleep")summary(sleep)
extra group ID
Min. :-1.600 1:10 1 :2
1st Qu.:-0.025 2:10 2 :2
Median : 0.950 3 :2
Mean : 1.540 4 :2
3rd Qu.: 3.400 5 :2
Max. : 5.500 6 :2
(Other):8
sleep2 <-reshape(sleep, timevar ="group", idvar ="ID", direction ="wide")summary(sleep2)
ID extra.1 extra.2
1 :1 Min. :-1.600 Min. :-0.100
2 :1 1st Qu.:-0.175 1st Qu.: 0.875
3 :1 Median : 0.350 Median : 1.750
4 :1 Mean : 0.750 Mean : 2.330
5 :1 3rd Qu.: 1.700 3rd Qu.: 4.150
6 :1 Max. : 3.700 Max. : 5.500
(Other):4
Paired t-test
data: sleep2$extra.1 and sleep2$extra.2
t = -4.0621, df = 9, p-value = 0.002833
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-2.4598858 -0.7001142
sample estimates:
mean difference
-1.58
Using tidyverse, we could get the same effect using the filter, select, and rename functions which are a part of dplyr. This method involves dividing the dataset into two separate groups based on a categorical variable. You then run the t-test on these two groups. In the code below, we use filter to divide by group and then after selecting and renaming the columns we need, we join back together using a full_join.
Using full_join ensures that all IDs are included in the resulting dataset, even if they only appear in one of the groups. This can be useful if your data might have missing group entries for some IDs.
data("sleep")summary(sleep)
extra group ID
Min. :-1.600 1:10 1 :2
1st Qu.:-0.025 2:10 2 :2
Median : 0.950 3 :2
Mean : 1.540 4 :2
3rd Qu.: 3.400 5 :2
Max. : 5.500 6 :2
(Other):8
Paired t-test
data: sleep2$extra.1 and sleep2$extra.2
t = -4.0621, df = 9, p-value = 0.002833
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-2.4598858 -0.7001142
sample estimates:
mean difference
-1.58
In either case, we reject the null hypotheses and find that the type of drug does affect the sleep in patient
Review and Practice
Using AI
Using AI
Use the following prompts with our chatbot (bottom right of this page) to explore these concepts further.
Can you explain the difference between the null hypothesis and the alternative hypothesis, and why is it important to set up both correctly before conducting a t-test?
What factors determine whether you should use a one-sample, independent samples, or dependent samples t-test, and how do the results differ based on the test chosen?
When should I use a two-tailed t-test compared to a right- or left-tailed test, and how does the alternative hypothesis guide this decision?
What are the key arguments in the t.test() function in R, and how do I modify these arguments to perform different types of t-tests, such as two-tailed or paired t-tests?
How do I interpret the p-value in the context of hypothesis testing, and how does it relate to rejecting or failing to reject the null hypothesis?
How does the confidence interval calculated during a t-test help in understanding the range of possible population parameters, and why is it important in hypothesis testing?
What is the difference between an independent samples t-test and a dependent (paired) samples t-test, and in what situations should each be used?
Prompting AI effectively for hypothesis testing tasks:
Vague prompt
Specific prompt
“Run a t-test”
“Run a two-sample t-test comparing home_value between Summer and Winter observations in zillow.csv. State the null and alternative hypotheses, run t.test(), and interpret the p-value at α = 0.05. Note whether the result is statistically significant and whether the effect size appears practically meaningful.”
“Is this significant?”
“I ran t.test(score ~ group, data=mydata) and got t = 2.14, df = 48, p = 0.037. Explain: (1) what the p-value means precisely, (2) whether I should reject the null at α = 0.05, and (3) what the 95% confidence interval tells me about the size of the difference.”
“Compare these groups”
“I have pre- and post-training scores for the same 30 employees. Is a paired or independent samples t-test correct here? Write the t.test() code with paired = TRUE or paired = FALSE as appropriate, and explain why.”
Verification checklist for AI-generated t-test output:
After AI generates a t-test result and interpretation, check three things: (1) does the p-value actually fall below your stated alpha? — do not rely on AI’s “significant” label; read the p-value yourself; (2) did AI correctly identify whether this is one-tailed or two-tailed? — wrong tail direction is a common AI error; (3) does AI’s interpretation correctly distinguish between statistical significance (p < α) and practical significance (is the effect large enough to act on)? — AI will almost never raise the practical significance question unless you ask explicitly.
t-Test Lab
A campus dining service claims that the average wait time at peak lunch hours is 8 minutes. A student researcher times 40 randomly selected visits and finds a mean wait of 9.4 minutes (sd = 3.1 minutes). Without running the code, set up the null and alternative hypotheses for a two-tailed test, and identify which t-test is appropriate and why.
Show Answer
This is a one-sample t-test because we are comparing a single sample mean (9.4 minutes) to a known or claimed population value (8 minutes).
\(H_0\): The mean wait time is equal to 8 minutes. (\(\mu = 8\))
\(H_A\): The mean wait time is not equal to 8 minutes. (\(\mu \neq 8\))
Two-tailed because the researcher is testing for any difference, not a specific direction.
t.test(wait_times, mu =8, alternative ="two.sided")
Identify whether each scenario below calls for a one-sample, independent samples, or dependent samples t-test. Briefly justify each choice.
A hospital compares the recovery times of patients assigned to two different physical therapy programs.
A nutritionist records a client’s sodium intake before and after a four-week dietary intervention.
A manufacturer wants to know if the average output voltage of a new circuit board differs from the specification of 5.0 volts.
A university compares the GPA of students who attended orientation versus those who did not.
Show Answer
a. Independent samples t-test. Two separate, unrelated groups (different patients in different programs). Each patient appears in only one group.
b. Dependent (paired) samples t-test. The same clients are measured twice — before and after. Each “before” observation is directly linked to the same person’s “after” observation.
c. One-sample t-test. One sample of circuit boards is being compared to a single known specification value (5.0 volts), not to another group.
d. Independent samples t-test. Two distinct, unrelated groups of students — those who attended orientation and those who did not. No natural pairing between individuals across groups.
A two-tailed one-sample t-test returns a t-statistic of 2.31 with a p-value of 0.024. Alpha is set at 0.05. State the conclusion in plain English, including both the formal decision and a sentence that communicates the result to a non-statistician.
Show Answer
Formal decision: Since the p-value (0.024) is less than alpha (0.05), we reject the null hypothesis.
Plain English conclusion: The sample provides statistically significant evidence that the true population mean is different from the hypothesized value. In other words, the difference we observed is unlikely to be due to random chance alone — there is real evidence that the actual average differs from what was claimed.
Note: If alpha had been set to 0.01, the p-value of 0.024 would be greater than alpha, and we would fail to reject the null. The conclusion depends on the alpha level chosen before running the test.
A researcher wants to test whether employees who participate in a wellness program report fewer sick days than those who do not. She collects data on 45 participants and 45 non-participants. Write out the null and alternative hypotheses, identify the test, specify the alternative argument in R’s t.test() function, and explain how the direction of the hypothesis affects which tail is examined.
Show Answer
Test: Independent samples t-test — two separate, unrelated groups (participants vs. non-participants).
Let \(\mu_1\) = mean sick days for wellness program participants and \(\mu_2\) = mean sick days for non-participants.
\(H_0: \mu_1 - \mu_2 \geq 0\) (participants have the same or more sick days)
\(H_A: \mu_1 - \mu_2 < 0\) (participants have fewer sick days — left-tailed)
t.test(sick_days ~ group, alternative ="less")
Why left-tailed? The hypothesis is directional — the researcher specifically predicts that participants will have fewer sick days, meaning \(\mu_1 < \mu_2\), which places the rejection region in the left tail of the distribution. If the data showed participants had more sick days, the test would not reject the null regardless of the magnitude of the difference.
A before-and-after study measures employee productivity scores before and after a training program for 12 employees. The differences (after minus before) are:
4, -1, 6, 3, 5, 2, 0, 7, 3, -2, 4, 5
By hand: (a) calculate the mean difference \(\bar{d}\), and (b) state what value of mu you would use in the t.test() call and why.
The average productivity score increased by 3.0 points after training.
(b) Value of mu:
t.test(after, before, paired =TRUE, alternative ="two.sided", mu =0)
We use mu = 0 because the null hypothesis for a paired test is that the mean difference is zero — meaning training had no effect on productivity. We are testing whether the observed mean difference of 3.0 is significantly different from zero.
Interpret the following R output from a dependent samples t-test. State the conclusion, identify the direction of the difference from the mean estimate, and explain what the confidence interval tells you.
Paired t-test
t = 4.1832, df = 11, p-value = 0.001501
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
1.463852 4.536148
sample estimates:
mean of the differences
3
Show Answer
Conclusion: Since the p-value (0.0015) is less than alpha (0.05), we reject the null hypothesis. There is statistically significant evidence that the mean difference is not equal to zero — the training program did change productivity scores (t(11) = 4.18, p = 0.0015).
Direction: The mean difference is positive (3.0), meaning scores were higher after the training program than before. The program appears to have increased productivity on average.
Confidence interval: We are 95% confident that the true mean increase in productivity falls between 1.46 and 4.54 points. Because the entire interval is above zero (it does not include zero), this corroborates the significant p-value — we are confident the true effect is positive and meaningful, not just a chance fluctuation.
Interactive R Lesson: t-Tests
Note
This lesson covers the one-sample, independent samples, and dependent (paired) samples t-test using built-in R datasets. All calculations use base R — no additional packages needed.
You can run real R code directly in the browser — no installation needed. Work through each section in order, then test yourself with the scored quiz at the bottom.
First-time load: The interactive R environment may take 10–20 seconds to initialize on your first visit. Once the Run Code buttons become active, you are ready to go.
Part 1: One-Sample t-Test
A one-sample t-test compares a single sample mean to a known or hypothesized population value. The null hypothesis always contains the equality sign; the alternative states the contradiction.
Load and summarize mtcars:
Test whether the true mean mpg is not equal to 25 (two-tailed):
Note
The p-value is well below 0.05 — we reject H₀ and conclude the true mean mpg is significantly different from 25. The sample mean is approximately 20, which is the direction driving the result.
Test whether the true mean mpg is less than 25 (left-tailed):
Test whether the true mean mpg is greater than 25 (right-tailed):
Tip
When the true mean is about 20 and we hypothesize 25, the left-tailed test is highly significant and the right-tailed test returns a p-value near 1 — meaning it is almost certain the mean is not greater than 25. Flipping the tail flips the probability: p-value (greater) ≈ 1 − p-value (less).
Part 2: Independent Samples t-Test
An independent samples t-test compares the means of two unrelated groups. When the grouping variable is in its own column, use the formula syntax Y ~ X.
Does the type of engine (vs) affect mpg?
Does transmission type (am: 0 = automatic, 1 = manual) affect mpg?
Do automatics get significantly less mpg than manuals (left-tailed)?
Note
Both the two-tailed and left-tailed tests are significant — manual transmissions get significantly more miles per gallon than automatics. Notice that alternative = "less" tests whether the first group (automatics, coded 0) has a smaller mean than the second (manuals, coded 1). Which group is “first” depends on the coding, so always check the group means in the output.
Part 3: Independent Samples t-Test — CO2 Dataset
Load and summarize CO2:
Does cold treatment (chilled vs. nonchilled) affect CO2 uptake?
Do nonchilled plants have significantly greater uptake than chilled plants?
Tip
Both tests are significant. Being chilled does cause plants to take up less CO₂. The group labeled first in the output is group 1 for the direction of the test — always check which group R lists as group 1 before specifying a directional alternative.
Part 4: Dependent (Paired) Samples t-Test
A paired samples t-test compares two related measurements — typically before/after or matched pairs. Each observation in group 1 is matched to a specific observation in group 2.
Load and summarize sleep:
Reshape the data from long to wide so each patient has one row:
Note
The reshape() function converts the data so that each of the 10 patients occupies one row, with extra.1 (Drug 1) and extra.2 (Drug 2) as separate columns. This structure is required for paired = TRUE in t.test().
Test whether the drug affected sleep duration:
Note
The p-value is below 0.05 — we reject H₀ and conclude that the type of drug significantly affects sleep duration. The mean of differences tells you the direction: a negative value means Drug 1 produced less extra sleep than Drug 2 on average.
Scored Quiz: t-Tests
Hypothesis Testing and t-Tests Quiz
Question 1 of 10
1. Which hypothesis always contains the equality sign?
2. A p-value of 0.03 at alpha = 0.05 means:
3. To test whether the mean mpg is less than 25 using the mtcars dataset, you would write:
4. Why does t.test(mtcars$mpg, mu = 25, alternative = "greater") return a p-value near 1?
5. When running an independent samples t-test and the grouping variable is in its own column, the correct syntax is:
6. In t.test(mtcars$mpg ~ mtcars$am, alternative = "less"), what does alternative = "less" test?
7. What distinguishes a dependent (paired) samples t-test from an independent samples t-test?
8. Why does the paired t-test on the sleep dataset require reshape() first?
9. The CO2 test t.test(CO2$uptake ~ CO2$Treatment, alternative = "greater") is significant. What is the correct conclusion?
10. Which research scenario calls for a paired samples t-test rather than an independent samples t-test?
Note
No need to submit this quiz anywhere. This exercise is for your benefit to help you learn R.
Summary
This lesson covered the logic of hypothesis testing and three t-tests in R:
Topic
Key concepts
Hypothesis testing logic
Null (\(H_0\)) vs. alternative (\(H_A\)), p-value interpretation, decision rule (p < α = reject \(H_0\))
Tail types
Two-tailed ($
eq\(), right-tailed (\)>\(), left-tailed (\)<$); alternative hypothesis determines the tail
p-value approach
Compare p-value to pre-set alpha (typically 0.05); never “accept” \(H_0\), only fail to reject
t-distribution
Used when population SD is unknown; approaches normal as \(n\) increases; measured in standard errors
Compares sample mean to a known/hypothesized value; formula \(t = (m_x - \mu_x)/(s_x/\sqrt{n})\)
Independent samples t-test
Compares means of two unrelated groups; use ~ when groups are in one column, , when in two
Dependent samples t-test
Compares two related measurements; use paired = TRUE; formula uses mean of differences \(m_d\)
reshape()
Converts long-to-wide data for paired tests; tidyverse alternative uses filter() + full_join()
Confidence intervals
Output includes 95% CI around the estimate; interpreted alongside the p-value
What comes next: The ANOVA lesson extends hypothesis testing beyond two groups. Where the independent samples t-test compares the means of two populations, a one-way ANOVA compares three or more. The F-statistic and p-value interpretation follow the same logic introduced here.