3.5 Answer Key Activity 3.5 Applied Statistics

Introduction

Applied statistics is the bridge between raw data and meaningful decisions, and Activity 3.That said, 5, explains the reasoning behind each solution, and highlights the statistical principles that the activity reinforces. This article presents a complete answer key for Activity 3.5 is a cornerstone exercise that helps students practice core concepts such as descriptive measures, probability distributions, hypothesis testing, and regression analysis. Whether you are a teacher preparing a grading rubric, a tutor checking student work, or a self‑studying learner, the detailed walkthrough below will clarify every step, reduce grading time, and deepen understanding of applied statistics.

Overview of Activity 3.5

Activity 3.Day to day, 5 typically appears in introductory textbooks on applied statistics (e. Also, g. , Applied Statistics for the Social Sciences) The details matter here. Turns out it matters..

Part	Topic	Typical Prompt
A	Descriptive statistics	Compute mean, median, mode, range, variance, and standard deviation for a given data set.
B	Probability & discrete distributions	Determine probabilities for a Binomial experiment and calculate expected value and variance.
C	Confidence interval for a population mean	Construct a 95 % confidence interval using the t‑distribution. Even so,
D	Hypothesis testing (two‑sample t‑test)	Test whether two independent samples differ significantly at α = 0. 05.
E	Simple linear regression	Estimate the regression line, interpret the slope, and evaluate model fit (R²).

The data supplied in the original textbook are reproduced here for completeness.

Data Set for Parts A & B

Observation	Value
1	12
2	15
3	9
4	14
5	11
6	13
7	10
8	16
9	8
10	12

Not the most exciting part, but easily the most useful.

Binomial Scenario for Part B

A quality‑control inspector examines 20 units, each with a 0.25 probability of being defective The details matter here..

Sample Statistics for Part C

Sample size (n) = 25
Sample mean ( (\bar{x}) ) = 78.4
Sample standard deviation (s) = 9.2

Two‑Sample Data for Part D

Group	n	(\bar{x})	s
A (Control)	30	82.1	7.5
B (Treatment)	28	86.3	6.

Regression Data for Part E

X (Hours studied)	Y (Test score)
2	68
4	75
6	80
8	85
10	92

Part A – Descriptive Statistics

1. Mean ( (\bar{x}) )

[ \bar{x} = \frac{\sum_{i=1}^{10} x_i}{10} = \frac{12+15+9+14+11+13+10+16+8+12}{10} = \frac{120}{10} = 12.0 ]

2. Median

Ordered data: 8, 9, 10, 11, 12, 12, 13, 14, 15, 16
With an even number of observations, the median is the average of the 5th and 6th values:

[ \text{Median} = \frac{12 + 12}{2} = 12 ]

3. Mode

The value 12 appears twice, more frequently than any other number → Mode = 12 Turns out it matters..

4. Range

[ \text{Range} = \max(x) - \min(x) = 16 - 8 = 8 ]

5. Variance ( (s^{2}) )

[ s^{2} = \frac{\sum (x_i-\bar{x})^{2}}{n-1} ]

[ \begin{aligned} \sum (x_i-\bar{x})^{2} &= (12-12)^{2}+(15-12)^{2}+…+(12-12)^{2}\ &= 0+9+9+4+1+1+4+16+16+0 = 60 \end{aligned} ]

[ s^{2} = \frac{60}{9}=6.667 ]

6. Standard Deviation (s)

[ s = \sqrt{s^{2}} = \sqrt{6.667}\approx 2.58 ]

Answer Key – Part A

Mean = 12.0
Median = 12
Mode = 12
Range = 8
Variance = 6.67 (rounded to two decimals)
Standard deviation = 2.58

Part B – Binomial Probability

The experiment: n = 20 trials, p = 0.In practice, 25 (defective). Let X be the number of defective units Easy to understand, harder to ignore..

1. Probability of exactly 5 defectives

[ P(X=5)=\binom{20}{5}p^{5}(1-p)^{15} ]

[ \binom{20}{5}=15504,; p^{5}=0.25^{5}=0.0009766,; (1-p)^{15}=0.75^{15}=0.0133 ]

[ P(X=5)=15504 \times 0.0009766 \times 0.0133 \approx 0.201 ]

2. Probability of at most 3 defectives

[ P(X\le 3)=\sum_{k=0}^{3}\binom{20}{k}p^{k}(1-p)^{20-k} ]

Calculations (rounded):

k	(\binom{20}{k})	(p^{k})	((1-p)^{20-k})	Product
0	1	1	0.75^{20}=0.In practice, 0032	0. 0032
1	20	0.Consider this: 25	0. Worth adding: 75^{19}=0. 0042	0.0209
2	190	0.And 0625	0. Which means 75^{18}=0. 0056	0.0665
3	1140	0.On top of that, 0156	0. 75^{17}=0.0075	0.

[ P(X\le 3)=0.0032+0.0209+0.0665+0.1334\approx 0.224 ]

3. Expected value (μ) and variance (σ²)

[ \mu = np = 20 \times 0.25 = 5 ]

[ \sigma^{2}=np(1-p)=20 \times 0.25 \times 0.75 = 3.75 ]

Answer Key – Part B

(P(X=5) \approx 0.201) (20.1 %)
(P(X\le 3) \approx 0.224) (22.4 %)
Expected number of defectives = 5
Variance = 3.75

Part C – Confidence Interval for a Population Mean

Given: n = 25, (\bar{x}=78.4), s = 9.2, confidence level = 95 %. Because σ is unknown and n < 30, use the t‑distribution Easy to understand, harder to ignore..

1. Degrees of freedom (df)

(df = n-1 = 24)

2. Critical t‑value

(t_{0.025,24} \approx 2.064) (from t‑table) Which is the point..

3. Standard error (SE)

[ SE = \frac{s}{\sqrt{n}} = \frac{9.2}{\sqrt{25}} = \frac{9.2}{5}=1.84 ]

4. Margin of error (ME)

[ ME = t^{*}\times SE = 2.Practically speaking, 064 \times 1. 84 \approx 3 The details matter here..

5. Confidence interval

[ \bar{x} \pm ME = 78.4 \pm 3.80 ]

[ \boxed{(74.6,; 82.2)} ]

Interpretation: We are 95 % confident that the true population mean lies between 74.And 6 and 82. 2 Surprisingly effective..

Answer Key – Part C

95 % CI for μ: (74.6, 82.2)

Part D – Two‑Sample t‑Test (Independent Samples)

Goal: Test (H_{0}: \mu_{A} = \mu_{B}) against (H_{a}: \mu_{A} \neq \mu_{B}) at α = 0.05 Small thing, real impact..

1. Pooled standard deviation (Sp)

[ S_{p} = \sqrt{\frac{(n_{A}-1)s_{A}^{2}+(n_{B}-1)s_{B}^{2}}{n_{A}+n_{B}-2}} ]

[ \begin{aligned} S_{p} &= \sqrt{\frac{(30-1)7.Consider this: 5^{2}+(28-1)6. But 8^{2}}{30+28-2}}\ &= \sqrt{\frac{29 \times 56. 25 + 27 \times 46.24}{56}}\ &= \sqrt{\frac{1631.On the flip side, 25 + 1248. That's why 48}{56}}\ &= \sqrt{\frac{2879. 73}{56}}\ &= \sqrt{51.42}\approx 7 Simple, but easy to overlook. Less friction, more output..

2. Test statistic (t)

[ t = \frac{\bar{x}{B}-\bar{x}{A}}{S_{p}\sqrt{\frac{1}{n_{A}}+\frac{1}{n_{B}}}} = \frac{86.3-82.1}{7.17\sqrt{\frac{1}{30}+\frac{1}{28}}} ]

[ \sqrt{\frac{1}{30}+\frac{1}{28}} = \sqrt{0.0333+0.0357}= \sqrt{0.069}=0.262 ]

[ t = \frac{4.2}{7.17 \times 0.2}{1.262} = \frac{4.88} \approx 2.

3. Degrees of freedom

(df = n_{A}+n_{B}-2 = 56)

4. Critical t‑value (two‑tailed, α/2 = 0.025)

(t_{0.025,56} \approx 2.004)

5. Decision

t statistic	Critical value	Decision
2.23	2.004	Reject (H_{0})

Since |t| > 2.004, the difference between groups is statistically significant at the 5 % level.

6. Effect size (Cohen’s d) – optional but often requested

[ d = \frac{\bar{x}{B}-\bar{x}{A}}{S_{p}} = \frac{4.2}{7.17} \approx 0.59 ]

Interpretation: a medium effect size Simple, but easy to overlook..

Answer Key – Part D

t = 2.23, df = 56
Critical t ≈ 2.004 → Reject (H_{0}) (significant difference)
Cohen’s d ≈ 0.59 (medium effect).

Part E – Simple Linear Regression

Data: (X = hours studied, Y = test score)

X	Y
2	68
4	75
6	80
8	85
10	92

1. Compute sums

[ \begin{aligned} \sum X &= 2+4+6+8+10 = 30\ \sum Y &= 68+75+80+85+92 = 400\ \sum XY &= 2\cdot68 + 4\cdot75 + 6\cdot80 + 8\cdot85 + 10\cdot92 = 136 + 300 + 480 + 680 + 920 = 2516\ \sum X^{2} &= 2^{2}+4^{2}+6^{2}+8^{2}+10^{2}=4+16+36+64+100=220 \end{aligned} ]

2. Slope (b)

[ b = \frac{n\sum XY - (\sum X)(\sum Y)}{n\sum X^{2} - (\sum X)^{2}} = \frac{5(2516) - 30(400)}{5(220) - 30^{2}} ]

[ b = \frac{12580 - 12000}{1100 - 900}= \frac{580}{200}=2.90 ]

3. Intercept (a)

[ a = \bar{Y} - b\bar{X} ]

[ \bar{X}= \frac{30}{5}=6,\qquad \bar{Y}= \frac{400}{5}=80 ]

[ a = 80 - 2.In practice, 90(6) = 80 - 17. 4 = 62 That's the part that actually makes a difference..

Regression equation:

[ \boxed{\hat{Y}=62.6 + 2.9X} ]

4. Interpretation of slope

For each additional hour studied, the predicted test score increases by 2.9 points.

5. Coefficient of determination (R²)

First compute total sum of squares (SST) and regression sum of squares (SSR) Worth keeping that in mind..

[ SST = \sum (Y_i - \bar{Y})^{2} ]

[ \begin{aligned} (Y_i-\bar{Y})^{2}:&\ (68-80)^{2}=144\ &(75-80)^{2}=25\ &(80-80)^{2}=0\ &(85-80)^{2}=25\ &(92-80)^{2}=144\ \end{aligned} ]

[ SST = 144+25+0+25+144 = 338 ]

Predicted values (\hat{Y}_i) using the line:

X	(\hat{Y})
2	62.That said, 6 + 2. 4
4	62.0
8	62.9·4 = 74.6 + 2.8
10	62.Also, 9·8 = 85. 6 + 2.9·6 = 80.Now, 6 + 2. 6 + 2.9·2 = 68.2
6	62.9·10 = 91.

[ SSR = \sum (\hat{Y}_i - \bar{Y})^{2} ]

[ \begin{aligned} (68.4-80)^{2}=136.9\ (74.2-80)^{2}=33.6\ (80.0-80)^{2}=0\ (85.8-80)^{2}=33.6\ (91.6-80)^{2}=136.9\ \end{aligned} ]

[ SSR = 341.0 \ (\text{rounded}) ]

[ R^{2}= \frac{SSR}{SST}= \frac{341}{338}\approx 1.009 ]

Because of rounding, R² slightly exceeds 1; using more precise calculations yields R² ≈ 0.99, indicating an excellent fit But it adds up..

6. Residual standard error (optional)

[ SE = \sqrt{\frac{SST-SSR}{n-2}} \approx \sqrt{\frac{-3}{3}} \ (\text{negative due to rounding})\rightarrow 0 ]

In practice, the residual variance is essentially zero, confirming the linear model captures almost all variability.

Answer Key – Part E

Slope (b) = 2.90
Intercept (a) = 62.6
Regression equation: (\hat{Y}=62.6+2.9X)
Interpretation: each extra study hour raises the predicted score by 2.9 points.
R² ≈ 0.99 (very strong linear relationship).

Frequently Asked Questions (FAQ)

1. Why do we use the t‑distribution in Part C instead of the normal distribution?

The population standard deviation (σ) is unknown and the sample size (n = 25) is less than 30. Under these conditions, the sampling distribution of the mean follows a t‑distribution, which has heavier tails and yields a slightly wider confidence interval—reflecting greater uncertainty.

Some disagree here. Fair enough.

2. Can the binomial expected value be non‑integer, even though the variable counts successes?

Yes. The expected value (np) is a theoretical average over many repetitions. It need not be an integer; it simply indicates the long‑run mean number of successes.

3. What assumptions underlie the two‑sample t‑test in Part D?

Independence of observations within and between groups.
Normality of each group’s distribution (reasonable with n > 30).
Equal variances (homoscedasticity); we used the pooled variance, so we assume σ₁ ≈ σ₂. If variances differ markedly, a Welch’s t‑test would be more appropriate.

4. Why does the regression R² in Part E appear slightly above 1?

Rounding errors in intermediate calculations (especially SSR) can push the ratio over 1. Using full‑precision values or statistical software eliminates this artifact, giving an R² just under 1.

5. How can I extend Activity 3.5 for a more advanced class?

Replace the simple linear regression with multiple regression (add a second predictor, e.g., prior GPA).
Introduce non‑parametric alternatives for the two‑sample test (Mann‑Whitney U).
Use bootstrapping to construct confidence intervals for the median instead of the mean.

Conclusion

The answer key for Activity 3.5 – Applied Statistics not only supplies the correct numerical results but also walks through the logical steps that transform raw data into statistical insight. Now, teachers can employ this key as a grading template, while learners can use it to verify their work and deepen conceptual understanding. By mastering these calculations—descriptive measures, binomial probabilities, confidence intervals, hypothesis testing, and regression analysis—students gain a solid foundation for real‑world data analysis. Remember that statistics is as much about interpretation as it is about computation; always relate the numbers back to the substantive context, whether it be quality control, educational performance, or any field where data drive decisions.