ISE 315: Engineering Statistics

Lecture 2 Handout: Review of Estimation

Instructor: Mansur M. Arief, PhD
Course: ISE 315 - Engineering Statistics

Learning Objectives

After completing this reading, you should be able to:

Define and distinguish between population parameters and sample statistics
Explain what a sampling distribution is and why it matters
Evaluate estimators using properties like bias, variance, and MSE
Apply the Central Limit Theorem to find probabilities involving sample means
Solve problems involving sampling distributions of various statistics

1. The Big Picture: From Probability to Statistical Inference

In ISE 205 (Probability), you learned to answer questions like: “If the population has mean μ = 50, what’s the probability of observing a sample mean greater than 55?” You moved from known parameters to observed data.

In ISE 315 (Statistics), we reverse this process. We ask: “Given our observed sample data, what can we infer about the unknown population parameters?” This is statistical inference—moving from observed data to unknown parameters.

Statistical inference has two main branches:

Branch	Question	Example
Parameter Estimation	“What is μ?”	Point estimate: $\hat{\mu} = \bar{x}$ Interval estimate: $\bar{x} \pm t \cdot \frac{s}{\sqrt{n}}$
Hypothesis Testing	“Is μ = 50?”	$H_0: \mu = 50$ vs $H_1: \mu \neq 50$ Decision: Reject or Fail to Reject

This lecture focuses on parameter estimation, specifically point estimation and the sampling distributions that make inference possible.

2. Key Terminology

2.1 Random Sample

A random sample of size $n$ consists of random variables $X_1, X_2, \ldots, X_n$ that satisfy two conditions:

Independence: The $X_i$’s are independent random variables
Identical Distribution: Every $X_i$ has the same probability distribution

When we actually collect data, we observe specific numerical values $x_1, x_2, \ldots, x_n$, which are called the realizations or observed values of the random sample.

Example (Autonomous Vehicle Safety): An AV company is evaluating the detection accuracy of their LiDAR sensors across a fleet of test vehicles. They randomly select $n = 50$ test drives and measure the object detection accuracy (% of correctly identified obstacles) for each drive. Each measurement $X_i$ is a random variable before the test is conducted. After testing, they obtain observed values like $x_1 = 97.2\%$, $x_2 = 96.8\%$, etc.

2.2 Statistic

A statistic is any function of the sample observations that does not depend on unknown parameters. Important points:

A statistic is itself a random variable (because it’s computed from random variables)
Common examples include the sample mean $\bar{X}$, sample variance $S^2$, sample median, and sample proportion $\hat{p}$

2.3 Point Estimator

A point estimator is a statistic used to estimate an unknown population parameter. We use the “hat” notation to distinguish estimators from parameters:

Parameter	Symbol	Point Estimator	Symbol
Population mean	$\mu$	Sample mean	$\hat{\mu} = \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$
Population variance	$\sigma^2$	Sample variance	$\hat{\sigma}^2 = S^2 = \frac{1}{n-1}\sum_{i=1}^n (X_i - \bar{X})^2$
Population proportion	$p$	Sample proportion	$\hat{p} = \frac{\text{number of successes}}{n}$

2.4 Sampling Distribution

The sampling distribution of a statistic is the probability distribution of that statistic. This is a crucial concept:

It describes how the statistic varies across all possible samples of the same size
It is not the same as the population distribution
Different samples yield different values of the statistic—this variability is exactly what sampling distributions capture

Key Insight: The sampling distribution depends on the statistic formula, the sample size $n$, and the population distribution.

3. Properties of Estimators

Not all estimators are equally good. We evaluate estimators using several criteria:

3.1 Bias and Unbiasedness

An estimator $\hat{\theta}$ is unbiased for parameter $\theta$ if:

\[E[\hat{\theta}] = \theta\]

The bias of an estimator is:

\[\text{Bias}(\hat{\theta}) = E[\hat{\theta}] - \theta\]

If bias equals zero, the estimator is unbiased. If bias is positive, the estimator tends to overestimate; if negative, it tends to underestimate.

Example (LLM Safety): When estimating the true rate $p$ of harmful responses from an LLM chatbot, the sample proportion $\hat{p} = \frac{\text{harmful responses}}{n}$ is an unbiased estimator because $E[\hat{p}] = p$ for any sample size $n$.

3.2 Variance

The variance of an estimator measures how much it varies from sample to sample:

\[\text{Var}(\hat{\theta}) = E[(\hat{\theta} - E[\hat{\theta}])^2]\]

Lower variance means more consistent estimates across different samples.

3.3 Standard Error

The standard error is the standard deviation of the sampling distribution:

\[SE(\hat{\theta}) = \sqrt{\text{Var}(\hat{\theta})}\]

For the sample mean: $SE(\bar{X}) = \frac{\sigma}{\sqrt{n}}$

3.4 Mean Squared Error (MSE)

The Mean Squared Error combines both bias and variance into a single measure:

\[MSE(\hat{\theta}) = E[(\hat{\theta} - \theta)^2] = \text{Var}(\hat{\theta}) + [\text{Bias}(\hat{\theta})]^2\]

Ideal Estimator: Low MSE (achieved through low variance AND low bias)

3.5 Comparing Estimators: An Example

Consider four different estimators for the population mean $\mu$:

Estimator	Formula	Unbiased?	Variance Behavior
Sample mean	$\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i$	Yes	Decreases as $n$ increases
Half-range	$\hat{X} = \frac{\max(X_i) + \min(X_i)}{2}$	Yes (for symmetric distributions)	Decreases slowly
First sample	$X^\dagger = X_1$	Yes	Constant (doesn’t improve with $n$)
Shrinkage	$X^* = 0.7\bar{X}$	No (biased toward 0)	Decreases as $n$ increases

Analysis:

$\bar{X}$, $\hat{X}$, and $X^\dagger$ are unbiased; $X^*$ is biased
$\bar{X}$ has the smallest variance among unbiased estimators
For $n = 1$, all estimators have similar MSE
For large $n$, $\bar{X}$ clearly has the lowest MSE

4. Sampling Distribution of the Sample Mean

4.1 Key Results

For a random sample $X_1, X_2, \ldots, X_n$ from a population with mean $\mu$ and variance $\sigma^2$, the sample mean $\bar{X}$ has:

\[E[\bar{X}] = \mu \quad \text{(unbiased)}\] \[\text{Var}(\bar{X}) = \frac{\sigma^2}{n} \quad \text{(shrinks as } n \text{ increases)}\] \[SE(\bar{X}) = \frac{\sigma}{\sqrt{n}}\]

4.2 Population Distribution vs. Sampling Distribution

These two distributions are fundamentally different:

Aspect	Population Distribution	Sampling Distribution of $\bar{X}$
What it describes	Individual observations $X_i$	Sample means $\bar{X}$
Shape	Fixed (whatever the population is)	Becomes Normal as $n$ increases (CLT)
Mean	$\mu$	$\mu_{\bar{X}} = \mu$ (same)
Variance	$\sigma^2$	$\sigma_{\bar{X}}^2 = \sigma^2/n$ (smaller)
Changes with $n$?	No	Yes (gets narrower)

Example (Oil & Gas Pipeline): The flow rate through a crude oil pipeline section follows a Uniform distribution between 800 and 1200 barrels per hour, i.e., $X_i \sim \text{Uniform}(800, 1200)$. Thus $\mu = 1000$ barrels/hour and $\sigma^2 = (1200-800)^2/12 = 13,333$ (barrels/hour)².

For $n = 12$ hourly measurements: The sampling distribution of $\bar{X}$ has mean $\mu_{\bar{X}} = 1000$ barrels/hour and variance $\sigma_{\bar{X}}^2 = \frac{13,333}{12} \approx 1,111$, giving $\sigma_{\bar{X}} \approx 33.3$ barrels/hour.

5. The Central Limit Theorem (CLT)

5.1 Statement

Central Limit Theorem: If $X_1, X_2, \ldots, X_n$ is a random sample from a population with mean $\mu$ and variance $\sigma^2$, then as $n$ becomes large, the sampling distribution of the sample mean $\bar{X}$ approaches a Normal distribution:

\[\bar{X} \xrightarrow{d} N\left(\mu, \frac{\sigma}{\sqrt{n}}\right)\]

Equivalently, the standardized statistic:

\[Z = \frac{\bar{X} - \mu}{\sigma/\sqrt{n}}\]

approaches the Standard Normal distribution $N(0, 1)$.

5.2 Why CLT Matters

The CLT is remarkable because:

It works regardless of the population distribution—whether uniform, exponential, skewed, or anything else
It enables probability calculations using Normal distribution tables
It justifies many statistical procedures that assume normality

Rule of thumb: The CLT approximation is usually reasonable for $n \geq 30$, though it works well for smaller $n$ if the population is not too skewed.

5.3 Computing Probabilities Using CLT

Problem (Oil & Gas Pipeline Monitoring): A pipeline operator monitors flow rates that follow $X_i \sim \text{Uniform}(800, 1200)$ barrels/hour. Based on $n = 12$ hourly readings, what is the probability that the average flow rate exceeds 1050 barrels/hour?

Solution:

Find population parameters: $\mu = 1000$, $\sigma^2 = 13,333$, so $\sigma \approx 115.5$ barrels/hour
Find sampling distribution parameters:
- $\mu_{\bar{X}} = \mu = 1000$ barrels/hour
- $\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}} = \frac{115.5}{\sqrt{12}} \approx 33.3$ barrels/hour
Standardize: $Z = \frac{1050 - 1000}{33.3} = \frac{50}{33.3} \approx 1.50$
Use Standard Normal table: $P(\bar{X} > 1050) = P(Z > 1.50) = 1 - \Phi(1.50) \approx 1 - 0.9332 = 0.0668$

There is approximately a 6.7% chance that the average flow rate exceeds 1050 barrels/hour.

6. General Recipe for Sampling Distribution Problems

Step-by-Step Approach

Identify population parameters: Find $\mu$ and $\sigma^2$ from the given population distribution
Apply CLT: Determine the sampling distribution of your statistic
- For $\bar{X}$: Mean is $\mu$, variance is $\sigma^2/n$
Use Normal approximation: $\hat{\theta} \sim N(\mu_{\hat{\theta}}, \sigma_{\hat{\theta}})$
Standardize (optional): Convert to $Z \sim N(0, 1)$ for easier calculation
Look up probability: Use tables or calculator

Common Variants

The same approach works for various statistics:

Statistic	Mean of Sampling Dist	Variance of Sampling Dist
$\bar{X}$	$\mu$	$\sigma^2/n$
$\bar{X} - c$ (shifted)	$\mu - c$	$\sigma^2/n$
$a(\bar{X} - c)$ (scaled shifted)	$a(\mu - c)$	$a^2 \sigma^2/n$
$\hat{p}$ (proportion)	$p$	$p(1-p)/n$
$\bar{X}_1 - \bar{X}_2$ (difference)	$\mu_1 - \mu_2$	$\sigma_1^2/n_1 + \sigma_2^2/n_2$
$\hat{p}_1 - \hat{p}_2$	$p_1 - p_2$	$\frac{p_1(1-p_1)}{n_1} + \frac{p_2(1-p_2)}{n_2}$

7. Key Parameters in This Course

Throughout ISE 315, we focus on estimating these parameters:

Parameter	Description	Point Estimator
$\mu$	Population mean (average)	$\bar{X}$
$\sigma^2$	Population variance (spread)	$S^2$
$p$	Population proportion	$\hat{p}$
$\mu_1 - \mu_2$	Difference in means (effect size)	$\bar{X}_1 - \bar{X}_2$
$p_1 - p_2$	Difference in proportions	$\hat{p}_1 - \hat{p}_2$

8. Summary

Key Takeaways

Statistical inference uses sample data to draw conclusions about population parameters
A statistic is a random variable whose distribution (the sampling distribution) describes how it varies across samples
Good estimators have low bias and low variance, leading to low MSE
The sampling distribution of $\bar{X}$ has:
- Mean: $\mu_{\bar{X}} = \mu$ (unbiased)
- Variance: $\sigma_{\bar{X}}^2 = \sigma^2/n$ (shrinks with larger samples)
The Central Limit Theorem tells us that $\bar{X}$ is approximately Normal for large $n$, regardless of the population distribution
The general approach for sampling distribution problems: find population parameters → apply CLT → use Normal approximation → calculate probability

Practice Problems

Problem Set 2

Problem 1: Autonomous Vehicle Sensor Reliability

The time-to-failure (in thousands of hours) of LiDAR sensors used in autonomous vehicles follows an exponential distribution with mean $\mu = 8$ thousand hours (8,000 hours). An AV company randomly selects $n = 36$ sensors from their inventory for accelerated life testing.

(a) What is the variance of the population distribution? (Hint: For an exponential distribution, $\sigma^2 = \mu^2$)

(b) What are the mean and standard error of the sampling distribution of the sample mean $\bar{X}$?

(c) Using the Central Limit Theorem, approximate the probability that the sample mean lifetime exceeds 9 thousand hours.

(d) The company wants to claim with high confidence that their sensors last at least 7,000 hours on average. If the true mean is 8,000 hours, what is the probability that a sample of 36 sensors would yield $\bar{X} < 7$? What does this tell you about the risk of underestimating sensor reliability?

(e) What sample size would be needed to reduce the standard error to 1 thousand hours?

Problem 2: Comparing LLM Safety Evaluation Methods

An AI safety team is evaluating different methods to estimate the mean toxicity score $\mu$ of responses from a large language model (LLM). The toxicity score ranges from 0 (harmless) to 100 (highly toxic). Based on a random sample of $n = 25$ responses, they consider three estimators:

Estimator A: $\hat{\mu}_A = \bar{X}$ (sample mean of all 25 toxicity scores)
Estimator B: $\hat{\mu}_B = \frac{X_1 + X_2 + X_3 + X_4 + X_5}{5}$ (average of first 5 responses only)
Estimator C: $\hat{\mu}_C = 1.2\bar{X} - 4$ (adjusted estimator based on historical calibration)

Assume the population variance of toxicity scores is $\sigma^2 = 225$.

(a) Show that Estimator A is unbiased. What is its variance?

(b) Is Estimator B unbiased? Calculate its variance and compare to Estimator A. Why might a safety team still consider using fewer samples in practice?

(c) Calculate the bias of Estimator C when the true mean toxicity is $\mu = 20$. What is its variance?

(d) Calculate the MSE for all three estimators when $\mu = 20$. Which estimator would you recommend for safety evaluation and why?

(e) If the safety team’s goal is to flag models where $\mu > 15$ as potentially unsafe, discuss how the choice of estimator might affect their conclusions.

Problem 3: Oil Pipeline Pressure Monitoring

Saudi Aramco operates a crude oil pipeline where the operating pressure is critical for safety. Historical data shows that the pressure at a monitoring station is normally distributed with mean $\mu = 850$ psi and standard deviation $\sigma = 40$ psi. For safety monitoring, pressure readings are taken every 15 minutes, and a sample of $n = 16$ readings is analyzed each shift.

(a) What is the probability that an individual pressure reading falls below the minimum safe threshold of 800 psi?

(b) What is the probability that the sample mean of 16 readings falls below 800 psi?

(c) Explain why your answers to (a) and (b) are different. Why is monitoring the sample mean more reliable for detecting systematic pressure drops?

(d) The operations manager wants to set up a control chart with lower control limit (LCL). Find the value $c$ such that $P(\bar{X} < c) = 0.01$. If the pressure system is operating normally, how often would you expect a false alarm (sample mean below LCL)?

(e) Two pipeline sections are monitored independently, each with $n = 16$ readings per shift. Under normal operations (both sections have $\mu = 850$ psi), what is the probability that the difference in sample means $

\bar{X}_1 - \bar{X}_2

$ exceeds 25 psi? This helps determine when a pressure differential alarm should be triggered.

(f) If a leak causes Section 1’s mean pressure to drop to $\mu_1 = 820$ psi while Section 2 remains at $\mu_2 = 850$ psi, what is the probability of detecting this by observing $

\bar{X}_1 - \bar{X}_2

> 25$ psi?

Solutions

Solution to Problem 1

(a) For an exponential distribution with mean $\mu = 8$ (thousand hours): $\sigma^2 = \mu^2 = 8^2 = 64 \text{ (thousand hours)}^2$

(b) For the sampling distribution of $\bar{X}$ with $n = 36$:

Mean: $\mu_{\bar{X}} = \mu = 8$ thousand hours
Standard error: $SE(\bar{X}) = \frac{\sigma}{\sqrt{n}} = \frac{8}{\sqrt{36}} = \frac{8}{6} \approx 1.33$ thousand hours

(c) By CLT, $\bar{X} \approx N(8, 1.33)$

Standardizing: $Z = \frac{9 - 8}{1.33} = \frac{1}{1.33} \approx 0.75$

\[P(\bar{X} > 9) = P(Z > 0.75) = 1 - \Phi(0.75) = 1 - 0.7734 \approx 0.227\]

There is approximately a 22.7% chance that the sample mean exceeds 9,000 hours.

(d) Finding $P(\bar{X} < 7)$ when $\mu = 8$: $Z = \frac{7 - 8}{1.33} = \frac{-1}{1.33} \approx -0.75$

\[P(\bar{X} < 7) = P(Z < -0.75) = \Phi(-0.75) \approx 0.227\]

There is about a 22.7% chance of underestimating sensor reliability. This is a substantial risk—nearly 1 in 4 samples would suggest sensors don’t meet the 7,000-hour threshold even though the true mean is 8,000 hours. This highlights the importance of adequate sample sizes for safety-critical claims.

(e) We need $SE(\bar{X}) = 1$: $\frac{\sigma}{\sqrt{n}} = 1 \implies \frac{8}{\sqrt{n}} = 1 \implies \sqrt{n} = 8 \implies n = 64$

Solution to Problem 2

(a) Estimator A: $\hat{\mu}_A = \bar{X}$

$E[\hat{\mu}_A] = E[\bar{X}] = \mu$ ✓ (unbiased)
$\text{Var}(\hat{\mu}_A) = \frac{\sigma^2}{n} = \frac{225}{25} = 9$

(b) Estimator B: $\hat{\mu}_B = \frac{X_1 + X_2 + X_3 + X_4 + X_5}{5}$

$E[\hat{\mu}B] = E\left[\frac{\sum{i=1}^5 X_i}{5}\right] = \frac{5\mu}{5} = \mu$ ✓ (unbiased)
$\text{Var}(\hat{\mu}_B) = \frac{1}{25} \cdot 5\sigma^2 = \frac{5 \times 225}{25} = \frac{1125}{25} = 45$

Estimator A has much lower variance (9 vs 45) because it uses all 25 observations.

Why use fewer samples? In practice, evaluating LLM outputs for toxicity requires human review, which is expensive and time-consuming. A safety team might use Estimator B for rapid screening, accepting higher variance in exchange for faster evaluation cycles.

(c) Estimator C: $\hat{\mu}_C = 1.2\bar{X} - 4$

$E[\hat{\mu}_C] = 1.2E[\bar{X}] - 4 = 1.2\mu - 4$
When $\mu = 20$: $E[\hat{\mu}_C] = 1.2(20) - 4 = 24 - 4 = 20$
Bias = $E[\hat{\mu}_C] - \mu = 20 - 20 = 0$ (unbiased when $\mu = 20$)

In general: Bias = $1.2\mu - 4 - \mu = 0.2\mu - 4$

$\text{Var}(\hat{\mu}_C) = 1.2^2 \cdot \text{Var}(\bar{X}) = 1.44 \times 9 = 12.96$

(d) MSE calculations when $\mu = 20$:

$MSE_A = \text{Var}(\hat{\mu}_A) + \text{Bias}_A^2 = 9 + 0 = 9$
$MSE_B = \text{Var}(\hat{\mu}_B) + \text{Bias}_B^2 = 45 + 0 = 45$
$MSE_C = \text{Var}(\hat{\mu}_C) + \text{Bias}_C^2 = 12.96 + 0 = 12.96$

Recommendation: Estimator A has the lowest MSE and should be preferred for safety evaluation. Estimator C happens to be unbiased at $\mu = 20$, but its bias varies with $\mu$, making it unreliable across different models. For safety-critical applications like LLM evaluation, we want estimators that perform well regardless of the true toxicity level.

(e) If using Estimator B (high variance), the safety team might:

Fail to flag a model with $\mu = 18$ (dangerous) because $\hat{\mu}_B$ happens to fall below 15
Incorrectly flag a model with $\mu = 12$ (safe) because $\hat{\mu}_B$ happens to exceed 15

For safety decisions, lower variance (Estimator A) reduces both false positives and false negatives.

Solution to Problem 3

(a) Individual pressure reading: $X \sim N(850, 40)$ $P(X < 800) = P\left(Z < \frac{800-850}{40}\right) = P(Z < -1.25) = \Phi(-1.25) \approx 0.1056$

About 10.6% of individual readings fall below 800 psi.

(b) Sample mean of 16 readings: $\bar{X} \sim N\left(850, \frac{40}{\sqrt{16}}\right) = N(850, 10)$ $P(\bar{X} < 800) = P\left(Z < \frac{800-850}{10}\right) = P(Z < -5) \approx 0.0000003$

This is essentially zero—less than 1 in 3 million shifts.

(c) The answers differ dramatically because:

Individual readings have high variability ($\sigma = 40$ psi)
Sample means have much lower variability ($\sigma_{\bar{X}} = 10$ psi)
The averaging process “cancels out” random fluctuations

Why sample means are better for detecting systematic problems: A single low reading could be a random fluctuation, sensor error, or brief transient. But if the average of 16 readings is low, it strongly suggests a real, sustained pressure drop. This is why control charts in process industries use sample statistics rather than individual measurements.

(d) Find $c$ such that $P(\bar{X} < c) = 0.01$: $\frac{c - 850}{10} = z_{0.01} = -2.326$ $c = 850 + 10(-2.326) = 850 - 23.26 = 826.74 \text{ psi}$

The LCL should be set at approximately 827 psi.

False alarm rate: By design, if the system is operating normally, we expect $P(\bar{X} < LCL) = 0.01$, meaning a false alarm about once every 100 shifts. If operators work 3 shifts per day, this is roughly one false alarm per month.

(e) For independent samples under normal operations ($\mu_1 = \mu_2 = 850$):

$E[\bar{X}_1 - \bar{X}_2] = 850 - 850 = 0$
$\text{Var}(\bar{X}_1 - \bar{X}_2) = \frac{40^2}{16} + \frac{40^2}{16} = 100 + 100 = 200$
$\sigma_{\bar{X}_1 - \bar{X}_2} = \sqrt{200} \approx 14.14$ psi

$P(|\bar{X}_1 - \bar{X}_2| > 25) = P(Z > 25/14.14) + P(Z < -25/14.14)$ $= 2 \times P(Z > 1.77) = 2 \times (1 - 0.9616) = 2 \times 0.0384 \approx 0.077$

Under normal operations, there is about a 7.7% chance of observing a pressure differential exceeding 25 psi. This would be the false alarm rate for a differential pressure alarm set at 25 psi.

(f) With the leak ($\mu_1 = 820$, $\mu_2 = 850$):

$E[\bar{X}_1 - \bar{X}_2] = 820 - 850 = -30$ psi
Variance and standard deviation remain: $\sigma_{\bar{X}_1 - \bar{X}_2} = 14.14$ psi

\[P(|\bar{X}_1 - \bar{X}_2| > 25) = P(\bar{X}_1 - \bar{X}_2 > 25) + P(\bar{X}_1 - \bar{X}_2 < -25)\]

For the right tail: $P(\bar{X}_1 - \bar{X}_2 > 25) = P\left(Z > \frac{25 - (-30)}{14.14}\right) = P(Z > 3.89) \approx 0.0001$

For the left tail: $P(\bar{X}_1 - \bar{X}_2 < -25) = P\left(Z < \frac{-25 - (-30)}{14.14}\right) = P(Z < 0.35) \approx 0.637$

\[P(|\bar{X}_1 - \bar{X}_2| > 25) \approx 0.0001 + 0.637 \approx 0.637\]

There is about a 63.7% probability of detecting the leak with the 25 psi threshold. This is the statistical power of the test. To improve detection probability, the operations team could: (1) lower the threshold, (2) increase sample size $n$, or (3) implement more sophisticated detection algorithms.

Additional Resources

Textbook: Montgomery & Runger, Chapter 7.1-7.4
NIST Handbook: Statistical Methods
Khan Academy: Sampling Distributions

Next week: Methods of point estimation (Method of Moments, Maximum Likelihood Estimation)