()
ECO321: Economic
Statistics II
Chapter 2&3: Reviews from ECO221a
Hiroshi Morita
XXXXXXXXXX
Department of Economics
Hunter College, The City University of New York
a c© 2010 by Hiroshi Morita. This document may be reproduced for educational purpose, so long as the
copies contain this notice and are retained for personal use or distributed free. ECO321: Economic Statistics II – p. 1/15
Population versus
Sample
To study statistics, it’s important to distinguish between
"population" and "sample" throughout this course and beyond.
ECO321: Economic Statistics II – p. 2/15
Population versus
Sample
To study statistics, it’s important to distinguish between
"population" and "sample" throughout this course and beyond.
Suppose that we want to estimate the average income in the
U.S. Usually, we don’t want to collect the income data for the
entire U.S. population ("population"). It takes time, and costs.
Typical way is to pick up a relatively small group of people
("sample"), such as Census Population Survey, and take an
average of incomes in that sample to estimate the actual
average of incomes in the U.S. population ("parameter"). Ou
goal in this chapter is to describe the sample numerically
("descriptive statistics"). In this example, the average income
in sample is one of descriptive statistics.
ECO321: Economic Statistics II – p. 2/15
Big Picture:
Statistical Inference
Population
Parameters: Population Mean µ - Unknown
Population Variance: σ2 - Unknown
Sample
{Xi : i = 1, 2, · · · , n}
Descriptive Statistics
Sample Mean: X
Sample Variance: s2...
"Statistical Inference"
Statistical inference is to draw a conclusion about unknown
population parameters based on the limited information from the
sample.
ECO321: Economic Statistics II – p. 3/15
Sample Mean
X =
1
n
n
∑
i=1
Xi
Xi is the sample. Let say, we have only 3 observations in
sample (n = 3), and X1 = 5, X2 = 3, and X3 = 7. So, the
sample mean is:
X =
1
3
3
∑
i=1
Xi
=
1
3
(X1 +X2 +X3)
=
1
3
XXXXXXXXXX) = 5
ECO321: Economic Statistics II – p. 4/15
Sample Variance
V ar(X) ≡ s2X =
1
n− 1
n
∑
i=1
(Xi −X)2
Using the same example (X1 = 5, X2 = 3, and X3 = 7),
and X = 5, the sample variance is:
s2X =
1
3− 1
3
∑
i=1
(Xi − 5)2
=
1
2
3
∑
i=1
[(5− XXXXXXXXXX− XXXXXXXXXX− 5)2]
=
1
2
XXXXXXXXXX) = 4
ECO321: Economic Statistics II – p. 5/15
Sample Standard
Deviation
sX =
√
V ar(X) =
√
√
√
√
1
n− 1
n
∑
i=1
(Xi −X)2
Using the same example, since we obtained the variance
of 4 in the pervious slide, the sample deviation is:
sX =
√
4 = 2
ECO321: Economic Statistics II – p. 6/15
Sample Covariance
Cov(X,Y ) ≡ sxy =
1
n− 1
n
∑
i=1
(Xi −X)(Yi − Y )
More convenient when we look at a "co
elation
coefficient."
ρ =
∑n
i=1(Xi −X)(Yi − Y )
√
∑n
i=1(Xi −X)2
√
∑n
i=1(Yi − Y )2
=
sxy
sxsy
, (−1 < ρ < 1)
X and Y are positively co
elated if ρ > 0, i.e., when X
goes up, Y goes up. X and Y are not co
elated if ρ = 0,
i.e., the change in X has no effect on Y . X and Y are
negatively co
elated if ρ < 0, i.e., when X goes up, Y goes
down.
ECO321: Economic Statistics II – p. 7/15
Numerical Example
Suppose that you collect the sample of 3 students which
contains the hours of studying per week (X) and GPA (Y ).
X XXXXXXXXXX
Y XXXXXXXXXX
Find a co
elation coefficient between X and Y . Guess
what number you will get? Then, compute it.
ECO321: Economic Statistics II – p. 8/15
Standard E
or of
Mean
The sample mean (X̄) has a distribution. It is true because
the sample mean varies as we pick up a difference sample.
So, the sample mean has a variation measured by a
standard deviation. You may feel weird. Yet, you have to
get used to this "classical" idea. The standard deviation of
the sample mean is call the "standard e
or" ("≡ sX̄") of the
mean. It can be found:
sX̄ =
sX√
n
where sX is a standard deviation of X. So, using the same
example, sX̄ = 2
√
3 = 1.15.
ECO321: Economic Statistics II – p. 9/15
Central Limit
Theorem
The "central limit theorem (CLT)" states that the sample
means have approximately a normal distribution when a
sample size is large enough∗.
More specifically, the sample mean X has a normal
distribution with a mean of µ and a standard deviation of
σX
√
n, which is a "standard e
or". In Mathematical
expression:
X ∼ N(µ, σX
√
n)
(*) If a sample size small, say less than 30, most basic stat. textbook suggest to use
t-distribution, but it’s controversial. So, our text deal with only large sample case so that
its distribution is always normal.
ECO321: Economic Statistics II – p. 10/15
Confidence Interval
(1− α)% C.I. = (Estimate) ± (C.V.) × (Standard E
or)
= X ± Zα
(
s√
n
)
⇒ µ lies in the C.I. at (1− α)% of probability.
XX − Zα
(
s√
n
)
X + Zα
(
s√
n
)
(1− α)% Confidence Interval
ECO321: Economic Statistics II – p. 11/15
Null Hypothesis Test
Suppose that you hypothesize that the population mean is
3, but your sample mean shows 2.9(= X). You can test if
the population mean (= µ) is different from your hypothesis
(≡ µ0 = 3). Let’s say s = .45, and n = 100. This is a
"two-sided t-test": i.e.,
Null Hypothesis H0 : µ = 3
Alternative Hypothesis H1 : µ 6= 3
The t-statistic: t =
(Estimate)-(Hypothesis)
(Standard E
or)
=
X − µ0
s
√
n
where µ0 is your hypothesis. µ0 = 3 in this example. So,
t = (2.9− 3)/(.45
√
100) = −2.22 ECO321: Economic Statistics II – p. 12/15
Null Hypothesis Test
(cont.)
If |t| > C.V., we would reject H0.
Otherwise, we would fail to reject H0.
t
P
-CV +CV
Fail to reject H0Reject H0 Reject H0
ECO321: Economic Statistics II – p. 13/15
Null Hypothesis Test
(cont.)
If you test H0 at the 1% level: Since |t| < 2.58 (=C.V. at
1%), we fail to reject the null hypothesis at the 1% level,
meaning that the population mean could be 3.
⇒ The probability of the null hypothesis occu
ing
("p-value") is greater than 1%. That could happen.
If you test H0 at the 5% level: Since |t| > 1.96 (=C.V. at
5%), we reject the null hypothesis at the 5% level, meaning
that the population mean is more likely different from 3.
⇒ The p-value is less than 5%. It could happen, but it’s
unlikely.
ECO321: Economic Statistics II – p. 14/15
p-value
If p-value < α, we would reject H0. Otherwise, we would
fail to reject H0. (The p-value is the probability of H0
occu
ing.) We can apply this to any types of tests, such as
Z, t, F , or χ2 test.
p-value in 2-sided tests = 2Φ(−|tact|)
where tact is the actual t-value calculated in the previous
slide (tact = −2.22) and Φ(·) is the cumulative density
function of standard normal distribution.
Note: There is no analytical solution for Φ unfortunately, but Stata can
compute this (numerically).
ECO321: Economic Statistics II – p. 15/15
Population versus Sample
Population versus Sample
Big Picture: Statistical Inference
Sample Mean
Sample Variance
Sample Standard Deviation
Sample Covariance
Numerical Example
Standard E
or of Mean
Central Limit Theorem
Confidence Interval
Null Hypothesis Test
Null Hypothesis Test (cont.)
Null Hypothesis Test (cont.)
$p$-value
()
ECO321: Economic
Statistics II
Chapter 4: Simple Regressiona
Hiroshi Morita
XXXXXXXXXX
Department of Economics
Hunter College, The City University of New York
a c© 2010 by Hiroshi Morita. This document may be reproduced for educational purpose, so long as the
copies contain this notice and are retained for personal use or distributed free. ECO321: Economic Statistics II – p. 1/28
Chapter 4: Simple
Regression
The regression is a simplified model of reality, but at
least useful for decision makings. For example,
TestScore = β0 + β1ClassSize
β1 =
∆TestScore
∆ClassSize
ECO321: Economic Statistics II – p. 2/28
Chapter 4: Simple
Regression
The regression is a simplified model of reality, but at
least useful for decision makings. For example,
TestScore = β0 + β1ClassSize
β1 =
∆TestScore
∆ClassSize
meaning that when a class size increases by 1, the test
score increases by β1. But the test scores depend on many
other factors, not just a class size. So,
TestScore = β0 + β1ClassSize+OtherFactors
ECO321: Economic Statistics II – p. 2/28
Liner Regression
with One Regresso
General Form:
Yi = β0 + β1Xi + ui
Y : dependent variable
X: independent variable, or "regressor"
β0: intercept (or constant)
β1: slope
u: "e
or term" (the other factors)
Y = β0 + β1X: "population regression line"
ECO321: Economic Statistics II – p. 3/28
Same Big Picture
Population: Y = β0 + β1X
Parameters: β0 and β1 - Unknown
Sample
{Xi, Yi : i = 1, · · · , n}
Descriptive Statistics
Estimates: β̂0 and β̂1
Standard e
ors of estimates...
"Statistical Inference"
We’ll study standard e
ors and statistical inference in Chapter 5.
ECO321: Economic Statistics II – p. 4/28
Scatterplot
ECO321: Economic Statistics II – p. 5/28
How do we draw a
egression line?
ECO321: Economic Statistics II – p. 6/28
How do we draw a
egression line?
Minimize the sum of squared e
or terms ui. ECO321: Economic Statistics II – p. 6/28
Estimate of β0 and β1
The "Ordinary Least Square (OLS)" Estimation:
In the OLS estimation, we choose estimates of β0 and β1
y minimizing the sum of squared e
or term, i.e.
n∑
i=1
u2i =
n∑
i=1
(Yi − β0 − β1Xi)
2
ECO321: Economic Statistics II – p. 7/28
Estimate of β0 and β1
The "Ordinary Least Square (OLS)" Estimation:
In the OLS estimation, we choose estimates of β0 and β1
y minimizing the sum of squared e
or term, i.e.
n∑
i=1
u2i =
n∑
i=1
(Yi − β0 − β1Xi)
2
Then, we find
β̂1 =
∑n
i=1(Xi −X)(Yi − Y )∑n
i=1(Xi −X)
2
=
Cov(X,Y )
V ar(X)
=
sXY
s2X
β̂0 = Y − β̂1X
ECO321: Economic Statistics II – p. 7/28
A Little More About
Regression
The estimates β̂0 and