Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

PowerPoint Presentation ITEC 210 DATA ANALYSIS FOR BUSINESS Analyzing Quantitative Variables Prof. Itir KARAESMEN AYDIN 1 Outline and Learning Outcomes In this presentation, you will learn To build...

1 answer below »
PowerPoint Presentation
ITEC 210 DATA ANALYSIS FOR BUSINESS
Analyzing Quantitative Variables
Prof. Itir KARAESMEN AYDIN
1
Outline and Learning Outcomes
In this presentation, you will learn
To build simple linear regression models.
To interpret the statistical output of a linear regression model.
To make predictions based on simple linear regression models (trend lines).
To define and interpret summary statistics (descriptive measures) for quantitative variables.
NOTE: This presentation does not show you *how* the work is done on Excel.
2
Scatter Plots and Trend Lines
3
Trend Line
Trend line is a straight line
It is displayed on the scatter plot
The trend line equation is
Y = b0 + b1 X
where
X: variable displayed on the horizontal axis of the scatter plot
0: intercept of the line (the value Y takes when X=0).
1: slope of the line (i.e., every 1 unit change in X, results in b units of change in Y); slope can be positive or negative.
Y: variable displayed on the vertical axis of the scatter plot
4
Fitting a Trend Line to the Data
5
A
B
C
Which of these lines “fits best” to the data?
Simple Linear Regression
6
Learning Objectives
In this presentation, you will learn
How to use regression analysis to predict the value of a dependent variable based on an independent variable
The meaning of the regression coefficients b0 and b1
How to judge the goodness of fit
How to make inferences about the slope
Exploring the Relationship Between Two Quantitative Variables
A scatter plot shows the relationship between two variables
Co
elation measures the strength of the linear relationship between two variables
Regression analysis is used to:
Predict the value of a dependent variable based on the value of at least one independent variable
Explain the impact of changes in an independent variable on the dependent variable
Dependent vs. Independent Variables in Regression
Dependent variable (or outcome variable): the variable we intend to predict or explain
Independent variable (or predictor): the variable we use to predict or explain the dependent variable
Simple Linear Regression Model
In a simple linear regression model
There is only one independent variable, X
The dependent variable Y is described by a linear function of X
The changes in Y are assumed to be related to changes in X
Linear component
Simple Linear Regression Model
Population
Y intercept
Population Slope
Coefficient
Random E
or term
Dependent Variable
Independent Variable
E
or (random)
Y
X
Yi: Observed Value of Y for Xi
Xi
Scatter Plot
Y
X
Xi
Slope = β1
Intercept = β0
Simple Linear Regression Model
Yi: Observed Value of Y for Xi
The simple linear regression equation provides an estimate of the population regression line.
Population regression line:
Prediction line (regression equation):
Simple Linear Regression Equation (Prediction Line)
Estimate of the regression
line
intercept
Estimate of the regression slope
Estimated (or predicted) Y value for observation i
Value of X for observation i
0 is the estimated average value of Y when the value of X is zero
1 is the estimated change in the average value of Y as a result of a one-unit increase in X
Regression Coefficients
The Least Squares Method
0 and b1 are obtained by finding the values of that minimize the sum of the squared differences between observed Y and predicted :
Prediction e
or for this observation
Y
X
Predicted Value of Y for Xi
Xi
Slope = b1
Intercept = b0
Simple Linear Regression Model
Yi: Observed Value of Y for Xi
Goodness of Fit
How well does the estimated regression line fit the data?
We investigate the variation in regression to answer this.
Total Sum of Squares
Regression Sum of Squares
E
or Sum of Squares
where:
     = Mean value of the dependent variable
    Yi = Observed value of the dependent variable
     = Predicted value of Y for the given Xi value
A measure of goodness of fit is the coefficient of determination, a.k.a., the R-squared value (R2).
R2 is the portion of the total variation in the dependent variable that is explained by variation in the independent variable:
The higher the R2 value, the better the fit.
R-squared is NOT equal to the co
elation between the dependent and independent variables.
R-squared is equal to square of co
elation in simple linear regression.
Goodness of Fit (cont’d)
R2
Chap 13-20
Inferences About the Slope
Questions
Is there a linear relationship between X and Y?
Could the slope of the regression line be 0?
Hypothesis Test
H0: β1 = 0 (the null hypothesis: slope=0)
H1: β1 ≠ 0 (the alternative hypothesis: slope ≠0)
Chap 13-21
Inferences About the Slope (cont’d)
Conducting the Hypothesis Test:
Obtain the p-value for the slope coefficient from the regression output.
Compare the p-value to a given significance level, . Typical choices of =0.01, 0.05, 0.10.
Conclude:
Reject H0 if p-value < .
Fail to reject H0 if p-value > .
Interpret the results and conclusions.
Chap 13-22
Inferences About the Slope (cont’d)
Interpretation of the hypothesis test results
Reject H0: There is enough statistical evidence that supports the claim that the slope is not zero. We can say there is a linear relationship between X and Y. The strength of the linear relationship can separately be evaluated by computing the co
elation between X and Y.
Fail to reject H0: Statistical evidence supports the claim that the slope is zero. We cannot say there is a linear relationship between X and Y. You can separately compute the co
elation between X and Y to verify that the linear relationship between X and Y is very weak or nonexistent (i.e., co
elation value should be close to 0).
Steps of Regression Analysis
Prepare a scatter plot and add a trendline in Excel.
Obtain the regression output in Excel.
Use the regression output and
write down predicted regression line,
make predictions, i.e., compute predicted value of the dependent variable for a given value of the independent variable,
discuss goodness of fit of the regression line,
discuss the existence (or lack of) a linear relationship between the dependent and independent variables,
discuss how reliable the predictions are based on the regression analysis.
Chap 13-24
Excel Exercise#11
You are asked to examine the relationship between the size (square feet) of a house and its sales price in a real estate market.
A random sample of 20 houses is selected
Dependent variable (Y) = house price in $1000s
Independent variable (X) = square feet
#11 (cont’d)
Snapshot of Data
#11 (cont’d)
Questions:
Prepare a scatter plot and fit a trend line to the data.
#11 (cont’d)
Questions:
2. Perform regression analysis using the Data Analysis Toolpak in Excel.
#11 (cont’d)
Questions:
3. What is the predicted regression line equation?
Trend line: Y= XXXXXXXXXX * X
From Regression Output in Excel:
Predicted sales price (on $1000s)
= XXXXXXXXXXx Size of the House in Sq.Ft.
#11 - Regression Line Equation
29
Y = XXXXXXXXXXX
Predicted sales price (on $1000s)
        = XXXXXXXXXXx Size of the House in Sq.Ft.
#11 (cont’d)
Questions:
4. What is the practical interpretation of the intercept of the regression equation in this example?
Y= XXXXXXXXXXwhen X=0, but house size will never be zero. Therefore, there is no practical interpretation.
#11 (cont’d)
Questions:
5. What is the practical interpretation of the slope of the regression equation in this example?
The sales price increases by 0.1303x$1000 = $1303 for every 1 sq.ft increase in the size of the house.
#11 (cont’d)
Questions:
6. What is the predicted price for a house that is 1900 square feet?
House price = XXXXXXXXXX1303x1900
    = $ XXXXXXXXXXin $1000s) = $338,838.70
#11 (cont’d)
Questions:
7. What is the value of the coefficient of determination?
Coefficient of determination = R-squared = 0.5992.
#11 – R-squared
R-squared
= SSR / SST
= XXXXXXXXXX / XXXXXXXXXX
= 0.5992
#11 (cont’d)
Interpretation of the coefficient of determination:

How good is the fit of the regression line to the data?
The house prices in our data set are not constant and vary from a minimum of $209,000 to a maximum of $498,000.
In this example, 59.9% of the variation in house prices is explained by the size of the house.
The size of a house is a good but not a perfect predictor of the price of a house. There must be other factors or variables that affect and determine the price of a house.
#11 (cont’d)
Questions:
8. Is there really a linear relationship between the size of the house and the house price? Conduct a hypothesis test on the slope coefficient of the regression line.
H0: β1 = 0
vs. H1: β1 ≠ 0
Chap 13-37
#11 – Inference about the slope
H0: β1 = 0
H1: β1 ≠ 0
P-value of the slope: p-value = 6.19422E-0.5 = XXXXXXXXXX.
Significance level: Choices are =0.01, 0.05, 0.10. The p-value is smaller than any of these significance levels.
Decision: Reject H0, since p-value < α.
Conclusion: There is sufficient evidence that size of the house affects the house price, i.e., the regression slope is not zero. We can claim that there is a linear relationship between these two variables. The strength of the linear relationship can be verified by separately calculating the co
elation between these two variables.
Hypothesis test of the regression line slope:
#11 (cont’d)
Questions:
9. If we were to use the regression output to predict the sales price of a house that is 10000 square feet, how reliable would our prediction be?
Excel Exercise#11 (cont’d)
Answers:
9. Prediction: Y = XXXXXXXXXX1303x10000 = $ XXXXXXXXXXin $1000s) = $1,394,268.70
Check the following to assess the reliability of the prediction:
Goodness of fit: R-squared value is XXXXXXXXXXThis is a good fit.
Inference on the slope: Reject H0. There is statistical evidence for a linear relationship between the size of a house and its price.
Risk of extrapolation: Size of house given XXXXXXXXXXsq.ft) is beyond the values used in the regression analysis. This means we need to extrapolate.
 The predicted value is not reliable because of extrapolation.
39
Chap 13-40
CAUTION
There is no “causation” – the changes in the values independent variable do not cause the values of the dependent variable to change.
You should build and interpret the model with the knowledge of the subject matter.
Do not extrapolate beyond the range of values used in the regression analysis.
You must ensure that the assumptions underlying least-squares regression are satisfied (beyond the scope
Answered 2 days After Feb 07, 2021

Solution

Himanshu answered on Feb 09 2021
151 Votes
Problem #12
Problem #12. Regression Analysis
Use the data provided in Problem#7 and explore the relationship between Age and Transaction value by performing regression analysis. Use Age as the independent, and Transaction Value as the dependent variable in a simple linear regression model. Use the Data Analysis Toolpak in Excel to obtain the statistical output of regression.
a) (2 points) Perform regression analysis using the Data Analysis Toolpak in Excel. Save the regression summary output in your worksheet.
) (2 points) What is p-value for the slope coefficient of the regression model?
c) (2 points) What is the predicted transaction value for a customer who is 42 years old?
d) (2 points) What percentage of the variability in transaction value can be explained by the regression model?
e) (2 points) Is there really a linear relationship between the size of the house and the house price? Conduct a hypothesis test on the slope coefficient of the regression line. Make sure you state the hull and alternative hypothesis, the p-value, and the conclusion of the test clearly.
Data #12
    Customer ID    Method of Payment    No. of Months Since Last Purchase    Discount Code was Emailed    Discount Code Used    Transaction value ($ sales value)    Age
    553800    EStore Card    9    No    No    $ 64.00    27
    555700    EStore Card    11    Yes    Yes    $ 54.00    27                a    SUMMARY OUTPUT
    558100    EStore Card    7    Yes    Yes    $ 49.00    27
    561200    EStore Card    11    Yes    Yes    $ 51.00    27                    Regression Statistics
    562000    EStore Card    5    Yes    No    $ 41.00    27                    Multiple R    0.3186453578
    562700    EStore Card    2    Yes    No    $ 27.00    27                    R Square    0.1015348641
    588500    EStore Card    3    Yes    Yes    $ 64.00    28                    Adjusted R Square    0.0954641537
    596800    MasterCard    12    No    No    $ 39.00    28                    Standard E
or    126.1824987404
    603900    EStore Card    2    No    Yes    $ 43.00    28                    Observations    150
    607000    EStore Card    11    Yes    No    $ 42.00    30
    609200    EStore Card    6    Yes    Yes    $ 50.00    30                    ANOVA
    612200    Visa    8    No    No    $ 32.00    30                        df    SS    MS    F    Significance F
    620900    EStore Card    9    Yes    Yes    $ 98.00    31                    Regression    1    266301.691054767    266301.691054767    16.7253678285    0.0000706632
    631400    EStore Card    7    Yes    Yes    $ 44.00    33                    Residual    148    2356459.40227857    15922.0229883687
    640900    EStore Card    4    Yes    No    $ 115.00    33                    Total    149    2622761.09333333
    643600    EStore Card    8    Yes    Yes    $ 80.00    33
    651100    EStore Card    2    Yes    Yes    $ 76.00    33                        Coefficients    Standard E
or    t Stat    P-value    Lower 95%    Upper 95%    Lower 95.0%    Upper 95.0%
    653900    Visa    10    No    Yes    $ 27.00    33                    Intercept    -31.7414181148    45.3098291065    -0.7005415545    0.4846890147    -121.2791905939    57.7963543644    -121.2791905939    57.7963543644
    677300    Visa    8    No    Yes    $ 108.00    34                    Age    3.4366699743    0.840330235    4.0896659801    0.0000706632    1.7760744947    5.0972654538    1.7760744947    5.0972654538
    678200    MasterCard    12    No    Yes    $ 74.00    34                    Equation: Y=-31.7414+3.43666*x which is Transaction value = -31.7414+3.43666*Age
    678500    EStore Card    7    Yes    Yes    $ 133.00    34
    679400    EStore Card    6    Yes    Yes    $ 123.00    36
    679700    EStore Card    2    Yes    Yes    $ 154.00    36                    b    p-value     0.0000706632    0.0000706632    (very small value)
    684800    EStore Card    12    Yes    Yes    $ 44.00    36                    c    Predicted transaction value for 42 years old customer would be                $ 112.60
    687100    EStore Card    2    Yes    Yes    $ 124.00    36                    d    0.1015348641    10.15%
    687400    EStore Card    6    No    Yes    $ 33.00    37                    e    Conduct a hypothesis test on the slope
    692700    MasterCard    8    No    Yes    $ 87.00    37                    The p-value is smaller than the significance values that would be used in a statistical test (eg. Alpha= 0.01,0.05,0.10)
    695800    Visa    2    No    Yes    $ 75.00    37                    Therefore, we reject the null hypothesis that the slope of the regression line is zero.
    695900    EStore Card    18    Yes    Yes    $ 126.00    39                    Conclusion: There is sufficient evidence that age of a customer affects the transaction value, i.e., the regression line’s slope is not zero.
    701100    EStore Card    4    No    Yes    $ 123.00    39                    This supports the claim for a linear relationship between the two variables.
    702000    MasterCard    10    No    Yes    $ 131.00    39                     However, the relationship is not perfectly linear because R-squared value is low, 011015.
    706400    EStore Card    9    No    Yes    $ 27.00    40
    708600    EStore Card    6    Yes    Yes    $ 127.00    40
    714000    EStore Card    7    Yes    Yes    $ 67.00    42
    720800    EStore Card    3    Yes    Yes    $ 92.00    43
    721600    EStore Card    9    No    Yes    $ 69.00    43
    729400    MasterCard    24    No    Yes    $ 29.00    43
    730000    EStore Card    3    Yes    Yes    $ 104.00    46
    734500    EStore Card    3    Yes    Yes    $ 112.00    46
    738900    EStore Card    6    No    Yes    $ 154.00    46
    747800    Discover    10    No    Yes    $ 133.00    48
    750100    EStore Card    9    Yes    Yes    $ 134.00    48
    752100    EStore Card    3    Yes    Yes    $ 21.00    48
    756700    EStore Card    7    Yes    Yes    $ 175.00    49
    756900    EStore Card    9    Yes    Yes    $ 141.00    49
    764800    EStore Card    2    Yes    Yes    $ 144.00    49
    768400    EStore Card    5    Yes    Yes    $ 128.00    51
    769700    Visa    8    No    Yes    $ 79.00    52
    777000    EStore Card    7    Yes    No    $ 81.00    52
    777300    MasterCard    8    No    No    $ 92.00    52
    778300    Visa    6    No    No    $ 64.00    52
    782600    EStore Card    3    Yes    No    $ 161.00    52
    783800    Discover    7    No    No    $ 108.00    52
    784300    EStore Card    12    Yes    Yes    $ 419.00    52
    784700    American Express    3    No    No    $ 202.00    54
    796900    EStore Card    2    Yes    No    $ 99.00    54
    799400    EStore Card    9    Yes    Yes    $ 348.00    54
    800700    EStore Card    8    No    No    $ 157.00    54
    809300    MasterCard    2    No    No    $ 168.00    54
    810400    MasterCard    3    No    Yes    $ 63.00    54
    816400    EStore Card    12    Yes    Yes    $ 41.00    54
    819400    MasterCard    16    No    No    $ 118.00    54
    827700    EStore Card    12    Yes    Yes    $ 170.00    54
    828200    Visa    5    No    No    $ 137.00    54
    828900    EStore Card    10    Yes    Yes    $ 70.00    55
    832900    EStore Card    8    Yes    Yes    $ 128.00    55
    845700    EStore Card    3    Yes    Yes    $ 676.00    55
    854000    EStore Card    11    No    Yes    $ 82.00    57
    856300    EStore Card    15    Yes    Yes    $ 213.00    58
    857400    Visa    11    No    No    $ 44.00    58
    862600    MasterCard    6    No    No    $ 24.00    58
    865400    Discover    8    No    Yes    $ 58.00    58
    887900    EStore Card    12    Yes    Yes    $ 84.00    58
    894500    EStore Card    11    Yes    Yes    $ 299.00    60
    901500    MasterCard    11    No    No    $ 91.00    60
    902900    EStore Card    18    No    Yes    $ 54.00    60
    910700    Visa    2    No    No    $ 332.00    60
    911200    American Express    12    No    No    $ 81.00    60
    911400    MasterCard    9    No    Yes    $ 193.00    60
    919700    MasterCard    19    No    No    $ 306.00    61
    923200    MasterCard    2    No    Yes    $ 112.00    61
    924000    EStore Card    17    Yes    Yes    $ 324.00    63
    926800    Discover    3    No    No    $ 111.00    63
    929200    EStore Card    2    Yes    No    $ 216.00    63
    930300    EStore Card    5    Yes    Yes    $ 675.00    63
    930700    EStore Card    15    Yes    No    $ 55.00    64
    940400    EStore Card    2    Yes    No    $ 612.00    64
    944400    EStore Card    22    Yes    Yes    $ 27.00    64
    946400    EStore Card    18    Yes    Yes    $ 123.00    64
    946700    EStore Card    17    Yes    No    $ 50.00    64
    947900    EStore Card    20    Yes    Yes    $ 41.00    66
    952900    EStore Card    4    Yes    Yes    $ 160.00    66
    955600    EStore Card    3    Yes    Yes    $ 29.00    66
    955600    EStore Card    10    Yes    No    $ 225.00    66
    956900    EStore Card    3    Yes    No    $ 170.00    67
    959200    EStore Card    2    Yes    Yes    $ 67.00    69
    961200    EStore Card    11    Yes    Yes    $ 30.00    70
    962700    EStore Card    28    Yes    No    $ 132.00    72
    962900    Visa    32    No    No    $ 185.00    72
    985500    EStore Card    2    Yes    No    $ 81.00    75
    985800    EStore Card    6    No    Yes    $ 178.00    52
    985900    Discover    10    No    Yes    $ 190.00    52
    986000    EStore Card    9    Yes    Yes    $ 131.00    52
    986100    EStore Card    3    Yes    Yes    $ 93.00    52
    986200    EStore Card    7    Yes    Yes    $ 791.00    52
    986300    EStore Card    9    Yes    Yes    $ 145.00    54
    986400    EStore Card    2    Yes    Yes    $ 74.00    54
    986500    EStore Card    5    Yes    Yes    $ 288.00    54
    986600    Visa    8    No    Yes    $ 211.00    54
    986700    EStore Card    7    Yes    No    $ 343.00    54
    986800    MasterCard    8    No    No    $ 48.00    54
    986900    Visa    6    No    No    $ 201.00    54
    987000    EStore Card    3    Yes    No    $ 176.00    54
    987100    Discover    7    No    No    $ 100.00    54
    987200    EStore Card    12    Yes    Yes    $ 272.00    54
    987300    American Express    3    No    No    $ 73.00    55
    987400    EStore Card    2    Yes    No    $ 52.00    55
    987500    EStore Card    9    Yes    Yes    $ 51.00    55
    987600    EStore Card    8    No    No    $ 203.00    57
    987700    MasterCard    2    No    No    $ 109.00    58
    987800    MasterCard    3    No    Yes    $ 195.00    58
    987900    EStore Card    12    Yes    Yes    $ 54.00    58
    988000    MasterCard    16    No    No    $ ...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here