PowerPoint PresentationITEC 210 DATA ANALYSIS FOR BUSINESSAnalyzing Quantitative VariablesProf. Itir...

Question

PowerPoint PresentationITEC 210 DATA ANALYSIS FOR BUSINESSAnalyzing Quantitative VariablesProf. Itir KARAESMEN AYDIN1Outline and Learning OutcomesIn this presentation, you will learn To build simple linear regression models. To interpret the statistical output of a linear regression model. To make predictions based on simple linear regression models (trend lines). To define and interpret summary statistics (descriptive measures) for quantitative variables. NOTE: This presentation does not show you *how* the work is done on Excel. 2Scatter Plots and Trend Lines3Trend LineTrend line is a straight lineIt is displayed on the scatter plotThe trend line equation is Y = b0 + b1 Xwhere X: variable displayed on the horizontal axis of the scatter plot 0: intercept of the line (the value Y takes when X=0).1: slope of the line (i.e., every 1 unit change in X, results in b units of change in Y); slope can be positive or negative.Y: variable displayed on the vertical axis of the scatter plot4Fitting a Trend Line to the Data5ABCWhich of these lines “fits best” to the data? Simple Linear Regression6Learning ObjectivesIn this presentation, you will learnHow to use regression analysis to predict the value of a dependent variable based on an independent variableThe meaning of the regression coefficients b0 and b1How to judge the goodness of fitHow to make inferences about the slopeExploring the Relationship Between Two Quantitative VariablesA scatter plot shows the relationship between two variablesCoelation measures the strength of the linear relationship between two variablesRegression analysis is used to:Predict the value of a dependent variable based on the value of at least one independent variableExplain the impact of changes in an independent variable on the dependent variableDependent vs. Independent Variables in RegressionDependent variable (or outcome variable):  the variable we intend to predict or explainIndependent variable (or predictor):  the variable we use to predict or explain the dependent variableSimple Linear Regression ModelIn a simple linear regression model There is only one independent variable, XThe dependent variable Y  is described by a linear function of XThe changes in Y are assumed to be related to changes in XLinear componentSimple Linear Regression ModelPopulation Y  intercept Population SlopeCoefficient Random Eor termDependent VariableIndependent VariableEor (random)YXYi: Observed Value of Y for XiXiScatter PlotYXXiSlope = β1Intercept = β0  Simple Linear Regression ModelYi: Observed Value of Y for XiThe simple linear regression equation provides an estimate of the population regression line.Population regression line:Prediction line (regression equation):  Simple Linear Regression Equation (Prediction Line)Estimate of the regressionline interceptEstimate of the regression slopeEstimated  (or predicted) Y value for observation iValue of X for observation i0 is the estimated average value of Y when the value of X is zero1 is the estimated change in the average value of Y as a result of a one-unit increase in XRegression CoefficientsThe Least Squares Method0  and  b1  are obtained by finding the values of  that minimize the sum of the squared differences between observed Y and predicted     :Prediction eor for this observationYXPredicted Value of Y for Xi XiSlope = b1Intercept = b0  Simple Linear Regression ModelYi: Observed Value of Y for XiGoodness of FitHow well does the estimated regression line fit the data? We investigate the variation in regression to answer this. Total Sum of SquaresRegression Sum of SquaresEor Sum of Squareswhere:        = Mean value of the dependent variable    Yi = Observed value of the dependent variable        = Predicted value of Y for the given Xi valueA measure of goodness of fit is the coefficient of determination, a.k.a., the R-squared value (R2).R2  is the portion of the total variation in the dependent variable that is explained by variation in the independent variable:The higher the R2 value, the better the fit. R-squared is NOT equal to the coelation between the dependent and independent variables. R-squared is equal to square of coelation in simple linear regression.  Goodness of Fit (cont’d)R2Chap 13-20Inferences About the SlopeQuestionsIs there a linear relationship between X and Y?Could the slope of the regression line be 0?Hypothesis TestH0:  β1 = 0  (the null hypothesis: slope=0)H1:  β1 ≠ 0 (the alternative hypothesis: slope ≠0)Chap 13-21Inferences About the Slope (cont’d)Conducting the Hypothesis Test:Obtain the p-value for the slope coefficient from the regression output.Compare the p-value to a given significance level, . Typical choices of =0.01, 0.05, 0.10.Conclude: Reject H0 if p-value Fail to reject H0 if p-value > .Interpret the results and conclusions.Chap 13-22Inferences About the Slope (cont’d)Interpretation of the hypothesis test resultsReject H0: There is enough statistical evidence that supports the claim that the slope is not zero. We can say there is a linear relationship between X and Y. The strength of the linear relationship can separately be evaluated by computing the coelation between X and Y. Fail to reject H0: Statistical evidence supports the claim that the slope is zero. We cannot say there is a linear relationship between X and Y. You can separately compute the coelation between X and Y to verify that the linear relationship between X and Y is very weak or nonexistent (i.e., coelation value should be close to 0). Steps of Regression AnalysisPrepare a scatter plot and add a trendline in Excel. Obtain the regression output in Excel. Use the regression output and write down predicted regression line, make predictions, i.e., compute predicted value of the dependent variable for a given value of the independent variable,discuss goodness of fit of the regression line, discuss the existence (or lack of) a linear relationship between the dependent and independent variables, discuss how reliable the predictions are based on the regression analysis. Chap 13-24Excel Exercise#11You are asked to examine the relationship between the size (square feet) of a house and its sales price in a real estate market. A random sample of 20 houses is selectedDependent variable (Y) = house price in $1000sIndependent variable (X) = square feet#11 (cont’d)Snapshot of Data#11 (cont’d)Questions: Prepare a scatter plot and fit a trend line to the data. #11 (cont’d)Questions: 2. Perform regression analysis using the Data Analysis Toolpak in Excel.#11 (cont’d)Questions:3. What is the predicted regression line equation?Trend line: Y= XXXXXXXXXX * XFrom Regression Output in Excel: Predicted sales price (on $1000s)       = XXXXXXXXXXx Size of the House in Sq.Ft.#11  - Regression Line Equation29Y = XXXXXXXXXXXPredicted sales price (on $1000s)         = XXXXXXXXXXx Size of the House in Sq.Ft.#11 (cont’d)Questions:4. What is the practical interpretation of the intercept of the regression equation in this example?Y= XXXXXXXXXXwhen X=0, but house size will never be zero. Therefore, there is no practical interpretation.#11 (cont’d)Questions:5. What is the practical interpretation of the slope of the regression equation in this example?The sales price increases by 0.1303x$1000 = $1303 for every 1 sq.ft increase in the size of the house.#11 (cont’d)Questions:6. What is the predicted price for a house that is 1900 square feet? House price = XXXXXXXXXX1303x1900     = $ XXXXXXXXXXin $1000s) = $338,838.70#11 (cont’d)Questions:7. What is the value of the coefficient of determination?Coefficient of determination = R-squared = 0.5992.#11 – R-squaredR-squared = SSR / SST= XXXXXXXXXX / XXXXXXXXXX= 0.5992#11 (cont’d)Interpretation of the coefficient of determination: How good is the fit of the regression line to the data?The house prices in our data set are not constant and vary from a minimum of $209,000 to a maximum of $498,000. In this example, 59.9% of the variation in house prices is explained by the size of the house.The size of a house is a good but not a perfect predictor of the price of a house.  There must be other factors or variables that affect and determine the price of a house. #11 (cont’d)Questions:8.  Is there really a linear relationship between the size of the house and the house price? Conduct a hypothesis test on the slope coefficient of the regression line. H0: β1 = 0vs.  H1: β1 ≠ 0Chap 13-37#11 – Inference about the slopeH0: β1 = 0H1: β1 ≠ 0P-value of the slope:  p-value = 6.19422E-0.5 = XXXXXXXXXX.Significance level: Choices are =0.01, 0.05, 0.10. The p-value is smaller than any of these significance levels. Decision:  Reject H0, since p-value Conclusion: There is sufficient evidence that size of the house affects the house price, i.e., the regression slope is not zero. We can claim that there is a linear relationship between these two variables.  The strength of the linear relationship can be verified by separately calculating the coelation between these two variables. Hypothesis test of the regression line slope:#11 (cont’d)Questions:9. If we were to use the regression output to predict the sales price of a house that is 10000 square feet, how reliable would our prediction be? Excel Exercise#11  (cont’d)Answers: 9. Prediction: Y = XXXXXXXXXX1303x10000 = $ XXXXXXXXXXin $1000s) = $1,394,268.70Check the following to assess the reliability of the prediction: Goodness of fit: R-squared value is XXXXXXXXXXThis is a good fit. Inference on the slope: Reject H0. There is statistical evidence for a linear relationship between the size of a house and its price. Risk of extrapolation: Size of house given XXXXXXXXXXsq.ft)  is beyond the values used in the regression analysis. This means we need to extrapolate.  The predicted value is not reliable because of extrapolation.39Chap 13-40CAUTIONThere is no “causation” – the changes in the values independent variable do not cause the values of the dependent variable to change. You should build and interpret the model with the knowledge of the subject matter.Do not extrapolate beyond the range of values used in the regression analysis.You must ensure that the assumptions underlying least-squares regression are satisfied (beyond the scope

Himanshu · Accepted Answer

Problem #12
Problem #12. Regression Analysis
Use the data provided in Problem#7  and explore the relationship between Age and Transaction value by performing regression analysis. Use Age as the independent, and Transaction Value as the dependent variable in a simple linear regression model. Use the Data Analysis Toolpak in Excel to obtain the statistical output of regression. 
a) (2 points) Perform regression analysis using the Data Analysis Toolpak in Excel. Save the regression summary output in your worksheet. 
b) (2 points) What is p-value for the slope coefficient of the regression model?
c) (2 points) What is the predicted transaction value for a customer who is 42 years old? 
d) (2 points) What percentage of the variability in transaction value can be explained by the regression model? 
e) (2 points) Is there really a linear relationship between the size of the house and the house price? Conduct a hypothesis test on the slope coefficient of the regression line.  Make sure you state the hull and alternative hypothesis, the p-value, and the conclusion of the test clearly. 
Data #12
	Customer ID	Method of Payment	No. of Months Since Last Purchase	Discount Code was Emailed	Discount Code Used	Transaction value ($ sales value)	Age
	553800	EStore Card	9	No	No	$   64.00	27
	555700	EStore Card	11	Yes	Yes	$   54.00	27				a	SUMMARY OUTPUT
	558100	EStore Card	7	Yes	Yes	$   49.00	27
	561200	EStore Card	11	Yes	Yes	$   51.00	27					Regression Statistics
	562000	EStore Card	5	Yes	No	$   41.00	27					Multiple R	0.3186453578
	562700	EStore Card	2	Yes	No	$   27.00	27					R Square	0.1015348641
	588500	EStore Card	3	Yes	Yes	$   64.00	28					Adjusted R Square	0.0954641537
	596800	MasterCard	12	No	No	$   39.00	28					Standard Error	126.1824987404
	603900	EStore Card	2	No	Yes	$   43.00	28					Observations	150
	607000	EStore Card	11	Yes	No	$   42.00	30
	609200	EStore Card	6	Yes	Yes	$   50.00	30					ANOVA
	612200	Visa	8	No	No	$   32.00	30						df	SS	MS	F	Significance F
	620900	EStore Card	9	Yes	Yes	$   98.00	31					Regression	1	266301.691054767	266301.691054767	16.7253678285	0.0000706632
	631400	EStore Card	7	Yes	Yes	$   44.00	33					Residual	148	2356459.40227857	15922.0229883687
	640900	EStore Card	4	Yes	No	$   115.00	33					Total	149	2622761.09333333
	643600	EStore Card	8	Yes	Yes	$   80.00	33
	651100	EStore Card	2	Yes	Yes	$   76.00	33						Coefficients	Standard Error	t Stat	P-value	Lower 95%	Upper 95%	Lower 95.0%	Upper 95.0%
	653900	Visa	10	No	Yes	$   27.00	33					Intercept	-31.7414181148	45.3098291065	-0.7005415545	0.4846890147	-121.2791905939	57.7963543644	-121.2791905939	57.7963543644
	677300	Visa	8	No	Yes	$   108.00	34					Age	3.4366699743	0.840330235	4.0896659801	0.0000706632	1.7760744947	5.0972654538	1.7760744947	5.0972654538
	678200	MasterCard	12	No	Yes	$   74.00	34					Equation: Y=-31.7414+3.43666*x which is Transaction value = -31.7414+3.43666*Age
	678500	EStore Card	7	Yes	Yes	$   133.00	34
	679400	EStore Card	6	Yes	Yes	$   123.00	36
	679700	EStore Card	2	Yes	Yes	$   154.00	36					b	p-value 	0.0000706632	0.0000706632	(very small value)
	684800	EStore Card	12	Yes	Yes	$   44.00	36					c	Predicted transaction value for 42 years old customer would be				$   112.60
	687100	EStore Card	2	Yes	Yes	$   124.00	36					d	0.1015348641	10.15%
	687400	EStore Card	6	No	Yes	$   33.00	37					e	Conduct a hypothesis test on the slope
	692700	MasterCard	8	No	Yes	$   87.00	37					The p-value is smaller than the significance values that would be used in a statistical test (eg. Alpha= 0.01,0.05,0.10)
	695800	Visa	2	No	Yes	$   75.00	37					Therefore, we reject the null hypothesis that the slope of the regression line is zero.
	695900	EStore Card	18	Yes	Yes	$   126.00	39					Conclusion: There is sufficient evidence that age of a customer affects the transaction value, i.e., the regression line’s slope is not zero. 
	701100	EStore Card	4	No	Yes	$   123.00	39					This supports the claim for a linear relationship between the two variables.
	702000	MasterCard	10	No	Yes	$   131.00	39					 However, the relationship is not perfectly linear because R-squared value is low, 011015. 
	706400	EStore Card	9	No	Yes	$   27.00	40
	708600	EStore Card	6	Yes	Yes	$   127.00	40
	714000	EStore Card	7	Yes	Yes	$   67.00	42
	720800	EStore Card	3	Yes	Yes	$   92.00	43
	721600	EStore Card	9	No	Yes	$   69.00	43
	729400	MasterCard	24	No	Yes	$   29.00	43
	730000	EStore Card	3	Yes	Yes	$   104.00	46
	734500	EStore Card	3	Yes	Yes	$   112.00	46
	738900	EStore Card	6	No	Yes	$   154.00	46
	747800	Discover	10	No	Yes	$   133.00	48
	750100	EStore Card	9	Yes	Yes	$   134.00	48
	752100	EStore Card	3	Yes	Yes	$   21.00	48
	756700	EStore Card	7	Yes	Yes	$   175.00	49
	756900	EStore Card	9	Yes	Yes	$   141.00	49
	764800	EStore Card	2	Yes	Yes	$   144.00	49
	768400	EStore Card	5	Yes	Yes	$   128.00	51
	769700	Visa	8	No	Yes	$   79.00	52
	777000	EStore Card	7	Yes	No	$   81.00	52
	777300	MasterCard	8	No	No	$   92.00	52
	778300	Visa	6	No	No	$   64.00	52
	782600	EStore Card	3	Yes	No	$   161.00	52
	783800	Discover	7	No	No	$   108.00	52
	784300	EStore Card	12	Yes	Yes	$   419.00	52
	784700	American Express	3	No	No	$   202.00	54
	796900	EStore Card	2	Yes	No	$   99.00	54
	799400	EStore Card	9	Yes	Yes	$   348.00	54
	800700	EStore Card	8	No	No	$   157.00	54
	809300	MasterCard	2	No	No	$   168.00	54
	810400	MasterCard	3	No	Yes	$   63.00	54
	816400	EStore Card	12	Yes	Yes	$   41.00	54
	819400	MasterCard	16	No	No	$   118.00	54
	827700	EStore Card	12	Yes	Yes	$   170.00	54
	828200	Visa	5	No	No	$   137.00	54
	828900	EStore Card	10	Yes	Yes	$   70.00	55
	832900	EStore Card	8	Yes	Yes	$   128.00	55
	845700	EStore Card	3	Yes	Yes	$   676.00	55
	854000	EStore Card	11	No	Yes	$   82.00	57
	856300	EStore Card	15	Yes	Yes	$   213.00	58
	857400	Visa	11	No	No	$   44.00	58
	862600	MasterCard	6	No	No	$   24.00	58
	865400	Discover	8	No	Yes	$   58.00	58
	887900	EStore Card	12	Yes	Yes	$   84.00	58
	894500	EStore Card	11	Yes	Yes	$   299.00	60
	901500	MasterCard	11	No	No	$   91.00	60
	902900	EStore Card	18	No	Yes	$   54.00	60
	910700	Visa	2	No	No	$   332.00	60
	911200	American Express	12	No	No	$   81.00	60
	911400	MasterCard	9	No	Yes	$   193.00	60
	919700	MasterCard	19	No	No	$   306.00	61
	923200	MasterCard	2	No	Yes	$   112.00	61
	924000	EStore Card	17	Yes	Yes	$   324.00	63
	926800	Discover	3	No	No	$   111.00	63
	929200	EStore Card	2	Yes	No	$   216.00	63
	930300	EStore Card	5	Yes	Yes	$   675.00	63
	930700	EStore Card	15	Yes	No	$   55.00	64
	940400	EStore Card	2	Yes	No	$   612.00	64
	944400	EStore Card	22	Yes	Yes	$   27.00	64
	946400	EStore Card	18	Yes	Yes	$   123.00	64
	946700	EStore Card	17	Yes	No	$   50.00	64
	947900	EStore Card	20	Yes	Yes	$   41.00	66
	952900	EStore Card	4	Yes	Yes	$   160.00	66
	955600	EStore Card	3	Yes	Yes	$   29.00	66
	955600	EStore Card	10	Yes	No	$   225.00	66
	956900	EStore Card	3	Yes	No	$   170.00	67
	959200	EStore Card	2	Yes	Yes	$   67.00	69
	961200	EStore Card	11	Yes	Yes	$   30.00	70
	962700	EStore Card	28	Yes	No	$   132.00	72
	962900	Visa	32	No	No	$   185.00	72
	985500	EStore Card	2	Yes	No	$   81.00	75
	985800	EStore Card	6	No	Yes	$   178.00	52
	985900	Discover	10	No	Yes	$   190.00	52
	986000	EStore Card	9	Yes	Yes	$   131.00	52
	986100	EStore Card	3	Yes	Yes	$   93.00	52
	986200	EStore Card	7	Yes	Yes	$   791.00	52
	986300	EStore Card	9	Yes	Yes	$   145.00	54
	986400	EStore Card	2	Yes	Yes	$   74.00	54
	986500	EStore Card	5	Yes	Yes	$   288.00	54
	986600	Visa	8	No	Yes	$   211.00	54
	986700	EStore Card	7	Yes	No	$   343.00	54
	986800	MasterCard	8	No	No	$   48.00	54
	986900	Visa	6	No	No	$   201.00	54
	987000	EStore Card	3	Yes	No	$   176.00	54
	987100	Discover	7	No	No	$   100.00	54
	987200	EStore Card	12	Yes	Yes	$   272.00	54
	987300	American Express	3	No	No	$   73.00	55
	987400	EStore Card	2	Yes	No	$   52.00	55
	987500	EStore Card	9	Yes	Yes	$   51.00	55
	987600	EStore Card	8	No	No	$   203.00	57
	987700	MasterCard	2	No	No	$   109.00	58
	987800	MasterCard	3	No	Yes	$   195.00	58
	987900	EStore Card	12	Yes	Yes	$   54.

PowerPoint Presentation ITEC 210 DATA ANALYSIS FOR BUSINESS Analyzing Quantitative Variables Prof. Itir KARAESMEN AYDIN 1 Outline and Learning Outcomes In this presentation, you will learn To build...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment