Sheet1
Car MPG Weight Cylinders Horsepower Country
Buick Skylark 28.4 2670 4 90 U.S.
Dodge Omni 30.9 2230 4 75 U.S.
Mercury Zephyr 20.8 3070 6 85 U.S.
Fiat Strada 37.3 2130 4 69 Italy
Peugeot 694 SL 17.8 3410 6 133 France
VW Ra
it 31.9 1925 4 71 Germany
Plymouth Horizon 34.2 2200 4 70 U.S.
Mazda GLC 34.1 1975 4 65 Japan
Buick Estate Wagon 16.9 4360 8 155 U.S.
Audi 5000 22.5 2830 5 103 Germany
Chevy Malibu Wagon 19.2 3605 8 125 U.S.
Dodge Aspen 18.6 3620 6 110 U.S.
VW Dasher 30.5 2190 4 78 Germany
Ford Mustang 4 26.5 2585 4 88 U.S.
Dodge Colt 35.1 1915 4 80 Japan
Datsun 810 22 2815 6 97 Japan
VW Scirocco 31.5 1990 4 71 Germany
Chevy Citation 28.8 2595 6 115 U.S.
Olds Omega 26.8 2700 6 115 U.S.
Chrysler LeBaron Wagon 18.5 3940 8 150 U.S.
Datsun 510 27.2 2300 4 97 Japan
AMC Concord D/L 18.1 3410 6 120 U.S.
Buick Century Special 20.6 3380 6 105 U.S.
Saab 99 GLE 21.6 2795 4 115 Sweden
Datsun 210 31.8 2020 4 65 Japan
Ford LTD 17.6 3725 8 129 U.S.
Volvo 240 GL 19 3140 6 125 Sweden
Dodge St Regis 18.2 3830 8 135 U.S.
Toyota Corona 27.5 2560 4 95 Japan
Chevette 30 2155 4 68 U.S.
Ford Mustang Ghia 21.9 2910 6 109 U.S.
AMC Spirit 27.4 2670 4 80 U.S.
Ford Country Squire Wagon 28 4054 8 142 U.S.
BMW 320i 23.1 2600 4 110 Germany
Pontiac Phoenix 33.5 2556 4 90 U.S.
Honda Accord LX 29.5 2135 4 68 Japan
Mercury Grand Marquis 16.5 3955 8 138 U.S.
Chevy Caprice Classic 17 3840 8 130 U.S.
Engineering Department, UMass Boston XXXXXXXXXXSpring 2022
ENGIN 322: Probability and Random Processes
Project #2: Sampling Theory and Linear Regression
Project Description
In this project, you will assume the part of an automotive manufacturer. In part I, you will provide
ackground analysis for the development of a new vehicle. In part II, you will develop a method for
sampling manufactured parts in order to determine the likelihood of failure. (Note: Data for Part I of the
project is slightly modified from: https:
www.statcrunch.com/5.0/viewresult.php?resid=1878105).
Part I
The spreadsheet ‘Vehicle_Info.xlsx’ provides detailed information about vehicles on the market. In Part I,
you should load the data into Matlab and compare vehicle properties to average miles per gallon (mpg).
a) Load the data set into Matlab with the readtable() function
) Determine the data set co
elation (i.e., Pearson’s r) for average mpg versus weight, number of
cylinders, and horsepower.
c) Use the scatter() and polyfit() functions to display the data points and linear regression curves
for average mpg versus weight, number of cylinders, and horsepower. For each of the 3 plots,
show average mpg on your Y axis. Label the resulting plots and include them in your report.
Question 1: Would you say that a vehicles horsepower impacts the average mpg? What about the impact
of weight and number and cylinders on average mpg? Explain your reasoning.
Question 2: Your company’s newest vehicle model will be a 6 cylinder that weighs 3,250lbs and has 100
horsepower. If your goal is to be above the regression line for every category, what average mpg should
you strive for?
Part II
Assume that you have received reports of faulty headlights. The data values in the file
‘population_data.mat’ represent the population of headlights on the production line (0 represents a good
headlight, 1 represents a faulty headlight). You are unable to test all headlights; but you are able to
andomly sample ?? of the headlights as they come off the production line. In Part II, you should simulate
the following in Matlab for values of ?? ranging from 1 to 2000.
a) For each value of ??, randomly sample n headlights from the population and determine the sample
mean and sample variance (i.e., the percentage of faulty headlights in your sample and the
associated variance of the sample).
) Plot the sample mean and sample variance versus the sample size.
Question 1: What do you notice about the sample mean and sample variance as you increase sample size?
Question 2: Based on your plots, what can you predict about the percentage of headlights in the
population that are faulty? Explain your reasoning. (Hint: Zoom in on the values from n=1000 to n=2000
for a better visualization. You can compare your prediction to the true mean by determining the mean
value of the full set of values from ‘population_data.mat’)
https:
www.statcrunch.com/5.0/viewresult.php?resid=1878105
Grading Metrics
• Coding and Results: 30%
• Theoretical Analysis: 40%
• Written Report: 30%
Coding and Results: This portion of the project will be graded based on the implementation of code as
described for Part I and II above. Results should include a description of the observed outcomes and some
depiction of the results from your code.
Theoretical Analysis: This portion of the project will be graded based on your answers to the questions
above. Be sure to clearly indicate the answers AND REASONING for each of the questions within your
written report.
Written Report: The written report should be submitted on blackboard by midnight on April 8. The report
should be 3-5 pages including an overview of the project, expected outcomes, your analysis method,
esults, and observations. You may include any code as an appendix.
Project Description
Part I
Part II
Grading Metrics
Part II - Old