Outline
The project is to use a multiple regression analysis to analyze a data set that is of interest to you. If you have a strong interest in two group or analysis of variance you can do that with my concu
ence.
The final report for the project should be a 5-7 page paper that describes the questions of interest, how you used your data set to analyze these questions with details on the steps you used in your analysis, your findings about your question of interest and the limitations of your study. Specifically, your report should contain the following:
1. Abstract: A one paragraph summary of what you set out to learn, and what you ended up finding. It should summarize the entire report.
1. Introduction: A discussion of what questions you are interested in.
1. Data Set: Describe details about how the data set was collected and the variables in the data set.
1. Analysis: Describe how you used multiple regression to analyze the data set. Specifically, you should discuss how you ca
ied out the steps in analysis discussed in class, i.e., exploration of data to find an initial reasonable model, checking the model and changes to the model based on your checking of the model.
1. Results: Provide inferences about the questions of interest and discussion.
1. Limitations of study and conclusion: Describe any limitations of your study and how they might be overcome in future research and provide
ief conclusions about the results of your study.
1. Abstract
2. Introduction
a. What’s the problem?
. Why is it important
c. How do you plan to solve it?
d. Who cares?
e. Why do they care
f. (industry graph)
g. Lit Review (Background) Industry Review
i. Describe the industry
ii. Scholar.google.com
iii. Academic papers
iv. Articles
v. What have other people done in this research in the past
vi. *cited!!!
2. Data
a. What is your data
. Where did you get it
c. Descriptive statistics table
3. Methodology
a. Describe your methodology
. Include equations
c. Linear regression
d.
4. Results
a. Describe your results
. Are they significant?
c. 10 step process
d. Alpha (.05)
5. Conclusion
a. The model is good
. Why is it important
c. This is important to whom?
d. How will this change the industry
e. Why should I care??
Data Sets
The project will be of most interest to you if you find questions of interest and a data set that are of interest to you. Examples of questions of interest are as follows:
1. What properties of a baseball team best predict its success over the course of a season?
1. What properties of a college are related to its rank in the U.S. News and World Report rankings?
1. Is the unemployment rate related to economic measures such as interest rates, stock returns, and the inflation rate?
1. What properties of a state predict the proportion of the vote that George Bush (John Ke
y) received in it?
You will need a data set to explore your question of interest. I will be happy to help you with suggestions. The data set should ideally contain at least 30-50 observations or more (e.g., companies, people, countries, etc., as the case may be), and at least 4 variables (pieces of information about the observations; e.g., stock price, revenues, profits, salaries, gender, etc.), although if that is not possible, exceptions will be allowed (subject to my approval). Do not be concerned if your dataset is large.
One of the variables should be such that it is a numerical variable that would be of interest to try to model or forecast (e.g., for the examples above, team winning percentage, stock price change, U.S. News and World Report rank, gas mileage, unemployment rate, and proportion of vote received respectively).
I will be happy to discuss ideas with you.
Here are a few potential sources of ideas and data:
1. http:
kaggle.com
1. http:
www.hawkeslearning.com/Statistics/dis/datasets.html
1. https:
www.springboard.com
log/free-public-data-sets-data-science-project
1. https:
www.dataquest.io
log/free-datasets-for-projects
1. http:
lib.stat.cmu.edu/DASL/
Samples
A good sample of what I’m expecting from the projects and reports is contained at the web site http:
pages.stern.nyu.edu/~jsimonof/classes/1305/projdoc/ . Note that these reports are for a class taught at New York University by Jeffrey Simonoff, so some of the methods used in the regression analyses may be unfamiliar to you.