Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Research Project for 3341 The project is to use a multiple regression analysis to analyze a data set that is of interest to you. If you have a strong interest in two group or analysis of variance you...

1 answer below »
Research Project for 3341
The project is to use a multiple regression analysis to analyze a data set that is of interest to you. If you have a strong interest in two group or analysis of variance you can do that with my concu
ence.
The final report for the project should be a 5-7 page paper that describes the questions of interest, how you used your data set to analyze these questions with details on the steps you used in your analysis, your findings about your question of interest and the limitations of your study. Specifically, your report should contain the following:

1. Abstract: A one paragraph summary of what you set out to learn, and what you ended up finding. It should summarize the entire report.
1. Introduction: A discussion of what questions you are interested in.
1. Data Set: Describe details about how the data set was collected and the variables in the data set.
1. Analysis: Describe how you used multiple regression to analyze the data set. Specifically, you should discuss how you ca
ied out the steps in analysis discussed in class, i.e., exploration of data to find an initial reasonable model, checking the model and changes to the model based on your checking of the model.
1. Results: Provide inferences about the questions of interest and discussion.
1. Limitations of study and conclusion: Describe any limitations of your study and how they might be overcome in future research and provide
ief conclusions about the results of your study.

Data Sets
The project will be of most interest to you if you find questions of interest and a data set that are of interest to you. Examples of questions of interest are as follows:
· What properties of a baseball team best predict its success over the course of a season?
· What properties of a college are related to its rank in the U.S. News and World Report rankings?
· Is the unemployment rate related to economic measures such as interest rates, stock returns, and the inflation rate?
· What properties of a state predict the proportion of the vote that George Bush (John Ke
y) received in it?
You will need a data set to explore your question of interest. I will be happy to help you with suggestions. The data set should ideally contain at least 30-50 observations or more (e.g., companies, people, countries, etc., as the case may be), and at least 4 variables (pieces of information about the observations; e.g., stock price, revenues, profits, salaries, gender, etc.), although if that is not possible, exceptions will be allowed (subject to my approval). Do not be concerned if your dataset is large.
One of the variables should be such that it is a numerical variable that would be of interest to try to model or forecast (e.g., for the examples above, team winning percentage, stock price change, U.S. News and World Report rank, gas mileage, unemployment rate, and proportion of vote received respectively).
I will be happy to discuss ideas with you.
Here are a few potential sources of ideas and data:
· http:
kaggle.com
· http:
www.hawkeslearning.com/Statistics/dis/datasets.html
· https:
www.springboard.com
log/free-public-data-sets-data-science-project
· https:
www.dataquest.io
log/free-datasets-for-projects
· http:
lib.stat.cmu.edu/DASL/
Samples
A good sample of what I’m expecting from the projects and reports is contained at the web site http:
pages.stern.nyu.edu/~jsimonof/classes/1305/projdoc/ . Note that these reports are for a class taught at New York University by Jeffrey Simonoff, so some of the methods used in the regression analyses may be unfamiliar to you.
1. Abstract
2. Introduction
a. What’s the problem
. Why is it important
c. How do you plan to solve it
d. Who cares?
e. Why do they care
f. (industry graph)
g. Lit Review (Background) Industry Review
i. Describe the industry
ii. Scholar.google.com
iii. Academic papers
iv. Articles
v. What have other people done in this research in the past
vi. *cited!!!
2. Data
a. What is your data
. Where did you get it
c. Descriptive statistics table
3. Methodology
a. Desci
e your methodology
. Include equations
c. Linear regression
d.
4. Results
a. Describe your results
. Are they significant
c. 10 step process
d. Alpha (.05)
5. Conclusion
a. The model is good
. Why is it important
c. This is important to whom?
d. How will this change the industry
e. Why should I care??
Suggested Variable List
SIC
STATE
FYEAR
ACT
AT
BKVL
CH
DLL
INVT
LCT
LT
PPECT
RE
TXP
COGS
DVC
EBIT
DVT
GP
NI
EMP
Answered Same Day Nov 27, 2021

Solution

Monali answered on Nov 29 2021
133 Votes
Note 2: This solution can not be worked based on guess work from any side. Only additional input and complete input with clear instruction of graphs to be prepared can give a direction.
Note1: For this report excel with 1338 data points were given and summary statistic of only 499 observation is given. No graphs are provided. I have not received reply for the query raised on above point. I have not got in to guess work of which data points selected out of 1338 total data points. Without co
ect and complete data, graphs cannot be produced. Graphs producing was no included in
ief. Also, I have included regression summary for analysis section. Within limitation of data, in situation of no reply to query and no complete information, below report is submitted.
Abstract
Insurance works on principal of appropriate risk segregation. Adverse selection refers to situation where a company fail to appropriately segregate risk thereby inadequate insurance pool gets created. In such an event of adverse selection, company will end up losing competitive advantage with price as well as business to others. Therefore, we need to check effect of variables on dependent variable.
This analysis contains insurance data for age between 18-64 years based on 3 independent variables, namely, BMI (Body Mass Index), Children (No. of Children) and Charges (Insurance premium). For this experiment, these three are considered as key variables. We examine these independent variables having different values of each individual datapoint reflecting different risk at different age. Here Multiple Linear Regression (MLR) technique is used. The summary shows see effect on response variable (Age) by three independent explanatory variables.
Introduction
Insurance business is based on charging adequate premium for protection against uncertainty. Risk is at the core of insurance business. This analysis is in fundamental interest to insurance company as risk differs at different ages. Each explanatory independent variable, namely BMI (Body Mass Index), Children (No. of children) and Charges (Premium), have unique values at for each Age creating independent and identical dataset for analysis. MLR summary shows mathematical relationship quantifying / predicting value of dependent variable. The summary gives statistical to examine measure of fit in order to see how well regression relationship is explained.
Data Set
Entire data points of 1338 collected for age...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here