Research Project for 3341The project is to use a multiple regression analysis to analyze a data set...

Question

Research Project for 3341The project is to use a multiple regression analysis to analyze a data set that is of interest to you.  If you have a strong interest in two group or analysis of variance you can do that with my concuence.The final report for the project should be a 5-7 page paper that describes the questions of interest, how you used your data set to analyze these questions with details on the steps you used in your analysis, your findings about your question of interest and the limitations of your study.  Specifically, your report should contain the following: 1. Abstract: A one paragraph summary of what you set out to learn, and what you ended up finding.  It should summarize the entire report.  1. Introduction: A discussion of what questions you are interested in.1. Data Set: Describe details about how the data set was collected and the variables in the data set. 1. Analysis: Describe how you used multiple regression to analyze the data set.  Specifically, you should discuss how you caied out the steps in analysis discussed in class, i.e., exploration of data to find an initial reasonable model, checking the model and changes to the model based on your checking of the model.     1. Results: Provide inferences about the questions of interest and discussion.1. Limitations of study and conclusion: Describe any limitations of your study and how they might be overcome in future research and provide ief conclusions about the results of your study.  Data SetsThe project will be of most interest to you if you find questions of interest and a data set that are of interest to you.   Examples of questions of interest are as follows: · What properties of a baseball team best predict its success over the course of a season?  · What properties of a college are related to its rank in the U.S. News and World Report rankings? · Is the unemployment rate related to economic measures such as interest rates, stock returns, and the inflation rate? · What properties of a state predict the proportion of the vote that George Bush (John Key) received in it?  You will need a data set to explore your question of interest.  I will be happy to help you with suggestions.  The data set should ideally contain at least 30-50 observations or more (e.g., companies, people, countries, etc., as the case may be), and at least 4 variables (pieces of information about the observations; e.g., stock price, revenues, profits, salaries, gender, etc.), although if that is not possible, exceptions will be allowed (subject to my approval). Do not be concerned if your dataset is large. One of the variables should be such that it is a numerical variable that would be of interest to try to model or forecast (e.g., for the examples above, team winning percentage, stock price change, U.S. News and World Report rank, gas mileage, unemployment rate, and proportion of vote received respectively). I will be happy to discuss ideas with you.  Here are a few potential sources of ideas and data: · http:kaggle.com· http:www.hawkeslearning.com/Statistics/dis/datasets.html· https:www.springboard.comlog/free-public-data-sets-data-science-project· https:www.dataquest.iolog/free-datasets-for-projects· http:lib.stat.cmu.edu/DASL/                                        SamplesA good sample of what I’m expecting from the projects and reports is contained at the web site http:pages.stern.nyu.edu/~jsimonof/classes/1305/projdoc/   .  Note that these reports are for a class taught at New York University by Jeffrey Simonoff, so some of the methods used in the regression analyses may be unfamiliar to you.1. Abstract2. Introductiona. What’s the problem. Why is it importantc. How do you plan to solve itd. Who cares?e. Why do they caref. (industry graph) g. Lit Review (Background) Industry Reviewi. Describe the industryii. Scholar.google.comiii. Academic papersiv. Articlesv. What have other people done in this research in the pastvi. *cited!!!2. Dataa. What is your data. Where did you get itc. Descriptive statistics table3. Methodologya. Descie your methodology . Include equationsc. Linear regressiond. 4. Resultsa. Describe your results. Are they significant c. 10 step processd. Alpha (.05)5. Conclusiona. The model is good. Why is it importantc. This is important to whom?d. How will this change the industrye. Why should I care??Suggested Variable ListSICSTATEFYEARACTATBKVLCHDLLINVTLCTLTPPECTRETXPCOGSDVCEBITDVTGPNIEMP

Monali · Accepted Answer

Note 2: This solution can not be worked based on guess work from any side. Only additional input and complete input with clear instruction of graphs to be prepared can give a direction. 
Note1: For this report excel with 1338 data points were given and summary statistic of only 499 observation is given. No graphs are provided. I have not received reply for the query raised on above point. I have not got in to guess work of which data points selected out of 1338 total data points. Without correct and complete data, graphs cannot be produced. Graphs producing was no included in brief. Also, I have included regression summary for analysis section. Within limitation of data, in situation of no reply to query and no complete information, below report is submitted. 
Abstract
Insurance works on principal of appropriate risk segregation. Adverse selection refers to situation where a company fail to appropriately segregate risk thereby inadequate insurance pool gets created. In such an event of adverse selection, company will end up losing competitive advantage with price as well as business to others. Therefore, we need to check effect of variables on dependent variable.
This analysis contains insurance data for age between 18-64 years based on 3 independent variables, namely, BMI (Body Mass Index), Children (No. of Children) and Charges (Insurance premium). For this experiment, these three are considered as key variables. We examine these independent variables having different values of each individual datapoint reflecting different risk at different age. Here Multiple Linear Regression (MLR) technique is used. The summary shows see effect on response variable (Age) by three independent explanatory variables. 
Introduction
Insurance business is based on charging adequate premium for protection against uncertainty. Risk is at the core of insurance business. This analysis is in fundamental interest to insurance company as risk differs at different ages. Each explanatory independent variable, namely BMI (Body Mass Index), Children (No. of children) and Charges (Premium), have unique values at for each Age creating independent and identical dataset for analysis. MLR summary shows mathematical relationship quantifying / predicting value of dependent variable. The summary gives statistical to examine measure of fit in order to see how well regression relationship is explained.

Research Project for 3341 The project is to use a multiple regression analysis to analyze a data set that is of interest to you. If you have a strong interest in two group or analysis of variance you...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment