Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

Assignment Details: · This is an individual Assignment where you need to analyze your own data and write a formal report of the analysis results. · Start by selecting a relatively large data set to...

1 answer below »
Assignment Details:
· This is an individual Assignment where you need to analyze your own data and write a formal report of the analysis results.
· Start by selecting a relatively large data set to analyze. There should be at least 10 variables in the data set and more than 50 observations. Have at least two “objectives” that you’d like to answer via data analysis. State these objectives clearly in the beginning of your report.
· Use the analysis techniques we are learning this semester to analyze the data and answer your objectives. This data may come from your work/internship or from an internet source that is of interest to you. If you’re struggling to find data, check out https: to an external site. for freely available data sets.
· Use the dataset (JMP) and your final project report.
· The report should be 5-10 pages in length – not too picky about length as long as you’ve done a thorough analysis and described your project. There should be nothing missing!
· Include relevant output/graphs in body of paper, but extra output in appendix. (The report should not be ALL output and figures. You need to write formal paragraphs explaining your thought process and analysis results. The reader should not need to know anything about JMP output to understand the conclusions.)
· Add appendix and source of data.

“Analysis of District Level Standardized Test Performance in Pennsylvania Public Schools”
Introduction and Objectives
A large part of the discourse su
ounding the 2014 Pennsylvania gubernatorial election campaign focused on the educational spending policy of past and future administrations. Inherent in that path of discussion is the idea the volume of money spent on education expenses is an influence on performance of the state’s educational institutions. Further, it implies that it is the most important influence on educational performance. However, a measured analysis of what factors drive good school performance was absent from the forefront of this discussion.
A deep field of research exists on what decisions and school characteristics do or do not make for better academic performance, including some analysis of Pennsylvania-specific data. Past studies have concluded that a student’s socioeconomic status has a strong influence on their academic performance (Dahl & Lochner, XXXXXXXXXXOthers suggest that financial incentives for teachers do not improve student performance (Fryer, 2011) or that smaller class size for lower grade levels can improve test scores (Betts, Zau & Rice, XXXXXXXXXXThis project will examine selected characteristics of Pennsylvania school districts to determine their effect on standardized test scores.
    All data for this study is from publicly available sources. PSSA scores, average daily membership, student ethnicity data, teacher compensation, attendance rate, district staffing levels, market value and district expense amounts for the school year 2012 were taken from the Pennsylvania Department of Education website. Adult education level, ma
iage rate and median family income by school district are from the National Center for Education Statistics (NCES). Where available, the most recent available school year’s measurements were used. PSSA scores are for Math, Reading and Science for 11th and 4th grades from the XXXXXXXXXXschool year. District staffing and expense information were from the XXXXXXXXXXschool year and due to a change in state reporting methods, the most recent attendance information available was from XXXXXXXXXXOnly public schools were included in this analysis because consistent reporting was not available for many charter schools. Three school districts were excluded because they did not have scores available for all of the subjects and grade levels that were analyzed.
    Some factors in the analysis were calculated based on a combination of data fields. Any factor that references a rate based on number of students is based on the average daily membership number for the district. Proficiency rates were calculated by combining the proficient and advanced scoring groups in the PSSA data. The values for average years of adult education were calculated based on groupings in NCES data.
Analysis and Methodology
Using JMP, stepwise multiple linear regression was performed on the data set for each subject (math, reading, science) and grade level (11th, 4th). Independent variables were removed from the model if they did not meet at least a 95% significance level (p-value of XXXXXXXXXXSee Appendix A for detailed output for each model.
Each model was check for the assumptions of multiple linear regression. See Apendix B for selected assumptions testing output. Residuals were tested for normality using the Shapiro-Wilk goodness of fit test. All sets of residuals failed this test and were shown to not be normally distributed. This failure might lead to attempts at non-parametric regression or other analysis methods, but that technique is outside the scope of techniques available for this project. It can be argued that the assumption of normal residuals is not always a critical assumption of multiple linear regression when evaluating the relationship between the independent variable and different factors. “Technically, the normal distribution assumption is not necessary if you are willing to assume the model equation is co
ect and your only goal is to estimate its coefficients and generate predictions in such a way as to minimize mean squared e
or.” (Nau, XXXXXXXXXXAnd in a 2013 article, Williams, Grajales & Kurkiewicz reiterate this point in saying that “the assumption of normally distributed e
ors is not required for multiple regression to provide regression coefficients that are unbiased and consistent, presuming that other assumptions are met. Further, as the sample size grows larger, inferences about coefficients will usually become more and more trustworthy.”
Residual by predicted plots were reviewed to show fairly constant variance across all values in all instance with no clear increasing or decreasing trend. To rule out the possibility of non-constant variance, a natural log transformation of the test scores was done and the models re-run. Virtually no difference in the residual by predicted plot shape was seen. Scatter plots for each combination of independent variable and dependent variable were reviewed to verify the presence of something resembling a linear relationship. Most scatterplots reveal a general linear relationship. Some were rather scattered or had a slight bend to them. Improving the linear relationships between the independent and dependent variables with transformations might be a way to improve the accuracy of these models, but was left out of scope for the purposes of this project.
Data points were reviewed for outliers. High leverage points were defined as data points with a hat matrix value that exceeded 2k/n, where k is the number of terms in the model and n is the sample size. In each model, a number of high leverage points were found, but none could be demonstrated to be inaccurate or not representative of the population measured, so they remained as part of the populated analyzed.
Variance inflation factors (VIF) for each of the six models was also reviewed to test for possible multicollinearity between predictor variables. A maximum VIF threshold of 5.0 was used. The only instance of a high VIF factor in all of the models was the “percentage of students from low income families” variable in the 4th grade math model. That variable has a co
elation of 0.79 with median family income because they are both measures of income in the school district. The percentage of students from low income families is a more precise measurement of student socioeconomic status because it measures only the subset of the school district population that has students in school whereas the median family income includes families that do not have school age children. For that reason, the median family income factor was deleted from the model thus reducing the VIF for “percentage of students from low income families” below the threshold.
A chart of the six models that came from this analysis and their applicable significant factors is below.
The core model equations are as follows:
% of students proficient or better in 11th Grade Math
% of students proficient or better in 4th Grade Math
% of students proficient or better in 11th Grade Reading
% of students proficient or better in 4th Grade Reading
% of students proficient or better in 11th Grade Science
% of students proficient or better in 4th Grade Science
In addition we did for some exploratory analysis to see if a more accurate model could be derived by deleting outliers from the data set. These models proved to be have better fits, but carving out a large portion of data without any evidence of inaccurate collection methods or an underlying reason for why they are outliers does not make sense in social science applications. The presence of so many outliers is an indication of the need for additional data collection and research in to the dynamics of the situation.
Model Adequacy, Shortcomings, Additional Collection and Analysis
    The adjusted R-squared values for our models ranged from 0.51 to 0.68, so there is room for additional data collection to explain more of the variation in each response variable.
In addition, ethnic and racial influence on learning has been shown to not be a result of a group’s capacity to learn but instead they are a result of a combination of cultural and economic factors. Collecting more descriptive data for students in each district related to those factors could improve the model. This model does not consider qualitative classroom decisions as a reason for variation in testing outcomes. Data for each district related to teaching methods used, class scheduling and other administrative decisions could be collected an added to the analysis. Additionally, a data research request could be submitted to the Pennsylvania Department of Education to gain access to more granular data including de-identified teacher and student data. Another opportunity for further study is to examine how test scores have changed in relation to the changes in each of the predictor factors over time. That type of analysis might allow for some explanation of how both funding and district administrative decisions affect testing performance.
Additional analysis that could be conducted would be to predict the scores for the XXXXXXXXXXschool year using the models and measure the accuracy. However, data for that school year was not available. Beginning in 2012 Pennsylvania began a slow multi-year transition from the longstanding PSSA tests to Keystone exams that have different standards that align with federal Common Core Standards. PSSA scores for all districts at a granular detail in a single data file stopped being made available in 2012 after seventeen years of consistent public reporting. This change in transparency in an election year amidst declining aggregate state scores drew criticism from both state legislatures and academic policy experts.
    Despite being the focus of recent election debates, district spending levels in Pennsylvania were not identified as a significant factor in the variation of PSSA test scores in any subject for grades four and eleven. Percentage of non-white students and percentage of students from a low income family were the two school district characteristics that tested as significant for all subjects and grade levels analyzed. One way to look at this analysis is as a starting point for policy initiatives that would improve test results in low performing schools. Among the variety of effective tactics that research has shown to raise the performance of students from low income and minority backgrounds aree introducing culturally relevant learning material, increased parental support (Museus et al., 2011), and nutrition programs (Holler et al., 2011 and Raush, 2013).
G. Dahl & L. Lochner, 2005. The Impact of Family Income on Child Achievement, Institute for Research on Poverty Discussion Paper, http:
EVIDENCE FROM NEW YORK CITY PUBLIC SCHOOLS, National Bureau of Economic Research Working Papers, http:
J. Betts, A. Zau & L. Rice, 2003. Determinants of Student Achievement: New Evidence from San Diego, The Public Policy Institute of California, http: XXXXXXXXXXOWI.pdf
Robert Nau, 2014, Regression Diagnostics: Testing the Assumptions of Linear Regression, http:
Williams, Grajales & Kurkiewicz, Sept XXXXXXXXXXAssumptions of Multiple Regression: Co
ecting Two Misconceptions, Practical Research Assessment & Evaluation, Volume 18, Num.ber 11
Dale Mezzacappa, Kevin McCo
y, and Paul Socolar, 2014 October 30. Election near, but still no 2014 Pa. test scores; 2013 results showed a downward trend, http:
Samuel Museus, Robert T. Palmer, Ryan J. Davis & Dina C. Maramba, 2011. Racial and ethnic minority students' success in STEM education, Hoboken: New Jersey: Jossey-Bass, http:
Danielle Hollar, Michelle Lombardo, Ga
iella Lopez-Mitnik, Theodore L. Hollar, Marie Almon, Arthur S. Agatston, Sarah E. Messiah, May 2010. “Effective Multi-level, Multi-sector, School-based Obesity Prevention Programming Improves Weight, Blood Pressure, and Academic Performance, Especially among Low-Income, Minority Children, Journal of Health Care for the Poor and Underserved
Volume 21, Number 2, pp XXXXXXXXXX, http:
Rita Rausch, 2013, Nutrition and Academic Performance in School-Age Children The Relation to Obesity and Food Insufficiency. Journal of Nutrition and Food Sciences Volume 3, page 190
Appendix A – Model Results
JMP Data table file (with scripts for core model) is attached
A.1: core model results
A.2 - Analysis models created by deleting high leverage data points
The high leverage value is (2k/n) = XXXXXXXXXX
    K: means 6 independent variables
Answered 2 days After Mar 02, 2023


Subhi answered on Mar 05 2023
33 Votes
The college life of young adults are proven to be most stressful periods as they effect directly on their health, present and future finance-related situations and struggling to build their own lives and career (Lampman et al, 2007) which too adversely effects with their mental or psychological health. Life satisfaction is a subject of wellbeing and also termed as a measure of happiness. This is also an adjustment indicators affected by the lives and perceiveness of the students.
Previous literatures showed that the reason of stress among college students arises from issues of academic workload, future plans, financial and family matters as well as relationship with opposite sex (Darling, Howard et al, 2007; Chao, 2012). Although, the previous researches presented that stress among students also contribute towards positive indicators such as self-esteem and hardworking nature contributing towards their mental health (Tsitsas et al, 2019). Also, a study done by Altson and Dudley concluded that older persons are much stressed about their lives than younger ones, resulted into an opposite relationship between age and various indices examining life satisfaction. However, gender is not a consistent predictor of live satisfaction, because both males and females have different perceptions and preferences towards life and several things are not in common. In some cases, women found to be happier than men and vice versa in some cases too. Therefore, it is not taken as a hypothesized variable.
The objective of this project is to explore the life satisfaction among the college students and to examine their association with reference to workload and positive stress mindset.
Hypothesis to be tested:
Based on the previous research on the given area, three hypothesized relationships have framed and attempted to test in this study. The hypothesis are:
Hypothesis 1: there is an inverse relationship between age and life satisfaction.
Hypothesis 2: there is an inverse relationship between the work load and life satisfaction.
Hypothesis 3: there is a direct relationship between the positive stress mindset and the life satisfaction.
Data used:
The data for this project is taken from a sample data for analysis practise based on 65 variables and 582 observations to explore the life satisfaction among the college students and to examine their association with reference to workload and positive stress mindset and three hypothesis was framed to be test further by analysis
The participants in the study consists of 579 students consists of 246 (42.5%) males and 333 (57.5%) females with a mean age of 22.2 years (SD= 2.5; range= 17 to 42). Most of the students belongs to age group 21 or 22 years. With respect to relationship status, 304 (52.2%) were single, 264 (45.4%) were in a committed relationship and only 12 (2.1%) were ma
The life satisfaction scale is a self-reported measure of overall satisfaction, consist of 5 items and based on a 7 point-likert scale lies from strongly disagree to strongly agree ( 1 to 7) where 1 indicates strongly disagree and indicates lower life satisfaction towards the response of the questions and 7 indicates greater life satisfaction. The internal consistency reliability (Cronbach’s alpha value) is of this scale was 0.84.
Perceived Positive Stress Mindset is a four items measure that assess the degree to one’s perceived aspects about the stress mindset in a positive manner. This is based on 5 point likert scale ranges from strongly disagree to strongly agree (1 to 5). The internal consistency reliability (Cronbach’s alpha value) is of this scale was 0.78. The items are: “Experiencing stress facilities learning and growth?”; “Experiencing stress inhibits learning and growths?”; “Experiencing stress improves health and vitality?”; and, “the effects of stress are positive and should be utilized?”.
Work-load is a measure of 3 items indicates responses of students it they having a say in: “Too much work for one person to do?”; Never have enough time to get everything done?”, and “ amount of work expected to do is too great?”. The internal consistency reliability (Cronbach’s alpha value) is of this scale was 0.79 and also based in 5 point likert scale ranges from strongly disagree to strongly agree (1 to 5) where 1 low workload and 5 indicates a high workload.
A total life satisfaction scores ranging from 5 to 35 was calculated by combining all the five variables where low scores indicated low satisfaction and higher score indicated greater satisfaction. Similarly, scores of positive stress mind set were measured with scores ranging from 4 to 20 and scores of workload measures varies from 3 to 15.
Table 1 represents the mean score of life satisfaction along with standard deviation to compare the satisfaction means scores according to student’s status, major subjects, gender and relationship. The results shows that full time students are more satisfied with their life in comparison to part time students. Life satisfaction scores are more among students whose main subjects are finance, management, human resources and marketing but most satisfied in those students who are studying double major. Males and those are in a committed relationship are more than satisfied towards their lives.
elation was done in Table 2 to examine the association between the age and life satisfaction to test the first hypothesis. Bivariate co
elation was done to show the relation trend and Pearson co
elation value has presented. Pearson co
elation value showed that there is an inverse relationship between the age...

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here