Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

1. Find a subject area of interest where you can gather sample data. (Note: Make sure the "population" for your area of interest is large enough that it is reasonable to gather data from a "sample"...

1 answer below »
1. Find a subject area of interest where you can gather sample data. (Note: Make sure the "population" for your area of interest is large enough that it is reasonable to gather data from a "sample" rather than doing a complete census.) o The project should not involve research on human subjects because such research requires approval of EIU's Institutional Review Board (IRB) and there is not enough time to obtain such approval before the end of the semester. However, there are still a lot of possibilities for projects where the data is about humans as long as the data is already publicly available. An example is in the area of sports where large quantities of data are generally available, both about physical characteristics of the athletes and about their performance. Doing a project involving such publicly available data would not require IRB approval. 2. Identify a few variables for which you can collect sample data. 3. Include both categorical and quantitative data. 4. Include some variables you believe are interrelated. 5. Develop descriptive statistics (graphical and numerical) to describe the sample data you collected (including correlation and other measures for interrelated variables). 6. Make inferences about the broader population from which your sample data was drawn: o Confidence intervals about population parameters. o Hypothesis tests about population parameters. o Regression between interrelated variables. These optional projects will be assessed based on the report you write including your data analysis. You should submit both a written report (in Microsoft Word .doc or .docx format, or .rtf or .pdf format) and one or more Microsoft Excel files. The written report should describe your subject area, your questions of interest, any hypotheses you made, how you defined your variables, how you collected your data, summaries of the data (including interrelationships where appropriate), how you analyzed the data, and summaries of your analyses and results (including inferences about the broader population parameters). The Microsoft Excel files should contain your sample data, your descriptive statistics (including their calculation), and your analysis of your sample data (including inferences about the population).
Answered Same Day Dec 02, 2021

Solution

Rajeswari answered on Dec 07 2021
149 Votes
48414 Assignment
The subject of interest was the market prices of houses in Sydney of late. I took interest to study about the real estate and hence how the market prices of built houses is influenced by some factors.
Here I have taken a sample of 400 prices of houses in cities around Sydney. I collected these data from internet sources and grouped according to my convenience. The variables I included were Market price, Sydney price index, total number of square metres and age of the house.
According to common sense we know that a house price is determined by its location, access to nea
y schools, markets. Age of the house is inversely associated with price in the sense when age of the house increases the price decreases.
So we can make age of the house as qualitative variable with 0 – 5 years very new, with value attached as 5, >5 – 10 years as new with value attached as 4, >10-15 years with value 3, >15-20 with value 2, >20 -30 with value 1 and above 30 years with value 0.5. This is segregating the house according to age categorically and assigning values.
Now mean, median, mode and other descriptive values are found out using the excel tools.
We got
    Â 
    Market Price ($000)
    Â 
    Sydney price Index
    Â 
    Total number of square meters
    Â 
    Age of house (years)
    Â 
    
    
    
    
    
    
    
    
    
    200.5
    Mean
    777.0375
    Mean
    106.0568
    Mean
    209.4885
    Mean
    17.755
    5.780715
    Standard E
o
    3.969241
    Standard E
o
    1.480512
    Standard E
o
    2.267691
    Standard E
o
    0.574074
    200.5
    Median
    777.5
    Median
    105.9
    Median
    211
    Median
    16.5
    #N/A
    Mode
    748
    Mode
    120.9
    Mode
    234.4
    Mode
    12
    115.6143
    Standard Deviation
    79.38483
    Standard Deviation
    29.61024
    Standard Deviation
    45.35382
    Standard Deviation
    11.48149
    13366.67
    Sample Variance
    6301.951
    Sample Variance
    876.7663
    Sample Variance
    2056.969
    Sample Variance
    131.8245
    -1.2
    Kurtosis
    -0.03229
    Kurtosis
    0.001878
    Kurtosis
    0.163011
    Kurtosis
    -0.2305
    2.24E-17
    Skewness
    -0.1442
    Skewness
    0.050154
    Skewness
    -0.00848
    Skewness
    0.553923
    399
    Range
    430
    Range
    177.3
    Range
    263.5
    Range
    54
    1
    Minimum
    541
    Minimum
    19.8
    Minimum
    84.7
    Minimum
    0
    400
    Maximum
    971
    Maximum
    197.1
    Maximum
    348.2
    Maximum
    54
    80200
    Sum
    310815
    Sum
    42422.7
    Sum
    83795.4
    Sum
    7102
    400
    Count
    400
    Count
    400
    Count
    400
    Count
    400
    11.36447
    Confidence Level(95.0%)
    7.80324
    Confidence Level(95.0%)
    2.910579
    Confidence Level(95.0%)
    4.458115
    Confidence Level(95.0%)
    1.128588
We find that maximum age of years is 54 and minimum is 0. We made it as categorical variable with very new, new….old, very old etc and assigned weights assuming that old houses have low prices.
The mean represents the average value of the variable, Median the middle value and mode the highest repeated value.
Range is the value giving the difference between maximum and minimum.
Variance is a measure of dispersion showing how the values are deviated from the mean. The higher the variance the more variable is the data.
The standard deviation is the square root of variance.
Confidence intervals here represents the margin of e
or.
Margin of e
or for 95%= 1.96 * std e
or
Std e
or = std dev/sqrt n
The interpretation of margin of e
or is for large samples randomly drawn we expect the average to be within this value from either side of mean with 95% confidence.
Once we completed these descriptive statistics, we find that mean, median are almost equal thus representing the symmetric distribution of data.
Now let us check co
elation i.e. measure of association between the variables. Here price of house is dependent variable while Sydney price index, area of house in sqm and age are independent variables.
i) Price vs Sydney price index
We find regression equation as Price = 0.02 (Sydney price index)+90.51322
i.e. slope is very low showing low impact of this variable on price. Also co
elation coefficient is 0.05 nearly 0 showing very weak association.
Hypothesis test for slope =0 is done and results show a p value of 0.28 which makes us accept that slope is almost 0.
i.e. Sydnes price index does not influence price so we can leave this as independent variable. The scatter plot also confirms that there cannot be a linear relation between these two.
ii) Price Vs square metres.
Here the scatter plot shows a trend of a line very perfectly. Let us confirm this using statistical tools
    SUMMARY OUTPUT
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    Regression Statistics
    
    
    
    
    
    
    
    Multiple R
    0.996855
    
    
    
    
    
    
    
    R Square
    0.993719
    
    
    
    
    
    
    
    Adjusted R Square
    0.993703
    
    
    
    
    
    
    
    Standard E
o
    6.299283
    
    
    
    
    
    
    
    Observations
    400
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    ANOVA
    
    
    
    
    
    
    
    
    Â 
    df
    SS
    MS
    F
    Significance F
    
    
    
    Regression
    1
    2498685
    2498685
    62969.38
    0
    
    
    
    Residual
    398
    15793.02
    39.68096
    
    
    
    
    
    Total
    399
    2514478
    Â 
    Â 
    Â 
    
    
    
    
    
    
    
    
    
    
    
    
    Â 
    Coefficients
    Standard E
o
    t Stat
    P-value
    Lower 95%
    Upper 95%
    Lower 95.0%
    Upper 95.0%
    Intercept
    411.5137
    1.490299
    276.1284
    0
    408.5838607
    414.4435
    408.5839
    414.4435
    Total number of square meters
    1.744839
    0.006953
    250.937
    0
    1.731169666
    1.758509
    1.73117
    1.758509
The r is 0.99 almost near to 1. Hence perfect linear positive relationship
Slope is 1.7448 and intercept is 411.5137
i.e. Price of house = 1.7448(area in square metres)+411.5137
The hypothesis test for slope =0 shows a p value =0 stating that the slope is almost 100% perfect.
Thus price of house is predicted from area in square metres with almost 100% accuracy.
iii) Price vs age of house (valued)
This is piecewise function so we cannot find out easily. Let us do statistical tests.
But apparently for a given age this seems to be straight line.
    SUMMARY OUTPUT
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    Regression Statistics
    
    
    
    
    
    
    
    Multiple R
    0.161248
    
    
    
    
    
    
    
    R Square
    0.026001
    
    
    
    
    
    
    
    Adjusted R Square
    0.023554
    
    
    
    
    
    
    
    Standard...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here