Linear Regression Model For Housing Market Price
Scatter plots for visual analysis
Understanding of the factors that affect housing prices and robust prediction models serve to be essential for those looking to buy houses but also for real economy as seen from various studies (Favara & Imbs, 2015). The study underlines and investigates certain factors which impact housing price.
The study is based on housing data from Australia in the period 2002 to 2017. The data is annual and so the sample size used is 15. The dependent variable for this study is housing prices in thousands AUD. Housing price index is taken as a predictor of price. This is a natural choice since rice indices serve to give basis of price movements. The index for Sydney is chosen owing to the fact that Sydney is the capital of Australia and a key market for real estate (Wu, Deng, & Liu, 2014). The study also includes annual percentage change in price as another independent predictor. Any increase or decrease in percentage change is expected to indicate whether price will rise or fall (Adelino, Schoar & Severino, 2015). A key parameter determining price is contemplate to be that of area of land of the property. A larger area is hypothesized and investigated by studies to have higher price (Xiao, Orford, & Webster, 2016). So area of land in squared meters was taken as independent variable. Again studies have identified age of the property in years as another independent variable (Xu et al., 2018). It is seen that people usually opt for older property to reduce the expense where they can just renovate an already existing property bypassing the cost of building the structure from scratch (Knoll, Schularick & Steger, 2017).
The linear relationship between housing price of a typical real estate property and the housing price index specific to Sydney in the period 2002 to 2017 was explored by use of the scatter diagram of housing price against price index of Sydney. The following figure shows the same and a moderate-strong positive relationship is indicated from the graph. The correlation coefficient was observed to be 0.805.
Figure 1: Scatter plot of Sydney Price Index against Market Price
The change in percentage annual change of housing price in Sydney in the period 2002 to 2017 with respect to the annual overall Market Price (in thousands) is depicted in the following figure. The Market price is seen to increase with an increase in annual percentage change suggesting a positive relationship between the two. The correlation coefficient was found to be 0.405.
Full model and interpretation
Figure 2: Scatter plot of Annual percentage change against Market Price
The linear relationship between the variables Market price (in thousand) and the total are of land of the property measured in squared meters was found to be such that the Market price increases slightly with increase in the area of land. The following figure shows the same and hence a mild positive correlation is suggested to exist between the two variables. The coefficient of correlation (r2) was found to be equal to 0.313.
Figure 3: Scatter plot of total area of real estate (in sq. m) against Market Price
The Market price of housing was seen to decrease with the increase in the age of the house or real estate measured in years. This following figure shows this relationship between the two variables and vice versa. Therefore a negative relationship was perceived to exist between age of the estate and the market price. The correlation coefficient was also found to be equal to -0.6779.
Figure 4: Scatter plot of total area of Age of House (in years) against Market Price
The linear model using these predictor variables was fitted using the ordinary least squares method and the fitted regression equation was estimated as follows:
Market Price = 548.978 + 1.963 Sydney Price Index – 5.622 Annual % Change + 0.519 Total Number of Square Meters– 2.488 Age of house (in years)
The market price is seen to be equal to 548.978 thousand AUD in the case of absence of all the predictors. The Market price increased by 1.963 thousand units when Sydney price increases by a single unit, the market price falls by 5.622 thousand units when annual percentage price change increases by a single unit, the market price is seen to increase by 0.159 thousand units when total number of squared meter area of estate increases by one unit and the market price decreases by 2.488 thousand units when the age of the estate or house in years increases by a single unit (Pfister, Schwarz, Carson & Jancyzk, 2013).
The following table gives the estimated coefficients as shown in the fitted regression equation above for the predictors of the linear model, their standard errors, and the value of the computed t-statistic for test of significance, the p-value for test of significance of the coefficients and the 95 percent confidence interval for the estimated coefficients (Abbott, 2014).
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
548.9781 |
81.13154 |
6.766519 |
4.94E-05 |
368.2058 |
729.7504 |
Sydney price Index |
1.963494 |
0.583205 |
3.366727 |
0.007161 |
0.664031 |
3.262957 |
Annual % change |
-5.6222 |
3.240109 |
-1.73519 |
0.113362 |
-12.8416 |
1.597209 |
Total number of square meters |
0.519146 |
0.323909 |
1.602752 |
0.140071 |
-0.20257 |
1.240859 |
Age of house (years) |
-2.48787 |
1.129751 |
-2.20214 |
0.052252 |
-5.00511 |
0.029376 |
Table 1: Regression Model Summary (Full Model)
Regression coefficients and their significance
The variable Sydney price index has the coefficient point estimate as 1.96 and the 95 percent interval estimate was found to be between 0.664 and 3.263. This means that there is 95 percent chance than the interval contains the actual regression model coefficient of Sydney price index. The p-value was found to be 0.007 which is less than 0.05 and thus the predictor is significantly different from zero at 5% level of significance change (Bluman, 2015). Similarly, the variable annual percentage change on the other hand has the coefficient point estimate as -5.622 and the 95 percent confidence interval was found to be between -12.8416 and 1.597209. This means that there is 95 percent chance than the interval contains the actual regression model coefficient of annual percentage The p-value was found to be 0.113 which is greater than 0.05 and thus the predictor was not found to be significantly different from zero at 5% level of significance. Again, the point estimate of the predictor total number of square predictors was found to be 0.519 and the 95 percent confidence interval was computed to be -0.20257 to 1.240859, which has a 95 percent chance of belonging containing the actual value and the p-value was found to be 0.14 which being greater than 0.05, the significance test for deviation from zero was rejected at 5% level of significance. Finally, the predictor, age of house (in years) was found to have the point estimate -2.487 and the 95 percent confidence interval was found to be -5.011 and 0.02, which has the probability 0.95 to contain the actual value of the coefficient. The p-value for test for deviation from zero was found to be 0.052 which is marginally greater than 0.05 and hence the significance test failed to reject the conjecture that the coefficient is zero at 5% level of significance (Siegel, 2016).
Regression Statistics |
|
Multiple R |
0.889165 |
R Square |
0.790614 |
Adjusted R Square |
0.70686 |
Standard Error |
43.88783 |
Observations |
15 |
Table 2: Regression Statistics
The measure of goodness of fit of the linear regression model as given by the coefficient of determination of R2 was found to be 0.79. This is the ratio between the variation explained by the model and the total variance of the dependent variable (Draper & Smith, 2014). The coefficient of variation says that model explains 79 percent of the total variation in the response variable, housing market price.
The ANOVA test shows that p-value is less than 0.05 and so the model is significant at 5% level.
ANOVA |
|||||
df |
SS |
MS |
F |
Significance F |
|
Regression |
4 |
72728.59 |
18182.15 |
9.439675 |
0.001993 |
Residual |
10 |
19261.41 |
1926.141 |
||
Total |
14 |
91990 |
Coefficient of determination
Table 3: ANOVA Test for Model Significance
Now, considering the linear model with predictor Total number of square meter area of estate and response Market price in thousand to investigate the linear relation between just these two, the following least square regression equation was determined.
Market Price = 659.143 + 0.563 Total Number of Square Meters
The equation shows that there is a positive linear relationship between the predictor and the response. The Market price increases by 0.563 thousand units with unit increase in total number of square meter area. The following table shows the summary output of the regression fit using least squares method.
Coefficients |
Standard Error |
t Stat |
P-value |
Lower 95% |
Upper 95% |
|
Intercept |
659.143 |
101.2221 |
6.51185 |
1.97E-05 |
440.466 |
877.8201 |
Total number of square meters |
0.563603 |
0.473897 |
1.189294 |
0.255593 |
-0.46019 |
1.587396 |
Table 3: Model 2 Summary
The coefficient has the interval estimate given by the 95 percent confidence interval, as 440.466 to 877.8201 and the p-value was found to be greater than 0.05 which implies that the test for deviation from zero for the coefficient is not significant at 5% level of significance.
The following table shows the regression statistics for the model, which shows the coefficient of determination which is a measure of goodness of fit to be equal to 0.098. This means that the model explains only about 9.8 percent of the variation in the Market price (in thousands).
Regression Statistics |
|
Multiple R |
0.31325 |
R Square |
0.098125 |
Adjusted R Square |
0.02875 |
Standard Error |
79.88619 |
Observations |
15 |
Table 4: Model 2 Regression Statistics
The following table gives the results of the ANOVA test for significance of the overall linear model. It is seen that the p-value is greater than 0.05 and hence not significant at 5% level.
df |
SS |
MS |
F |
Significance F |
|
Regression |
1 |
9026.557 |
9026.557 |
1.414421 |
0.255593 |
Residual |
13 |
82963.44 |
6381.803 |
||
Total |
14 |
91990 |
Table 5: ANOVA Test for Model Significance for Model 2
The predicted value of market price based on the second model when the total number of squared meter area of estate is 400 meter square is given by plugging it in the predicted equation (Salkind, 2016):
Predicted Market price = 659.143 + 400 × 0.563603 = 884.584 (in 000) AUD
The study analyzed the linear relationship between the independent variables, Age of household, House Price Index of Sydney, percentage annual change in house price and total area of real estate in square meters with the dependent variable Market price measured in thousand AUD, the linear model as shown in the following diagram was studied and its validity was investigated.
Figure 5: Conceptual Model
The linear model explaining housing price using the predictors, age of house, area of estate in squared meters, housing price index of Sydney and percentage annual change was found to be significant as apparent from the results shown in table 3. The variables found to have a significant effect on the monthly price is the Sydney price index as shown in table 1. The variable age of house although had a moderately strong and negative correlation with the price was not found to be statistically significant in the model (Rumsey, 2015).
Comparison between two regression models
Contemplating on the impact that total number of square meter area may have on the market price a second linear regression model was constructed with only the area variable as predictor and the market price. Comparing the coefficient of determination for the first and second model it can be said that the first model explains more percentage of the response variable and so the first model is considered to be better than the second one with only the insignificant predictor, total number of meter squared estate area (Moorad & Wade, 2013). The market price for housing was then predicted using the second model for the area of the estate being 400 m2 and found to be 844.584 AUD (in 000).
However, the ANOVA test for the significance of the model in table 5 showed that the second model is not significant since the p-value was found to be greater than 0.05 (Anderson et al., 2016). Thus the variable total area of estate although was seen to have a moderate degree of positive association with the market price, but was not found to be significant for the data considered in this case at least.
Conclusion
The study concludes that Sydney’s house price index is a significant factor which influences the market price of house in Australia. The area of the estate measure in number of square meters was found to have a positive association with the market price, however not enough evidence could be found to say that it has a significant impact on the market prices. Similarly, the age of the estate has been found to have a negative association with the market price of housing however there is a lack of evidence to assert that it has any major impact on the market prices. The same can be said about the percent annual change in housing price which has a positive association but not enough evidence to assert that it has an impact in determining market price. Even so the second model predicted the market price to be 844.584 AUD (in 000) when the area of estate is 400 m2.
Reference
Abbott, M. L. (2014). Understanding educational statistics using Microsoft Excel and SPSS. John Wiley & Sons.
Adelino, M., Schoar, A., & Severino, F. (2015). House prices, collateral, and self-employment. Journal of Financial Economics, 117(2), 288-306.
Anderson, D. R., Sweeney, D. J., Williams, T. A., Camm, J. D., & Cochran, J. J. (2016). Statistics for business & economics. Nelson Education.
Bluman, A. G. (2015). Elementary Statistics: A Step by Step Approach: a Brief Version. McGraw-Hill Education.
Draper, N. R., & Smith, H. (2014). Applied regression analysis(Vol. 326). John Wiley & Sons.
Favara, G., & Imbs, J. (2015). Credit supply and the price of housing. American Economic Review, 105(3), 958-92.
Ferrero, A. (2015). House price booms, current account deficits, and low interest rates. Journal of Money, Credit and Banking, 47(S1), 261-293.
Knoll, K., Schularick, M., & Steger, T. (2017). No price like home: Global house prices, 1870-2012. American Economic Review, 107(2), 331-53.
Moorad, J. A., & Wade, M. J. (2013). Selection gradients, the opportunity for selection, and the coefficient of determination. The American Naturalist, 181(3), 291-300.
Pfister, R., Schwarz, K., Carson, R., & Jancyzk, M. (2013). Easy methods for extracting individual regression slopes: Comparing SPSS, R, and Excel. Tutorials in Quantitative Methods for Psychology, 9(2), 72-78.
Rumsey, D. J. (2015). U Can: statistics for dummies. John Wiley & Sons.
Salkind, N. J. (2016). Statistics for people who (think they) hate statistics. Sage Publications.
Siegel, A. (2016). Practical business statistics. Academic Press.
Wu, J., Deng, Y., & Liu, H. (2014). House price index construction in the nascent housing market: the case of China. The Journal of Real Estate Finance and Economics, 48(3), 522-545.
Xiao, Y., Orford, S., & Webster, C. J. (2016). Urban configuration, accessibility, and property prices: A case study of Cardiff, Wales. Environment and Planning B: Planning and Design, 43(1), 108-129.
Xu, Y., Zhang, Q., Zheng, S., & Zhu, G. (2018). House Age, Price and Rent: Implications from Land-Structure Decomposition. The Journal of Real Estate Finance and Economics, 56(2), 303-324.