It is evident that the price of the houses is correlated with the number of bathrooms in the house, age of the house and the lot area of the house. The correlation between the house area and the other variables is not statistically significant. However, the correlation between lot area and number of bathrooms, price and house area are significant. This implies that potential multicollinearity may exist between these sets of variables. However, the final check of multicollinearity is done via VIF (Variance Inflation Factor). The working rule is that VIF should be less than 5 or the tolerance (=1/VIF) should be more than 0.2. Table 9 shows the VIF and the tolerance of the variables and it is evident that the model is free from multicollinearity.The dummy variables d98004, d98006, d98040 and d98166 correspond to the zip codes 98004, 98006, 98040 and 98166. The zip code 98125 is chosen as the base category because it has the most number of observations. The final model developed is shown in table 8. The dependent variable in the model developed is the log of the price of the properties. The variables corresponding to number of bedrooms, bathrooms and the age of the house are very small in magnitude as compared to the price, house area and lot area.
The difference in scale often leads to heteroskedasticity. Therefore, the variables high in magnitude are used in their logarithmic form while the other variables are left as they are. The use of logarithmic scale also ensures that there are no outliers in the model. To ensure that the final model is free from heteroskedasticity, hettest is used whose result is shown in table 9. The estimation of the model is done by the OLS technique. The coefficient of number of bathrooms is significant at 1% level of significance. The coefficient is 0.26 which implies that for every unit increase in the number of bathrooms in the house, the log of price is expected to go up by 0.26. The coefficient dummy variable corresponding to zip code 98004 has a value of 0.87 and is significant at 1% level of significance. This implies that compared to the base area, the log of price of the houses are 0.87 times higher in the zip code 98004. Similarly, the coefficient of dummy variable for zip code 98040 is significant at 1% level of significance and the coefficient is 0.5. This implies that that compared to the base area the log of prices are 50% higher in the zip code 98040.