OLS Regression Results
==============================================================================
Dep. Variable: price R-squared: 0.242
Model: OLS Adj. R-squared: 0.242
Method: Least Squares F-statistic: 1623.
Date: Sun, 16 Feb 2025 Prob (F-statistic): 0.00
Time: 09:06:45 Log-Likelihood: -1.4998e+05
No. Observations: 10154 AIC: 3.000e+05
Df Residuals: 10151 BIC: 3.000e+05
Df Model: 2
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -2.26e+04 1.92e+04 -1.177 0.239 -6.02e+04 1.5e+04
bed -2.675e+04 7072.066 -3.783 0.000 -4.06e+04 -1.29e+04
house_size 391.1845 8.311 47.067 0.000 374.893 407.476
==============================================================================
Omnibus: 12672.728 Durbin-Watson: 1.202
Prob(Omnibus): 0.000 Jarque-Bera (JB): 3902146.529
Skew: 6.543 Prob(JB): 0.00
Kurtosis: 98.141 Cond. No. 6.85e+03
==============================================================================
For every added square foot
, the price of the house goes up by 391.1845
dollars
import pandas as pd
import numpy as np
import statsmodels.api as sm
def main():
data = pd.read_csv("data/clean_house_prices.csv")
X = data[["bed", "house_size", "zip_code_99350"]]
y = data["price"]
result = sm.OLS(y, sm.add_constant(X)).fit()
print(result.summary())
data_dict = {"bed": [2, 3, 2],
"house_size": [1400, 2300, 1400],
"zip_code_99350": [0, 0, 1] }
df_dict = pd.DataFrame(data_dict)
new_data = sm.add_constant(df_dict[["bed", "house_size", "zip_code_99350"]])
predictions = result.predict(new_data)
print(predictions)
main()