Download Tagliamonte’s that expresion data and inspect it
Phenomenom:
Question: What linguistics and social factor affect the expression of that
data_wrangling.py
data/that-expression.csv
and write data/clean-that-expression.csv
Download clean-that-expression.csv
import statsmodels.api as sm
import pandas as pd
def main():
# loading the training dataset
data = pd.read_csv("data/clean-that-expression.csv")
# defining the dependent and independent variables
X = data[["know"]]
y = data["expressed"]
# building the model and fitting the data
log_reg = sm.Logit(y, sm.add_constant(X)).fit()
print(log_reg.summary())
main()