Numpy

Package and Data

import numpy as np

Recode Data

bank['sex'] = np.where(bank['sex'] == "Female", 0, 1)
bank.head()

Transform Data

bank['log_salary'] = np.log(bank['salary'])

Transform Data

When do we log-transform numeric variables?

  • Normalizing skewed distributions: Many real-world variables (like income, population, prices) tend to have right-skewed distributions, where most values are small but there are some very large values. Taking the logarithm can make these distributions more symmetric and closer to normal, which is beneficial for many statistical methods.

Transform Data

When do we log-transform numeric variables?

  • Managing outliers: Log transformation reduces the impact of extreme values or outliers. For example, the difference between $1M and $2M becomes similar in magnitude to the difference between $10K and $20K when logged.

Transform Data

When do we log-transform numeric variables?

  • Linearizing relationships: Many relationships that appear exponential become linear after log transformation. For instance, if \(y = ax^b\), then \(log(y) = log(a) + b*log(x)\) is linear. This makes the relationship easier to model and interpret.

Transform Data

When do we log-transform numeric variables?

  • Making multiplicative relationships additive: When variables have multiplicative effects (like compound growth), logging transforms them into additive relationships. This often makes more sense for how variables actually interact in real-world systems.

Transform Data

When do we log-transform numeric variables?

  • Stabilizing variance: When the variability of data increases with its magnitude (heteroscedasticity), log transformation can help create more constant variance across the range of values, which is important for many statistical methods.