Data Wrangling

Methods to use

Two .replace() methods, one from pandas.Series.str and another from pandas.DataFrame

data_frame[column_name] = data_frame[column_name].str.replace(r"none", "0")
data_frame[column_name] = pd.to_numeric(data_frame[column_name])
data_frame = data_frame.replace({'data_frame': {'value1': '0', 'value2': '1'}})

Practice

  • Clean this data set on Titanic survivors
  • Make sure you define a main() methods that reads and writes the data
  • Join Gradescope – it’s free, and join our course (Entry Code:8XG2NV)
  • Submit your file(s) to the Data Wrangling assignment

Exploratory data analysis

Calculating descriptive stats of a numeric variable by a group.

data_frame.groupby("category")["numeric_value"].mean().reset_index()
data_frame.groupby("category")["numeric_value"].agg(["mean", "std", "max", "min"]).reset_index()

Bar plots

import seaborn as sns

sns.barplot(data = desc_stats, x = "category", y = "mean")