/path/to/bin/python3 -m pip install "altair[all]"
We will be working with Gapminder data
More specifically, we will be working with car deaths per 100000 people and cars trucks and buses per 1000 persons
We also need geo country entity data to get country names
/path/to/bin/python3 -m pip install polars
/path/to/bin/python3 -m pip install "pyarrow>=11.0.0"
Once installed, you can import polars:
We use .read_csv()
to read data in:
Schema([('geo', String), ('time', Int64), ('car_deaths_per_100000_people', Float64)])
We also need the two other data sets:
shape: (5, 3)
┌─────┬──────┬──────────────────────────────┐
│ geo ┆ time ┆ car_deaths_per_100000_people │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞═════╪══════╪══════════════════════════════╡
│ alb ┆ 2006 ┆ 5.978 │
│ ant ┆ 1988 ┆ 3.299 │
│ ant ┆ 1989 ┆ 7.132 │
│ ant ┆ 1990 ┆ 5.636 │
│ ant ┆ 1991 ┆ 13.15 │
└─────┴──────┴──────────────────────────────┘
We will use .join()
to join the data frames
print(df.select(["geo", "name", "car_deaths_per_100000_people", "cars_trucks_and_buses_per_1000_persons"]).head())
shape: (5, 4)
┌─────┬───────────┬──────────────────────────────┬─────────────────────────────────┐
│ geo ┆ name ┆ car_deaths_per_100000_people ┆ cars_trucks_and_buses_per_1000… │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ f64 ┆ f64 │
╞═════╪═══════════╪══════════════════════════════╪═════════════════════════════════╡
│ alb ┆ Albania ┆ 5.978 ┆ 97.32 │
│ are ┆ UAE ┆ 31.85 ┆ 313.1 │
│ arg ┆ Argentina ┆ 8.682 ┆ 313.9 │
│ arm ┆ Armenia ┆ 7.899 ┆ 104.8 │
│ aus ┆ Australia ┆ 5.972 ┆ 644.0 │
└─────┴───────────┴──────────────────────────────┴─────────────────────────────────┘
df = df.rename({"cars_trucks_and_buses_per_1000_persons": "vehicles_per_1000_persons"})
print(df.select(["geo","time", "name", "car_deaths_per_100000_people", "vehicles_per_1000_persons"]).head())
shape: (5, 5)
┌─────┬──────┬───────────┬──────────────────────────────┬───────────────────────────┐
│ geo ┆ time ┆ name ┆ car_deaths_per_100000_people ┆ vehicles_per_1000_persons │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ str ┆ f64 ┆ f64 │
╞═════╪══════╪═══════════╪══════════════════════════════╪═══════════════════════════╡
│ alb ┆ 2006 ┆ Albania ┆ 5.978 ┆ 97.32 │
│ are ┆ 2007 ┆ UAE ┆ 31.85 ┆ 313.1 │
│ arg ┆ 2007 ┆ Argentina ┆ 8.682 ┆ 313.9 │
│ arm ┆ 2007 ┆ Armenia ┆ 7.899 ┆ 104.8 │
│ aus ┆ 2003 ┆ Australia ┆ 5.972 ┆ 644.0 │
└─────┴──────┴───────────┴──────────────────────────────┴───────────────────────────┘
First we need to import altair
:
Then we can plot a scatterplot (point chart):
We can map name
of the country to tooltip
:
A few steps are needed to add legend interaction – we define a selection object that binds to the legend:
Then we change the color in the chart, and add our selection
object as a param:
We first create need to create a user input (in this case a slider, or range) and then a variable for it.
We can then use the new filter_var
to filter our data, we also need to make sure we add it as a param.
alt.Chart(df).mark_point().encode(
x = "car_deaths_per_100000_people",
y = "vehicles_per_1000_persons",
color = alt.condition(selection, "income_groups", alt.value("lightgray")),
tooltip = "name"
).transform_filter(
alt.datum["car_deaths_per_100000_people"] <= filter_var
).add_params(
selection,
filter_var
).interactive()
brush = alt.selection_interval()
chart = alt.Chart(df).mark_point().encode(
x = "car_deaths_per_100000_people",
y = "vehicles_per_1000_persons",
color=alt.condition(brush, "income_groups",alt.value("lightgray")),
tooltip = "name"
).add_params(
selection,
brush
)
chart & chart.encode(x = alt.X("time:T"))
brush = alt.selection_interval()
chart = alt.Chart(df).mark_point().encode(
x = "car_deaths_per_100000_people",
y = "vehicles_per_1000_persons",
color=alt.condition(brush, "income_groups",alt.value("lightgray")),
tooltip = "name"
).add_params(
selection,
brush
)
bars = alt.Chart(df).mark_bar().encode(
x='car_deaths_per_100000_people',
y='name',
color='name'
).transform_filter(
brush
)
chart | chart.encode(x = alt.X("time:T")) & bars
You can also use plotly for dashboards