Dashboards with Altair

Altair package

  • A declarative visualization library for Python
  • Built on top of Vega-Lite grammar
/path/to/bin/python3 -m pip install "altair[all]"

Data

We will be working with Gapminder data

More specifically, we will be working with car deaths per 100000 people and cars trucks and buses per 1000 persons

We also need geo country entity data to get country names

Polars

Polars API reference

  • High-performance DataFrame library
  • Designed to provide fast and efficient data processing capabilities
/path/to/bin/python3 -m pip install polars
/path/to/bin/python3 -m pip install "pyarrow>=11.0.0"

Once installed, you can import polars:

import polars as pl

Reading data in

We use .read_csv() to read data in:

car_deaths = pl.read_csv("data/car_deaths.csv")
print(car_deaths.schema)
Schema([('geo', String), ('time', Int64), ('car_deaths_per_100000_people', Float64)])

We also need the two other data sets:

car_trucks_buses = pl.read_csv("data/cars_trucks_and_buses.csv")
countries = pl.read_csv("data/entities-geo-country.csv")

Looking at data

print(car_deaths.head())
shape: (5, 3)
┌─────┬──────┬──────────────────────────────┐
│ geo ┆ time ┆ car_deaths_per_100000_people │
│ --- ┆ ---  ┆ ---                          │
│ str ┆ i64  ┆ f64                          │
╞═════╪══════╪══════════════════════════════╡
│ alb ┆ 2006 ┆ 5.978                        │
│ ant ┆ 1988 ┆ 3.299                        │
│ ant ┆ 1989 ┆ 7.132                        │
│ ant ┆ 1990 ┆ 5.636                        │
│ ant ┆ 1991 ┆ 13.15                        │
└─────┴──────┴──────────────────────────────┘

Joining dataframes

We will use .join() to join the data frames

df = car_deaths.join(car_trucks_buses, on=["geo", "time"])
df = df.join(countries, left_on = "geo", right_on = "country", how="left")

Check results

print(df.select(["geo", "name", "car_deaths_per_100000_people", "cars_trucks_and_buses_per_1000_persons"]).head())
shape: (5, 4)
┌─────┬───────────┬──────────────────────────────┬─────────────────────────────────┐
│ geo ┆ name      ┆ car_deaths_per_100000_people ┆ cars_trucks_and_buses_per_1000… │
│ --- ┆ ---       ┆ ---                          ┆ ---                             │
│ str ┆ str       ┆ f64                          ┆ f64                             │
╞═════╪═══════════╪══════════════════════════════╪═════════════════════════════════╡
│ alb ┆ Albania   ┆ 5.978                        ┆ 97.32                           │
│ are ┆ UAE       ┆ 31.85                        ┆ 313.1                           │
│ arg ┆ Argentina ┆ 8.682                        ┆ 313.9                           │
│ arm ┆ Armenia   ┆ 7.899                        ┆ 104.8                           │
│ aus ┆ Australia ┆ 5.972                        ┆ 644.0                           │
└─────┴───────────┴──────────────────────────────┴─────────────────────────────────┘

Rename columns

df = df.rename({"cars_trucks_and_buses_per_1000_persons": "vehicles_per_1000_persons"})
print(df.select(["geo","time", "name", "car_deaths_per_100000_people", "vehicles_per_1000_persons"]).head())
shape: (5, 5)
┌─────┬──────┬───────────┬──────────────────────────────┬───────────────────────────┐
│ geo ┆ time ┆ name      ┆ car_deaths_per_100000_people ┆ vehicles_per_1000_persons │
│ --- ┆ ---  ┆ ---       ┆ ---                          ┆ ---                       │
│ str ┆ i64  ┆ str       ┆ f64                          ┆ f64                       │
╞═════╪══════╪═══════════╪══════════════════════════════╪═══════════════════════════╡
│ alb ┆ 2006 ┆ Albania   ┆ 5.978                        ┆ 97.32                     │
│ are ┆ 2007 ┆ UAE       ┆ 31.85                        ┆ 313.1                     │
│ arg ┆ 2007 ┆ Argentina ┆ 8.682                        ┆ 313.9                     │
│ arm ┆ 2007 ┆ Armenia   ┆ 7.899                        ┆ 104.8                     │
│ aus ┆ 2003 ┆ Australia ┆ 5.972                        ┆ 644.0                     │
└─────┴──────┴───────────┴──────────────────────────────┴───────────────────────────┘

Scatterplot

First we need to import altair:

import altair as alt

Then we can plot a scatterplot (point chart):

alt.Chart(df).mark_point().encode(
    x = "car_deaths_per_100000_people",
    y = "vehicles_per_1000_persons"
)

Add categorical variable

alt.Chart(df).mark_point().encode(
    x = "car_deaths_per_100000_people",
    y = "vehicles_per_1000_persons",
    color = "income_groups"
)

Add interaction – hovering

We can map name of the country to tooltip:

alt.Chart(df).mark_point().encode(
    x = "car_deaths_per_100000_people",
    y = "vehicles_per_1000_persons",
    color = "income_groups",
    tooltip = "name"
)

Add interaction – zooming

alt.Chart(df).mark_point().encode(
    x = "car_deaths_per_100000_people",
    y = "vehicles_per_1000_persons",
    color = "income_groups",
    tooltip = "name"
).interactive()

Add interaction

A few steps are needed to add legend interaction – we define a selection object that binds to the legend:

selection = alt.selection_point(fields=["income_groups"], bind="legend")

Add interaction

Then we change the color in the chart, and add our selection object as a param:

alt.Chart(df).mark_point().encode(
    x = "car_deaths_per_100000_people",
    y = "vehicles_per_1000_persons",
    color = alt.condition(selection, "income_groups", alt.value("lightgray")),
    tooltip = "name"
).add_params(
    selection
).interactive()

Add interaction

Add filter slider

We first create need to create a user input (in this case a slider, or range) and then a variable for it.

slider = alt.binding_range(min=0, max=35, step=0.05,
                           name='car_deaths_per_100000_people:')
filter_var = alt.param(value=35, bind=slider)

We can then use the new filter_var to filter our data, we also need to make sure we add it as a param.

alt.Chart(df).mark_point().encode(
    x = "car_deaths_per_100000_people",
    y = "vehicles_per_1000_persons",
    color = alt.condition(selection, "income_groups", alt.value("lightgray")),
    tooltip = "name"
).transform_filter(
    alt.datum["car_deaths_per_100000_people"] <= filter_var
).add_params(
    selection,
    filter_var
).interactive()

Add filter slider

Add multiple charts

chart = alt.Chart(df).mark_point().encode(
    x = "car_deaths_per_100000_people",
    y = "vehicles_per_1000_persons",
    color = alt.condition(selection, "income_groups", alt.value("lightgray")),
    tooltip = "name"
).add_params(
    selection
).interactive()

chart | chart.encode(x = alt.X("time:T"))

Selection for multiple charts

brush = alt.selection_interval()

chart = alt.Chart(df).mark_point().encode(
    x = "car_deaths_per_100000_people",
    y = "vehicles_per_1000_persons",
    color=alt.condition(brush, "income_groups",alt.value("lightgray")),
    tooltip = "name"
).add_params(
    selection,
    brush
)

chart & chart.encode(x = alt.X("time:T"))

Add bar chart for selection

brush = alt.selection_interval()

chart = alt.Chart(df).mark_point().encode(
    x = "car_deaths_per_100000_people",
    y = "vehicles_per_1000_persons",
    color=alt.condition(brush, "income_groups",alt.value("lightgray")),
    tooltip = "name"
).add_params(
    selection,
    brush
)

bars = alt.Chart(df).mark_bar().encode(
    x='car_deaths_per_100000_people',
    y='name',
    color='name'
).transform_filter(
    brush
)

chart | chart.encode(x = alt.X("time:T")) & bars

Add bar chart for selection

Plotly

You can also use plotly for dashboards