You need to apply Munzner’s What-Why-How Framework: map to data-task-idiom trio to evaluate the quality of the visualization:
Remember also the general guidelines:
Visualization by Stephan Teodosescu (@steodosescu)
Overall the visualizations are good at showing changes of number of flights over time for countries in Europe. However, it’s not clear what the colors mean (encoding). Also, counting departing AND arriving flights together obscures information – showing departing and arriving flights would be more meaningful (why). The changes in line get confusing the lower ranked the country – the colors don’t help here either (encoding).
Visualization by Pauline Baudry @PauBaudry
All bars sum to 100%, instead of splitting the comparison won vs. disqualified, which makes comparisons difficult (why). Especially since these are not all the ingredients used, but only the top 10 (data/encoding). The light color is very similar to the background, and the white font makes it very hard to read (encoding).
Visualization by Nicola Rennie @nrennie35
Number of matches played might not be the ideal to understand the countries that perform better (why) so it’s not clear what the point is here. The plot is hard to read, too many lines overlapping (encoding). Maybe a comparison between number of matches played for winners would be more informative (why).
Visualization by Dan Oehm @danoehm
Not everyone will recognize the flag for the countries, making it interactive with the country name showing up would add to the accessibility of the plot. Nice interpretation of results in the text at the top (why), but I am not sure the interpretation has any actual meaning. The home vs. away is not meaningful here (data), it is just the way the data is organized. The more I try to understand this plot, the harder it gets.
Variables can be:
Color schemes that best represent each type of variable:
year | state | id | total | DEMOCRAT | REPUBLICAN | democrat_difference | republican_difference |
---|---|---|---|---|---|---|---|
2020 | ALABAMA | 1 | 2323282 | 0.3656999 | 0.6203164 | -0.2546165 | 0.2546165 |
2020 | ALASKA | 2 | 359530 | 0.4277195 | 0.5283314 | -0.1006119 | 0.1006119 |
2020 | ARIZONA | 4 | 3387326 | 0.4936469 | 0.4905598 | 0.0030871 | -0.0030871 |
2020 | ARKANSAS | 5 | 1219069 | 0.3477506 | 0.6239573 | -0.2762067 | 0.2762067 |
2020 | CALIFORNIA | 6 | 17500881 | 0.6348395 | 0.3432072 | 0.2916322 | -0.2916322 |
2020 | COLORADO | 8 | 3279980 | 0.5501107 | 0.4160413 | 0.1340694 | -0.1340694 |
2020 | CONNECTICUT | 9 | 1823857 | 0.5926073 | 0.3918712 | 0.2007361 | -0.2007361 |
2020 | DELAWARE | 10 | 504346 | 0.5874301 | 0.3977488 | 0.1896813 | -0.1896813 |
2020 | DISTRICT OF COLUMBIA | 11 | 344356 | 0.9214969 | 0.0539732 | 0.8675237 | -0.8675237 |
2020 | FLORIDA | 12 | 11067456 | 0.4786145 | 0.5121982 | -0.0335837 | 0.0335837 |
* Categorical unordered: state, id * Categorical/discrete ordered: year * Numeric discrete: total * Numeric continuous: DEMOCRAT, REPUBLICAN, democrat_difference (divergent), republican_difference (divergent)
1. Qualitative color scheme for state 1. Divergent color scale, with red representing more republican and blue more democratic – make zero white 1. Red bars for republican votes, Blue for democratic votes. No need to map color to state since state is mapped to one of the axes.
id | title | author | year | total_weeks | first_week | debut_rank | best_rank |
---|---|---|---|---|---|---|---|
0 | “H” IS FOR HOMICIDE | Sue Grafton | 1991 | 15 | 1991-05-05 | 1 | 2 |
1 | “I” IS FOR INNOCENT | Sue Grafton | 1992 | 11 | 1992-04-26 | 14 | 2 |
10 | ‘’G’’ IS FOR GUMSHOE | Sue Grafton | 1990 | 6 | 1990-05-06 | 4 | 8 |
100 | A DOG’S JOURNEY | W. Bruce Cameron | 2012 | 1 | 2012-05-27 | 3 | 14 |
1000 | CHANGING FACES | Kimberla Lawson Roby | 2006 | 1 | 2006-02-19 | 11 | 14 |
1001 | CHAOS | Patricia Cornwell | 2016 | 3 | 2016-12-04 | 1 | 7 |
1002 | CHAPTERHOUSE: DUNE | Frank Herbert | 1985 | 16 | 1985-04-21 | 9 | 2 |
1003 | CHARADE | Sandra Brown | 1994 | 5 | 1994-05-01 | 7 | 10 |
1004 | CHARLESTON | John Jakes | 2002 | 4 | 2002-08-25 | 7 | 12 |
1005 | CHARLOTTE GRAY | Sebastian Faulks | 1999 | 1 | 1999-03-14 | 12 | 17 |
* Categorical unordered: author, title, id * Categorical/discrete ordered: year, firt_week * Numeric discrete: total_weeks, debut_rank, best_rank
Complete the Vega specification for the three plots (they all use the same data – NY Times Best Sellers).
var spec = {
$schema: "https://vega.github.io/schema/vega/v5.json",
description: "NY Times Best Sellers of All Times",
width: 800,
height: 400,
padding: 50,
data: [
{
name: "books",
url: "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-05-10/nyt_titles.tsv",
format: { type: "tsv" }
},
{
name: "aggregate",
source: "books",
transform: [
{
type: "aggregate",
groupby: ["year"],
fields: ["total_weeks"],
ops: ["mean"],
as: ["total_weeks"]
}
]
}
],
scales: [
{
name: "xScale",
type: "linear",
domain: { field: "year", data: "aggregate" },
range: "width",
zero: false
},
{
name: "yScale",
type: "linear",
domain: { field: "total_weeks", data: "aggregate" },
range: "height",
zero: false
}
],
axes: [
{
scale: "xScale",
orient: "bottom",
format: "d",
title: "Year"
},
{
scale: "yScale",
orient: "left",
title: "average number of weeks in the NY Times Best Sellers list"
}
],
marks: [
{
type: "symbol",
from: { data: "aggregate" },
encode: {
enter: {
x: { field: "year", scale: "xScale" },
y: { field: "total_weeks", scale: "yScale" },
}
}
}
],
title: {
text: "NY Times Best Sellers Books by Average Total Weeks"
}
};
var spec = {
$schema: "https://vega.github.io/schema/vega/v5.json",
description: "NY Times Best Sellers of All Times",
width: 800,
height: 800,
padding: 50,
data: [
{
name: "books",
url: "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-05-10/nyt_titles.tsv",
format: { type: "tsv" },
transform: [
{
type: "formula",
expr: "datum.total_weeks / 1",
as: "total_weeks"
},
{
type: "collect",
sort: { field: "total_weeks", order: "descending"}
},
{
type: "filter",
expr: "datum.total_weeks > 94"
}
]
}
],
scales: [
{
name: "yScale",
type: "linear",
domain: [2020, 1931],
range: "height",
zero: false
},
{
name: "xScale",
type: "linear",
domain: { field: "total_weeks", data: "books" },
range: "width"
}
],
axes: [
{
scale: "xScale",
orient: "bottom",
title: "total weeks in the NY Times Best Sellers list"
},
{
scale: "yScale",
orient: "left",
format: "d",
title: "Year"
}
],
marks: [
{
type: "rect",
from: { data: "books" },
encode: {
enter: {
y: { field: "year", scale: "yScale" },
x: { field: "total_weeks", scale: "xScale" },
x2: { value: 0, scale: "xScale" },
height: { value: 3 }
}
}
},
{
type: "text",
from: {data : "books" },
encode: {
enter: {
text: { signal: "datum.title + ' by ' + datum.author + ' ' + datum.year" },
y: { field: "year", scale: "yScale" },
x: { field: "total_weeks", scale: "xScale" },
align: { value: "right"}
}
}
}
],
title: {
text: "NY Times Best Sellers Books with longest total weeks in list",
subtitle: "The 1980s didn't have any books in the list for more than 94 weeks"
}
};
var spec = {
$schema: "https://vega.github.io/schema/vega/v5.json",
description: "NY Times Best Sellers of All Times",
width: 800,
height: 400,
padding: 50,
data: [
{
name: "books",
url: "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-05-10/nyt_titles.tsv",
format: { type: "tsv" }
},
{
name: "aggregate",
source: "books",
transform: [
{
type: "aggregate",
groupby: ["year"]
}
]
}
],
scales: [
{
name: "xScale",
type: "linear",
domain: { field: "year", data: "aggregate" },
range: "width",
zero: false
},
{
name: "yScale",
type: "linear",
domain: { field: "count", data: "aggregate" },
range: "height",
zero: true
}
],
axes: [
{
scale: "xScale",
orient: "bottom",
title: "Year",
format: "d"
},
{
scale: "yScale",
orient: "left",
title: "Number of Books in the NY Times Best Sellers list"
}
],
marks: [
{
type: "rect",
from: { data: "aggregate" },
encode: {
enter: {
x: { field: "year", scale: "xScale" },
y: { field: "count", scale: "yScale" },
y2: { value: 0, scale: "yScale" },
width: { value: 5 }
}
}
}
],
title: {
text: "Total NY Times Best Sellers Books by year"
}
};
Consider the three plots from the previous questions when answering the following questions:
The New York Times Best Sellers are up-to-date and authoritative lists of the most popular books in the United States, based on sales in the past week. It can be used as a measure of what Americans are reading across the years. There was an increase of average number of weeks books stayed in the list from the 1930 to the 1970s – then the average number of weeks books stayed in the list has gradually decreased since the 1970s. Did books get less popular, or was just more books being published so public attention is divided among the many reading options available? Looking at the data in a more fine grained manner shows us that there were no books that stayed in the list for more than 94 weeks between 1960 and 1990. At the same time, when we investigate the total number of books in the list per year, there was a decrease in publications after 1960, with number of books increasing steadily since the 1980s. So it does seem that the 1970s and 1980s were not great decades for popular books, but we do have an abundance of books since the 2000s, with a number of books staying in the list well over 100 weeks total
The 1970s and 1980s were not great for the book industry (not a lot of popular books in the NY Times Best Sellers list). There has been an abundance of books in the list since the 2000s, with a number of books staying in the list well over 100 weeks total. It is not clear if this is an indication that people are reading more, but it is an indication that more books are being published.
Bookstores that want to make informed decisions on what to stock their shelves with.