1 Viz critique

You need to apply Munzner’s What-Why-How Framework: map to data-task-idiom trio to evaluate the quality of the visualization:

Remember also the general guidelines:

1.1 Visualization A

Visualization by Stephan Teodosescu (@steodosescu)

Overall the visualizations are good at showing changes of number of flights over time for countries in Europe. However, it’s not clear what the colors mean (encoding). Also, counting departing AND arriving flights together obscures information – showing departing and arriving flights would be more meaningful (why). The changes in line get confusing the lower ranked the country – the colors don’t help here either (encoding).

1.2 Visualization B

Visualization by Pauline Baudry @PauBaudry

All bars sum to 100%, instead of splitting the comparison won vs. disqualified, which makes comparisons difficult (why). Especially since these are not all the ingredients used, but only the top 10 (data/encoding). The light color is very similar to the background, and the white font makes it very hard to read (encoding).

1.3 Visualization C

Visualization by Nicola Rennie @nrennie35

Number of matches played might not be the ideal to understand the countries that perform better (why) so it’s not clear what the point is here. The plot is hard to read, too many lines overlapping (encoding). Maybe a comparison between number of matches played for winners would be more informative (why).

1.4 Visualization D

Visualization by Dan Oehm @danoehm

Not everyone will recognize the flag for the countries, making it interactive with the country name showing up would add to the accessibility of the plot. Nice interpretation of results in the text at the top (why), but I am not sure the interpretation has any actual meaning. The home vs. away is not meaningful here (data), it is just the way the data is organized. The more I try to understand this plot, the harder it gets.

2 Variable mapping to viz encodings

Variables can be:

Color schemes that best represent each type of variable:

2.1 What variable types are present in the following data:

year state id total DEMOCRAT REPUBLICAN democrat_difference republican_difference
2020 ALABAMA 1 2323282 0.3656999 0.6203164 -0.2546165 0.2546165
2020 ALASKA 2 359530 0.4277195 0.5283314 -0.1006119 0.1006119
2020 ARIZONA 4 3387326 0.4936469 0.4905598 0.0030871 -0.0030871
2020 ARKANSAS 5 1219069 0.3477506 0.6239573 -0.2762067 0.2762067
2020 CALIFORNIA 6 17500881 0.6348395 0.3432072 0.2916322 -0.2916322
2020 COLORADO 8 3279980 0.5501107 0.4160413 0.1340694 -0.1340694
2020 CONNECTICUT 9 1823857 0.5926073 0.3918712 0.2007361 -0.2007361
2020 DELAWARE 10 504346 0.5874301 0.3977488 0.1896813 -0.1896813
2020 DISTRICT OF COLUMBIA 11 344356 0.9214969 0.0539732 0.8675237 -0.8675237
2020 FLORIDA 12 11067456 0.4786145 0.5121982 -0.0335837 0.0335837

* Categorical unordered: state, id * Categorical/discrete ordered: year * Numeric discrete: total * Numeric continuous: DEMOCRAT, REPUBLICAN, democrat_difference (divergent), republican_difference (divergent)

2.1.1 What variables would you map to build the following visualizations based on these data:

  1. A line plot showing the percentage of votes to the democratic candidate across the years for the state of Arizona x to year, y to DEMOCRAT, optionally: split by state with interactive element, so state mapped to color/interaction
  2. A map plot with percent democrat/republican votes per state democrat_difference mapped to fill for each shape
  3. A bar plot showing the 5 states that voted the most republican and the 5 states that voted the most democrat fill to democrat/republican, state mapped to y (so that we can read labels horizontally), total votes mapped to x, optional interactive element mapping to year

2.1.2 What color scheme would use for each of the visualizations above?

1. Qualitative color scheme for state 1. Divergent color scale, with red representing more republican and blue more democratic – make zero white 1. Red bars for republican votes, Blue for democratic votes. No need to map color to state since state is mapped to one of the axes.

2.2 What variable types are present in the following data (NYTimes best sellers):

id title author year total_weeks first_week debut_rank best_rank
0 “H” IS FOR HOMICIDE Sue Grafton 1991 15 1991-05-05 1 2
1 “I” IS FOR INNOCENT Sue Grafton 1992 11 1992-04-26 14 2
10 ‘’G’’ IS FOR GUMSHOE Sue Grafton 1990 6 1990-05-06 4 8
100 A DOG’S JOURNEY W. Bruce Cameron 2012 1 2012-05-27 3 14
1000 CHANGING FACES Kimberla Lawson Roby 2006 1 2006-02-19 11 14
1001 CHAOS Patricia Cornwell 2016 3 2016-12-04 1 7
1002 CHAPTERHOUSE: DUNE Frank Herbert 1985 16 1985-04-21 9 2
1003 CHARADE Sandra Brown 1994 5 1994-05-01 7 10
1004 CHARLESTON John Jakes 2002 4 2002-08-25 7 12
1005 CHARLOTTE GRAY Sebastian Faulks 1999 1 1999-03-14 12 17

* Categorical unordered: author, title, id * Categorical/discrete ordered: year, firt_week * Numeric discrete: total_weeks, debut_rank, best_rank

2.2.1 What visualization would you built to answer the following questions? Include which variables you would map to each encoding, and what color scheme you would use:

  1. What are the top 10 books that stayed the most weeks in the NYTimes best sellers list? bar plot mapping title to y axis, and total number of weeks to x
  2. How has the debut ranking for books by Stephen King changed over time? line plot with year mapped to x, and debut_rank mapped to y 1.1 How does debut rank for Stephen King compare with debut rank by Danielle Steel over time? **line plot with year mapped to x, debut_rank mapped to y, qualitative fill color mapped to author*
  3. Which books had the largest difference between best rank and debut rank? line plot with debut rank vs. best rank labels (categories for rank type) mapped to x, and actual rank number for debut_rank and best_rank mapped to y, two colors: one for positive change, another color for negative change. Annotate plot with title of the book (or make it show up when hovering over the line)

3 Vega spec completion

Complete the Vega specification for the three plots (they all use the same data – NY Times Best Sellers).

var spec = {
  $schema: "https://vega.github.io/schema/vega/v5.json",
  description: "NY Times Best Sellers of All Times",
  width: 800,
  height: 400,
  padding: 50,
  data: [
    {
      name: "books",
      url: "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-05-10/nyt_titles.tsv",
      format: { type: "tsv" }
    },
    { 
      name: "aggregate",
      source: "books",
      transform: [
        {
          type: "aggregate",
          groupby: ["year"],
          fields: ["total_weeks"],
          ops: ["mean"],
          as: ["total_weeks"]
        }
      ]
    }
  ],
  scales: [
    {
      name: "xScale",
      type: "linear",
      domain: { field: "year", data: "aggregate" },
      range: "width",
      zero: false
    },
    {
      name: "yScale",
      type: "linear",
      domain: { field: "total_weeks", data: "aggregate" },
      range: "height",
      zero: false
    }
  ],
  axes: [
    {
      scale: "xScale",
      orient: "bottom",
      format: "d",
      title: "Year"
     
    },
    {
      scale: "yScale",
      orient: "left",
      title: "average number of weeks in the NY Times Best Sellers list"
    }
  ],
  marks: [
    {
      type: "symbol",
      from: { data: "aggregate" },
      encode: {
        enter: {
          x: { field: "year", scale: "xScale" },
          y: { field: "total_weeks", scale: "yScale" },
        }
      }
    }
  ],
  title: {
    text: "NY Times Best Sellers Books by Average Total Weeks"
  }
};

var spec = {
  $schema: "https://vega.github.io/schema/vega/v5.json",
  description: "NY Times Best Sellers of All Times",
  width: 800,
  height: 800,
  padding: 50,
  data: [
    {
      name: "books",
      url: "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-05-10/nyt_titles.tsv",
      format: { type: "tsv" },
      transform: [
        {
          type: "formula",
          expr: "datum.total_weeks / 1",
          as: "total_weeks"
        },
        {
          type: "collect",
          sort: { field: "total_weeks", order: "descending"}
        },
        {
          type: "filter",
          expr: "datum.total_weeks > 94"
        }
      ]
    }
  ],
  scales: [
    {
      name: "yScale",
      type: "linear",
      domain: [2020, 1931],
      range: "height",
      zero: false
    },
    {
      name: "xScale",
      type: "linear",
      domain: { field: "total_weeks", data: "books" },
      range: "width"
    }
  ],
  axes: [
    {
      scale: "xScale",
      orient: "bottom",
      title: "total weeks in the NY Times Best Sellers list"
    },
    {
      scale: "yScale",
      orient: "left",
      format: "d",
      title: "Year"
      
    }
  ],
  marks: [
    {
      type: "rect",
      from: { data: "books" },
      encode: {
        enter: {
          y: { field: "year", scale: "yScale" },
          x: { field: "total_weeks", scale: "xScale" },
          x2: { value: 0, scale: "xScale" },
          height: { value: 3 }
        }
      }
    },
    {
      type: "text",
      from: {data : "books" },
      encode: {
        enter: {
          text: { signal: "datum.title + ' by ' + datum.author + ' ' + datum.year" },
          y: { field: "year", scale: "yScale" },
          x: { field: "total_weeks", scale: "xScale" },
          align: { value: "right"}
        }
      }
    }
  ],
  title: {
    text: "NY Times Best Sellers Books with longest total weeks in list",
    subtitle: "The 1980s didn't have any books in the list for more than 94 weeks"
  }
};

var spec = {
  $schema: "https://vega.github.io/schema/vega/v5.json",
  description: "NY Times Best Sellers of All Times",
  width: 800,
  height: 400,
  padding: 50,
  data: [
    {
      name: "books",
      url: "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-05-10/nyt_titles.tsv",
      format: { type: "tsv" }
    },
    { 
      name: "aggregate",
      source: "books",
      transform: [
        {
          type: "aggregate",
          groupby: ["year"]
        }
      ]
    }
  ],
  scales: [
    {
      name: "xScale",
      type: "linear",
      domain: { field: "year", data: "aggregate" },
      range: "width",
      zero: false
    },
    {
      name: "yScale",
      type: "linear",
      domain: { field: "count", data: "aggregate" },
      range: "height",
      zero: true
    }
  ],
  axes: [
    {
      scale: "xScale",
      orient: "bottom",
      title: "Year",
      format: "d"
     
    },
    {
      scale: "yScale",
      orient: "left",
      title: "Number of Books in the NY Times Best Sellers list"
    }
  ],
  marks: [
    {
      type: "rect",
      from: { data: "aggregate" },
      encode: {
        enter: {
          x: { field: "year", scale: "xScale" },
          y: { field: "count", scale: "yScale" },
          y2: { value: 0, scale: "yScale" },
          width: { value: 5 }
        }
      }
    }
  ],
  title: {
    text: "Total NY Times Best Sellers Books by year"
  }
};

4 Audiences, purposes, and storytelling

Consider the three plots from the previous questions when answering the following questions:

4.1 What would be the 3-minute story that the three plots tell?

The New York Times Best Sellers are up-to-date and authoritative lists of the most popular books in the United States, based on sales in the past week. It can be used as a measure of what Americans are reading across the years. There was an increase of average number of weeks books stayed in the list from the 1930 to the 1970s – then the average number of weeks books stayed in the list has gradually decreased since the 1970s. Did books get less popular, or was just more books being published so public attention is divided among the many reading options available? Looking at the data in a more fine grained manner shows us that there were no books that stayed in the list for more than 94 weeks between 1960 and 1990. At the same time, when we investigate the total number of books in the list per year, there was a decrease in publications after 1960, with number of books increasing steadily since the 1980s. So it does seem that the 1970s and 1980s were not great decades for popular books, but we do have an abundance of books since the 2000s, with a number of books staying in the list well over 100 weeks total

4.2 What is the big idea?

The 1970s and 1980s were not great for the book industry (not a lot of popular books in the NY Times Best Sellers list). There has been an abundance of books in the list since the 2000s, with a number of books staying in the list well over 100 weeks total. It is not clear if this is an indication that people are reading more, but it is an indication that more books are being published.

4.3 What would the audience be for this story?

Bookstores that want to make informed decisions on what to stock their shelves with.