November 15 2022

Agenda

  • Strategies for Multidimensional data visualization:
    • direct visualization (this was approached last class)
    • projections (dimensionality reduction)
      • Principal Components Analysis (PCA) visualization

Dimensionality Reduction

Dimensionality Reduction

  • methods that allow the representation of multidimensional data from a high dimensional to a low-dimensional space (also called projections)
  • the projection space can be called a display, embedding or image space.

Examples of linear projection methods:

  • principal component analysis
  • linear discriminant analysis

Example of nonlinear projection methods:

  • isometric feature mapping

Principal Component Analysis (PCA)

Principal Component Analysis (PCA)

  • Invented in 1901 and still widely used today
  • The main idea is to reduce dimensionality
  • The first principal component explains the most variance
  • The principals are uncorrelated and ordered by decreasing variances
  • Limitations: not good for nonlinear structures

Case Study: McDonald’s Menu Data PCA

McDonald’s Menu Data

PCA results

Adding rule marks

Check the rule mark documentation for Vega.

Add rule marks for the loadings data (you have to add scales as well) mapping x and y to zero, x2 to PC1, and y2 to PC2.

Add the other PCs

How can you replicate the plot you created to show PC3 vs. PC4, and PC5 vs. PC6?

Vega-Lite

What is Vega-Lite?

Vega-lite is a higher-level language built on top of Vega that automates some constructions and makes the JSON specification significantly shorter.

Vega-Lite allows the creation of common plots fast.

“Compared to Vega, Vega-Lite provides a more concise and convenient form to author common visualizations. As Vega-Lite can compile its specifications to Vega specifications, users may use Vega-Lite as the primary visualization tool and, if needed, transition to use the lower-level Vega for advanced use cases.”

Examples of Vega-Lite