Visualization mapping links variables in the data to things you can see in your plot.
If you have not done so already, download the penguin and set up your analysis environment:
We have a total of 8 variables. Take a moment to read whether each variable is a categorical or a numeric variable.
x
and y
(our axes)x
, like histograms, because the y
axis information is calculated by the histogram function itselfWhat is the relationship between
bill length
andbill depth
?
bill length
and bill depth
bill length
increases so does bill depth
Note that these two variables are in the same unit of measure (i.e., length and depth are both given in mm
or millimeters)
Map variables to the two axes: bill length
to y
and bill depth
to x
.
We start with our data, then we map our variables to our axes using the ggplot()
and aes()
(i.e., aesthetics) functions.
Let’s add the geometrics:
x
(and often also what to map to y
) first. Then you build your other mappings.species
.For geom_point
the mapping we need is color
. Let’s add that to our aesthetics mapping (i.e., inside the aes()
function, which in turn is inside the ggplot()
function).
Let’s see how fill
would be used. For that, we need a different type of plot, such as a bar plot. We can use geom_bar()
with only x
mapped to bill_depth_mm
(the values for y
are calculate by the geom_bar()
function, and the default stat is count
). We will also keep the mapping of species
to color
.
The mapping color
in a bar plot determines the outline color of each bar. To map to the fill color of the bars, you need to use the fill
mapping.
You can always set a fixed color using the color
parameter in the geometrics (e.g., inside geom_point()
) by not using the aes()
function and naming a specific color. Let’s change the color of the dots in our first scatterplot:
size
for numeric variable mappingsshape
for categorical variable mappingsHere’s the same scatter plot but with only shape
mapped to the categorical variable species
.
Let’s now map shape
and color
to the categorical variable species
.
size
is better mapped to numeric continuous variables, as opposed to categorical variablesLet’s map size
to body_mass_g
(i.e., penguin body mass in grams).
Remember that some channels are better processed than others, with color
being one of the best channels in terms of visual processing.