Visualization mapping links variables in the data to things you can see in your plot.
If you have not done so already, download the penguin and set up your analysis environment:
We have a total of 8 variables. Take a moment to read whether each variable is a categorical or a numeric variable.
x and y (our axes)x, like histograms, because the y axis information is calculated by the histogram function itselfWhat is the relationship between
bill lengthandbill depth?
bill length and bill depthbill length increases so does bill depthNote that these two variables are in the same unit of measure (i.e., length and depth are both given in mm or millimeters)
Map variables to the two axes: bill length to y and bill depth to x.
We start with our data, then we map our variables to our axes using the ggplot() and aes() (i.e., aesthetics) functions.
Let’s add the geometrics:
x (and often also what to map to y) first. Then you build your other mappings.species.For geom_point the mapping we need is color. Let’s add that to our aesthetics mapping (i.e., inside the aes() function, which in turn is inside the ggplot() function).
Let’s see how fill would be used. For that, we need a different type of plot, such as a bar plot. We can use geom_bar() with only x mapped to bill_depth_mm (the values for y are calculate by the geom_bar() function, and the default stat is count). We will also keep the mapping of species to color.
The mapping color in a bar plot determines the outline color of each bar. To map to the fill color of the bars, you need to use the fill mapping.
You can always set a fixed color using the color parameter in the geometrics (e.g., inside geom_point()) by not using the aes() function and naming a specific color. Let’s change the color of the dots in our first scatterplot:
size for numeric variable mappingsshape for categorical variable mappingsHere’s the same scatter plot but with only shape mapped to the categorical variable species.
Let’s now map shape and color to the categorical variable species.
size is better mapped to numeric continuous variables, as opposed to categorical variablesLet’s map size to body_mass_g (i.e., penguin body mass in grams).
Remember that some channels are better processed than others, with color being one of the best channels in terms of visual processing.