class: center, middle, inverse, title-slide .title[ # Slides 04 - Metaphors with Graphics ] .subtitle[ ## From Code to Geometry ] .author[ ### Arvind Venkatadri ] .institute[ ### Srishti Manipal Institute ] .date[ ### (2022-08-09) ] --- class: middle, center ## How does one read Shakespeare? ![shakespeare](https://media.giphy.com/media/oveqQA2LxpwYg/giphy.gif) ~~To code or not to code, that is the question...~~ --- # What is a Grammar of Graphics? ## Code looks and reads like **English**. ## Has **verbs**, **nouns**, some **adjectives**.... -- - Describes Information/ideas/concepts from *any* **source domain**. -- - **GEOMETRY** as the *target domain* : What comes out of R is predominantly "geometry" --- layout: false # How do we express visuals in words? .font120[ - **Data** to be visualized ] -- .font120[ - **.hlb[Geom]etric objects** that appear on the plot ] -- .font120[ - **.hlb[Aes]thetic mappings** from data to visual component ] -- .font120[ - **.hlb[Stat]istics** transform data on the way to visualization ] -- .font120[ - **.hlb[Coord]inates** organize location of geometric objects ] -- .font120[ - **.hlb[Scale]s** define the range of values for aesthetics ] -- .font120[ - **.hlb[Facet]s** group into subplots ] --- # The Essence of ggplot all `ggplot2` - `aes(x = , y = )` (aesthetics) - `aes(x = , y = , color = )` (add color) - `aes(x = , y = , size = )` (add size) - `+ facet_wrap(~ )` (facetting) - `+ scale_` ( add a scale) --- # gg is for Grammar of Graphics .left-column[ ### Data ### Aesthetics ### Geoms ```r + geom_*() ``` ] .right-column[ <img src="04-Metaphors-with-Graphics_files/figure-html/geom_demo-1.png" width="850px" height="350px" /> ] --- # [The Five-Named Graphs](http://moderndive.com/3-viz.html#FiveNG) - Scatterplot: `geom_point()` - Line graph: `geom_line()` - Histogram: `geom_histogram()` - Boxplot: `geom_boxplot()` - Bar graph: `geom_bar()` or `geom_col` (see [Lab 02](../02-Pronouns-and-Data.html)) --- ## Chunk : penguins ```r head(penguins) ``` ``` ## # A tibble: 6 × 8 ## species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex year ## <fct> <fct> <dbl> <dbl> <int> <int> <fct> <int> *## 1 Adelie Torgersen 39.1 18.7 181 3750 male 2007 *## 2 Adelie Torgersen 39.5 17.4 186 3800 female 2007 *## 3 Adelie Torgersen 40.3 18 195 3250 female 2007 ## 4 Adelie Torgersen NA NA NA NA <NA> 2007 ## 5 Adelie Torgersen 36.7 19.3 193 3450 female 2007 ## 6 Adelie Torgersen 39.3 20.6 190 3650 male 2007 ``` We see the first few rows of the dataset `penguins`. We see that there are a few **NA** data observations too. Let us remove them for now. ```r penguins <- penguins %>% drop_na() ``` --- ## Chunk: Mapping .pull-left[ ```r *ggplot(penguins) ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/first-plot1a-out-1.png" width="504" /> ] --- ## Chunk: Mapping .pull-left[ ```r ggplot(data = penguins, * mapping = aes(x = bill_length_mm, * y = body_mass_g)) ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/first-plot1b-out-1.png" width="504" /> ] --- ## Chunk: Mapping .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = body_mass_g)) + * geom_point() ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/first-plot1c-out-1.png" width="504" /> ] --- ## Chunk: Mapping .pull-left[ ```r ggplot(data = penguins, mapping = aes(x = bill_length_mm, y = body_mass_g)) + geom_point() + * geom_smooth(method = "lm") ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/first-plot1d-out-1.png" width="504" /> ] --- ### Chunk: Geom_Point_Position_Colour .pull-left[ ```r *ggplot(data = penguins) ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-2-1.png" width="504" /> ] --- ### Chunk: Geom_Point_Position_Colour .pull-left[ ```r ggplot(data = penguins, * aes(x = bill_length_mm, * y = body_mass_g, * color = island)) ``` We can leave out the "mapping" word and just use **aes** . Why is there no plot? 🤔 💭 Right !! We have not used a `geom` command yet!! ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-3-1.png" width="504" /> ] --- ### Chunk: Geom_Point_Position_Colour .pull-left[ ```r ggplot(data = penguins, aes(x = bill_length_mm, y = body_mass_g, color = island)) + *geom_point() + * ggtitle("A point geom with position, color aesthetics") ``` Note that the points are located by **position** coordinates on both x and y axis, and **coloured** by the island variable. ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-4-1.png" width="504" /> ] --- ### Chunk: Geom_Point_Position_Colour .pull-left[ ```r ggplot(data = penguins, aes(x = bill_length_mm, y = body_mass_g, color = island)) + *geom_point(size = 4) + * ggtitle("A point geom with position color and size aesthetics") ``` Note that the points are located by **position** coordinates on both x and y axis, and **coloured** by the island variable. And we've fixed size = 4! ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-5-1.png" width="504" /> ] --- ## Alpha .pull-left[ ```r diamonds %>% # Sample some 20% of the data slice_sample(prop = 0.2) %>% ggplot(.) + * geom_point(aes(x = carat, * y = price)) ``` Are the points all overlapping? Can we see them better? ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-6-1.png" width="504" /> ] --- ## Alpha .pull-left[ ```r diamonds %>% # Sample some 20% of the data slice_sample(prop = 0.2) %>% ggplot(.) + * geom_point(aes(x = carat, y = price), * # alpha outside the aes() !!! * alpha = 0.2) + labs(title = "Points plotted with Alpha") ``` Are the points all overlapping? Can we see them better? ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-7-1.png" width="90%" /> ] --- ## Chunk: Box Plot .pull-left[ ```r ggplot(diamonds) + * geom_boxplot(aes(x = cut, y = price)) + labs(title = "Box Plot") ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-8-1.png" width="90%" /> ] --- ## Chunk: Box Plot .pull-left[ ```r ggplot(diamonds) + * geom_boxplot(aes(x = cut, * y = price, * fill = cut)) + labs(title = "Box Plot") ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-9-1.png" width="90%" /> ] --- ## Chunk: Geom_Bar_1 .pull-left[ ```r *ggplot(data = penguins) ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-10-1.png" width="90%" /> ] --- ## Chunk: Geom_Bar_1 .pull-left[ ```r ggplot(data = penguins) + * aes(x = species) ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-11-1.png" width="90%" /> ] --- ## Chunk: Geom_Bar_1 .pull-left[ ```r ggplot(data = penguins) + aes(x = species) + * geom_bar() + * ggtitle("A bar geom with position and height aesthetics") ``` The bars are plotted with **positions** on the x-axis, defined by the `species` variable, and **heights** mapped to the y-axis. How did the graph "know" the heights of the bars? `geom_bar` has an internal `count` statistic computation. Many `geom_s` have internal computation that are accessible to programmers. ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-12-1.png" width="90%" /> ] --- ## Geom_Bar_Position_Stack_and_Dodge .pull-left[ When using more than a pair of variables with a bar chart, we have a few more **position** options: ```r ggplot(penguins, * aes(x = species, * fill = island)) ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-13-1.png" width="90%" /> ] --- ## Geom_Bar_Position_Stack_and_Dodge .pull-left[ When using more than a pair of variables with a bar chart, we have a few more **position** options: ```r ggplot(penguins, aes(x = species, fill = island)) + * geom_bar() + * ggtitle(label = "A stacked bar chart") ``` The bars are coloured by the `island` variable and are **stacked** in **position**. ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-14-1.png" width="90%" /> ] --- ## Geom_Bar_Position_Stack_and_Dodge .pull-left[ And here we use the `dodge` option: ```r ggplot(penguins, aes(x = species, fill = island)) + * geom_bar(position ="dodge") + * ggtitle(label = * "A dodged bar chart") ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-15-1.png" width="90%" /> ] --- ## Facetting .pull-left[ ```r *ggplot(penguins) ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-16-1.png" width="90%" /> ] --- ## Facetting .pull-left[ ```r ggplot(penguins) + * aes(x = flipper_length_mm, * y = body_mass_g) ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-17-1.png" width="90%" /> ] --- ## Facetting .pull-left[ ```r ggplot(penguins) + aes(x = flipper_length_mm, y = body_mass_g) + * geom_point() ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-18-1.png" width="90%" /> ] --- ## Facetting .pull-left[ ```r ggplot(penguins) + aes(x = flipper_length_mm, y = body_mass_g) + geom_point() + * facet_wrap(~island) + * ggtitle("A point geom graph with facets") ``` The graph has **split** into multiples, based on the **number** of islands. ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-19-1.png" width="90%" /> ] --- ## Still more Facetting .pull-left[ ```r ggplot(penguins) + aes(x = flipper_length_mm, y = body_mass_g) + * geom_point() ``` What if we have even more "factor" variables? We have `island` and `species`...can we split further? ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-20-1.png" width="90%" /> ] --- ## Still more Facetting .pull-left[ ```r ggplot(penguins) + aes(x = flipper_length_mm, y = body_mass_g) + geom_point() + * facet_grid(species~island) + * ggtitle("A point geom graph with grid facets") ``` The graph has **split** into multiples, based on the **number** of islands **and** the number of species. ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-21-1.png" width="90%" /> ] --- class:middle, center, inverse ## And shall we look briefly at colour? --- ## Finally...Colour !! ( Just a bit ) .pull-left[ ```r diamonds %>% slice_sample(prop = 0.2) %>% ggplot(.) + * geom_point(aes(x = carat, y = price)) ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-22-1.png" width="90%" /> ] --- ## Finally...Colour !! ( Just a bit ) .pull-left[ ```r diamonds %>% slice_sample(prop = 0.2) %>% ggplot(.) + geom_point(aes(x = carat, y = price, colour = cut), size = 3) + * scale_colour_brewer(palette = "Set3") + labs(title = "Brewer Colour Pallette (Set3)") ``` We are using the `RColorBrewer` package here. Type `RColorBrewer::display.brewer.all()` in your Console and see what palettes are available. ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-23-1.png" width="90%" /> ] --- ## Chunk: Colour !! ( Just a bit ) .pull-left[ ```r diamonds %>% slice_sample(prop = 0.2) %>% ggplot(.) + geom_point(aes(x = carat, y = price, colour = cut), size = 3) + * scale_colour_viridis_d() + labs(title = "Viridis Palette", subtitle = "The Default in ggplot") ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-24-1.png" width="504" /> ] --- ## Chunk: Colour !! ( Just a bit ) .pull-left[ ```r diamonds %>% slice_sample(prop = 0.2) %>% ggplot(.) + geom_point(aes(x = carat, y = price, colour = cut), size = 3) + * scale_colour_viridis_d(option = "magma") + * labs(title = "Viridis Palette, Option Magma") ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-25-1.png" width="504" /> ] --- ## Chunk: Colour !! ( Just a bit ) .pull-left[ ```r diamonds %>% slice_sample(prop = 0.2) %>% ggplot(.) + geom_point(aes(x = carat, y = price, colour = cut), size = 3) + * scale_colour_viridis_d(option = "inferno") + labs(title = "Viridis Palette, Option Inferno") ``` ] .pull-right[ <img src="04-Metaphors-with-Graphics_files/figure-html/unnamed-chunk-26-1.png" width="504" /> ] --- ## Conclusion - `ggplot` takes a dataframe/tibble as the data argument - The `aes`-thetic arguments can be `x`, `y`, `colour`, `shape`, `alpha` for example... - The `geom_*()` commands specify the kind of plot - Together, the `ggplot` package offers a **Grammar** of near-English commands which allow us to plot data in various ways. --- # References 1. [Wickham, Hadley. (2010) "A Layered Grammar of Graphics". *Journal of Computational and Graphical Statistics*, 19(1).](http://www.jstor.org.proxy.uchicago.edu/stable/25651297) 2. [Wilkinson, Leland. (2005). *The Grammar of Graphics*. (UChicago authentication required)](http://link.springer.com.proxy.uchicago.edu/book/10.1007%2F0-387-28695-0)