--- output: github_document editor_options: chunk_output_type: console --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", echo = TRUE, warning = FALSE, message = FALSE, fig.path = "man/figures/tidyexplain-", cache = TRUE ) library(tidyexplain) set_font_size(11, 26) ``` [gganimate]: https://github.com/thomasp85/gganimate#README [dplyr-two-table]: https://dplyr.tidyverse.org/articles/two-table.html [r4ds-set-ops]: http://r4ds.had.co.nz/relation-data.html#set-operations # Tidy Animated Verbs Garrick Aden-Buie -- [@grrrck](https://twitter.com/grrrck) -- [garrickadenbuie.com](https://www.garrickadenbuie.com). David Zimmermann -- [@dav_zim](https://twitter.com/dav_zim) -- [datashenanigan.wordpress.com](https://datashenanigan.wordpress.com/) Set operations contributed by [Tyler Grant Smith](https://github.com/TylerGrantSmith). [![Binder](http://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/gadenbuie/tidy-animated-verbs/master?urlpath=rstudio) [![CC0](https://img.shields.io/badge/license_(images)_-CC0-green.svg)](https://creativecommons.org/publicdomain/zero/1.0/) [![MIT](https://img.shields.io/badge/license_(code)_-MIT-green.svg)](https://opensource.org/licenses/MIT) - Mutating Joins: [`inner_join()`](#inner-join), [`left_join()`](#left-join), [`right_join()`](#right-join), [`full_join()`](#full-join) - Filtering Joins: [`semi_join()`](#semi-join), [`anti_join()`](#anti-join) - Set Operations: [`union()`](#union), [`union_all()`](#union-all), [`intersect()`](#intersect), [`setdiff()`](#setdiff) - Tidyr Operations: [`gather()`](#gather), [`spread()`](#spread) - Learn more about - [Relational Data](#relational-data) - [gganimate](#gganimate) Please feel free to use these images for teaching or learning about action verbs from the [tidyverse](https://tidyverse.org). You can directly download the [original animations](images/) or static images in [svg](images/static/svg/) or [png](images/static/png/) formats, or you can use the [scripts](R/) to recreate the images locally. Currently, the animations cover the [dplyr two-table verbs][dplyr-two-table] and I'd like to expand the animations to include more verbs from the tidyverse. [Suggestions are welcome!](https://github.com/gadenbuie/tidy-animated-verbs/issues) ## Installing The in-development version of `tidyexplain` can be installed with `devtools`: ```r # install.package("devtools") devtools::install_github("gadenbuie/tidy-animated-verbs") library(tidyexplain) ``` ## Mutating Joins ```{r intial-dfs} x <- dplyr::data_frame( id = 1:3, x = paste0("x", 1:3) ) y <- dplyr::data_frame( id = (1:4)[-3], y = paste0("y", (1:4)[-3]) ) animate_full_join(x, y, by = c("id"), export = "first") ``` ```{r} x y ``` ### Inner Join > All rows from `x` where there are matching values in `y`, and all columns from `x` and `y`. ```{r inner-join} animate_inner_join(x, y, by = "id") ``` ```{r} dplyr::inner_join(x, y, by = "id") ``` ### Left Join > All rows from `x`, and all columns from `x` and `y`. Rows in `x` with no match in `y` will have `NA` values in the new columns. ```{r left-join} animate_left_join(x, y, by = "id") ``` ```{r} dplyr::left_join(x, y, by = "id") ``` ### Left Join (Extra Rows in y) > ... If there are multiple matches between `x` and `y`, all combinations of the matches are returned. ```{r left-join-extra} y_extra <- dplyr::bind_rows(y, dplyr::data_frame(id = 2, y = "y5")) y_extra # has multiple rows with the key from `x` animate_left_join(x, y_extra, by = "id", title_size = 22) ``` ```{r} dplyr::left_join(x, y_extra, by = "id") ``` ### Right Join > All rows from y, and all columns from `x` and `y`. Rows in `y` with no match in `x` will have `NA` values in the new columns. ```{r right-join} animate_right_join(x, y, by = "id") ``` ```{r} dplyr::right_join(x, y, by = "id") ``` ### Full Join > All rows and all columns from both `x` and `y`. Where there are not matching values, returns `NA` for the one missing. ```{r full-join} animate_full_join(x, y, by = "id") ``` ```{r} dplyr::full_join(x, y, by = "id") ``` ## Filtering Joins ### Semi Join > All rows from `x` where there are matching values in `y`, keeping just columns from `x`. ```{r semi-join} animate_semi_join(x, y, by = "id") ``` ```{r} dplyr::semi_join(x, y, by = "id") ``` ### Anti Join > All rows from `x` where there are not matching values in `y`, keeping just columns from `x`. ```{r anti-join} animate_anti_join(x, y, by = "id") ``` ```{r} dplyr::anti_join(x, y, by = "id") ``` ## Set Operations ```{r intial-dfs-so} x <- dplyr::data_frame( x = c(1, 1, 2), y = c("a", "b", "a") ) y <- dplyr::data_frame( x = c(1, 2), y = c("a", "b") ) animate_union(x, y, export = "first") ``` ```{r} x y ``` ### Union > All unique rows from `x` and `y`. ```{r union} animate_union(x, y) ``` ```{r} dplyr::union(x, y) ``` ```{r union-y-x} animate_union(y, x) dplyr::union(y, x) ``` ### Union All > All rows from `x` and `y`, keeping duplicates. ```{r union-all} animate_union_all(x, y) ``` ```{r} dplyr::union_all(x, y) ``` ### Intersection > Common rows in both `x` and `y`, keeping just unique rows. ```{r intersect} animate_intersect(x, y) ``` ```{r} dplyr::intersect(x, y) ``` ### Set Difference > All rows from `x` which are not also rows in `y`, keeping just unique rows. ```{r setdiff} animate_setdiff(x, y) ``` ```{r} dplyr::setdiff(x, y) ``` ```{r setdiff-y-x} animate_setdiff(y, x) dplyr::setdiff(y, x) ``` ## Tidy Data and `gather()`, `spread()` functionality [Tidy data](http://r4ds.had.co.nz/tidy-data.html#tidy-data-1) follows the following three rules: 1. Each variable has its own column. 2. Each observation has its own row. 3. Each value has its own cell. Many of the tools in the [tidyverse](https://tidyverse.org) expect data to be formatted as a tidy dataset and the [tidyr](https://tidyr.tidyverse.org) package provides functions to help you organize your data into tidy data. ```{r} long <- dplyr::data_frame( year = c(2010, 2011, 2010, 2011, 2010, 2011), person = c("Alice", "Alice", "Bob", "Bob", "Charlie", "Charlie"), sales = c(105, 110, 100, 97, 90, 95) ) wide <- dplyr::data_frame( year = 2010:2011, Alice = c(105, 110), Bob = c(100, 97), Charlie = c(90, 95) ) ``` ### Gather > Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use gather() when you notice that your column names are not names of variables, but values of a variable. ```{r gather} set_font_size(4, 15) set_anim_options(anim_options(cell_width = 2)) animate_gather(wide, key = "person", value = "sales", -year) ``` ```{r} tidyr::gather(wide, key = "person", value = "sales", -year) ``` ### Spread > Spread a key-value pair across multiple columns. Use it when an a column contains observations from multiple variables. ```{r spread} animate_spread(long, key = "person", value = "sales") ``` ```{r} tidyr::spread(long, key = "person", value = "sales") ``` ## Learn More ### Relational Data The [Relational Data](http://r4ds.had.co.nz/relation-data.html) chapter of the [R for Data Science](http://r4ds.had.co.nz/) book by Garrett Grolemund and Hadley Wickham is an excellent resource for learning more about relational data. The [dplyr two-table verbs vignette][dplyr-two-table] and Jenny Bryan's [Cheatsheet for dplyr join functions](http://stat545.com/bit001_dplyr-cheatsheet.html) are also great resources. ### gganimate The animations were made possible by the newly re-written [gganimate] package by [Thomas Lin Pedersen](https://github.com/thomasp85) (original by [Dave Robinson](https://github.com/dgrtwo)). The [package readme][gganimate] provides an excellent (and quick) introduction to gganimte.