---
output: github_document
editor_options:
chunk_output_type: console
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
echo = TRUE,
warning = FALSE,
message = FALSE,
fig.path = "man/figures/tidyexplain-",
cache = TRUE
)
library(tidyexplain)
set_font_size(11, 26)
```
[gganimate]: https://github.com/thomasp85/gganimate#README
[dplyr-two-table]: https://dplyr.tidyverse.org/articles/two-table.html
[r4ds]: http://r4ds.had.co.nz/
[r4ds-relational]: http://r4ds.had.co.nz/relational-data.html
[r4ds-set-ops]: http://r4ds.had.co.nz/relational-data.html#set-operations
[r4ds-tidy-data]: http://r4ds.had.co.nz/tidy-data.html#tidy-data-1
[tidyverse]: https://tidyverse.org
[tidyr]: https://tidyr.tidyverse.org
# Tidy Animated Verbs
Garrick Aden-Buie -- [@grrrck](https://twitter.com/grrrck) -- [garrickadenbuie.com](https://www.garrickadenbuie.com).
David Zimmermann -- [@dav_zim](https://twitter.com/dav_zim) -- [datashenanigan.wordpress.com](https://datashenanigan.wordpress.com/)
Set operations contributed by [Tyler Grant Smith](https://github.com/TylerGrantSmith).
[](https://mybinder.org/v2/gh/gadenbuie/tidy-animated-verbs/master?urlpath=rstudio)
[_-CC0-green.svg)](https://creativecommons.org/publicdomain/zero/1.0/)
[_-MIT-green.svg)](https://opensource.org/licenses/MIT)
- Mutating Joins: [`inner_join()`](#inner-join), [`left_join()`](#left-join),
[`right_join()`](#right-join), [`full_join()`](#full-join)
- Filtering Joins: [`semi_join()`](#semi-join), [`anti_join()`](#anti-join)
- Set Operations: [`union()`](#union), [`union_all()`](#union-all), [`intersect()`](#intersect), [`setdiff()`](#setdiff)
- Tidyr Operations: [`gather()`](#gather), [`spread()`](#spread)
- Learn more about
- [Using the animations and images](#usage)
- [Relational Data](#relational-data)
- [gganimate](#gganimate)
Please feel free to use these images for teaching or learning about action verbs from the [tidyverse](https://tidyverse.org).
You can directly download the [original animations](images/) or static images in [svg](images/static/svg/) or [png](images/static/png/) formats, or you can use the [scripts](R/) to recreate the images locally.
Currently, the animations cover the [dplyr two-table verbs][dplyr-two-table] and I'd like to expand the animations to include more verbs from the tidyverse.
[Suggestions are welcome!](https://github.com/gadenbuie/tidy-animated-verbs/issues)
## Installing
The in-development version of `tidyexplain` can be installed with `devtools`:
```r
# install.package("devtools")
devtools::install_github("gadenbuie/tidy-animated-verbs")
library(tidyexplain)
```
## Mutating Joins
> A mutating join allows you to combine variables from two tables. It first matches observations by their keys, then copies across variables from one table to the other.
> [R for Data Science: Mutating joins](http://r4ds.had.co.nz/relational-data.html#mutating-joins)
```{r intial-dfs}
x <- dplyr::data_frame(
id = 1:3,
x = paste0("x", 1:3)
)
y <- dplyr::data_frame(
id = (1:4)[-3],
y = paste0("y", (1:4)[-3])
)
animate_full_join(x, y, by = c("id"), export = "first")
```
```{r}
x
y
```
### Inner Join
> All rows from `x` where there are matching values in `y`, and all columns from `x` and `y`.
```{r inner-join}
animate_inner_join(x, y, by = "id")
```
```{r}
dplyr::inner_join(x, y, by = "id")
```
### Left Join
> All rows from `x`, and all columns from `x` and `y`. Rows in `x` with no match in `y` will have `NA` values in the new columns.
```{r left-join}
animate_left_join(x, y, by = "id")
```
```{r}
dplyr::left_join(x, y, by = "id")
```
### Left Join (Extra Rows in y)
> ... If there are multiple matches between `x` and `y`, all combinations of the matches are returned.
```{r left-join-extra}
y_extra <- dplyr::bind_rows(y, dplyr::data_frame(id = 2, y = "y5"))
y_extra # has multiple rows with the key from `x`
animate_left_join(x, y_extra, by = "id",
anim_opts = anim_options(title_size = 22))
```
```{r}
dplyr::left_join(x, y_extra, by = "id")
```
### Right Join
> All rows from y, and all columns from `x` and `y`. Rows in `y` with no match in `x` will have `NA` values in the new columns.
```{r right-join}
animate_right_join(x, y, by = "id")
```
```{r}
dplyr::right_join(x, y, by = "id")
```
### Full Join
> All rows and all columns from both `x` and `y`. Where there are not matching values, returns `NA` for the one missing.
```{r full-join}
animate_full_join(x, y, by = "id")
```
```{r}
dplyr::full_join(x, y, by = "id")
```
## Filtering Joins
> Filtering joins match observations in the same way as mutating joins, but affect the observations, not the variables.
> ... Semi-joins are useful for matching filtered summary tables back to the original rows.
> ... Anti-joins are useful for diagnosing join mismatches.
> [R for Data Science: Filtering Joins](http://r4ds.had.co.nz/relational-data.html#filtering-joins)
### Semi Join
> All rows from `x` where there are matching values in `y`, keeping just columns from `x`.
```{r semi-join}
animate_semi_join(x, y, by = "id")
```
```{r}
dplyr::semi_join(x, y, by = "id")
```
### Anti Join
> All rows from `x` where there are not matching values in `y`, keeping just columns from `x`.
```{r anti-join}
animate_anti_join(x, y, by = "id")
```
```{r}
dplyr::anti_join(x, y, by = "id")
```
## Set Operations
> Set operations are occasionally useful when you want to break a single complex filter into simpler pieces.
> All these operations work with a complete row, comparing the values of every variable.
> These expect the x and y inputs to have the same variables, and treat the observations like sets.
> [R for Data Science: Set operations](http://r4ds.had.co.nz/relational-data.html#set-operations)
```{r intial-dfs-so}
x <- dplyr::data_frame(
x = c(1, 1, 2),
y = c("a", "b", "a")
)
y <- dplyr::data_frame(
x = c(1, 2),
y = c("a", "b")
)
animate_union(x, y, export = "first")
```
```{r}
x
y
```
### Union
> All unique rows from `x` and `y`.
```{r union}
animate_union(x, y)
```
```{r}
dplyr::union(x, y)
```
```{r union-y-x}
animate_union(y, x)
dplyr::union(y, x)
```
### Union All
> All rows from `x` and `y`, keeping duplicates.
```{r union-all}
animate_union_all(x, y)
```
```{r}
dplyr::union_all(x, y)
```
### Intersection
> Common rows in both `x` and `y`, keeping just unique rows.
```{r intersect}
animate_intersect(x, y)
```
```{r}
dplyr::intersect(x, y)
```
### Set Difference
> All rows from `x` which are not also rows in `y`, keeping just unique rows.
```{r setdiff}
animate_setdiff(x, y)
```
```{r}
dplyr::setdiff(x, y)
```
```{r setdiff-y-x}
animate_setdiff(y, x)
dplyr::setdiff(y, x)
```
## Tidy Data and `gather()`, `spread()` functionality
[Tidy data](http://r4ds.had.co.nz/tidy-data.html#tidy-data-1) follows
the following three rules:
1. Each variable has its own column.
2. Each observation has its own row.
3. Each value has its own cell.
Many of the tools in the [tidyverse](https://tidyverse.org) expect data
to be formatted as a tidy dataset and the
[tidyr](https://tidyr.tidyverse.org) package provides functions to help
you organize your data into tidy data.
```{r}
long <- dplyr::data_frame(
year = c(2010, 2011, 2010, 2011, 2010, 2011),
person = c("Alice", "Alice", "Bob", "Bob", "Charlie", "Charlie"),
sales = c(105, 110, 100, 97, 90, 95)
)
wide <- dplyr::data_frame(
year = 2010:2011,
Alice = c(105, 110),
Bob = c(100, 97),
Charlie = c(90, 95)
)
```
### Gather
> Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use gather() when you notice that your column names are not names of variables, but values of a variable.
```{r gather}
set_font_size(4.5, 15)
animate_gather(wide, key = "person", value = "sales", -year)
```
```{r}
tidyr::gather(wide, key = "person", value = "sales", -year)
```
### Spread
> Spread a key-value pair across multiple columns. Use it when an a column contains observations from multiple variables.
```{r spread}
animate_spread(long, key = "person", value = "sales")
```
```{r}
tidyr::spread(long, key = "person", value = "sales")
```
## Learn More
[Tidy data][r4ds-tidy-data] follows the following three rules:
1. Each variable has its own column.
1. Each observation has its own row.
1. Each value has its own cell.
Many of the tools in the [tidyverse] expect data to be formatted as a tidy dataset and the [tidyr] package provides functions to help you organize your data into tidy data.
```{r tidyr-wide-long}
source("R/tidyr_spread_gather.R")
tidy_plots <- list()
tidy_plots$wide <- bind_rows(sg_wide, sg_wide_labels)
tidy_plots$long <- bind_rows(sg_long, sg_long_labels)
tidy_plots <- map(tidy_plots, ~ mutate(.,
.text_color = ifelse(grepl("id|key|val", value), "black", "white"),
.text_size = ifelse(grepl("id|key|val", value), 6, 10)
)) %>%
imap(~ plot_data(.x, .y))
tidy_plots$wide <- tidy_plots$wide + ylim(-6.5, 0.5)
save_static_plot(cowplot::plot_grid(plotlist = tidy_plots, axis = "t"), "original-dfs-tidy")
```

```{r echo=TRUE}
wide
long
```
### Spread and Gather
`spread(data, key, value)`
> Spread a key-value pair across multiple columns.
> Use it when an a column contains observations from multiple variables.
`gather(data, key = "key", value = "value", ...)`
> Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed.
> You use `gather()` when you notice that your column names are not names of variables, but *values* of a variable.

```{r echo=TRUE}
gather(wide, key, val, x:z)
spread(long, key, val)
```