| @@ -17,7 +17,12 @@ knitr::opts_chunk$set( | |||
| [gganimate]: https://github.com/thomasp85/gganimate#README | |||
| [dplyr-two-table]: https://dplyr.tidyverse.org/articles/two-table.html | |||
| [r4ds-set-ops]: http://r4ds.had.co.nz/relation-data.html#set-operations | |||
| [r4ds]: http://r4ds.had.co.nz/ | |||
| [r4ds-relational]: http://r4ds.had.co.nz/relational-data.html | |||
| [r4ds-set-ops]: http://r4ds.had.co.nz/relational-data.html#set-operations | |||
| [r4ds-tidy-data]: http://r4ds.had.co.nz/tidy-data.html#tidy-data-1 | |||
| [tidyverse]: https://tidyverse.org | |||
| [tidyr]: https://tidyr.tidyverse.org | |||
| # Tidy Animated Verbs | |||
| @@ -53,8 +58,8 @@ Currently, the animations cover the [dplyr two-table verbs][dplyr-two-table] and | |||
| ### Relational Data | |||
| The [Relational Data](http://r4ds.had.co.nz/relation-data.html) chapter of the | |||
| [R for Data Science](http://r4ds.had.co.nz/) book by Garrett Grolemund and Hadley Wickham | |||
| The [Relational Data][r4ds-relational] chapter of the | |||
| [R for Data Science][r4ds] book by Garrett Grolemund and Hadley Wickham | |||
| is an excellent resource for learning more about relational data. | |||
| The [dplyr two-table verbs vignette][dplyr-two-table] | |||
| @@ -70,6 +75,9 @@ The [package readme][gganimate] provides an excellent (and quick) introduction t | |||
| ## Mutating Joins | |||
| > A mutating join allows you to combine variables from two tables. It first matches observations by their keys, then copies across variables from one table to the other. | |||
| > [R for Data Science: Mutating joins](http://r4ds.had.co.nz/relational-data.html#mutating-joins) | |||
| ```{r intial-dfs} | |||
| source("R/00_base_join.R") | |||
| df_names <- data_frame( | |||
| @@ -165,6 +173,11 @@ full_join(x, y, by = "id") | |||
| ## Filtering Joins | |||
| > Filtering joins match observations in the same way as mutating joins, but affect the observations, not the variables. | |||
| > ... Semi-joins are useful for matching filtered summary tables back to the original rows. | |||
| > ... Anti-joins are useful for diagnosing join mismatches. | |||
| > [R for Data Science: Filtering Joins](http://r4ds.had.co.nz/relational-data.html#filtering-joins) | |||
| ### Semi Join | |||
| > All rows from `x` where there are matching values in `y`, keeping just columns from `x`. | |||
| @@ -195,6 +208,11 @@ anti_join(x, y, by = "id") | |||
| ## Set Operations | |||
| > Set operations are occasionally useful when you want to break a single complex filter into simpler pieces. | |||
| > All these operations work with a complete row, comparing the values of every variable. | |||
| > These expect the x and y inputs to have the same variables, and treat the observations like sets. | |||
| > [R for Data Science: Set operations](http://r4ds.had.co.nz/relational-data.html#set-operations) | |||
| ```{r intial-dfs-so} | |||
| source("R/00_base_set.R") | |||
| df_names <- data_frame( | |||
| @@ -298,6 +316,14 @@ setdiff(y, x) | |||
| ## Tidy Data | |||
| [Tidy data][r4ds-tidy-data] follows the following three rules: | |||
| 1. Each variable has its own column. | |||
| 1. Each observation has its own row. | |||
| 1. Each value has its own cell. | |||
| Many of the tools in the [tidyverse] expect data to be formatted as a tidy dataset and the [tidyr] package provides functions to help you organize your data into tidy data. | |||
| ```{r tidyr-wide-long} | |||
| source("R/tidyr_spread_gather.R") | |||
| @@ -327,11 +353,13 @@ long | |||
| `spread(data, key, value)` | |||
| > Spread a key-value pair across multiple columns. | |||
| > Spread a key-value pair across multiple columns. | |||
| > Use it when an a column contains observations from multiple variables. | |||
| `gather(data, key = "key", value = "value", ...)` | |||
| > Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. You use `gather()` when you notice that you have columns that are not variables. | |||
| > Gather takes multiple columns and collapses into key-value pairs, duplicating all other columns as needed. | |||
| > You use `gather()` when you notice that your column names are not names of variables, but *values* of a variable. | |||
|  | |||
| @@ -50,10 +50,10 @@ welcome\!](https://github.com/gadenbuie/tidy-animated-verbs/issues) | |||
| ### Relational Data | |||
| The [Relational Data](http://r4ds.had.co.nz/relation-data.html) chapter | |||
| of the [R for Data Science](http://r4ds.had.co.nz/) book by Garrett | |||
| Grolemund and Hadley Wickham is an excellent resource for learning more | |||
| about relational data. | |||
| The [Relational Data](http://r4ds.had.co.nz/relational-data.html) | |||
| chapter of the [R for Data Science](http://r4ds.had.co.nz/) book by | |||
| Garrett Grolemund and Hadley Wickham is an excellent resource for | |||
| learning more about relational data. | |||
| The [dplyr two-table verbs | |||
| vignette](https://dplyr.tidyverse.org/articles/two-table.html) and Jenny | |||
| @@ -72,6 +72,12 @@ excellent (and quick) introduction to gganimte. | |||
| ## Mutating Joins | |||
| > A mutating join allows you to combine variables from two tables. It | |||
| > first matches observations by their keys, then copies across variables | |||
| > from one table to the other. | |||
| > [R for Data Science: Mutating | |||
| > joins](http://r4ds.had.co.nz/relational-data.html#mutating-joins) | |||
| <img src="images/static/png/original-dfs.png" width="480px" /> | |||
| ``` r | |||
| @@ -187,6 +193,13 @@ full_join(x, y, by = "id") | |||
| ## Filtering Joins | |||
| > Filtering joins match observations in the same way as mutating joins, | |||
| > but affect the observations, not the variables. … Semi-joins are | |||
| > useful for matching filtered summary tables back to the original rows. | |||
| > … Anti-joins are useful for diagnosing join mismatches. | |||
| > [R for Data Science: Filtering | |||
| > Joins](http://r4ds.had.co.nz/relational-data.html#filtering-joins) | |||
| ### Semi Join | |||
| > All rows from `x` where there are matching values in `y`, keeping just | |||
| @@ -220,6 +233,14 @@ anti_join(x, y, by = "id") | |||
| ## Set Operations | |||
| > Set operations are occasionally useful when you want to break a single | |||
| > complex filter into simpler pieces. All these operations work with a | |||
| > complete row, comparing the values of every variable. These expect the | |||
| > x and y inputs to have the same variables, and treat the observations | |||
| > like sets. | |||
| > [R for Data Science: Set | |||
| > operations](http://r4ds.had.co.nz/relational-data.html#set-operations) | |||
| <img src="images/static/png/original-dfs-set-ops.png" width="480px" /> | |||
| ``` r | |||
| @@ -328,6 +349,18 @@ setdiff(y, x) | |||
| ## Tidy Data | |||
| [Tidy data](http://r4ds.had.co.nz/tidy-data.html#tidy-data-1) follows | |||
| the following three rules: | |||
| 1. Each variable has its own column. | |||
| 2. Each observation has its own row. | |||
| 3. Each value has its own cell. | |||
| Many of the tools in the [tidyverse](https://tidyverse.org) expect data | |||
| to be formatted as a tidy dataset and the | |||
| [tidyr](https://tidyr.tidyverse.org) package provides functions to help | |||
| you organize your data into tidy data. | |||
|  | |||
| ``` r | |||
| @@ -353,13 +386,15 @@ long | |||
| `spread(data, key, value)` | |||
| > Spread a key-value pair across multiple columns. | |||
| > Spread a key-value pair across multiple columns. Use it when an a | |||
| > column contains observations from multiple variables. | |||
| `gather(data, key = "key", value = "value", ...)` | |||
| > Gather takes multiple columns and collapses into key-value pairs, | |||
| > duplicating all other columns as needed. You use `gather()` when you | |||
| > notice that you have columns that are not variables. | |||
| > notice that your column names are not names of variables, but *values* | |||
| > of a variable. | |||
|  | |||