Du kan inte välja fler än 25 ämnen Ämnen måste starta med en bokstav eller siffra, kan innehålla bindestreck ('-') och vara max 35 tecken långa.

4.3KB

Tidy Animated Verbs

Garrick Aden-Buie – @grrrckgarrickadenbuie.com

Binder CC0

Mutating Joins

x
#> # A tibble: 3 x 2
#>      id x    
#>   <int> <chr>
#> 1     1 x1   
#> 2     2 x2   
#> 3     3 x3
y
#> # A tibble: 3 x 2
#>      id y    
#>   <int> <chr>
#> 1     1 y1   
#> 2     2 y2   
#> 3     4 y4

Inner Join

All rows from x where there are matching values in y, and all columns from x and y.

inner_join(x, y, by = "id")
#> # A tibble: 2 x 3
#>      id x     y    
#>   <int> <chr> <chr>
#> 1     1 x1    y1   
#> 2     2 x2    y2

Left Join

All rows from x, and all columns from x and y. Rows in x with no match in y will have NA values in the new columns.

left_join(x, y, by = "id")
#> # A tibble: 3 x 3
#>      id x     y    
#>   <int> <chr> <chr>
#> 1     1 x1    y1   
#> 2     2 x2    y2   
#> 3     3 x3    <NA>

Left Join (Extra Rows in y)

… If there are multiple matches between x and y, all combinations of the matches are returned.

y_extra # has multiple rows with the key from `x`
#> # A tibble: 4 x 2
#>      id y    
#>   <dbl> <chr>
#> 1     1 y1   
#> 2     2 y2   
#> 3     4 y4   
#> 4     2 y5
left_join(x, y_extra, by = "id")
#> # A tibble: 4 x 3
#>      id x     y    
#>   <dbl> <chr> <chr>
#> 1     1 x1    y1   
#> 2     2 x2    y2   
#> 3     2 x2    y5   
#> 4     3 x3    <NA>

Right Join

All rows from y, and all columns from x and y. Rows in y with no match in x will have NA values in the new columns.

right_join(x, y, by = "id")
#> # A tibble: 3 x 3
#>      id x     y    
#>   <int> <chr> <chr>
#> 1     1 x1    y1   
#> 2     2 x2    y2   
#> 3     4 <NA>  y4

Full Join

All rows and all columns from both x and y. Where there are not matching values, returns NA for the one missing.

full_join(x, y, by = "id")
#> # A tibble: 4 x 3
#>      id x     y    
#>   <int> <chr> <chr>
#> 1     1 x1    y1   
#> 2     2 x2    y2   
#> 3     3 x3    <NA> 
#> 4     4 <NA>  y4

Filtering Joins

Semi Join

All rows from x where there are matching values in y, keeping just columns from x.

semi_join(x, y, by = "id")
#> # A tibble: 2 x 2
#>      id x    
#>   <int> <chr>
#> 1     1 x1   
#> 2     2 x2

Anti Join

All rows from x where there are not matching values in y, keeping just columns from x.

anti_join(x, y, by = "id")
#> # A tibble: 1 x 2
#>      id x    
#>   <int> <chr>
#> 1     3 x3

Learn More

Relational Data

The Relational Data chapter of the R for Data Science book by Garrett Grolemund and Hadley Wickham is an excellent resource for learning more about relational data.

The dplyr two-table verbs vignette and Jenny Bryan’s Cheatsheet for dplyr join functions are also great resources.

gganimate

The animations were made possible by the newly re-written gganimate package by Thomas Lin Pedersen (original by Dave Robinson). The package readme provides an excellent (and quick) introduction to gganimte.