Quellcode durchsuchen

Update readme

- cite rtweet!
- general formatting
- gathertweet timeline
master
Garrick Aden-Buie vor 7 Jahren
Ursprung
Commit
b81867358d
2 geänderte Dateien mit 127 neuen und 31 gelöschten Zeilen
  1. +26
    -3
      README.Rmd
  2. +101
    -28
      README.md

+ 26
- 3
README.Rmd Datei anzeigen

```{r setup, include = FALSE} ```{r setup, include = FALSE}
knitr::opts_chunk$set( knitr::opts_chunk$set(
collapse = TRUE, collapse = TRUE,
cache = TRUE,
comment = "", comment = "",
prompt = TRUE, prompt = TRUE,
fig.path = "man/figures/README-", fig.path = "man/figures/README-",
``` ```


[rtweet]: https://rtweet.info [rtweet]: https://rtweet.info
[cron]: https://en.wikipedia.org/wiki/Cron


# gathertweet
<h1 style="font-weight: normal;">gathe<strong>rtweet</strong></h1>


The goal of gathertweet is to provide a simple command line utility that wraps key functions from [rtweet]. The goal of gathertweet is to provide a simple command line utility that wraps key functions from [rtweet].


The magic of **gathertweet** is that it grants you the power to **quickly set up twitter monitoring and tweet gathering** while saving you from the pain of **writing a bunch of boilerplate code to save new tweets without losing previously collected tweets, join multiple searches, update tweet stats, simplify stored tweets, and more**.
The magic of **gathertweet** is that it grants you the power to **quickly set up twitter monitoring and tweet gathering** while saving you from the pain of **writing a bunch of boilerplate code** to

- save new tweets without losing previously collected tweets,
- join multiple searches,
- update tweet stats,
- simplify stored tweets,
- schedule easily with [cron],
- and more...

gathe**rtweet** is a thin wrapper around [rtweet], the excellent R interface to Twitter written by [Mike Kearney](https://mikewk.com/).
If you use gathertweet, please ensure that you [cite rtweet directly](https://rtweet.info/authors.html).

```{r}
citation("rtweet")
```


## Installation ## Installation


``` ```


which adds `gathertweet` to `/usr/local/bin` as a symlink (you can adjust where this link is created in `install_gathertweet()`). which adds `gathertweet` to `/usr/local/bin` as a symlink (you can adjust where this link is created in `install_gathertweet()`).
If you need admin rights to install, try `sudo Rscript -e "gathertweet::install_gathertweet()` from the command line.
If you need admin rights to install, try
```
sudo Rscript -e "gathertweet::install_gathertweet()"
```
from the command line.



## Example ## Example


### Use gathertweet from the command line

Create a directory to store tweets Create a directory to store tweets


```bash ```bash

+ 101
- 28
README.md Datei anzeigen



<!-- README.md is generated from README.Rmd. Please edit that file --> <!-- README.md is generated from README.Rmd. Please edit that file -->


# gathertweet
<h1 style="font-weight: normal;">

gathe<strong>rtweet</strong>

</h1>


The goal of gathertweet is to provide a simple command line utility that The goal of gathertweet is to provide a simple command line utility that
wraps key functions from [rtweet](https://rtweet.info). wraps key functions from [rtweet](https://rtweet.info).


The magic of **gathertweet** is that it grants you the power to The magic of **gathertweet** is that it grants you the power to
**quickly set up twitter monitoring and tweet gathering** while saving **quickly set up twitter monitoring and tweet gathering** while saving
you from the pain of **writing a bunch of boilerplate code to save new
tweets without losing previously collected tweets, join multiple
searches, update tweet stats, simplify stored tweets, and more**.
you from the pain of **writing a bunch of boilerplate code** to

- save new tweets without losing previously collected tweets,
- join multiple searches,
- update tweet stats,
- simplify stored tweets,
- schedule easily with [cron](https://en.wikipedia.org/wiki/Cron),
- and more…

gathe**rtweet** is a thin wrapper around [rtweet](https://rtweet.info),
the excellent R interface to Twitter written by [Mike
Kearney](https://mikewk.com/). If you use gathertweet, please ensure
that you [cite rtweet directly](https://rtweet.info/authors.html).

``` r
> citation("rtweet")

To cite rtweet use:

Kearney, M. W. (2018). rtweet: Collecting Twitter Data. R
package version 0.6.7 Retrieved from
https://cran.r-project.org/package=rtweet

A BibTeX entry for LaTeX users is

@Manual{rtweet-package,
title = {rtweet: Collecting Twitter Data},
author = {Michael W. Kearney},
year = {2018},
note = {R package version 0.6.7},
url = {https://cran.r-project.org/package=rtweet},
}
```


## Installation ## Installation




which adds `gathertweet` to `/usr/local/bin` as a symlink (you can which adds `gathertweet` to `/usr/local/bin` as a symlink (you can
adjust where this link is created in `install_gathertweet()`). If you adjust where this link is created in `install_gathertweet()`). If you
need admin rights to install, try `sudo Rscript -e
"gathertweet::install_gathertweet()` from the command line.
need admin rights to install, try

sudo Rscript -e "gathertweet::install_gathertweet()"

from the command line.


## Example ## Example


### Use gathertweet from the command line

Create a directory to store tweets Create a directory to store tweets


``` bash ``` bash


``` bash ``` bash
> gathertweet search --n 100 --quiet "#rstats" > gathertweet search --n 100 --quiet "#rstats"
[2019-01-29 21:54:37] [INFO] ---- gathertweet search start ----
[2019-01-29 21:54:37] [INFO] Searching for "#rstats"
[2019-01-29 21:54:37] [INFO] Gathered 100 tweets
[2019-01-29 21:54:38] [INFO] Total of 100 tweets in tweets.rds
[2019-01-29 21:54:38] [INFO] ---- gathertweet search complete ----
[2019-05-04 14:52:15] [INFO] ---- gathertweet search start ----
[2019-05-04 14:52:15] [INFO] Searching for "#rstats"
[2019-05-04 14:52:16] [INFO] Gathered 100 tweets
[2019-05-04 14:52:16] [INFO] Total of 100 tweets in tweets.rds
[2019-05-04 14:52:16] [INFO] ---- gathertweet search complete ----
``` ```


Get more tweets, automatically starting from end of the last search Get more tweets, automatically starting from end of the last search


``` bash ``` bash
> gathertweet search --n 100 --quiet "#rstats" > gathertweet search --n 100 --quiet "#rstats"
[2019-01-29 21:55:39] [INFO] ---- gathertweet search start ----
[2019-01-29 21:55:39] [INFO] Searching for "#rstats"
[2019-01-29 21:55:39] [INFO] Tweets from 1090438050835038208
[2019-01-29 21:55:39] [INFO] Gathered 1 tweets
[2019-01-29 21:55:39] [INFO] Total of 100 tweets in tweets.rds
[2019-01-29 21:55:39] [INFO] ---- gathertweet search complete ----
[2019-05-04 14:53:17] [INFO] ---- gathertweet search start ----
[2019-05-04 14:53:17] [INFO] Searching for "#rstats"
[2019-05-04 14:53:17] [INFO] Tweets from 1124748486971359232
[2019-05-04 14:53:17] [INFO] Gathered 1 tweets
[2019-05-04 14:53:17] [INFO] Total of 100 tweets in tweets.rds
[2019-05-04 14:53:17] [INFO] ---- gathertweet search complete ----
``` ```


Update the stored data about those \#rstats tweets Update the stored data about those \#rstats tweets


``` bash ``` bash
> gathertweet update > gathertweet update
[2019-01-29 21:55:40] [INFO] ---- gathertweet update start ----
[2019-01-29 21:55:40] [INFO] Updating tweets in tweets.rds
[2019-01-29 21:55:40] [INFO] Getting 100 tweets
[2019-01-29 21:55:41] [INFO] ---- gathertweet update complete ----
[2019-05-04 14:53:18] [INFO] ---- gathertweet update start ----
[2019-05-04 14:53:18] [INFO] Updating tweets in tweets.rds
[2019-05-04 14:53:18] [INFO] Getting 100 tweets
[2019-05-04 14:53:19] [INFO] ---- gathertweet update complete ----
``` ```


``` bash ``` bash
> ls -lh > ls -lh
total 40K total 40K
-rw-rw-r-- 1 garrick garrick 40K Jan 29 21:55 tweets.rds
-rw-rw-r-- 1 garrick garrick 39K May 4 14:53 tweets.rds
```

Gather user timelines

``` bash
> gathertweet timeline hadleywickham jennybryan dataandme
[2019-05-04 21:11:54] [INFO] ---- gathertweet timeline start ----
[2019-05-04 21:11:54] [INFO] Gathering tweets by hadleywickham, jennybryan, dataandme
[2019-05-04 21:12:23] [INFO] Gathered 7368 tweets from 3 users
[2019-05-04 21:12:23] [INFO] Total of 7368 tweets in tweets.rds
[2019-05-04 21:12:23] [INFO] ---- gathertweet timeline complete ----
```

### Schedule tweet gathering using cron

The primary use case of gathertweet is to make it easy to set up
[cron](https://en.wikipedia.org/wiki/Cron) to periodically gather
tweets. Here’s a simple example to download all tweets matching the
search term `rstats OR tidyverse` every night at midnight. The tweets
are stored, by default, in `tweets.rds` in `~/rstats-tweets`.

``` bash
crontab -e

# m h dom mon dow command
0 0 * * * (cd ~/rstats-tweets && ~/bin/gathertweet search --polite 'rstats OR tidyverse' >>gathertweet.log)
``` ```


## Documentation ## Documentation
Usage: Usage:
gathertweet search [--file=<file>] [options] [--] <terms>... gathertweet search [--file=<file>] [options] [--] <terms>...
gathertweet timeline [options] [--] <users>...
gathertweet update [--file=<file> --token=<token> --backup --backup-dir=<dir> --polite --debug-args] gathertweet update [--file=<file> --token=<token> --backup --backup-dir=<dir> --polite --debug-args]
gathertweet simplify [--file=<file> --output=<output> --debug-args --polite] [<fields>...] gathertweet simplify [--file=<file> --output=<output> --debug-args --polite] [<fields>...]
--backup Create a backup of existing tweet file before writing any new files --backup Create a backup of existing tweet file before writing any new files
--backup-dir <dir> Location for backups, use "" for current directory. [default: backups] --backup-dir <dir> Location for backups, use "" for current directory. [default: backups]
--debug-args Print values of the arguments only --debug-args Print values of the arguments only
--and-simplify Create additional simplified tweet set with default values.
Run `gathertweet simplify` manually for more control.
search:
search and timeline:
-n, --n <n> Number of tweets to return [default: 18000] -n, --n <n> Number of tweets to return [default: 18000]
--type <type> Type of search results: "recent", "mixed", or "popular". [default: recent]
--include_rts Logical indicating whether retweets should be included --include_rts Logical indicating whether retweets should be included
--geocode <geocode> Geographical limiter of the template "latitude,longitude,radius"
--max_id <max_id> Return results with an ID less than (older than) or equal to max_id --max_id <max_id> Return results with an ID less than (older than) or equal to max_id
search:
--type <type> Type of search results: "recent", "mixed", or "popular". [default: recent]
--geocode <geocode> Geographical limiter of the template "latitude,longitude,radius"
--since_id <since_id> Return results with an ID greather than (newer than) or equal to since_id, --since_id <since_id> Return results with an ID greather than (newer than) or equal to since_id,
automatically extracted from the existing tweets <file>, if it exists, and automatically extracted from the existing tweets <file>, if it exists, and
ignored when <max_id> is set. "none" for all available tweets. [default: last]
--and-simplify Create additional simplified tweet set with default values.
Run `gathertweet simplify` manually for more control.
ignored when <max_id> is set. Use "none" for all available tweets,
or "last" for the maximum seen status_id in existing tweets. [default: last]
timeline:
--home If included, returns home-timeline instead of user-timeline.
simplify: simplify:
--output <output> Output file, default is input file with `_simplified` appended to name. --output <output> Output file, default is input file with `_simplified` appended to name.

Laden…
Abbrechen
Speichern