Просмотр исходного кода

Update readme

- cite rtweet!
- general formatting
- gathertweet timeline
master
Garrick Aden-Buie 7 лет назад
Родитель
Сommit
b81867358d
2 измененных файлов: 127 добавлений и 31 удалений
  1. +26
    -3
      README.Rmd
  2. +101
    -28
      README.md

+ 26
- 3
README.Rmd Просмотреть файл

@@ -7,6 +7,7 @@ output: github_document
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
cache = TRUE,
comment = "",
prompt = TRUE,
fig.path = "man/figures/README-",
@@ -18,12 +19,27 @@ knitr::opts_knit$set(root.dir = ".tmp/rstats")
```

[rtweet]: https://rtweet.info
[cron]: https://en.wikipedia.org/wiki/Cron

# gathertweet
<h1 style="font-weight: normal;">gathe<strong>rtweet</strong></h1>

The goal of gathertweet is to provide a simple command line utility that wraps key functions from [rtweet].

The magic of **gathertweet** is that it grants you the power to **quickly set up twitter monitoring and tweet gathering** while saving you from the pain of **writing a bunch of boilerplate code to save new tweets without losing previously collected tweets, join multiple searches, update tweet stats, simplify stored tweets, and more**.
The magic of **gathertweet** is that it grants you the power to **quickly set up twitter monitoring and tweet gathering** while saving you from the pain of **writing a bunch of boilerplate code** to

- save new tweets without losing previously collected tweets,
- join multiple searches,
- update tweet stats,
- simplify stored tweets,
- schedule easily with [cron],
- and more...

gathe**rtweet** is a thin wrapper around [rtweet], the excellent R interface to Twitter written by [Mike Kearney](https://mikewk.com/).
If you use gathertweet, please ensure that you [cite rtweet directly](https://rtweet.info/authors.html).

```{r}
citation("rtweet")
```

## Installation

@@ -42,10 +58,17 @@ gathertweet::install_gathertweet()
```

which adds `gathertweet` to `/usr/local/bin` as a symlink (you can adjust where this link is created in `install_gathertweet()`).
If you need admin rights to install, try `sudo Rscript -e "gathertweet::install_gathertweet()` from the command line.
If you need admin rights to install, try
```
sudo Rscript -e "gathertweet::install_gathertweet()"
```
from the command line.


## Example

### Use gathertweet from the command line

Create a directory to store tweets

```bash

+ 101
- 28
README.md Просмотреть файл

@@ -1,16 +1,50 @@

<!-- README.md is generated from README.Rmd. Please edit that file -->

# gathertweet
<h1 style="font-weight: normal;">

gathe<strong>rtweet</strong>

</h1>

The goal of gathertweet is to provide a simple command line utility that
wraps key functions from [rtweet](https://rtweet.info).

The magic of **gathertweet** is that it grants you the power to
**quickly set up twitter monitoring and tweet gathering** while saving
you from the pain of **writing a bunch of boilerplate code to save new
tweets without losing previously collected tweets, join multiple
searches, update tweet stats, simplify stored tweets, and more**.
you from the pain of **writing a bunch of boilerplate code** to

- save new tweets without losing previously collected tweets,
- join multiple searches,
- update tweet stats,
- simplify stored tweets,
- schedule easily with [cron](https://en.wikipedia.org/wiki/Cron),
- and more…

gathe**rtweet** is a thin wrapper around [rtweet](https://rtweet.info),
the excellent R interface to Twitter written by [Mike
Kearney](https://mikewk.com/). If you use gathertweet, please ensure
that you [cite rtweet directly](https://rtweet.info/authors.html).

``` r
> citation("rtweet")

To cite rtweet use:

Kearney, M. W. (2018). rtweet: Collecting Twitter Data. R
package version 0.6.7 Retrieved from
https://cran.r-project.org/package=rtweet

A BibTeX entry for LaTeX users is

@Manual{rtweet-package,
title = {rtweet: Collecting Twitter Data},
author = {Michael W. Kearney},
year = {2018},
note = {R package version 0.6.7},
url = {https://cran.r-project.org/package=rtweet},
}
```

## Installation

@@ -30,11 +64,16 @@ gathertweet::install_gathertweet()

which adds `gathertweet` to `/usr/local/bin` as a symlink (you can
adjust where this link is created in `install_gathertweet()`). If you
need admin rights to install, try `sudo Rscript -e
"gathertweet::install_gathertweet()` from the command line.
need admin rights to install, try

sudo Rscript -e "gathertweet::install_gathertweet()"

from the command line.

## Example

### Use gathertweet from the command line

Create a directory to store tweets

``` bash
@@ -46,39 +85,65 @@ Get 100 \#rstats tweets

``` bash
> gathertweet search --n 100 --quiet "#rstats"
[2019-01-29 21:54:37] [INFO] ---- gathertweet search start ----
[2019-01-29 21:54:37] [INFO] Searching for "#rstats"
[2019-01-29 21:54:37] [INFO] Gathered 100 tweets
[2019-01-29 21:54:38] [INFO] Total of 100 tweets in tweets.rds
[2019-01-29 21:54:38] [INFO] ---- gathertweet search complete ----
[2019-05-04 14:52:15] [INFO] ---- gathertweet search start ----
[2019-05-04 14:52:15] [INFO] Searching for "#rstats"
[2019-05-04 14:52:16] [INFO] Gathered 100 tweets
[2019-05-04 14:52:16] [INFO] Total of 100 tweets in tweets.rds
[2019-05-04 14:52:16] [INFO] ---- gathertweet search complete ----
```

Get more tweets, automatically starting from end of the last search

``` bash
> gathertweet search --n 100 --quiet "#rstats"
[2019-01-29 21:55:39] [INFO] ---- gathertweet search start ----
[2019-01-29 21:55:39] [INFO] Searching for "#rstats"
[2019-01-29 21:55:39] [INFO] Tweets from 1090438050835038208
[2019-01-29 21:55:39] [INFO] Gathered 1 tweets
[2019-01-29 21:55:39] [INFO] Total of 100 tweets in tweets.rds
[2019-01-29 21:55:39] [INFO] ---- gathertweet search complete ----
[2019-05-04 14:53:17] [INFO] ---- gathertweet search start ----
[2019-05-04 14:53:17] [INFO] Searching for "#rstats"
[2019-05-04 14:53:17] [INFO] Tweets from 1124748486971359232
[2019-05-04 14:53:17] [INFO] Gathered 1 tweets
[2019-05-04 14:53:17] [INFO] Total of 100 tweets in tweets.rds
[2019-05-04 14:53:17] [INFO] ---- gathertweet search complete ----
```

Update the stored data about those \#rstats tweets

``` bash
> gathertweet update
[2019-01-29 21:55:40] [INFO] ---- gathertweet update start ----
[2019-01-29 21:55:40] [INFO] Updating tweets in tweets.rds
[2019-01-29 21:55:40] [INFO] Getting 100 tweets
[2019-01-29 21:55:41] [INFO] ---- gathertweet update complete ----
[2019-05-04 14:53:18] [INFO] ---- gathertweet update start ----
[2019-05-04 14:53:18] [INFO] Updating tweets in tweets.rds
[2019-05-04 14:53:18] [INFO] Getting 100 tweets
[2019-05-04 14:53:19] [INFO] ---- gathertweet update complete ----
```

``` bash
> ls -lh
total 40K
-rw-rw-r-- 1 garrick garrick 40K Jan 29 21:55 tweets.rds
-rw-rw-r-- 1 garrick garrick 39K May 4 14:53 tweets.rds
```

Gather user timelines

``` bash
> gathertweet timeline hadleywickham jennybryan dataandme
[2019-05-04 21:11:54] [INFO] ---- gathertweet timeline start ----
[2019-05-04 21:11:54] [INFO] Gathering tweets by hadleywickham, jennybryan, dataandme
[2019-05-04 21:12:23] [INFO] Gathered 7368 tweets from 3 users
[2019-05-04 21:12:23] [INFO] Total of 7368 tweets in tweets.rds
[2019-05-04 21:12:23] [INFO] ---- gathertweet timeline complete ----
```

### Schedule tweet gathering using cron

The primary use case of gathertweet is to make it easy to set up
[cron](https://en.wikipedia.org/wiki/Cron) to periodically gather
tweets. Here’s a simple example to download all tweets matching the
search term `rstats OR tidyverse` every night at midnight. The tweets
are stored, by default, in `tweets.rds` in `~/rstats-tweets`.

``` bash
crontab -e

# m h dom mon dow command
0 0 * * * (cd ~/rstats-tweets && ~/bin/gathertweet search --polite 'rstats OR tidyverse' >>gathertweet.log)
```

## Documentation
@@ -91,6 +156,7 @@ total 40K
Usage:
gathertweet search [--file=<file>] [options] [--] <terms>...
gathertweet timeline [options] [--] <users>...
gathertweet update [--file=<file> --token=<token> --backup --backup-dir=<dir> --polite --debug-args]
gathertweet simplify [--file=<file> --output=<output> --debug-args --polite] [<fields>...]
@@ -119,17 +185,24 @@ total 40K
--backup Create a backup of existing tweet file before writing any new files
--backup-dir <dir> Location for backups, use "" for current directory. [default: backups]
--debug-args Print values of the arguments only
--and-simplify Create additional simplified tweet set with default values.
Run `gathertweet simplify` manually for more control.
search:
search and timeline:
-n, --n <n> Number of tweets to return [default: 18000]
--type <type> Type of search results: "recent", "mixed", or "popular". [default: recent]
--include_rts Logical indicating whether retweets should be included
--geocode <geocode> Geographical limiter of the template "latitude,longitude,radius"
--max_id <max_id> Return results with an ID less than (older than) or equal to max_id
search:
--type <type> Type of search results: "recent", "mixed", or "popular". [default: recent]
--geocode <geocode> Geographical limiter of the template "latitude,longitude,radius"
--since_id <since_id> Return results with an ID greather than (newer than) or equal to since_id,
automatically extracted from the existing tweets <file>, if it exists, and
ignored when <max_id> is set. "none" for all available tweets. [default: last]
--and-simplify Create additional simplified tweet set with default values.
Run `gathertweet simplify` manually for more control.
ignored when <max_id> is set. Use "none" for all available tweets,
or "last" for the maximum seen status_id in existing tweets. [default: last]
timeline:
--home If included, returns home-timeline instead of user-timeline.
simplify:
--output <output> Output file, default is input file with `_simplified` appended to name.

Загрузка…
Отмена
Сохранить