Parcourir la source

Update README with more information

master
Garrick Aden-Buie il y a 7 ans
Parent
révision
8143bd8056
3 fichiers modifiés avec 147 ajouts et 46 suppressions
  1. +1
    -0
      .gitignore
  2. +49
    -8
      README.Rmd
  3. +97
    -38
      README.md

+ 1
- 0
.gitignore Voir le fichier

@@ -3,6 +3,7 @@
.Rhistory
.RData
.DS_Store
.tmp

# Directories that start with _
_*/

+ 49
- 8
README.Rmd Voir le fichier

@@ -12,12 +12,18 @@ knitr::opts_chunk$set(
fig.path = "man/figures/README-",
out.width = "100%"
)
if (dir.exists(".tmp")) unlink(".tmp", recursive = TRUE)
dir.create(".tmp/rstats", recursive = TRUE, showWarnings = FALSE)
knitr::opts_knit$set(root.dir = ".tmp/rstats")
```

[rtweet]: https://rtweet.info

# gathertweet

The goal of gathertweet is to provide a simple command line utility that wraps key functions from [rtweet].

__gathertweet__ removes the boilerplate code required to run periodic Twitter searches and plays well with cron.
The magic of **gathertweet** is that it grants you the power to **quickly set up twitter monitoring and tweet gathering** while saving you from the pain of **writing a bunch of boilerplate code to save new tweets without losing previously collected tweets, join multiple searches, update tweet stats, simplify stored tweets, and more**.

## Installation

@@ -35,23 +41,58 @@ Once you've installed the package, you need to run
gathertweet::install_gathertweet()
```

which adds `gathertweet` to `/usr/local/bin` as a symlink (you can adjust were this link is created).
which adds `gathertweet` to `/usr/local/bin` as a symlink (you can adjust where this link is created in `install_gathertweet()`).
If you need admin rights to install, try `sudo Rscript -e "gathertweet::install_gathertweet()` from the command line.

## Example

Create a directory to store tweets

```bash
# Get 100 #rstats tweets
gathertweet search --n 100 --quiet "#rtats"
mkdir rstats
cd rstats
```

```{r include=FALSE}
knitr::opts_knit$set(root.dir = ".tmp/rstats")
```

Get 100 #rstats tweets

# Get more tweets, automatically starting from end of the last search
```{bash}
gathertweet search --n 100 --quiet "#rstats"
```

# Update the stored data about those #rstats tweets
Get more tweets, automatically starting from end of the last search

```{r include=FALSE}
Sys.sleep(60)
```

```{bash}
gathertweet search --n 100 --quiet "#rstats"
```

Update the stored data about those #rstats tweets

```{bash}
gathertweet update
```

```{bash}
ls -lh
```

## Documentation

```{bash}
gathertweet --help
```{bash gathertweet_help, echo=FALSE, eval=FALSE}
../../inst/gathertweet.R --help
```

```bash
> gathertweet --help
```

```{bash gathertweet_help-out, echo=FALSE}
<<gathertweet_help>>
```

+ 97
- 38
README.md Voir le fichier

@@ -4,10 +4,13 @@
# gathertweet

The goal of gathertweet is to provide a simple command line utility that
wraps key functions from \[rtweet\].
wraps key functions from [rtweet](https://rtweet.info).

**gathertweet** removes the boilerplate code required to run periodic
Twitter searches and plays well with cron.
The magic of **gathertweet** is that it grants you the power to
**quickly set up twitter monitoring and tweet gathering** while saving
you from the pain of **writing a bunch of boilerplate code to save new
tweets without losing previously collected tweets, join multiple
searches, update tweet stats, simplify stored tweets, and more**.

## Installation

@@ -26,51 +29,107 @@ gathertweet::install_gathertweet()
```

which adds `gathertweet` to `/usr/local/bin` as a symlink (you can
adjust were this link is created).
adjust where this link is created in `install_gathertweet()`). If you
need admin rights to install, try `sudo Rscript -e
"gathertweet::install_gathertweet()` from the command line.

## Example

Create a directory to store tweets

``` bash
mkdir rstats
cd rstats
```

Get 100 \#rstats tweets

``` bash
# Get 100 #rstats tweets
gathertweet search --n 100 --quiet "#rtats"
> gathertweet search --n 100 --quiet "#rstats"
[2019-01-24 14:17:19] [INFO] ---- gathertweet search start ----
[2019-01-24 14:17:19] [INFO] Searching for "#rstats"
[2019-01-24 14:17:20] [INFO] Gathered 96 tweets
[2019-01-24 14:17:20] [INFO] Total of 96 tweets in tweets.rds
[2019-01-24 14:17:20] [INFO] ---- gathertweet search complete ----
```

# Get more tweets, automatically starting from end of the last search
gathertweet search --n 100 --quiet "#rstats"
Get more tweets, automatically starting from end of the last search

# Update the stored data about those #rstats tweets
gathertweet update
``` bash
> gathertweet search --n 100 --quiet "#rstats"
[2019-01-24 14:18:21] [INFO] ---- gathertweet search start ----
[2019-01-24 14:18:21] [INFO] Searching for "#rstats"
[2019-01-24 14:18:21] [INFO] Tweets from 1088513093242679296
[2019-01-24 14:18:21] [INFO] Gathered 1 tweets
[2019-01-24 14:18:21] [INFO] Total of 96 tweets in tweets.rds
[2019-01-24 14:18:21] [INFO] ---- gathertweet search complete ----
```

Update the stored data about those \#rstats tweets

``` bash
> gathertweet update
[2019-01-24 14:18:22] [INFO] ---- gathertweet update start ----
[2019-01-24 14:18:22] [INFO] Updating tweets in tweets.rds
[2019-01-24 14:18:22] [INFO] Getting 96 tweets
[2019-01-24 14:18:23] [INFO] ---- gathertweet update complete ----
```

``` bash
> ls -lh
total 40K
-rw-rw-r-- 1 garrick garrick 39K Jan 24 14:18 tweets.rds
```

## Documentation

``` bash
> gathertweet --help
Gather tweets from the command line

Usage:
gathertweet search [--file=<file>] [options] [--] <terms>...
gathertweet update [--file=<file> --backup --polite --token --debug-args]

Arguments
<terms> Search terms. Individual search terms are queried separately,
but duplicated tweets are removed from the stored results.

Options:
-h --help Show this screen.
--file=<file> Name of RDS file where tweets are stored [default: tweets.rds]
-n, --n <n> Number of tweets to return [default: 18000]
--type <type> Type of search results: "recent", "mixed", or "popular". [default: recent]
--include_rts Logical indicating whether retweets should be included
--geocode <geocode> Geographical limiter of the template "latitude,longitude,radius"
--max_id <max_id> Return results with an ID less than (older than) or equal to max_id
--since_id <since_id> Return results with an ID greather than (newer than) or equal to since_id,
automatically extracted from the existing tweets <file>, if it exists, and
ignored when <max_id> is set. [default: last]
--no-parse Disable parsing of the results
--token <token> See {rtweet} for more information
--retryonratelimit Wait and retry when rate limited (only relevant when n exceeds 18000 tweets)
--quiet Disable printing of {rtweet} processing/retrieval messages
--polite Only allow one process (search|update) to run at a time
--backup Create a backup of existing tweet file before writing any new files
--debug-args Print values of the arguments only
```

Gather tweets from the command line
Usage:
gathertweet search [--file=<file>] [options] [--] <terms>...
gathertweet update [--file=<file> --token=<token> --backup --backup-dir=<dir> --polite --debug-args]
gathertweet simplify [--file=<file> --output=<output> --debug-args --polite <fields>...]
Arguments
<terms> Search terms. Individual search terms are queried separately,
but duplicated tweets are removed from the stored results.
Each search term counts against the 15 minute rate limit of 180
searches, which can be avoided by manually joining search terms
into a single query. WARNING: Wrap queries with spaces in
'single quotes': double quotes are allowed inside single quotes only.
<fields> Tweet fields that should be included. Default value will include
`status_id`, `created_at`, `user_id`, `screen_name`, `text`,
`favorite_count`, `retweet_count`, `is_quote`, `hashtags`,
`mentions_screen_name`, `profile_url`, `profile_image_url`,
`media_url`, `urls_url`, `urls_expanded_url`.
Options:
-h --help Show this screen.
--file=<file> Name of RDS file where tweets are stored [default: tweets.rds]
--no-parse Disable parsing of the results
--token <token> See {rtweet} for more information
--retryonratelimit Wait and retry when rate limited (only relevant when n exceeds 18000 tweets)
--quiet Disable printing of {rtweet} processing/retrieval messages
--polite Only allow one process (search|update) to run at a time
--backup Create a backup of existing tweet file before writing any new files
--backup-dir <dir> Location for backups, use "" for current directory. [default: backups]
--debug-args Print values of the arguments only
search:
-n, --n <n> Number of tweets to return [default: 18000]
--type <type> Type of search results: "recent", "mixed", or "popular". [default: recent]
--include_rts Logical indicating whether retweets should be included
--geocode <geocode> Geographical limiter of the template "latitude,longitude,radius"
--max_id <max_id> Return results with an ID less than (older than) or equal to max_id
--since_id <since_id> Return results with an ID greather than (newer than) or equal to since_id,
automatically extracted from the existing tweets <file>, if it exists, and
ignored when <max_id> is set. "none" for all available tweets. [default: last]
--and-simplify Create additional simplified tweet set with default values.
Run `gathertweet simplify` manually for more control.
simplify:
--output=<output> Output file, default is input file with `_simplified` appended to name.

Chargement…
Annuler
Enregistrer