Вы не можете выбрать более 25 тем Темы должны начинаться с буквы или цифры, могут содержать дефисы(-) и должны содержать не более 35 символов.

142 lines
4.8KB

  1. # Created by use_targets().
  2. # Follow the comments below to fill in this target script.
  3. # Then follow the manual to check and run the pipeline:
  4. # https://books.ropensci.org/targets/walkthrough.html#inspect-the-pipeline
  5. # Load packages required to define the pipeline:
  6. library(targets)
  7. # Set target options:
  8. tar_option_set(
  9. packages = strsplit(desc::desc_get_field("Depends"), ", ")[[1]],
  10. # For distributed computing in tar_make(), supply a {crew} controller
  11. # as discussed at https://books.ropensci.org/targets/crew.html.
  12. controller = crew::crew_controller_local(workers = 24),
  13. # debug = "path_receipts_parquet_8d195f7e",
  14. # cue = tar_cue(mode = "never")
  15. error = "null"
  16. )
  17. # Run the R scripts in the R/ folder with your custom functions:
  18. tar_source()
  19. # Replace the target list below with your own:
  20. list(
  21. tar_target(path_report_list_csv, "../data-raw/report_list.csv", format = "file"),
  22. tar_target(path_report_list_raw, prepare_report_list(path_report_list_csv)),
  23. tar_target(report_list_raw, arrow::read_parquet(path_report_list_raw)),
  24. tar_target(
  25. dirs_all_src,
  26. fs::dir_ls("../data-raw/reports", glob = "**/all", recurse = TRUE, type = "directory"),
  27. format = "file"
  28. ),
  29. # This comes from Will's answer in https://stackoverflow.com/a/70293576
  30. # We're basically tricking targets into letting us branch over a file target
  31. tar_target(dirs_all_names, dirs_all_src),
  32. tar_target(dirs_all, {dirs_all_src; dirs_all_names}, pattern = map(dirs_all_names), format = "file"),
  33. tar_target(
  34. dirs_receipts_src,
  35. fs::dir_ls("../data-raw/reports", glob = "**/receipts", recurse = TRUE, type = "directory"),
  36. format = "file"
  37. ),
  38. tar_target(dirs_receipts_names, dirs_receipts_src),
  39. tar_target(dirs_receipts, {dirs_receipts_src; dirs_receipts_names}, pattern = map(dirs_receipts_names), format = "file"),
  40. tar_target(
  41. dirs_expenditures_src,
  42. fs::dir_ls("../data-raw/reports", glob = "**/expenditures", recurse = TRUE, type = "directory"),
  43. format = "file"
  44. ),
  45. tar_target(dirs_expenditures_names, dirs_expenditures_src),
  46. tar_target(dirs_expenditures, {dirs_expenditures_src; dirs_expenditures_names}, pattern = map(dirs_expenditures_names), format = "file"),
  47. tar_target(
  48. paths_all_parquet,
  49. write_prepared_report_export(dirs_all, report_list_raw),
  50. pattern = map(dirs_all),
  51. format = "file"
  52. ),
  53. tar_target(
  54. path_receipts_parquet,
  55. write_prepared_receipts_parquet(dirs_receipts, report_list_raw),
  56. pattern = map(dirs_receipts),
  57. format = "file"
  58. ),
  59. tar_target(
  60. path_expenditures_parquet,
  61. write_prepared_expenditures_parquet(dirs_expenditures, report_list_raw),
  62. pattern = map(dirs_expenditures),
  63. format = "file"
  64. ),
  65. tar_target(path_data_prep_cover, { paths_all_parquet; "../data-prep/cover" }, format = "file"),
  66. tar_target(path_data_prep_officers, { paths_all_parquet; "../data-prep/officers" }, format = "file"),
  67. tar_target(path_data_prep_receipts, { paths_all_parquet; "../data-prep/receipts" }, format = "file"),
  68. tar_target(path_data_prep_expenditures, { paths_all_parquet; "../data-prep/expenditures" }, format = "file"),
  69. tar_target(
  70. cover_raw,
  71. arrow::open_dataset(path_data_prep_cover, partitioning = "sboe_id") |> dplyr::collect()
  72. ),
  73. tar_target(
  74. report_dates,
  75. process_report_dates(report_list_raw, cover_raw)
  76. ),
  77. tar_target(
  78. path_report_dates, {
  79. out_path <- "../data-prep/report_dates/part-0.parquet"
  80. fs::dir_create(fs::path_dir(out_path))
  81. arrow::write_parquet(report_dates, out_path)
  82. }),
  83. tar_target(
  84. report_amended_score,
  85. calc_report_amended_score(report_dates)
  86. ),
  87. tar_target(
  88. addresses_raw,
  89. prep_collect_addresses_raw(
  90. path_officers = path_data_prep_officers,
  91. path_receipts = path_data_prep_receipts,
  92. path_expenditures = path_data_prep_expenditures,
  93. path_candidate_listing = path_candidate_listing,
  94. path_voters = NULL # path_voters_parquet
  95. ),
  96. format = "parquet"
  97. ),
  98. tar_target(
  99. path_addresses_db,
  100. prepare_addresses_lookup_db(addresses_raw$address)
  101. ),
  102. # This report list uses the latest amended report -----
  103. tar_target(
  104. report_list,
  105. process_report_list(report_list_raw, report_amended_score)
  106. ),
  107. tar_target(committees, prepare_committees(cover_raw, report_list)),
  108. tar_target(candidates, prepare_candidates(path_data_prep_officers, report_list)),
  109. # Outside data sources -----
  110. tar_target(candidate_listing, get_candidate_listing(2016:2023)),
  111. tar_target(path_candidate_listing, write_parquet(candidate_listing, "../data-prep/candidate_listing/part-0.parquet")),
  112. ## Voter registration records
  113. tar_target(path_voters_txt, voter_statewide_download(), cue = tar_cue("never")), #<< invalidate to get latest
  114. tar_target(
  115. path_voters_parquet,
  116. voter_statewide_convert_parquet(path_voters_txt),
  117. cue = tar_cue("never")
  118. )
  119. )