You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

128 satır
3.0KB

  1. ---
  2. title: PROBLEMS
  3. author: Garrick Aden-Buie
  4. format: pdf
  5. execute:
  6. echo: true
  7. ---
  8. ## Setup
  9. ```{r}
  10. library(tidyverse)
  11. library(fs)
  12. pkgload::load_all(here::here("process"))
  13. ```
  14. ```{r load-data}
  15. cf <- cf_prep_db_create()
  16. ```
  17. ```{r load-data-report_list}
  18. report_list <-
  19. cf$report_list |>
  20. collect() |>
  21. mutate(across(doc_name, as_report_factor))
  22. ```
  23. ### Problem scoping
  24. For helping determine the size of the problem
  25. ```{r}
  26. expenditures_by_report <-
  27. cf$expenditures |>
  28. summarize(
  29. n_expenses = n(),
  30. total_expenses = sum(amount),
  31. .by = report_id
  32. ) |>
  33. collect() |>
  34. full_join(report_list["report_id"], by = "report_id") |>
  35. replace_na(list(n_expenses = 0, total_expenses = 0))
  36. receipts_by_report <-
  37. cf$receipts |>
  38. summarize(
  39. n_receipts = n(),
  40. total_receipts = sum(amount),
  41. .by = report_id
  42. ) |>
  43. collect() |>
  44. full_join(report_list["report_id"], by = "report_id") |>
  45. replace_na(list(n_receipts = 0, total_receipts = 0))
  46. ```
  47. ## Doc search problems
  48. ```{r}
  49. report_cover_report_type <-
  50. report_list |>
  51. mutate(report_type_listed = paste(year, doc_name)) |>
  52. select(report_id, sboe_id, report_type_listed) |>
  53. left_join(
  54. cf$cover |> select(sboe_id, report_id, report_type_cover = report_type) |> collect()
  55. )
  56. report_cover_report_type |> count(report_type_listed == report_type_cover)
  57. report_cover_report_type |> filter(report_type_listed != report_type_cover)
  58. ```
  59. ```{r}
  60. report_cover_report_type |>
  61. filter(report_type_listed != report_type_cover) |>
  62. left_join(expenditures_by_report) |>
  63. left_join(receipts_by_report) |>
  64. arrange(total_receipts)
  65. ```
  66. In some of these cases, the cover is probably wrong:
  67. ```{r}
  68. report_cover_report_type |>
  69. filter(report_type_listed != report_type_cover) |>
  70. left_join(
  71. cf$cover |> select(report_id, date_from, date_to) |> collect()
  72. ) |>
  73. left_join(
  74. reporting_schedule() |>
  75. mutate(report_type_sched = paste(year, doc_name)) |>
  76. select(report_type_sched, sboe_start_date, sboe_end_date),
  77. by = c(date_from = "sboe_start_date", date_to = "sboe_end_date")
  78. )
  79. ```
  80. ## Dates
  81. ```{r}
  82. report_dates <- tar_read(report_dates, store = here::here("process/_targets"))
  83. ```
  84. ```{r}
  85. report_dates |> filter(sboe_start_date != cover_start_date) # 3,422
  86. report_dates |> filter(sboe_end_date != cover_end_date) # 590
  87. report_dates |> filter(received_image < cover_start_date) # 60
  88. report_dates |> filter(received_image < cover_end_date) # 222
  89. report_dates |> filter(received_data < cover_start_date) # 2
  90. report_dates |> filter(received_data < cover_end_date) # 45
  91. report_dates |> filter(cover_date_filed < cover_end_date) # 950
  92. ```
  93. ## Picking amended
  94. Picking the correct amended report is problematic because no date in the `report_list` can really be trusted.
  95. ### Interestingly problematic
  96. ```{r}
  97. # STA-C3235N-C-001 2017 Year End Semi-Annual
  98. # WAK-56BLZN-C-001 2020 Mid Year Semi-Annual CITIZENS FOR TOMMY MATTHEWS
  99. # STA-Z6M8TR-C-001 2017 Year End Semi-Annual FIREFIGHTERS FOR RESPON
  100. ```