Candidate Listing Resolution

Author

Garrick Aden-Buie

Published

January 5, 2024

Overview

The complete dataset involves a number of tables from four data sources:

  1. The Campaign Finance Reports from the NC State Board of Elections (SBOE);
  2. The Candidate Listing from the NC SBOE;
  3. The Voter Registration from the NC SBOE; and
  4. Resolved addresses from the U.S. Census Bureau.

The diagram below outlines the tables in the uploaded dataset and their general relationship to each other.

flowchart LR
  subgraph "Campaign Finance reports"
      reports
      cover
      officers
      receipts
      expenses
      committees
      receipts_payer
      expenses_payee
  end

  subgraph "Candidate Listing"
    cl_candidates
    cl_elections
    cl_name_on_ballot
    cl_party
    cl_contact
  end

  committees --> reports

  reports --> cover
  reports --> officers
  reports --> receipts
  reports --> expenses

  receipts --> receipts_payer
  expenses --> expenses_payee

  cl_candidates --> cl_elections
  cl_elections --> cl_name_on_ballot
  cl_elections --> cl_party
  cl_elections --> cl_contact

  committee_candidate <--> committees
  committee_candidate <--> cl_candidates

Campaign finance reports

The primary goal of this project was to collect and organize the campaign finance reports. The core tables of this portion of the dataset are:

  1. reports: This table provides a master list of reports filed with the SBOE.
  2. committees: This table extracts the most recent committee information from the filed reports. If you’re interested in a particular committee, this is likely the place you’ll want to start.
  3. receipts, expenses: These table provides a list of received contributions and expenses paid by the committee.
  4. receipts_payer, expenses_payee: These tables provide the payer/payee information for the receipts and expenses, extracted from the receipts and expenses tables. I haven’t de-duplicated the records in this table (yet).
  5. cover: Each report has a cover “page” where key information about the committee or the period being reported is provided.
  6. officers: Each committee has a list of officers. This table provides a master list of officers for all committees.

All of the above tables have both sboe_id and report_id columns. sboe_id uniquely identifies a committee by its SBOE-assigned ID, and report_id uniquely identifies an individual campaign finance report. In the committees table, report_id refers to the latest report from which the committee contact information was extracted.

The sboe_id is always the same for a given committee and is the best way to identify a committee. The report_id refers to a specific filing of a report, and I’ve included only the most recently filed reports in this database. Note that amended filings receive a new report ID, the report_id may change in the future if or when the committee files an amendment.

Candidate listing

To augment the campaign finance data, I’ve included several tables extracted from each year’s candidate listing. The primary table of interest is cl_candidates. It contains individual candidates from the candidate listing and their contact information and party affiliation. If you’re looking for a specific candidate, but don’t know their SBOE ID or their committee’s name, this is the place to start.

Candidates from the candidate listing are linked to a specific SBOE committee via the committee_candidate table, which matches a candidate_id with an sboe_id. Note that the candidate_id is an ID I’ve created to help organize the candidate listing – although I’m sure the SBOE has its own internal ID for each candidate, they don’t include them in the data they publish. This means that the candidate_id may change when the candidate listing is updated.

Additional data sources

Many tables have an address_lookup column. When this column is found in a table, it serves as the key for matching the address of that row with the resolved addresses in the addresses table. These addresses have been passed through the geocoding services provided by the U.S. Census Bureau, so the addresses is also a useful way to get the latitude and longitude on an address for mapping purposes.

Additionally, I’ve included the voter registration data. It’s not currently linked to any other tables, but it could be useful in observing trends in voter registration across counties, or for exploring demographic trends in the voting population.

Finding a candidate

You can often start by searching for a candidate in the cl_candidates table. Here’s an example looking for John Bell.

SELECT *
FROM cl_candidates
WHERE (
  first_name = 'JOHN' AND
  middle_name = 'RICHARD' AND
  last_name = 'BELL'
)
1 records
candidate_id name_on_ballot first_name middle_name last_name name_suffix_lbl party_last party_most contest_n contest_first contest_latest street city state zip_code phone email address_lookup
924 John Bell JOHN RICHARD BELL IV REP REP 4 2016-11-08 2022-11-08 501 HOLLAND HILL DR GOLDSBORO NC 27530 (919) 344-6324 NA 501 HOLLAND HILL DR, GOLDSBORO, NC, 27530

Then, join the result with the committee_candidate to find the candidate’s sboe_id, if a match has been identified. The candidate-committee linking uses probabilistic matching. which allows for some flexibility in the matching process. Note that not every candidate is linked to a committee and there are many more committees than candidates.

WITH john_bell AS (
  SELECT *
  FROM cl_candidates
  WHERE (
    first_name = 'JOHN' AND
    middle_name = 'RICHARD' AND
    last_name = 'BELL'
  )
)

SELECT *
FROM committee_candidate
WHERE candidate_id IN (
  SELECT candidate_id
  FROM john_bell
)
1 records
sboe_id candidate_id
STA-8S285O-C-001 924

The candidate listing also gives you a complete history of the candidate’s election history. You might find some historically interesting information by joining cl_candidates with

  • cl_elections for specific election contests,
  • cl_name_on_ballot for the candidate’s name on the ballot,
  • cl_party for the party affiliation of a candidate in an election, and
  • cl_contact for contact information for a candidate.

Here’s an example combining the above tables to show the last three elections for John Bell.

WITH john_bell AS (
  SELECT *
  FROM cl_candidates
  WHERE (
    first_name = 'JOHN' AND
    middle_name = 'RICHARD' AND
    last_name = 'BELL'
  )
)

SELECT *
FROM cl_elections
LEFT JOIN cl_name_on_ballot USING (candidate_id, election_dt)
LEFT JOIN cl_party USING (candidate_id, election_dt)
LEFT JOIN cl_contact USING (candidate_id, election_dt)
WHERE candidate_id IN (
  SELECT candidate_id
  FROM john_bell
)
ORDER BY election_dt DESC
LIMIT 3
3 records
election_dt county_name contest_name candidate_id first_name middle_name last_name name_suffix_lbl name_on_ballot first_name middle_name last_name name_suffix_lbl party_candidate street city state zip_code phone email address_lookup
2022-11-08 WAYNE NC HOUSE OF REPRESENTATIVES DISTRICT 010 924 JOHN RICHARD BELL IV John Bell JOHN RICHARD BELL IV REP 501 HOLLAND HILL DR GOLDSBORO NC 27530 NA NA 501 HOLLAND HILL DR, GOLDSBORO, NC, 27530
2020-11-03 GREENE NC HOUSE OF REPRESENTATIVES DISTRICT 010 924 JOHN RICHARD BELL IV John Bell JOHN RICHARD BELL IV REP 501 HOLLAND HILL DR GOLDSBORO NC 27530 (919) 344-6324 NA 501 HOLLAND HILL DR, GOLDSBORO, NC, 27530
2020-11-03 JOHNSTON NC HOUSE OF REPRESENTATIVES DISTRICT 010 924 JOHN RICHARD BELL IV John Bell JOHN RICHARD BELL IV REP 501 HOLLAND HILL DR GOLDSBORO NC 27530 (919) 344-6324 NA 501 HOLLAND HILL DR, GOLDSBORO, NC, 27530

Campaign Finance Reports

Within the campaign finance report data, the best place to get start is with the reports or committees table.

The reports table provides a master list of reports filed with the SBOE. These include only the most up-to-date reports, taking into account amended filings, so you do not need to worry about filtering out outdated reports.

Here are the first 5 reports filed by John Bell’s committee (sboe_id='STA-8S285O-C-001' taken from the previous query).

SELECT *
FROM reports
WHERE sboe_id='STA-8S285O-C-001'
LIMIT 5
5 records
year doc_name doc_order sboe_id report_id amended image_id received_image received_data start_date end_date sboe_start_date sboe_end_date cover_start_date cover_end_date cover_date_filed
2016 First Quarter 1 STA-8S285O-C-001 164748 TRUE 231339 2019-03-01 2019-03-01 2016-01-01 2016-02-29 2016-01-01 2016-02-29 2016-01-01 2016-02-29 2019-02-23
2016 Second Quarter 2 STA-8S285O-C-001 164749 TRUE 231340 2019-03-01 2019-03-01 2016-03-01 2016-06-30 2016-03-01 2016-06-30 2016-03-01 2016-06-30 2019-02-23
2016 Third Quarter 3 STA-8S285O-C-001 164751 TRUE 231341 2019-03-01 2019-03-01 2016-07-01 2016-10-22 2016-07-01 2016-10-22 2016-07-01 2016-10-22 2019-02-25
2016 Fourth Quarter 4 STA-8S285O-C-001 164753 TRUE 231342 2019-03-01 2019-03-01 2016-10-23 2016-12-31 2016-10-23 2016-12-31 2016-10-23 2016-12-31 2019-02-25
2017 Mid Year Semi-Annual 5 STA-8S285O-C-001 164754 TRUE 231343 2019-03-01 2019-03-01 2017-01-01 2017-06-30 2017-01-01 2017-06-30 2017-01-01 2017-06-30 2019-02-25

Each report has a cover page, which is included in the cover table, but I’ve extracted the most recent name and contact information for the committee into the committees table. Here, you’re guaranteed to get a single row per committee, for example with John Bell’s committee:

SELECT *
FROM committees
WHERE sboe_id='STA-8S285O-C-001'
1 records
sboe_id committee_name report_id street_1 street_2 city state zip_code address_lookup committee_type fund_type fund_name
STA-8S285O-C-001 JOHN BELL COMMITTEE 209592 501 HOLLAND HILL DRIVE NA GOLDSBORO NC 27530 501 HOLLAND HILL DRIVE, GOLDSBORO, NC, 27530 Candidate Committee NA NA

Next, we have the receipts and expenses tables. Both are similarly structured, so I’ll just demonstrate how to use receipts. First, we’ll filter the reports table to get the 2016 Q1 report record for John Bell’s committee. Then, we can INNER JOIN this table with the receipts table, which returns all of the receipts for the reports in the filtered_reports common table expression (CTE, i.e. the temporary tables created using WITH ___ AS (__query__)).

WITH filtered_reports AS (
  SELECT *
  FROM reports
  WHERE sboe_id='STA-8S285O-C-001'
    AND year=2016
    AND doc_name='First Quarter'
)

SELECT *
FROM receipts
INNER JOIN filtered_reports USING (report_id)
LIMIT 5
5 records
sboe_id report_id payer_id group_id occur_date amount sum_to_date is_aggregated receipt_type_desc receipt_type_code is_donation account_abbr form_of_payment_desc is_prior year doc_name doc_order sboe_id amended image_id received_image received_data start_date end_date sboe_start_date sboe_end_date cover_start_date cover_end_date cover_date_filed
STA-8S285O-C-001 164748 542341 12161117 2016-02-22 500 500 FALSE Other Political Committee Contribution CPCM TRUE 01 Check NA 2016 First Quarter 1 STA-8S285O-C-001 TRUE 231339 2019-03-01 2019-03-01 2016-01-01 2016-02-29 2016-01-01 2016-02-29 2016-01-01 2016-02-29 2019-02-23
STA-8S285O-C-001 164748 431328 12161098 2016-02-22 1000 1000 FALSE Other Political Committee Contribution CPCM TRUE 01 Check NA 2016 First Quarter 1 STA-8S285O-C-001 TRUE 231339 2019-03-01 2019-03-01 2016-01-01 2016-02-29 2016-01-01 2016-02-29 2016-01-01 2016-02-29 2019-02-23
STA-8S285O-C-001 164748 253762 12161106 2016-02-12 500 500 FALSE Other Political Committee Contribution CPCM TRUE 01 Check NA 2016 First Quarter 1 STA-8S285O-C-001 TRUE 231339 2019-03-01 2019-03-01 2016-01-01 2016-02-29 2016-01-01 2016-02-29 2016-01-01 2016-02-29 2019-02-23
STA-8S285O-C-001 164748 76519 12161112 2016-02-12 500 500 FALSE Other Political Committee Contribution CPCM TRUE 01 Check NA 2016 First Quarter 1 STA-8S285O-C-001 TRUE 231339 2019-03-01 2019-03-01 2016-01-01 2016-02-29 2016-01-01 2016-02-29 2016-01-01 2016-02-29 2019-02-23
STA-8S285O-C-001 164748 183769 12161104 2016-01-15 250 250 FALSE Other Political Committee Contribution CPCM TRUE 01 Electronic Funds Transfer NA 2016 First Quarter 1 STA-8S285O-C-001 TRUE 231339 2019-03-01 2019-03-01 2016-01-01 2016-02-29 2016-01-01 2016-02-29 2016-01-01 2016-02-29 2019-02-23

Once you have a list of receipts, you can start to do some analysis on these, like finding the total money received by contribution type. This time I’m filtering to John Bell’s 2020 Q3 report.

WITH filtered_reports AS (
  SELECT *
  FROM reports
  WHERE sboe_id='STA-8S285O-C-001'
    AND year=2020
    AND doc_name='Third Quarter'
)

SELECT
  receipt_type_desc,
  receipt_type_code,
  is_donation,
  SUM(amount) as total
FROM receipts
INNER JOIN filtered_reports USING (report_id)
GROUP BY
  receipt_type_desc,
  receipt_type_code,
  is_donation
ORDER BY total DESC
4 records
receipt_type_desc receipt_type_code is_donation total
Other Political Committee Contribution CPCM TRUE 178773.92
Individual Contribution IND TRUE 100977.00
Party Contribution PPTY TRUE 4558.35
Outside Source OUTS TRUE 660.00

Receipts and Payer Information

As you may have noticed above, each receipt is linked to a payer in the receipts_payer table via the payer_id column.

At this time, I have not de-duplicated the receipts_payer table, so there may be multiple records for a single payer. Separating the receipts and receipts_payer tables will allow us to de-duplicate payer information in the future.

Compared with the committee and candidate information, the payer records are much noisier and don’t have any highly reliable fields that we can use to de-duplicate the records. The probabilistic matching I used for linking candidates and committees will work but it’s a relatively large engineering lift. The challenge is that there are about 750,000 unique payer records, which certainly isn’t big data, but deduplicating the records requires more than 2.8 Billion comparisons (to compare every record with every other record).

For now, I’ll show you how to use the receipts_payer table to look for specific donors. I recommend starting with receipts_payer, finding all payer_id values that match your person(s) of interest, and then working from those records back to the receipts table.

As an example, let’s find the top donors who have contributed the most to NC candidates since 2016. The first CTE (donations_total) uses receipts to count the total amount donated by each payer_id. The second CTE (top_donors) filters the donors down to the top 20 donors. Finally, we join top_donors with receipts_payer to get the payer information for each donor.

This gives us the “top 20” donors, but it’s clear that some of them are the same person with slightly different payer information.

WITH
  donations_total AS (
    SELECT payer_id, SUM(amount) AS total
    FROM receipts
    WHERE (NOT((payer_id IS NULL)))
    GROUP BY payer_id
  ),
  top_donors AS (
    SELECT payer_id, total
    FROM (
      SELECT *, RANK() OVER (ORDER BY total DESC) AS donor_rank
      FROM donations_total
    )
    WHERE (donor_rank <= 20)
  )

SELECT
  top_donors.*,
  org_name,
  profession,
  employers_name,
  address_lookup
FROM top_donors
LEFT JOIN receipts_payer
  ON (top_donors.payer_id = receipts_payer.payer_id)
ORDER BY UPPER(org_name)
Displaying records 1 - 10
payer_id total org_name profession employers_name address_lookup
623503 22877799 Aggregated Individual Contribution , , ,
8616 18832500 CITIZENS FOR A BETTER NORTH CAROLINA PO BOX 97875, RALEIGH, NC, 27624
83606 4570881 Cooper for North Carolina PO BOX 10587, RALEIGH, NC, 27605
266214 4050500 DAGA PEOPLE’S LAWYER PROJECT 1350 I ST NW, WASHINGTON, DC, 20005
637770 9920000 Democratic Action 1225 I ST NW, WASHINGTON, DC, 20005
64001 3008075 Friends of Tim Moore 305 EAST KING STREET, KINGS MOUNTAIN, NC, 28086
635144 21875000 GOOD GOVERNMENT COALITION 6412 BRANDON AVE, SPRINGFIELD, VA, 22150
593023 2970000 LEAGUE OF CONSERVATION VOTERS 740 15TH ST NW, WASHINGTON, DC, 20005
456039 10222018 MISTY SMITHEY REGISTER OF DEEDS WILKES CO 403 E. MAIN ST, WILKESBORO, NC, 28697
288844 16925066 NC DEMOCRATIC LEADERSHIP COMMITTEE 434 FAYETTEVILLE STREET - STE 2020, RALEIGH, NC, 27601

Let’s pick out Greg Lindberg from the list above. (I’m sure you’re familiar with Greg Lindberg but I wasn’t, and wow!)

SELECT
  payer_id,
  org_name,
  profession,
  employers_name,
  address_lookup
FROM receipts_payer
WHERE (
  UPPER(org_name) LIKE 'GREG%LINDBERG%'
)
Displaying records 1 - 10
payer_id org_name profession employers_name address_lookup
15730 GREG LINDBERG CEO/CHAIRMAN ELI GLOBAL 3406 STAGECOACH ROAD, DURHAM, NC, 27713
64005 Greg Lindberg Owner Eli Global 2222 SEDWICK ROAD, DURHAM, NC, 27713
116488 Greg Lindberg Investor Eli Global, LLC 3406 STAGECOACH RD, DURHAM, NC, 27713
119328 GREG E. LINDBERG OWNER ELI GLOBAL 2222 SEDWICK ROAD, DURHAM, NC, 27713
183876 GREG E LINDBERG PRESIDENT ELI GLOBAL, LLC 2222 SEDWICK ROAD, DURHAM, NC, 27713
234988 GREG LINDBERG OWNER ELI GLOBAL 3406 STAGECOACH RD, DURHAM, NC, 27713
306178 GREG LINDBERG INVESTOR ELI GLOBAL, LLC 3406 STAGECOACH ROAD, DURHAM, NC, 27713
311006 GREG E LINDBERG OWNER/CEO ELI GLOBAL 2222 SEDWICK RD, DURHAM, NC, 27713
346536 GREG E LINDBERG CEO ELI GLOBAL 2222 SEDWICK RD, DURHAM, NC, 27713
472930 Greg Lindberg Investor Eli Global LLC 3406 STAGECOACH RD, DURHAM, NC, 27713

All of the above records are definitely the same person, so, because I’m curious, let’s find out how much money Greg Lindberg has donated.

WITH greg_lindberg AS (
  SELECT
    payer_id,
    org_name,
    profession,
    employers_name,
    address_lookup
  FROM receipts_payer
  WHERE (
    UPPER(org_name) LIKE 'GREG%LINDBERG%'
  )
)

SELECT SUM(amount) AS total
FROM receipts
INNER JOIN greg_lindberg USING (payer_id)

The result of that query tells us that Greg Lindberg has donated $5,000,532 to NC candidates since 2016. This number tracks with an article from The News & Observer that reports Greg Lindberg donated around $7.5 million from 2016 to 2018, a figure that includes donations to federal PACs.

Now we can find out where Greg Lindberg has donated his money. The first CTE picks out the payer records for Greg Lindberg and the second finds his total donations to each committee. Finally, we join the total donations results with the committees table to get the committee names (and pick out the top 10).

WITH
  greg_lindberg AS (
    SELECT
      payer_id,
      org_name,
      profession,
      employers_name,
      address_lookup
    FROM receipts_payer
    WHERE (
      UPPER(org_name) LIKE 'GREG%LINDBERG%'
    )
  ),
  greg_lindberg_donations AS (
    SELECT sboe_id, SUM(amount) AS total
    FROM receipts
    INNER JOIN greg_lindberg USING (payer_id)
    GROUP BY sboe_id
  )

SELECT
  greg_lindberg_donations.sboe_id,
  committees.committee_name,
  greg_lindberg_donations.total
FROM greg_lindberg_donations
LEFT JOIN committees USING (sboe_id)
ORDER BY total DESC
LIMIT 10
SBOE ID Committee Name Total Donations
STA-C4184N-C-001 NC Republican Party $1,490,000
STA-P0J4M6-C-001 NC REP COUNCIL OF STATE COMMITTEE-DONT USE $1,416,000
STA-25CP77-C-001 TRUTH AND PROSPERITY INC $1,000,000
STA-C3839N-C-001 North Carolina Democratic Party - State $500,000
STA-90G81R-C-001 NORTH CAROLINA OPPORTUNITY COMMITTEE $450,000
STA-C4207N-C-001 CHATHAM COUNTY REC $100,000
STA-N4PW5W-C-001 The Goodwin Committee $9,970
133-3309X1-C-001 FARAD ALI FOR MAYOR $5,200
STA-C0808N-C-002 BRENT JACKSON FOR NC SENATE $5,200
STA-C1649N-C-001 DAVID LEWIS FOR NC HOUSE $5,200

And now, also because I’m curious, here is a plot showing Lindberg’s donations over time. Things went south for Lindberg early in 2019, so it’s unsurprising that his donations have dropped off after 2018.

WITH
  greg_lindberg AS (
    SELECT
      payer_id,
      org_name,
      profession,
      employers_name,
      address_lookup
    FROM receipts_payer
    WHERE (
      UPPER(org_name) LIKE 'GREG%LINDBERG%'
    )
  )

SELECT YEAR(occur_date) AS year, SUM(amount) AS total
FROM receipts
INNER JOIN greg_lindberg USING (payer_id)
GROUP BY year