flowchart LR
subgraph "Campaign Finance reports"
reports
cover
officers
receipts
expenses
committees
receipts_payer
expenses_payee
end
subgraph "Candidate Listing"
cl_candidates
cl_elections
cl_name_on_ballot
cl_party
cl_contact
end
committees --> reports
reports --> cover
reports --> officers
reports --> receipts
reports --> expenses
receipts --> receipts_payer
expenses --> expenses_payee
cl_candidates --> cl_elections
cl_elections --> cl_name_on_ballot
cl_elections --> cl_party
cl_elections --> cl_contact
committee_candidate <--> committees
committee_candidate <--> cl_candidates
Candidate Listing Resolution
Overview
The complete dataset involves a number of tables from four data sources:
- The Campaign Finance Reports from the NC State Board of Elections (SBOE);
- The Candidate Listing from the NC SBOE;
- The Voter Registration from the NC SBOE; and
- Resolved addresses from the U.S. Census Bureau.
The diagram below outlines the tables in the uploaded dataset and their general relationship to each other.
Campaign finance reports
The primary goal of this project was to collect and organize the campaign finance reports. The core tables of this portion of the dataset are:
reports: This table provides a master list of reports filed with the SBOE.committees: This table extracts the most recent committee information from the filed reports. If you’re interested in a particular committee, this is likely the place you’ll want to start.receipts,expenses: These table provides a list of received contributions and expenses paid by the committee.receipts_payer,expenses_payee: These tables provide the payer/payee information for the receipts and expenses, extracted from thereceiptsandexpensestables. I haven’t de-duplicated the records in this table (yet).cover: Each report has a cover “page” where key information about the committee or the period being reported is provided.officers: Each committee has a list of officers. This table provides a master list of officers for all committees.
All of the above tables have both sboe_id and report_id columns. sboe_id uniquely identifies a committee by its SBOE-assigned ID, and report_id uniquely identifies an individual campaign finance report. In the committees table, report_id refers to the latest report from which the committee contact information was extracted.
The sboe_id is always the same for a given committee and is the best way to identify a committee. The report_id refers to a specific filing of a report, and I’ve included only the most recently filed reports in this database. Note that amended filings receive a new report ID, the report_id may change in the future if or when the committee files an amendment.
Candidate listing
To augment the campaign finance data, I’ve included several tables extracted from each year’s candidate listing. The primary table of interest is cl_candidates. It contains individual candidates from the candidate listing and their contact information and party affiliation. If you’re looking for a specific candidate, but don’t know their SBOE ID or their committee’s name, this is the place to start.
Candidates from the candidate listing are linked to a specific SBOE committee via the committee_candidate table, which matches a candidate_id with an sboe_id. Note that the candidate_id is an ID I’ve created to help organize the candidate listing – although I’m sure the SBOE has its own internal ID for each candidate, they don’t include them in the data they publish. This means that the candidate_id may change when the candidate listing is updated.
Additional data sources
Many tables have an address_lookup column. When this column is found in a table, it serves as the key for matching the address of that row with the resolved addresses in the addresses table. These addresses have been passed through the geocoding services provided by the U.S. Census Bureau, so the addresses is also a useful way to get the latitude and longitude on an address for mapping purposes.
Additionally, I’ve included the voter registration data. It’s not currently linked to any other tables, but it could be useful in observing trends in voter registration across counties, or for exploring demographic trends in the voting population.
Finding a candidate
You can often start by searching for a candidate in the cl_candidates table. Here’s an example looking for John Bell.
SELECT *
FROM cl_candidates
WHERE (
first_name = 'JOHN' AND
middle_name = 'RICHARD' AND
last_name = 'BELL'
)| candidate_id | name_on_ballot | first_name | middle_name | last_name | name_suffix_lbl | party_last | party_most | contest_n | contest_first | contest_latest | street | city | state | zip_code | phone | address_lookup | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 924 | John Bell | JOHN | RICHARD | BELL | IV | REP | REP | 4 | 2016-11-08 | 2022-11-08 | 501 HOLLAND HILL DR | GOLDSBORO | NC | 27530 | (919) 344-6324 | NA | 501 HOLLAND HILL DR, GOLDSBORO, NC, 27530 |
Then, join the result with the committee_candidate to find the candidate’s sboe_id, if a match has been identified. The candidate-committee linking uses probabilistic matching. which allows for some flexibility in the matching process. Note that not every candidate is linked to a committee and there are many more committees than candidates.
WITH john_bell AS (
SELECT *
FROM cl_candidates
WHERE (
first_name = 'JOHN' AND
middle_name = 'RICHARD' AND
last_name = 'BELL'
)
)
SELECT *
FROM committee_candidate
WHERE candidate_id IN (
SELECT candidate_id
FROM john_bell
)| sboe_id | candidate_id |
|---|---|
| STA-8S285O-C-001 | 924 |
The candidate listing also gives you a complete history of the candidate’s election history. You might find some historically interesting information by joining cl_candidates with
cl_electionsfor specific election contests,cl_name_on_ballotfor the candidate’s name on the ballot,cl_partyfor the party affiliation of a candidate in an election, andcl_contactfor contact information for a candidate.
Here’s an example combining the above tables to show the last three elections for John Bell.
WITH john_bell AS (
SELECT *
FROM cl_candidates
WHERE (
first_name = 'JOHN' AND
middle_name = 'RICHARD' AND
last_name = 'BELL'
)
)
SELECT *
FROM cl_elections
LEFT JOIN cl_name_on_ballot USING (candidate_id, election_dt)
LEFT JOIN cl_party USING (candidate_id, election_dt)
LEFT JOIN cl_contact USING (candidate_id, election_dt)
WHERE candidate_id IN (
SELECT candidate_id
FROM john_bell
)
ORDER BY election_dt DESC
LIMIT 3| election_dt | county_name | contest_name | candidate_id | first_name | middle_name | last_name | name_suffix_lbl | name_on_ballot | first_name | middle_name | last_name | name_suffix_lbl | party_candidate | street | city | state | zip_code | phone | address_lookup | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2022-11-08 | WAYNE | NC HOUSE OF REPRESENTATIVES DISTRICT 010 | 924 | JOHN | RICHARD | BELL | IV | John Bell | JOHN | RICHARD | BELL | IV | REP | 501 HOLLAND HILL DR | GOLDSBORO | NC | 27530 | NA | NA | 501 HOLLAND HILL DR, GOLDSBORO, NC, 27530 |
| 2020-11-03 | GREENE | NC HOUSE OF REPRESENTATIVES DISTRICT 010 | 924 | JOHN | RICHARD | BELL | IV | John Bell | JOHN | RICHARD | BELL | IV | REP | 501 HOLLAND HILL DR | GOLDSBORO | NC | 27530 | (919) 344-6324 | NA | 501 HOLLAND HILL DR, GOLDSBORO, NC, 27530 |
| 2020-11-03 | JOHNSTON | NC HOUSE OF REPRESENTATIVES DISTRICT 010 | 924 | JOHN | RICHARD | BELL | IV | John Bell | JOHN | RICHARD | BELL | IV | REP | 501 HOLLAND HILL DR | GOLDSBORO | NC | 27530 | (919) 344-6324 | NA | 501 HOLLAND HILL DR, GOLDSBORO, NC, 27530 |
Campaign Finance Reports
Within the campaign finance report data, the best place to get start is with the reports or committees table.
The reports table provides a master list of reports filed with the SBOE. These include only the most up-to-date reports, taking into account amended filings, so you do not need to worry about filtering out outdated reports.
Here are the first 5 reports filed by John Bell’s committee (sboe_id='STA-8S285O-C-001' taken from the previous query).
SELECT *
FROM reports
WHERE sboe_id='STA-8S285O-C-001'
LIMIT 5| year | doc_name | doc_order | sboe_id | report_id | amended | image_id | received_image | received_data | start_date | end_date | sboe_start_date | sboe_end_date | cover_start_date | cover_end_date | cover_date_filed |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2016 | First Quarter | 1 | STA-8S285O-C-001 | 164748 | TRUE | 231339 | 2019-03-01 | 2019-03-01 | 2016-01-01 | 2016-02-29 | 2016-01-01 | 2016-02-29 | 2016-01-01 | 2016-02-29 | 2019-02-23 |
| 2016 | Second Quarter | 2 | STA-8S285O-C-001 | 164749 | TRUE | 231340 | 2019-03-01 | 2019-03-01 | 2016-03-01 | 2016-06-30 | 2016-03-01 | 2016-06-30 | 2016-03-01 | 2016-06-30 | 2019-02-23 |
| 2016 | Third Quarter | 3 | STA-8S285O-C-001 | 164751 | TRUE | 231341 | 2019-03-01 | 2019-03-01 | 2016-07-01 | 2016-10-22 | 2016-07-01 | 2016-10-22 | 2016-07-01 | 2016-10-22 | 2019-02-25 |
| 2016 | Fourth Quarter | 4 | STA-8S285O-C-001 | 164753 | TRUE | 231342 | 2019-03-01 | 2019-03-01 | 2016-10-23 | 2016-12-31 | 2016-10-23 | 2016-12-31 | 2016-10-23 | 2016-12-31 | 2019-02-25 |
| 2017 | Mid Year Semi-Annual | 5 | STA-8S285O-C-001 | 164754 | TRUE | 231343 | 2019-03-01 | 2019-03-01 | 2017-01-01 | 2017-06-30 | 2017-01-01 | 2017-06-30 | 2017-01-01 | 2017-06-30 | 2019-02-25 |
Each report has a cover page, which is included in the cover table, but I’ve extracted the most recent name and contact information for the committee into the committees table. Here, you’re guaranteed to get a single row per committee, for example with John Bell’s committee:
SELECT *
FROM committees
WHERE sboe_id='STA-8S285O-C-001'| sboe_id | committee_name | report_id | street_1 | street_2 | city | state | zip_code | address_lookup | committee_type | fund_type | fund_name |
|---|---|---|---|---|---|---|---|---|---|---|---|
| STA-8S285O-C-001 | JOHN BELL COMMITTEE | 209592 | 501 HOLLAND HILL DRIVE | NA | GOLDSBORO | NC | 27530 | 501 HOLLAND HILL DRIVE, GOLDSBORO, NC, 27530 | Candidate Committee | NA | NA |
Next, we have the receipts and expenses tables. Both are similarly structured, so I’ll just demonstrate how to use receipts. First, we’ll filter the reports table to get the 2016 Q1 report record for John Bell’s committee. Then, we can INNER JOIN this table with the receipts table, which returns all of the receipts for the reports in the filtered_reports common table expression (CTE, i.e. the temporary tables created using WITH ___ AS (__query__)).
WITH filtered_reports AS (
SELECT *
FROM reports
WHERE sboe_id='STA-8S285O-C-001'
AND year=2016
AND doc_name='First Quarter'
)
SELECT *
FROM receipts
INNER JOIN filtered_reports USING (report_id)
LIMIT 5| sboe_id | report_id | payer_id | group_id | occur_date | amount | sum_to_date | is_aggregated | receipt_type_desc | receipt_type_code | is_donation | account_abbr | form_of_payment_desc | is_prior | year | doc_name | doc_order | sboe_id | amended | image_id | received_image | received_data | start_date | end_date | sboe_start_date | sboe_end_date | cover_start_date | cover_end_date | cover_date_filed |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| STA-8S285O-C-001 | 164748 | 542341 | 12161117 | 2016-02-22 | 500 | 500 | FALSE | Other Political Committee Contribution | CPCM | TRUE | 01 | Check | NA | 2016 | First Quarter | 1 | STA-8S285O-C-001 | TRUE | 231339 | 2019-03-01 | 2019-03-01 | 2016-01-01 | 2016-02-29 | 2016-01-01 | 2016-02-29 | 2016-01-01 | 2016-02-29 | 2019-02-23 |
| STA-8S285O-C-001 | 164748 | 431328 | 12161098 | 2016-02-22 | 1000 | 1000 | FALSE | Other Political Committee Contribution | CPCM | TRUE | 01 | Check | NA | 2016 | First Quarter | 1 | STA-8S285O-C-001 | TRUE | 231339 | 2019-03-01 | 2019-03-01 | 2016-01-01 | 2016-02-29 | 2016-01-01 | 2016-02-29 | 2016-01-01 | 2016-02-29 | 2019-02-23 |
| STA-8S285O-C-001 | 164748 | 253762 | 12161106 | 2016-02-12 | 500 | 500 | FALSE | Other Political Committee Contribution | CPCM | TRUE | 01 | Check | NA | 2016 | First Quarter | 1 | STA-8S285O-C-001 | TRUE | 231339 | 2019-03-01 | 2019-03-01 | 2016-01-01 | 2016-02-29 | 2016-01-01 | 2016-02-29 | 2016-01-01 | 2016-02-29 | 2019-02-23 |
| STA-8S285O-C-001 | 164748 | 76519 | 12161112 | 2016-02-12 | 500 | 500 | FALSE | Other Political Committee Contribution | CPCM | TRUE | 01 | Check | NA | 2016 | First Quarter | 1 | STA-8S285O-C-001 | TRUE | 231339 | 2019-03-01 | 2019-03-01 | 2016-01-01 | 2016-02-29 | 2016-01-01 | 2016-02-29 | 2016-01-01 | 2016-02-29 | 2019-02-23 |
| STA-8S285O-C-001 | 164748 | 183769 | 12161104 | 2016-01-15 | 250 | 250 | FALSE | Other Political Committee Contribution | CPCM | TRUE | 01 | Electronic Funds Transfer | NA | 2016 | First Quarter | 1 | STA-8S285O-C-001 | TRUE | 231339 | 2019-03-01 | 2019-03-01 | 2016-01-01 | 2016-02-29 | 2016-01-01 | 2016-02-29 | 2016-01-01 | 2016-02-29 | 2019-02-23 |
Once you have a list of receipts, you can start to do some analysis on these, like finding the total money received by contribution type. This time I’m filtering to John Bell’s 2020 Q3 report.
WITH filtered_reports AS (
SELECT *
FROM reports
WHERE sboe_id='STA-8S285O-C-001'
AND year=2020
AND doc_name='Third Quarter'
)
SELECT
receipt_type_desc,
receipt_type_code,
is_donation,
SUM(amount) as total
FROM receipts
INNER JOIN filtered_reports USING (report_id)
GROUP BY
receipt_type_desc,
receipt_type_code,
is_donation
ORDER BY total DESC| receipt_type_desc | receipt_type_code | is_donation | total |
|---|---|---|---|
| Other Political Committee Contribution | CPCM | TRUE | 178773.92 |
| Individual Contribution | IND | TRUE | 100977.00 |
| Party Contribution | PPTY | TRUE | 4558.35 |
| Outside Source | OUTS | TRUE | 660.00 |
Receipts and Payer Information
As you may have noticed above, each receipt is linked to a payer in the receipts_payer table via the payer_id column.
At this time, I have not de-duplicated the receipts_payer table, so there may be multiple records for a single payer. Separating the receipts and receipts_payer tables will allow us to de-duplicate payer information in the future.
Compared with the committee and candidate information, the payer records are much noisier and don’t have any highly reliable fields that we can use to de-duplicate the records. The probabilistic matching I used for linking candidates and committees will work but it’s a relatively large engineering lift. The challenge is that there are about 750,000 unique payer records, which certainly isn’t big data, but deduplicating the records requires more than 2.8 Billion comparisons (to compare every record with every other record).
For now, I’ll show you how to use the receipts_payer table to look for specific donors. I recommend starting with receipts_payer, finding all payer_id values that match your person(s) of interest, and then working from those records back to the receipts table.
As an example, let’s find the top donors who have contributed the most to NC candidates since 2016. The first CTE (donations_total) uses receipts to count the total amount donated by each payer_id. The second CTE (top_donors) filters the donors down to the top 20 donors. Finally, we join top_donors with receipts_payer to get the payer information for each donor.
This gives us the “top 20” donors, but it’s clear that some of them are the same person with slightly different payer information.
WITH
donations_total AS (
SELECT payer_id, SUM(amount) AS total
FROM receipts
WHERE (NOT((payer_id IS NULL)))
GROUP BY payer_id
),
top_donors AS (
SELECT payer_id, total
FROM (
SELECT *, RANK() OVER (ORDER BY total DESC) AS donor_rank
FROM donations_total
)
WHERE (donor_rank <= 20)
)
SELECT
top_donors.*,
org_name,
profession,
employers_name,
address_lookup
FROM top_donors
LEFT JOIN receipts_payer
ON (top_donors.payer_id = receipts_payer.payer_id)
ORDER BY UPPER(org_name)| payer_id | total | org_name | profession | employers_name | address_lookup |
|---|---|---|---|---|---|
| 623503 | 22877799 | Aggregated Individual Contribution | , , , | ||
| 8616 | 18832500 | CITIZENS FOR A BETTER NORTH CAROLINA | PO BOX 97875, RALEIGH, NC, 27624 | ||
| 83606 | 4570881 | Cooper for North Carolina | PO BOX 10587, RALEIGH, NC, 27605 | ||
| 266214 | 4050500 | DAGA PEOPLE’S LAWYER PROJECT | 1350 I ST NW, WASHINGTON, DC, 20005 | ||
| 637770 | 9920000 | Democratic Action | 1225 I ST NW, WASHINGTON, DC, 20005 | ||
| 64001 | 3008075 | Friends of Tim Moore | 305 EAST KING STREET, KINGS MOUNTAIN, NC, 28086 | ||
| 635144 | 21875000 | GOOD GOVERNMENT COALITION | 6412 BRANDON AVE, SPRINGFIELD, VA, 22150 | ||
| 593023 | 2970000 | LEAGUE OF CONSERVATION VOTERS | 740 15TH ST NW, WASHINGTON, DC, 20005 | ||
| 456039 | 10222018 | MISTY SMITHEY | REGISTER OF DEEDS | WILKES CO | 403 E. MAIN ST, WILKESBORO, NC, 28697 |
| 288844 | 16925066 | NC DEMOCRATIC LEADERSHIP COMMITTEE | 434 FAYETTEVILLE STREET - STE 2020, RALEIGH, NC, 27601 |
Let’s pick out Greg Lindberg from the list above. (I’m sure you’re familiar with Greg Lindberg but I wasn’t, and wow!)
SELECT
payer_id,
org_name,
profession,
employers_name,
address_lookup
FROM receipts_payer
WHERE (
UPPER(org_name) LIKE 'GREG%LINDBERG%'
)| payer_id | org_name | profession | employers_name | address_lookup |
|---|---|---|---|---|
| 15730 | GREG LINDBERG | CEO/CHAIRMAN | ELI GLOBAL | 3406 STAGECOACH ROAD, DURHAM, NC, 27713 |
| 64005 | Greg Lindberg | Owner | Eli Global | 2222 SEDWICK ROAD, DURHAM, NC, 27713 |
| 116488 | Greg Lindberg | Investor | Eli Global, LLC | 3406 STAGECOACH RD, DURHAM, NC, 27713 |
| 119328 | GREG E. LINDBERG | OWNER | ELI GLOBAL | 2222 SEDWICK ROAD, DURHAM, NC, 27713 |
| 183876 | GREG E LINDBERG | PRESIDENT | ELI GLOBAL, LLC | 2222 SEDWICK ROAD, DURHAM, NC, 27713 |
| 234988 | GREG LINDBERG | OWNER | ELI GLOBAL | 3406 STAGECOACH RD, DURHAM, NC, 27713 |
| 306178 | GREG LINDBERG | INVESTOR | ELI GLOBAL, LLC | 3406 STAGECOACH ROAD, DURHAM, NC, 27713 |
| 311006 | GREG E LINDBERG | OWNER/CEO | ELI GLOBAL | 2222 SEDWICK RD, DURHAM, NC, 27713 |
| 346536 | GREG E LINDBERG | CEO | ELI GLOBAL | 2222 SEDWICK RD, DURHAM, NC, 27713 |
| 472930 | Greg Lindberg | Investor | Eli Global LLC | 3406 STAGECOACH RD, DURHAM, NC, 27713 |
All of the above records are definitely the same person, so, because I’m curious, let’s find out how much money Greg Lindberg has donated.
WITH greg_lindberg AS (
SELECT
payer_id,
org_name,
profession,
employers_name,
address_lookup
FROM receipts_payer
WHERE (
UPPER(org_name) LIKE 'GREG%LINDBERG%'
)
)
SELECT SUM(amount) AS total
FROM receipts
INNER JOIN greg_lindberg USING (payer_id)The result of that query tells us that Greg Lindberg has donated $5,000,532 to NC candidates since 2016. This number tracks with an article from The News & Observer that reports Greg Lindberg donated around $7.5 million from 2016 to 2018, a figure that includes donations to federal PACs.
Now we can find out where Greg Lindberg has donated his money. The first CTE picks out the payer records for Greg Lindberg and the second finds his total donations to each committee. Finally, we join the total donations results with the committees table to get the committee names (and pick out the top 10).
WITH
greg_lindberg AS (
SELECT
payer_id,
org_name,
profession,
employers_name,
address_lookup
FROM receipts_payer
WHERE (
UPPER(org_name) LIKE 'GREG%LINDBERG%'
)
),
greg_lindberg_donations AS (
SELECT sboe_id, SUM(amount) AS total
FROM receipts
INNER JOIN greg_lindberg USING (payer_id)
GROUP BY sboe_id
)
SELECT
greg_lindberg_donations.sboe_id,
committees.committee_name,
greg_lindberg_donations.total
FROM greg_lindberg_donations
LEFT JOIN committees USING (sboe_id)
ORDER BY total DESC
LIMIT 10| SBOE ID | Committee Name | Total Donations |
|---|---|---|
| STA-C4184N-C-001 | NC Republican Party | $1,490,000 |
| STA-P0J4M6-C-001 | NC REP COUNCIL OF STATE COMMITTEE-DONT USE | $1,416,000 |
| STA-25CP77-C-001 | TRUTH AND PROSPERITY INC | $1,000,000 |
| STA-C3839N-C-001 | North Carolina Democratic Party - State | $500,000 |
| STA-90G81R-C-001 | NORTH CAROLINA OPPORTUNITY COMMITTEE | $450,000 |
| STA-C4207N-C-001 | CHATHAM COUNTY REC | $100,000 |
| STA-N4PW5W-C-001 | The Goodwin Committee | $9,970 |
| 133-3309X1-C-001 | FARAD ALI FOR MAYOR | $5,200 |
| STA-C0808N-C-002 | BRENT JACKSON FOR NC SENATE | $5,200 |
| STA-C1649N-C-001 | DAVID LEWIS FOR NC HOUSE | $5,200 |
And now, also because I’m curious, here is a plot showing Lindberg’s donations over time. Things went south for Lindberg early in 2019, so it’s unsurprising that his donations have dropped off after 2018.
WITH
greg_lindberg AS (
SELECT
payer_id,
org_name,
profession,
employers_name,
address_lookup
FROM receipts_payer
WHERE (
UPPER(org_name) LIKE 'GREG%LINDBERG%'
)
)
SELECT YEAR(occur_date) AS year, SUM(amount) AS total
FROM receipts
INNER JOIN greg_lindberg USING (payer_id)
GROUP BY year