10 things every data journalist should know NR14, Hamburg Jennifer LaFleur Center for Investigative Reporting
A bit about CIR Nonprofit investigative newsroom Public interest investigative journalism Based near San Francisco About 80 staff Print, web, radio and tv
A little data journalism history 1952 1967 1980s …
#1 data is a powerful reporting tool It takes you beyond the anecdote
And It’s easier than dealing with this
#1 data is a powerful reporting tool Contrasts are in the data
Caution: This slide contains extreme nerdiness
#1 data is a powerful reporting tool Contrasts are in the data Your most powerful figures are in the data
Source: California Health Dept. data, Medicare bil ing data Findings: Some hospitals had “alarming rates of a Third World nutritional disorder among its Medicare patients.”
#1 data is a powerful reporting tool Contrasts are in the data Your most powerful figures are in the data You can make connections you might not be able to make otherwise
Data: Youth prison workers, criminal convictions and grievance data Findings: Employees with criminal backgrounds were more likely to be accused of abusing inmates.
Data: Federal bridge inspections and stimulus funding. Findings: Some of the nation’s worst bridges did not get stimulus funds.
#1 data is a powerful reporting tool Contrasts are in the data Your most powerful figures are in the data You can make connections you might not be able to make otherwise You can test assumptions
Source: NHTSA complaint data Findings: “… unintended acceleration has been a problem across the auto industry.”
#2 data comes from many places
Where’s the data? If something is inspected Licensed Enforced or Purchased …There probably is a database
Where’s the data? If there is a report Or a form There probably is a database
Where’s the data? Sometimes data is readily available online for download
Where’s the data? Sometimes you have to scrape it. That usual y involves programs that automate searching tasks on Web sites.
Where’s the data? More often you need to go to an agency or source to get the data
Source: School district credit card purchases Findings: District card holders made questionable purchases with their cards.
#3 people who keep data don’t always want t give it up
Getting electronic information Know the law. Know what information you want. Do your homework Know what the appropriate cost should be. Know who does the data entry. Get to know the computer people.
Just another way of saying no Huge costs Delay tactics “Oh you sil y little journalist” Sending you the wrong thing “Your request was unclear” HIPAA Privacy Privatization
#4 Sometimes holes in data can be a story
#5 Even when there is no data, you can use techniques for sampling and building a database. Sampling Physical surveys – go look at one Testing Questionnaires, pol s and surveys Building from documents
We built a database of 500 people who had been granted or denied pardons during the Bush administration. We started with a list of nearly 2,000 people. From that, we pul ed a random sample. Then spent months researching the individuals. We found that even after control ing for other factors, whites were more likely to get a pardon.
To examine food safety, the Center for Investigative Reporting in Bosnia sampled food – literal y -- and had it tested in labs.
SVT surveyed 355 counties and districts about drug control – all replied (Courtesy Helena Bengtsson)
#6 Sometimes the crowd can help you
#7 There are many data tools – choose the right one Spreadsheets Databases Mapping Statistics Programming
Source: Salary data and other charter school records Findings: Reporters Found nepotism in charter schools and administrators earning six-figure salaries to run schools with only a few hundred or a couple of thousand students
Source: Washington Health Department data Findings: “MRSA has been quietly killing in hospitals for decades.” But no one had tracked it until this story.
Source: City Budget Findings: Some neighborhoods suffer more than others as mayor cuts budgets
SOURCE: Local health department inspection reports FINDINGS: At 28% of the venues, more than half of the concession stands or restaurants had been cited for at least one "critical" or "major" health violation.
#8 Sharing data is good, but give it context and be sure it is right
Source: EPA and state data on hazardous chemical locations Findings: Dal as County has 900+ sites that store hazardous chemicals
Source: Medicaid outcomes data for dialysis facilities Findings: A CMS online tool did not tell the whole story about facilities. In some counties the gap in measures, such as survival rate were vast.
Source: Dam inspection data from Texas and federal government Findings: Dam records had not been updated to account for population growth
#9 Data intended for one purpose can be used in other ways
Source: 311 cal s for downed trees Findings: After a tornado swept across New York City, 311 calls for downed trees helps trace its path
Disparities in water usage “Water use highest in poor areas of the city” Mapping and statistical analysis
#10: No data is perfect
Check your data • Read the documentation. Understand the contents of every field. • Know how many records you should have. • Check counts and totals against reports. • Are all possibilities included? All states, all counties, correct ranges? • Check for missing data, duplicates, internal problems