Originally published in GQ on February 19, 2021.
It perhaps doesn’t say great things about the U.S. government’s response to the coronavirus pandemic that a plucky volunteer organization has grown into one of the most trusted data sources on COVID-19 in the U.S. But that’s the reality: The COVID Tracking Project, an improvised effort supported the Atlantic, was founded in the earliest days of the pandemic, after four journalists and data scientists realized there wan’t a good centralized source for essential stats like the number of tests administered and patients currently in the hospital with the virus.
So every day for the last eleven months the project coordinated an effort of mostly volunteers to manually gather the latest public health data from all 50 states, Washington D.C., and five territories. They then worked to translate that information for the public—producing daily charts and analysis on the scale of the pandemic, which have been cited everywhere from local broadcasters to executive branch briefings. These daily updates on the number of tests, cases, hospitalizations and deaths have been invaluable journalists and public health officials, and for millions of people, become one of the few steady fixtures of this last year.
Earlier this month the group announced it would be ending its daily data compilation work on March 7, the one-year anniversary of it’s founding. Ahead of their last day, GQ spoke with two of the group’s co-founders, Alexis Madrigal and Erin Kissane, about the terrifying early days of the pandemic, why the government wasn’t doing this work, and their decision to shut down.
GQ: Can you both tell me about how this started?
Alexis Madrigal: I was talking with Robinson Meyer, another staff writer at The Atlantic, a lot in February about how we were worried about COVID. Rob realized that the number of people being tested in the U.S. was not actually known. He called me up one day and he was like, “Imagine we’re reporters on the Army Corps of Engineers beat, five days before Hurricane Katrina. Like what the fuck are we doing here? We should do something.”
We decided to try and count and compile the number of people who had been tested by calling all the states. We came up with a count of less than 2,000—when the Trump administration had been talking about having deployed millions of tests. Which meant the number of cases being reported was also an enormous undercount.
We published our first article [From March 6, Exclusive: The Strongest Evidence Yet That America Is Botching Coronavirus Testing] and after that I got an email from Jeff Hammerbacher, a college friend of mine who went on to build data systems at Facebook and then became a bioinformatics guy. He asked me if I had used his spreadsheet to write our article. I was like, “What spreadsheet?” He linked us to it, and that Google spreadsheet became the basis of what we do at COVID Tracking Project.
Erin had a lot of experience in managing distributed news projects, she came on as a fourth founder, we made this cattle call for volunteers, and that was it. Now it’s 340 days later and we’re still doing it.
Erin Kissane: Rob and I had been doing late night anxiety texts in February about how we just didn’t have eyes on the virus in the United States. As soon as I saw that Rob and Alexis had done this work, I got in touch.
You both grasped that this would be a bigger deal earlier than most Americans, and earlier than most journalists.
EK: There were so many concerns about not alarming people and not overreacting. But it happens that I’m just a person interested in pandemics and I also have an autoimmune condition, so I’m particularly concerned about respiratory viruses. I also read a lot of news out of China. It just seemed so bad and U.S. coverage in January and into February was so much about how it’s probably not going to come here, but it felt like there was very little attention on how bad it actually was in Wuhan.
And so your project quickly became the best place for testing data. Correct me if I’m wrong, but as I understand it, at some point the CDC did start collecting a lot of similar data, but failed to package it in a way that the public could easily digest. At what point did that start happening?
AM: The short answer is I think it was roughly 100 days before the CDC released a testing dataset. It wasn’t until much, much later, in the fall, that the CDC put out a dataset on current hospitalizations.
But there’s two things, there’s data availability—is anything there? And then there’s data quality—are there reasons to suspect the data is not complete? And what we found with the CDC’s testing data is that there were major problems. Each federal data pipeline matches up differently with the stitched-together data from the 56 jurisdictions, and our job is to figure that out. That’s a lot of what our work became.
EK: May 9 is when the CDC began posting testing data, cases, and deaths all together in their COVID tracker. We did a pretty in-depth research report on that, and found the testing data was really quite dramatically off for a lot of states. In some cases, it was much higher than what states reported, in some cases much lower. So back in May we felt we couldn’t stop our project. When the federal hospitalization data came out, we did a lot of work to try to explain to our data users that their data was actually quite good.
We didn’t try to build a dashboard that was easy to use. We sort of backed ourselves into providing that. At first it was just journalists and data nerds, but eventually we brought on people with more science communication expertise. I think something we’re feeling very heartened and encouraged by is that some combination of the CDC and HHS [the Department of Health and Human Services] now seem quite committed to doing science communication about the details of this pandemic—with regular briefings and all those things.
EK: The metrics we track are different, and we also work at the state level, and some of the other trackers work at the county level. The other trackers that I’m aware of are primarily scrapers, and our work is entirely manual. We have humans who go in and collect the numbers, more humans who check the numbers and double check them. The benefit of continuing to work this way, instead of moving to automation, has been that we are very, very close to tiny definitional changes in the data. We can dig through PDFs, and we can spot tiny blips in ways that a scraper might miss. It’s a very labor-intensive way of doing work, but it’s really about keeping the institutional knowledge about what exactly each number means.
AM: Unlike most of the others, we weren’t trying to build a standalone-destination tracker. Our role was quite different. We were building a node that fed information to a lot of those other trackers, as well as people who were extremely interested in some of the in-depth texture of Covid statistics. We gave top-lines, but that was not the primary goal of the project.
How many people would you say were involved with the project? And how did it break down between paid staff and volunteers?
AM: I think about 900 people have flowed through in some way, and about 400 have done a shift and entered data. So 400 Americans have really contributed to this dataset for their fellow folks. On an average week, I think it’s about 250-300 active people, and on a given data shift, there’s only so many slots, so it’s probably like 30 people on a given shift.
EK: On a given day we have the new folks, the checkers, the more experienced folks, the double checkers, and then the shift leads.
AM: Then there’s reporter folks who go out to the states, data infrastructure people, data quality people, and then there’s been about 30 paid staffers for the last few months.
So your project is winding down. While I know you didn’t launch this with the intention to run it forever, you all have created a trusted institution at a time when some of our other institutions have come to be seen as less trustworthy. So why are you ending next month, as opposed to the summer or once we get through the pandemic?
EK: We wanted to wind down as soon as we thought the federal government was doing a good enough job that we could hand it off. That sounds arrogant to say, but let’s be clear, there were deficits. Our orientation has been from the beginning that we would only go as long as we had to. And the reason for that is that we really want people to be looking at, working with, banging on, and using the federal data.
We don’t want to be a barrier between full attention on the federal data. That was really an ethical concern for us: We think it’s properly the role of the federal government. Our data can only get so good, because we’re at the wrong end of the pipeline: We can only look at what’s on public dashboards, and there’s a lot of work on those metrics that happens before they get to the dashboards. The federal government can see things we can’t see, they can do things we can’t do.
AM: We’ve had tons of interactions with states and the federal government to know that people have been making really, really good faith efforts to collect data. It’s easy to say now that the government does not appear to be cooking the books—and has not appeared to be cooking the books—but that was not at all clear through most of 2020.
EK: This has all been very ad hoc, and the people doing this work, whether they’re getting paid or not, they’re doing it because they need to be doing the work for themselves, for their country. They see it as their responsibility but it’s not a sustainable situation. We haven’t ever paid our people what they’re worth. This work should not be done by volunteers.
A real turning point for us was when we decided not to collect vaccine data. That was a strategic and tactical decision, because we wanted to put attention and pressure on the feds to track it.
I understand you’ve given some recommendations to the Biden transition team. Can you say more about that, and why you feel better about passing the torch?
AM: The number one thing is that the people we have been pressuring at HHS to deliver have really been delivering. When we first made contact with people there, they said, “We’d really like to make things more open and transparent.” And we said, “Great, let’s see that.” And week after week we continue to get more and more releases, and information.
EK: One of the things that has happened over the course of this project is that we’ve developed relationships with most of the states, with people in their public health departments, who have really helped us understand what they could and couldn’t do. And something that we’re trying to do now, as we make these recommendations to the federal government, is to include those perspectives and things that we learned about what is actually possible for states, where there are resource problems, tech system problems. We hope these can be seeds for the federal government to do the deep, difficult long-term work of rebuilding the country’s public health infrastructure, which is what it’ll take to do a really good job on the data. That’s a very long project that needs to be done, and it hopefully can be nudged along by the pressure around COVID.
What needs to happen to be better able to track things in the future?
AM: The short answer is that it’s nuts to run a country, from a public health perspective, in the way that we do. Each governor and state control an enormous amount of information. The federal government can request things, even mandate it, but they’re not providing the systems that go along with those mandates. It’s not so much tech capacity, narrowly construed. It’s more like state capacity, and within that there are counties with their own capacity issues.
If we really want to go about fixing this in a deep, systemic way, you build up that capacity from the county level on up. But that does require federal coordination.
EK: Right, the federal government can compel uniform reporting. There are states that can refuse but the feds do have a lot more authority than an outside organization like us to get clean, standardized data. And we’ve seen, like with the federal hospitalization data, that they can do this. We just think they need to better provide resources to support state collection of that data, to help build capacity. I’m sure you’ve seen the reporting about how many people have quit their local public health departments this year all over the country because they’re so burnt out.
When you look back on the project, what were some interesting or particularly meaningful ways you saw the work impact the world?
AM: All of it, but the bottom up way the project hit people is what made us feel particularly good. Like when we’d hear from individual people that their family members had changed their decisions because they were able to see through our data that this was real and they should take it seriously. Also things we heard from the actual people doing heroic work on the frontlines in healthcare.
The things that were oftentimes dispiriting was seeing how much use the data was getting in governments at all levels. While that should maybe occasionally feel gratifying, it actually felt destabilizing because it made us realize the state of play in the world.
EK: We wanted to help media organizations do accountability reporting, and we did see huge pickups from media organizations, including tons and tons of broadcast stations. That was really meaningful for us, and it was also important to show media organizations that they could trust the data coming from states. We’ve seen very little malfeasance from states. We’ve seen mistakes. We’ve seen big backlogs that made things look weird. But really for the most part we’ve done the work of saying, “Look, you can trust this information.”
But I think also seeing our data cited by two different administrations has been unsettling. The hardest thing on this entire project for me has been when we learned the federal government didn’t have something that we thought they were just sitting on.
Interview has been edited and condensed.