Lessons from the COVID data wizards

Published: March 25, 2022

Category:

In March 2020, Beth Blauer started hearing anecdotally that COVID-19 was disproportionately affecting Black people in the United States. But the numbers to confirm that disparity were “very limited”, says Blauer, a data and public-policy specialist at Johns Hopkins University in Baltimore, Maryland. So, her team, which had developed one of the most popular tools for tracking the spread of COVID-19 around the world, added a new graphic to their website: a colour-coded map tracking which US states were — and were not — sharing infection and death data broken down by race and ethnicity.

They posted the map to their data dashboard — the Coronavirus Resource Center — in mid-April 2020 and promoted it through social media and blogs. At the time, just 26 states included racial information with their death data. “Then we started to see the map rapidly filling in,” says Blauer. By the middle of May 2020, 40 states were reporting that information. For Blauer, the change showed that people were paying attention. “And it confirmed that we have the ability to influence what’s happening here,” she says.

COVID-19 dashboards mushroomed around the world in 2020 as data scientists and journalists shifted their work to tracking and presenting information on the pandemic — from infection and death rates, to vaccination data and other variables. “You didn’t have any data set before that was so essential to how you plan your life,” says Lisa Charlotte Muth, a data designer and blogger at Datawrapper, a Berlin-based company that helps newsrooms and journalists to enrich their reporting with embeddable charts. “The weather, maybe, was the closest thing you could compare it to.” The growth in the service’s popularity was impressive. In January 2020 — before the pandemic — Datawrapper had 260 million chart views on its clients’ websites. By April that year, that monthly figure had shot up to more than 4.7 billion.

Policymakers, too, have leaned on COVID-19 data dashboards and charts to guide important decisions. And they had hundreds of local and global examples to reference, including academic enterprises such as the Coronavirus Resource Center, as well as government websites and news-media projects.

The New York City Department of Health was among Datawrapper’s clients. And Blauer notes that she has hosted regular webinars with several US mayors, walking them through her team’s metrics. She is confident, she says, that the “data informed policy”.

The architects of these dashboards put in long hours and faced considerable challenges, including incomplete and inconsistent data, misconceptions and misunderstandings about how the information was collected, and efforts to twist the messages that the dashboards present. As these data wranglers continue to try to inform individuals and public-health officials, they are learning lessons that will help to navigate the next stage of the pandemic, as well as other social and public-health issues — from crime to climate change.

Hard data

The Johns Hopkins dashboard originated at a meeting between Lauren Gardner, who studies civil and systems engineering at Johns Hopkins, and her PhD student, Ensheng Dong. In early January 2020, Dong began closely following cases of a type of pneumonia emerging in his home country, China. “He was hearing things directly from his friends and family,” says Gardner.

Dong was concerned for their well-being, and began pulling data from a Chinese website, DXY.cn. He and Gardner spent days and nights tracking the information on a Google sheet. Then they built a map to go alongside that dynamic spreadsheet and made both available to the public. “We literally decided this one afternoon and built the initial version of the dashboard that night,” says Gardner. “It seemed like a manageable, simple task, given the scale of the problem at the time. Of course, we didn’t know the scale that this would grow to.” Just weeks later, the website had upwards of 4 billion queries a day.

Gardner and Dong eventually moved the data to a GitHub repository, a cloud-based data-storage and -management space that maintains a file history. Their initial global map, with its recognizable red dots proportional to case counts, is still updated every hour. Blauer and others joined the effort early and expanded it with a multi-layered, interactive dashboard to help people digest the data.

Ideally, data that are this important for public health should be freely available, machine-readable and standardized. From the start, the team realized that they were not. Compiling complete and consistent COVID-19 data was “very manual and very messy”, says Gardner. “We were scrambling, collecting and validating reported data as fast as we could.” Because COVID-19 data were not yet provided on any public-health agency’s website, they looked elsewhere, including on Facebook and Twitter posts and in one-off news and media announcements. Even after agencies launched official data pages, both sourcing and formatting remained an issue. Gardner says that some of the data the team collects are still not machine-readable. “There should be a standardized way in which the data is provided and shared publicly, as well as what is shared,” says Gardner. “That would’ve made our job a lot easier.”

Lauren Gardner sits at a conferece table in front of a monitor showing the Johns Hopkins COVID dashboard — Blauer has been vocal, blogging about the need for greater accessibility and standardization of data, including the use of consistent categories and naming conventions for age, gender, race and ethnicity. “Demographic data is a complete mess,” she says. Racial and ethnic categories are tricky because they are regarded differently in different countries. But even in a single US state, Blauer found category definitions changed depending on whether they were linked to vaccination rates, cases or deaths. She has made creative moves to fill in the blanks,such as when her team revealed which states were and weren’t collecting race and ethnicity data. Blauer and her team have confirmed that the pandemic has had inequitable impacts. As of September 2021, for example, Black residents of Washington DC made up 45% of the population, but 76% of COVID-19-associated deaths.

The Johns Hopkins team was not alone in its struggles. Hannah Ritchie, head of research at the non-profit organization Our World in Data in Oxford, UK, recalls her efforts to copy data from PDFs. She also points to some of the consequences of incomplete and inconsistent data. For example, differences in access to COVID-19 testing data can result in inaccurate comparisons. “It can often lead you to conclude that some countries have not been touched by the pandemic,” says Ritchie. “That is just not true.”

Ritchie also fears that the gains that have been made in data collection and visualization could easily be lost before the global pandemic is over. “A lot of these data projects are seen as one-off things,” she says. “As rich countries start to get more back to normal because of high vaccination rates, for example, will they turn around and just let these projects die?” Some dashboards have already stopped their efforts. And government efforts to collect and display data in real time are slowing in many parts of the world.

The above is excerpted from an article published in Nature on March 23, 2022. View the full article.

Hard data

Stay Connected

Address

Contact

Site Menu

Share Options

Hard data

Site Menu