On the quality of official datasets available for COVID-19

01 January 1970 3 7K Report

Governments and public health systems need accurate and agile information about the characteristics and behaviour of COVID-19 to respond to this ongoing public health emergency appropriately. Researchers, public health authorities, and the general public will benefit from reliable and expeditious data to evaluate the impact of the Coronavirus pandemic on health care systems and to plan for an appropriate policy response at all levels of government.Currently, governments and policymakers throughout the world are being forced to make decisions and take actions based on alternative mathematical models developed for other diseases and/or the experience of other countries in which the outbreak has been detected early and developed. In this situation, high-quality institutional-based datasets are the prerequisite of necessary analysis for public health, which is inherently a data-intensive domain. Effective data quality assessment in the data collection process would guarantee the concordant outcomes from different studies worldwide.

There are several institutional-based repositories of public health data with the capability of electronic data collection and dissemination such as the datasets of public health information systems (PHIS), with various data quality assessment methods and standards. However, poor data quality or coding errors in PHIS is not a new issue and can lead to inaccurate inferences of health interventions. For COVID-19, multi-source datasets of the “World Health Organization (WHO)”, “European Centre for Disease Prevention and Control” and “Chinese Center for Disease Control and Prevention (Chinese CDC)” are reputable references for global BI dashboards and academic research, comprising measures of confirmed, deaths, severe, suspected and recovered cases. These resources are widely used to monitor trends in the virus outbreak and assess the risks of the pandemic in several countries and regions.

In the just published paper

Ashofteh, Afshin and Bravo, Jorge M. ‘A Study on the Quality of Novel Coronavirus (COVID-19) Official Datasets’. 1 Jan. 2020 : 1 – 11. (available at https://content.iospress.com/articles/statistical-journal-of-the-iaos/sji200674 or https://doi.org/10.3233/SJI-200674)

we analysed and compared the quality of official datasets available for COVID-19. We used comparative statistical analysis to evaluate the accuracy of data collection by a national (Chinese Center for Disease Control and Prevention) and two international (World Health Organization; European Centre for Disease Prevention and Control) organisations based on the value of systematic measurement errors. We combined excel files, text mining techniques and manual data entries to extract the COVID-19 data from official reports and to generate an accurate profile for comparisons. The findings show noticeable and increasing measurement errors in the three datasets as the pandemic outbreak expanded and more countries contributed data for the official repositories, raising data comparability concerns and pointing to the need for better coordination and harmonized statistical methods. The study offers a COVID-19 combined dataset and dashboard with minimum systematic measurement errors, and valuable insights into the potential problems in using databanks without carefully examining the metadata and additional documentation that describe the overall context of data.

The dataset and dashboard are available at:

Ashofteh, Afshin; Bravo, Jorge (2020), “COVID-19 data set resulted from a study on the quality of Novel Corona-virus official datasets”, Mendeley Data, v1 https://dx.doi.org/10.17632/nw5m4hs3jr.1 with reference to dashboard. doi: 10.17632/nw5m4hs3jr.2, available from: http://dx.doi.org/10.17632/nw5m4hs3jr.2

The description of the dataset comparisons provides valuable insights into the potential problems in using databanks that are the repository of information from many countries without carefully examining the metadata and additional documentation that describe the content and the overall context of data. Developing guidelines, standards, and ontologies for data documentation is crucial for researchers and policymakers in terms of understanding the context of data creation and collection. Moreover, the altering way in which confirmed cases and deaths have been classified in China points to similar problems which may arise in other countries which require a careful forensic analysis on a regular basis to understand how definitions are applied and to what extent data are comparable. There is a growing need for harmonization and standardization of the data gathering, reporting and data analysis processes.

Although this analysis is being conducted at a relatively early stage of the epidemics and, in the course of time, additional data sets have become available, the discussion on the identification of measurement errors remains timely, useful, and important.

Carolina Cristina Castro

Thank you for sharing your research. Is it goingo to be continued including other countries official COVID datasets?

Jorge M. Bravo

Dear Carolina, thanks for the comments. We are in the process of updating the database and results with fresh data. We will be updating the info.

Best regards

Ahmed Ismail Ebada

Thank you for sharing your research.

Why Do TDS and EC Increase with Larger Wastewater Volumes, While BOD and COD Decrease?

How to enrich pig excreta for increasing nutrient quality organically ?

Is it possible to plot the atom-projected band structure using GPAW?

Unusual intensity drop in some sections of chromatograms in DDA?

Leaf area of tomato ?

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

How to preform densitometry on SDS-page bands?

XRD Analysis is showing only Calcium carbonate. It is not showing other compounds. Can anyone help me get the other compounds?

Which solvent is better to dissolve with secondary metabolites extracted from fungi?

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Could you recommend some articles on Urban Transportation System optimization and Innovation?

• What the possible Persistent Organic Pollutants and Heavy metals present in fluorspar, sediments, and water bodies around its mining area?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Do you know best mines of western part of Afghanistan?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

Determining the worth of a point improvement in Hamilton Depression Scale?

Why wait for a doctor's visit when you can become the guardian of your child's health today?