Data Cleaning

Data cleaning is essential for ensuring accuracy and removing duplicates or inconsistencies in bibliometric datasets. This process involves standardizing author names, consolidating variations (e.g., "J. Doe" and "John Doe"), and formatting data for compatibility with bibliometric tools like Excel, VOSviewer, or CiteSpace.

Example:

Using Excel, duplicates from PubMed-exported bibliographies can be filtered through functions like "Remove Duplicates." For instance:

  • Articles authored by "John Doe" and "J. Doe" need to be reconciled into a single entity.
  • Similarly, journals named both as "J Dent Res" and "Journal of Dental Research" can be standardized.

Software tools like EndNote or Zotero can further streamline this process by automating duplicate detection.

More Anitha Roshan Sagarkar's questions See All
Similar questions and discussions