I’m conducting a meta-analysis based on observational studies (over 2,000 eligible articles), and I’m exploring the use of AI or machine learning tools to assist with data extraction, given the complexity and volume of the data.

Unlike RCTs, these studies use a wide range of statistical analyses (e.g., different regression models), and variables are reported inconsistently. For example:

  • It’s often unclear which group is used as the reference in logistic regressions.
  • Age is reported variously as mean, median, or categorized into non-standard age groups.
  • Important variables and outcomes are embedded in narrative text or poorly structured tables.

Given these challenges, I’m looking for AI/ML tools that can support:

  • Extraction from full-text PDFs, not just abstracts
  • Understanding and interpreting statistical outputs (e.g., odds ratios, reference categories (like male vs female), regression coefficients, correlation)
  • Managing heterogeneous variable formats across studies
  • Integration with tools like Covidence, Excel, or RevMan (optional but helpful)

🔍 Have you used any AI or NLP-based tools for complex data extraction in large-scale systematic reviews? I’d appreciate any recommendations, workflows, or lessons learned — particularly from those working with non-interventional studies.

Thanks in advance for your insights!

More Dewan Md. Sumsuzzman's questions See All
Similar questions and discussions