Research Reproducibility (Data/Code storage options)

01 January 1970 0 2K Report

Let's compile the online data repositories proposed to enhance the reproducibility of articles. I'll add my options, and you can add yours, with their pros and cons, so we can explore the possibilities and limitations. Here is my list:

CODE

Github (https://github.com): the most widely used open-source platform for version control and collaborative development, commonly used in academia to share research code and documentation. Pros: Git-based version control, collaborative development, integration with CI/CD tools and academic repositories (e.g., Zenodo DOI integration). Cons: Not designed for data preservation or formal citation, lacks structured metadata for academic datasets and reproducibility standards.

GitLab (https://gitlab.com): an open-source DevOps platform that supports version-controlled academic code repositories with integrated CI/CD and institutional hosting options. Pros: Stronger privacy controls and self-hosting support, integrated issue tracking and CI, ideal for academic institutions. Cons: Smaller academic user base than GitHub, less third-party integration for academic publishing (e.g., Zenodo linkage not as seamless).

CodeOcean (https://codeocean.com): open science library, a cloud-based platform for sharing executable research code and data in reproducible capsules, tailored for academic transparency and open science. Pros: Supports fully reproducible research capsules, DOI assignment, peer review-friendly, journal integrations (e.g., Nature, IEEE). Cons: Limited free tier, commercial platform, smaller user community compared to GitHub.

huggingface (https://huggingface.co): AI community with models, datasets, and applications share. Hugging Face is an open AI hub for sharing machine learning models, datasets, and demos, with strong community and support for reproducible, citation-ready AI research. Pros: Model hub with versioning and citation, dataset sharing, integration with leading ML frameworks (PyTorch, TensorFlow), strong academic presence. Cons: Primarily focused on NLP and ML communities, less suited for non-AI disciplines, not ideal for storing broader project files or raw research data.

Jupyter Notebook + nbviewer / Binder (https://mybinder.org / https://nbviewer.org): Jupyter-based tools like Binder and nbviewer allow researchers to share interactive, executable code notebooks for reproducible computational experiments. Pros: Great for tutorials, reproducibility, and open science education; integrates with GitHub and Zenodo. Cons: Not a repository per se—relies on external hosting (e.g., GitHub), not ideal for long-term archival.

Dockstore (https://dockstore.org): a platform for sharing bioinformatics tools and workflows using Docker and CWL/WDL, widely adopted in genomics and biomedical research. Pros: Standards-compliant (e.g., GA4GH), strong reproducibility, integration with major cloud platforms and bioinformatics pipelines. Cons: Limited to life sciences; requires workflow language expertise (WDL, CWL, Nextflow).

DATA SHARE

Figshare: (https://figshare.com): a general-purpose open-access repository that allows researchers to upload and share a wide variety of research outputs, including datasets, figures, and presentations. Pros: Cons: Limited storage for free users, commercial ownership (part of Digital Science) may raise concerns for some institutions.

Zenodo (https://zenodo.org): an open-access research data repository developed by CERN and OpenAIRE, designed to support sharing of data, software, and publications. Pros: Free to use, provides DOIs, strong integration with GitHub, EU-supported, and non-commercial. Cons: File size limit (50GB per upload), less customization of metadata compared to others.

Dryad (https://datadryad.org): a curated, non-profit repository for data underlying scientific and medical publications. Pros: Focus on data curation, compliance with journal and funder requirements, and DOI assignment. Cons: Submission fee required, primarily designed for datasets associated with published research.

IEEE DataPort (https://ieee-dataport.org): IEEE DataPort is a data repository focused on datasets in engineering, technology, and computer science, supporting both open and subscription-based access. Pros: Supports very large datasets (up to 2TB), DOI assignments, and integration with IEEE publications. Cons: Some features require a subscription, and there is a narrower disciplinary focus.

Harvard Dataverse (https://dataverse.harvard.edu): an open-source data repository platform for sharing, citing, and preserving research data, widely used by academic institutions worldwide. Pros: Institutional support, versioning, DOI assignment, and supporting metadata standards. Cons: This system is limited to datasets and does not support other research outputs. Additionally, it requires setup for institutional hosting unless Harvard’s instance is used.

Mendeley Data (https://data.mendeley.com): an Elsevier-hosted repository for sharing datasets across disciplines, with an emphasis on citation and collaboration. Pros: Easy integration with Elsevier journals, DOI assignment, and support for large files. Cons: Commercial ownership (Elsevier), limited customization for metadata and access controls.

Please let me know what you think about these options. Which ones do you use? If there are some more options that are missing from the list, please let us know!

Badges
Science topic

More Attila Biró's questions See All

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about Uranium ore deposits in world.

11 August 2024 6,720 0 View

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about diamond ore deposits in world.

11 August 2024 2,167 1 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

10 August 2024 8,198 5 View

Controlling for pupil light reflex when analyzing pupil size time course?

I used eye tracking to examine how participants from two different populations (A and B) react to an image. Participants in population A exhibit larger pupil sizes over time, but they also have...

10 August 2024 3,229 0 View

What are a “Farmers Producer Organization” (FPO) and its essential features?

10 August 2024 477 5 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,291 2 View

How to get moment output in Abaqus Standart?

I have input a moment load in module load Abaqus, i put my moment load on the node surface (using reference point). I have define moment in history output and make a set for moment too. But the...

08 August 2024 4,831 4 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,162 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

Are there any commercially available Donkey anti-Alpaca secondary antibodies?

Are there any fluorescently labeled anti-Alpaca secondary antibodies raised in Donkey? So far I have only been able to find anti-Alpaca secondaries raised in Goat. Or is this not possible due to...

04 August 2024 4,255 1 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

Why do open and free science in a world where science is not open and free?

Because I have realized that the world tends more and more to do open and free science and there is a trend more and more to choose free databases, free tools and open access platforms.

01 August 2024 10,046 1 View

Why can't academics earn the money they deserve?

Only Journals make money from the articles we have worked on for years. Academics do not earn money from their refereeing. Then shouldn't the solution be a system in which academics can earn...

01 August 2024 6,469 6 View