Which are the best datasets for Federated Learning with Differential Privacy?

MNIST:Description: A large database of handwritten digits commonly used for training various image processing systems. Usage: Due to its simplicity and wide usage, MNIST is often used for initial testing of FL and DP algorithms. Link: MNIST Dataset
CIFAR-10 and CIFAR-100:Description: CIFAR-10 consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. CIFAR-100 is similar but has 100 classes. Usage: These datasets are used to evaluate the performance of FL algorithms on more complex image classification tasks. Link: CIFAR-10 and CIFAR-100
Fashion-MNIST:Description: A dataset of Zalando's article images, intended as a drop-in replacement for the original MNIST dataset but with more complexity. Usage: Provides a more challenging benchmark for image classification tasks in FL and DP settings. Link: Fashion-MNIST
EMNIST:Description: An extension of MNIST to handwritten letters, providing a set of handwritten character digits. Usage: Useful for tasks that go beyond digit classification to include character recognition in FL contexts. Link: EMNIST Dataset
Shakespeare:Description: A dataset derived from the works of William Shakespeare, used for text prediction tasks. Usage: Commonly used for evaluating FL algorithms in natural language processing (NLP) tasks. Link: Shakespeare Dataset
Google Landmark Dataset:Description: A large-scale dataset for landmark recognition and retrieval. Usage: Suitable for evaluating FL algorithms in large-scale image recognition tasks. Link: Google Landmark Dataset

Federated Learning (FL) with Differential Privacy (DP) is an advanced approach that allows training models across decentralized data sources while ensuring individual data privacy. To facilitate research and development in this field, several datasets are commonly used. Here are some of the best datasets for federated learning with differential privacy:

MNIST (Modified National Institute of Standards and Technology database):Description: A large database of handwritten digits commonly used for training image processing systems. Usage: Suitable for initial experimentation with FL and DP due to its simplicity and well-understood nature. Link: MNIST Dataset

CIFAR-10 and CIFAR-100 (Canadian Institute For Advanced Research):Description: Two datasets of 60,000 32x32 color images in 10 and 100 classes, respectively, with 6000 images per class. Usage: Useful for more complex image classification tasks, allowing researchers to test FL and DP on more challenging data. Link: CIFAR-10/100 Dataset

Fashion-MNIST:Description: A dataset of Zalando's article images, intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. Usage: Provides a more complex classification challenge compared to MNIST, making it suitable for evaluating FL and DP techniques. Link: Fashion-MNIST Dataset

LFW (Labeled Faces in the Wild):Description: A database of face photographs designed for studying the problem of unconstrained face recognition. Usage: Good for experimenting with privacy-preserving techniques in face recognition tasks. Link: LFW Dataset

PUMS (Public Use Microdata Sample):Description: Datasets provided by the U.S. Census Bureau that contain individual records from the census, anonymized to protect privacy. Usage: Ideal for socio-economic and demographic research under FL and DP frameworks. Link: PUMS Dataset

Texas Hospital Discharge Data:Description: Hospital discharge data including patient demographics, diagnoses, and treatments. Usage: Useful for healthcare-related federated learning scenarios, requiring robust privacy protections. Link: Texas Hospital Discharge Data

MovieLens:Description: A collection of datasets from the MovieLens website, which contains data on user ratings of movies. Usage: Suitable for recommendation system research, allowing evaluation of FL and DP methods in collaborative filtering. Link: MovieLens Dataset

FEMNIST (Federated Extended MNIST):Description: An extension of MNIST created for federated learning, containing handwritten images across different devices/users. Usage: Specifically designed for federated learning research, making it highly relevant for experiments with DP. Link: FEMNIST Dataset

When selecting datasets for federated learning with differential privacy, it’s crucial to consider the specific research goals, such as the type of data (images, text, numerical), the complexity of the task, and the need for privacy preservation. These datasets provide a good starting point for exploring various aspects of FL and DP in different application domains.

Recruitment for Postpartum Mental Health Research?

Bilingui precoci (0-3 anni) seguono la stessa traiettoria di sviluppo del linguaggio dei monolingui?

Identifying if it is a contamination?

How Can I Understand the Different Results from MacConkey Agar and β-Galactosidase Assays in a LexA-Based Bacterial-Two Hybrid System?

What statistical analysis should I use for a Likert scale?

Ai powered agile business analyst assistant?

MEGA X GenBLAST form within an alignment file .mas gives URI error?

Why is DMA data not continuous at low temperatures?

Has anyone used TO-PRO-3 (or two-pro 3) to stain fixed tissue sections by adding it to mounting media? What concentration would you suggest?

Local differential privacy vs Centralized differential privacy. Which one is better for IoT?

"A Markov-like Model for Patient Progression"?

I need the datasets of Microgrid for system identification?

Which file formats are accepted for supplementary material?

Dataset of synchronized cardiac angiography and ECG?

How to Select the most suitable machine learning algorithm depending on the characteristics of the given dataset ?

How to use evolutionary algorithms with real parameters in ryu sdn controller with large scale?

How to use NCBI datasets ?

How do I access .vcf files without an R statistical package?

Which is the best approach for anomaly detection in scanned image data set?

"Hello, I am trying to find public datasets containing FTIR spectra of blood samples (both healthy and disease-related)?