You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Details: 62 different classes (10 digits, 26 lowercase, 26 uppercase), images are 28 by 28 pixels (with option to make them all 128 by 128 pixels), 3500 users
Task: Image Classification
Sentiment140
Overview: Text Dataset of Tweets
Details 660120 users
Task: Sentiment Analysis
Shakespeare
Overview: Text Dataset of Shakespeare Dialogues
Details: 1129 users (reduced to 660 with our choice of sequence length. See bug.)
Details: 9343 users (we exclude celebrities with less than 5 images)
Task: Image Classification (Smiling vs. Not smiling)
Synthetic Dataset
Overview: We propose a process to generate synthetic, challenging federated datasets. The high-level goal is to create devices whose true models are device-dependant. To see a description of the whole generative process, please refer to the paper
Details: The user can customize the number of devices, the number of classes and the number of dimensions, among others
Task: Classification
Reddit
Overview: We preprocess the Reddit data released by pushshift.io corresponding to December 2017.
Details: 1,660,820 users with a total of 56,587,343 comments.
Task: Next-word Prediction.
Notes
Install the libraries listed in requirements.txt
I.e. with pip: run pip3 install -r requirements.txt
Go to directory of respective dataset for instructions on generating data
in MacOS check if wget is installed and working
models directory contains instructions on running baseline reference implementations