I come from a strong engineering background, having spent over a decade in the industry as a Software Engineer. Most recently, I was at Meta AI as a Research Engineer in Privacy Preserving ML group. During my time there, I led the development of Opacus, an open-source PyTorch library for training models with Differential Privacy that has gained significant community adoption. I also helped design StopNCII.org, a privacy-preserving platform that uses on-device perceptual hashing to combat non-consensual intimate image sharing.
contact
I echo the standing invitation: I like getting email and I like talking to people. Please do reach out if you’re interested in discussing research, collaboration opportunities, or the future of privacy in AI!
State-of-the-art membership inference attacks (MIAs) typically require training many reference models, making it difficult to scale these attacks to large pre-trained language models (LLMs). As a result, prior research has either relied on weaker attacks that avoid training reference models (e.g., fine-tuning attacks), or on stronger attacks applied to small-scale models and datasets. However, weaker attacks have been shown to be brittle - achieving close-to-arbitrary success - and insights from strong attacks in simplified settings do not translate to today’s LLMs. These challenges have prompted an important question: are the limitations observed in prior work due to attack design choices, or are MIAs fundamentally ineffective on LLMs? We address this question by scaling LiRA - one of the strongest MIAs - to GPT-2 architectures ranging from 10M to 1B parameters, training reference models on over 20B tokens from the C4 dataset. Our results advance the understanding of MIAs on LLMs in three key ways: (1) strong MIAs can succeed on pre-trained LLMs; (2) their effectiveness, however, remains limited (e.g., AUC<0.7) in practical settings; and, (3) the relationship between MIA success and related privacy metrics is not as straightforward as prior work has suggested.
USENIX 2025
Free Record-Level Privacy Risk Evaluation Through Artifact-Based Methods
Joseph Pollock*, Igor Shilov*, Euodia Dodd, and Yves-Alexandre Montjoye
In 34th USENIX Security Symposium (USENIX Security 25), 2025
Membership inference attacks (MIAs) are widely used to empirically assess the privacy risks of samples used to train a target machine learning model. State-of-the-art methods however require training hundreds of shadow models, with the same size and architecture of the target model, solely to evaluate the privacy risk. While one might be able to afford this for small models, the cost often becomes prohibitive for medium and large models. We here instead propose a novel approach to identify the at-risk samples using only artifacts available during training, with little to no additional computational overhead. Our method analyzes individual per-sample loss traces and uses them to identify the vulnerable data samples. We demonstrate the effectiveness of our artifact-based approach through experiments on the CIFAR10 dataset, showing high precision in identifying vulnerable samples as determined by a SOTA shadow model-based MIA (LiRA). Impressively, our method reaches the same precision as another SOTA MIA when measured against LiRA, despite it being orders of magnitude cheaper. We then show LT-IQR to outperform alternative loss aggregation methods, perform ablation studies on hyperparameters, and validate the robustness of our method to the target metric. Finally, we study the evolution of the vulnerability score distribution throughout training as a metric for model-level risk assessment.
SaTML 2025
SoK: Membership Inference Attacks on LLMs are Rushing Nowhere (and How to Fix It)
Matthieu Meeus, Igor Shilov, Shubham Jain, Manuel Faysse, Marek Rei, and Yves-Alexandre Montjoye
In 3rd IEEE Conference on Secure and Trustworthy Machine Learning, 2025
Whether LLMs memorize their training data and what this means, from privacy leakage to detecting copyright violations – has become a rapidly growing area of research over the last two years. In recent months, more than 10 new methods have been proposed to perform Membership Inference Attacks (MIAs) against LLMs. Contrary to traditional MIAs which rely on fixed – but randomized – records or models, these methods are mostly evaluated on datasets collected post-hoc. Sets of members and non-members, used to evaluate the MIA, are constructed using informed guesses after the release of a model. This lack of randomization raises concerns of a distribution shift between members and non-members. In the first part, we review the literature on MIAs against LLMs. While most work focuses on sequence-level MIAs evaluated in post-hoc setups, we show that a range of target models, motivations and units of interest have been considered in the literature. We then quantify distribution shifts present in the 6 datasets used in the literature, ranging from books to papers, using a bag of word classifier. Our analysis reveals that all of them suffer from severe distribution shifts. This challenges the validity of using such setups to measure LLM memorization and may undermine the benchmarking of recently proposed methods. Yet, all hope might not be lost. In the second part, we introduce important considerations to properly evaluate MIAs against LLMs and discuss potential ways forward: randomized test splits, injections of randomized (unique) sequences, randomized finetuning, and post-hoc control methods. While each option comes with its advantages and limitations, we believe they collectively provide solid grounds to guide the development of MIA methods and study LLM memorization. We conclude by proposing comprehensive, easy-to-use benchmarks for sequence- and document-level MIAs against LLMs.
ICML 2024
Copyright Traps for Large Language Models
Matthieu Meeus*, Igor Shilov*, Manuel Faysse, and Yves-Alexandre Montjoye
In Forty-first International Conference on Machine Learning, 2024
Questions of fair use of copyright-protected content to train Large Language Models (LLMs) are being very actively debated. Document-level inference has been proposed as a new task: inferring from black-box access to the trained model whether a piece of content has been seen during training. SOTA methods however rely on naturally occurring memorization of (part of) the content. While very effective against models that memorize a lot, we hypothesize–and later confirm–that they will not work against models that do not naturally memorize, e.g. medium-size 1B models. We here propose to use copyright traps, the inclusion of fictitious entries in original content, to detect the use of copyrighted materials in LLMs with a focus on models where memorization does not naturally occur. We carefully design an experimental setup, randomly inserting traps into original content (books) and train a 1.3B LLM. We first validate that the use of content in our target model would be undetectable using existing methods. We then show, contrary to intuition, that even medium-length trap sentences repeated a significant number of times (100) are not detectable using existing methods. However, we show that longer sequences repeated a large number of times can be reliably detected (AUC=0.75) and used as copyright traps. We further improve these results by studying how the number of times a sequence is seen improves detectability, how sequences with higher perplexity tend to be memorized more, and how taking context into account further improves detectability.
NeurIPS Workshop
Opacus: User-Friendly Differential Privacy Library in PyTorch
Ashkan Yousefpour, Igor Shilov, Alexandre Sablayrolles, Davide Testuggine, Karthik Prasad, Mani Malek, and 6 more authors
In NeurIPS Workshop on Privacy in Machine Learning, 2021
We introduce Opacus, a free, open-source PyTorch library for training deep learning models with differential privacy (hosted at opacus.ai). Opacus is designed for simplicity, flexibility, and speed. It provides a simple and user-friendly API, and enables machine learning practitioners to make a training pipeline private by adding as little as two lines to their code. It supports a wide variety of layers, including multi-head attention, convolution, LSTM, GRU (and generic RNN), and embedding, right out of the box and provides the means for supporting other user-defined layers. Opacus computes batched per-sample gradients, providing higher efficiency compared to the traditional "micro batch" approach. In this paper we present Opacus, detail the principles that drove its implementation and unique features, and benchmark it against other frameworks for training models with differential privacy as well as standard PyTorch.
Opacus: User-Friendly Differential Privacy Library in PyTorch
Tech Lead (Meta)
I was the lead developer and maintainer of Opacus, a PyTorch library to train ML models with Differential Privacy. With 1k+ stars on github and 300+ citations, the tool is helping to advance the state-of-the-art in privacy preserving ML both internally and for the wider community of researchers.
I have lead the team developing a privacy-preserving platform helping combat non-consensual intimate image sharing, a joint effort between Meta and a UK-based NGO running “Revenge Porn Helpline”. The platform takes advantage of on-device perceptual hashing to protect privacy. Since it’s launch, many major social media platforms has signed up as industry partners, inclusing Reddit, Snap, TilTok, OnlyFans and PornHub.