CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Tue, 10 Oct 2023 05:12:10 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"6524dd2a-3d7a" expires: Mon, 29 Dec 2025 18:35:34 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: 33DA:444BC:93D79A:A5B93D:6952C79D accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 18:25:34 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210066-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1767032734.275866,VS0,VE218 vary: Accept-Encoding x-fastly-request-id: 9c52cfdf97c19c3c4457e471e5f15f58272ae87a content-length: 5963 Veselin Stoyanov

e-mail Google Scholar LinkedIn Twitter Resume

Veselin Stoyanov

Applied Research Leader excited about turning current powerful AI models into new products that enhance productivity and creativity. Strong track record of innovating in AI and NLP to solve real-world problems. Led teams to create industry-standard pretraining and large LM methods such as RoBERTa, XLM-R and MultiRay and apply them to improve products, e.g., reduce the prevalence of hate speech and bullying posts. Experienced in building and motivating high-performing diverse teams and mentoring researchers and engineers.

Experience

Tome AI

Head of ML/AI: Apr 2023 - current

Leading efforts to develop new paradigms for AI-powered products in the productivity space.

Facebook / Meta Inc, Menlo Park, CA

Applied Research Scientist Manager: Jul 2018 - Nov 2022
Research Scientist: Jan 2013 - Jul 2018

Project Highlights

MultiRay
Built a service to run multiple very large and accurate models on the same input, and share the majority of the computational costs. MultiRay makes it possible for very accurate self-supervised models to be run on every piece of content. (paper, blog)
Cross-lingual NLP through XLM-R
Trained XLM-R, a state-of-the-art large-scale multilingual language model (paper, blog) and applied it to extend Integrity classifiers to many languages (blog). Extended upon previous work on multilingual word embeddings (blog).
RoBERTa and applications to Integrity
Trained RoBERTa, a robustly optimized BERT pretraining approach, a state-of-the-art self-supervised method (paper, blog, blog). Applied it to identifying violations such as hate speech (blog) and bullying. (paper, blog)
Neural Machine Translation
Shipped the first large-scale commercial Neural MT system with big improvements to translation quality. (blog, news)
NLP for Search
Shipped several impactful NLP features to Facebook Search including phonetic name search, intent classification and keyword typeahead.

Center for Language and Speech Processing (CLSP), Johns Hopkins University

Assistant Research Scientist: Oct 2010 - Jan 2013

Computing Innovation Fellowship (awarded by CRA).
Performed research on Machine Learning for Structured Prediction.

Education

Cornell University

PhD in Computer Science: Aug 2010
MSc in Computer Science: Aug 2006

Advisor: Prof. Claire Cardie. Thesis title: Opinion Summarization: Automatically Creating Useful Representations of Opinions Expressed in Text.

University of Delaware

Honors BSc, with Distinction in Computer Science: May 2002

Graduated Summa Cum Laude; GPA: 4.00/4.00. Minors in Mathematics and Cognitive Science.

Selected Publications

Full publication list available on Google Scholar

RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, Veselin Stoyanov
BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, Luke Zettlemoyer
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, Luke Zettlemoyer, Veselin Stoyanov
XNLI: Evaluating Cross-lingual Sentence Representations
Alexis Conneau, Guillaume Lample, Ruty Rinott, Adina Williams, Samuel R Bowman, Holger Schwenk, Veselin Stoyanov
Supervised Contrastive Learning for Pre-trained Language Model Fine-tuning
Beliz Gunel, Jingfei Du, Alexis Conneau, Veselin Stoyanov
Emerging Cross-lingual Structure in Pretrained Language Models
Alexis Conneau, Shijie Wu, Haoran Li, Luke Zettlemoyer, Veselin Stoyanov
Pretrained Encyclopedia: Weakly supervised knowledge-pretrained language model
Wenhan Xiong, Jingfei Du, William Wang, Veselin Stoyanov
Preserving integrity in online social networks
Alon Halevy, Cristian Canton-Ferrer, Hao Ma, Umut Ozertem, Patrick Pantel, Marzieh Saeidi, Fabrizio Silvestri, Veselin Stoyanov
Empirical risk minimization of graphical model parameters given approximate inference, decoding, and model structure
Veselin Stoyanov, Alexander Ropson, Jason Eisner
Conundrums in noun phrase coreference resolution: Making sense of the state-of-the-art
Veselin Stoyanov, Nathan Gilbert, Claire Cardie, Ellen Riloff

Full publication list available on Google Scholar

Personal

Outside of work I am an avid runner. I enjoy cooking and all things cullinary and traveling. I love learning languages and can speak Bulgarian, English, Spanish, some Russian, Serbian, Croatian and Japanese.

Hosted on GitHub Pages — Theme by orderedlist

Original Source | Taken Source