| CARVIEW |
Select Language
HTTP/2 200
server: GitHub.com
content-type: text/html; charset=utf-8
last-modified: Tue, 04 Nov 2025 02:51:57 GMT
access-control-allow-origin: *
strict-transport-security: max-age=31556952
etag: W/"69096a4d-18c93"
expires: Mon, 29 Dec 2025 00:00:09 GMT
cache-control: max-age=600
content-encoding: gzip
x-proxy-cache: MISS
x-github-request-id: 0B30:3655F2:81E0BF:91CB32:6951C230
accept-ranges: bytes
date: Sun, 28 Dec 2025 23:50:09 GMT
via: 1.1 varnish
age: 0
x-served-by: cache-bom-vanm7210031-BOM
x-cache: MISS
x-cache-hits: 0
x-timer: S1766965809.248663,VS0,VE212
vary: Accept-Encoding
x-fastly-request-id: 4015ee064c5903bf69a845305c82195adf9e92d2
content-length: 22605
Kayo Yin
Hi, I'm Kayo
/kajo iɴ/
kayoyin🥸berkeley.edu
Vitae ⋅ Bio ⋅ Fun
Anonymous feedback
Hi, I'm Kayo
Bonjour, je suis Kayo
今日は、綺妤です
/kajo iɴ/ kayoyin🥸berkeley.edu
Vitae ⋅ Bio ⋅ Fun
Anonymous feedback
Hello! I'm a PhD student at UC Berkeley advised by Jacob Steinhardt and Dan Klein, affiliated with Berkeley AI Research and Berkeley NLP.
I work on AI alignment and NLP for signed languages, I also dabble in linguistics and cognitive science. I am grateful to be supported by a Future of Life PhD fellowship.
I did my master's at Carnegie Mellon University where I was fortunate to be advised by Graham Neubig, and I did my undergrad in math+cs at École Polytechnique. I also interned at Microsoft Research and DeepMind. In a previous life, I wanted to become a classical musician and I have a CEM from Conservatoire Frédéric Chopin.
Misc: I was born in Kobe, Japan and grew up in Paris, France. I took bagpipe lessons at CMU. I'm learning American Sign Language. I like to play music, backcountry snowboard, road bike, and jigsaw puzzle. Here's some of my favorite books.
I did my master's at Carnegie Mellon University where I was fortunate to be advised by Graham Neubig, and I did my undergrad in math+cs at École Polytechnique. I also interned at Microsoft Research and DeepMind. In a previous life, I wanted to become a classical musician and I have a CEM from Conservatoire Frédéric Chopin.
Misc: I was born in Kobe, Japan and grew up in Paris, France. I took bagpipe lessons at CMU. I'm learning American Sign Language. I like to play music, backcountry snowboard, road bike, and jigsaw puzzle. Here's some of my favorite books.
Upcoming events:
2025-12-06 Attending NeurIPS to co-organize Mechanistic Interpretability Workshop (San Diego, CA).
2025-10-10 Attending COLM to co-organize PragLM (Montréal, Canada).
2025-07-15 Attending ICML to present Which Attention Heads Matter for In-Context Learning? (Vancouver, Canada).
Past news:
2025-04-03 Gave an invited talk at Stanford NLP Seminar .
2025-02-24 Gave an invited talk at CMU Accessibility Lunch Seminar.
2025-01-14 Gave a stage presentation at TISLR.
2024-12-14 Co-organized the SoLaR Workshop at NeurIPS.
2024-10-23 Gave an invited talk at NLPコロキウム.
2024-07-27 Co-organized the Mechanistic Interpretability Workshop at ICML.
2024-06-19 Gave an invited talk at University of Melbourne.
2024-05-02 Gave an invited talk at EPFL.
2023-10-27 Gave an invited talk at Université Laval.
2023-07-10 Extremely thrilled to receive the Best Resource Paper award at ACL 2023!
2023-05-15 I started my internship at Microsoft Research! Ping me if you want to meet up in NYC :)
2023-04-28 Gave an invited talk at KUNGFU.AI.
2023-04-26 Gave an invited talk at Sony CSL.
2023-02-10 Gave an invited talk at the University of Chicago and Toyota Technological Institute at Chicago.
2022-12-19 Gave an invited talk at the University of Melbourne.
2022-12-11 Extremely honored to receive the Best Paper Honorable Mention award at EMNLP 2022!
2022-08-19 Gave an invited talk at the Workshop on Pronouns and Machine Translation.
2022-07-27 Gave an invited presentation at IJCAI on Including Signed Languages in NLP. My first in-person conference yay!
2022-07-09 Gave a keynote talk at the Queer in AI Workshop @NAACL.
2022-06-06 I started my internship at DeepMind! If you're in London this summer, let's meet up :)
2022-05-19 Guested on the NLP Highlights Podcast.
2022-04-15 I will join UC Berkeley for my PhD next Fall!
2021-11-05 Gave an invited talk at DeepMind on Natural Language Processing for Signed Languages
2021-10-07 Gave an invited talk at University of Pittsburgh on Extending Neural Machine Translation to Dialogue and Signed Languages
2021-09-23 Extremely honored to be selected as a Siebel Scholar Class of 2022!
2021-09-17 Gave an invited talk at SIGTYP on Understanding, Improving and Evaluating Context Usage in Context-aware Machine Translation
2021-07-05 Extremely thrilled to receive the Best Theme Paper award at ACL 2021!
2021-03-01 Gave an invited talk at Unbabel on Do Context-Aware Translation Models Pay the Right Attention?
2020-10-18 Gave an invited talk at Computer Vision Talks on Sign Language Translation with Transformers
2020-09-21 Extremely honored to be awarded Global Winner in Computer Science at The Global Undergraduate Awards 2020!
2020-08-31 Started my Master's degree at CMU LTI!
Past news:
Publications
* = equal contribution-
Which Attention Heads Matter for In-Context Learning?
Kayo Yin, Jacob Steinhardt.
ICML 2025.
PDF Code TweetLarge language models (LLMs) exhibit impressive in-context learning (ICL) capability, enabling them to perform new tasks using only a few demonstrations in the prompt. Two different mechanisms have been proposed to explain ICL: induction heads that find and copy relevant tokens, and function vector (FV) heads whose activations compute a latent encoding of the ICL task. To better understand which of the two distinct mechanisms drives ICL, we study and compare induction heads and FV heads in 12 language models.
Through detailed ablations, we discover that few-shot ICL performance depends primarily on FV heads, especially in larger models. In addition, we uncover that FV and induction heads are connected: many FV heads start as induction heads during training before transitioning to the FV mechanism. This leads us to speculate that induction facilitates learning the more complex FV mechanism that ultimately drives ICL.@article{yin-icl-heads,
title = {Which Attention Heads Matter for In-Context Learning?},
author = {Yin, Kayo and Steinhardt, Jacob},
journal = {2025 International Conference on Machine Learning},
month = {July},
year = {2025}
}
-
Understanding In-context Learning of Addition via Activation Subspaces
Xinyan Hu, Kayo Yin, Michael I. Jordan, Lijie Chen, Jacob Steinhardt.
arXiv 2025.
To perform in-context learning, language models must extract signals from individual few-shot examples, aggregate these into a learned prediction rule, and then apply this rule to new examples. How is this implemented in the forward pass of modern transformer models? To study this, we consider a structured family of few-shot learning tasks for which the true prediction rule is to add an integer k to the input. We find that Llama-3-8B attains high accuracy on this task for a range of k, and localize its few-shot ability to just three attention heads. We further show that each head represents the extracted signals in a six-dimensional subspace, where four of the dimensions track the unit digit and the other two dimensions track overall magnitude. We finally examine how these heads extract information from individual few-shot examples, identifying a self-correction mechanism in which mistakes from earlier examples are suppressed by later examples. Our findings shed light on the computational structure of pretrained transformer models and suggest that tasks are localizable to low-dimensional activation subspaces.@article{hu-icl-add,
title = {Understanding In-context Learning of Addition via Activation Subspaces},
author = {Hu, Xinyan and Yin, Kayo and I. Jordan, Michael and Chen, Lijie and Steinhardt, Jacob},
journal = {Preprint},
month = {May},
year = {2025}
}
-
SHADES: Towards a Multilingual Assessment of Stereotypes in Large Language Models
Margaret Mitchell, ... Kayo Yin ... Aurélie Névéol, Zeerak Talat et al..
NAACL 2025.
PDF CodeLarge Language Models (LLMs) reproduce and exacerbate the social biases present in their training data, and resources to quantify this issue are limited. While research has attempted to identify and mitigate such biases, most efforts have been concentrated around English, lagging the rapid advancement of LLMs in multilingual settings. In this paper, we introduce a new multilingual parallel dataset SHADES to help address this issue, designed for examining culturally-specific stereotypes that may be learned by LLMs. The dataset includes stereotypes from 20 regions around the world and 16 languages, spanning multiple identity categories subject to discrimination worldwide. We demonstrate its utility in a series of exploratory evaluations for both “base” and “instruction-tuned” language models. Our results suggest that stereotypes are consistently reflected across models and languages, with some languages and models indicating much stronger stereotype biases than others.@article{mitchell-etal-2025-shades,
title = "{SHADES}: Towards a Multilingual Assessment of Stereotypes in Large Language Models",
author = "Mitchell, Margaret and Attanasio, Giuseppe and Baldini, Ioana and Clinciu, Miruna and Clive, Jordan and Delobelle, Pieter and Dey, Manan and
Hamilton, Sil and Dill, Timm and Doughman, Jad and Dutt, Ritam and Ghosh, Avijit and Forde, Jessica Zosa and Holtermann, Carolin and
Kaffee, Lucie-Aim{\'e}e and Laud, Tanmay and Lauscher, Anne and Lopez-Davila, Roberto L and Masoud, Maraim and Nangia, Nikita and
Ovalle, Anaelia and Pistilli, Giada and Radev, Dragomir and Savoldi, Beatrice and Raheja, Vipul and Qin, Jeremy and
Ploeger, Esther and Subramonian, Arjun and Dhole, Kaustubh and Sun, Kaiser and Djanibekov, Amirbek and Mansurov, Jonibek and
Yin, Kayo and Cueva, Emilio Villa and Mukherjee, Sagnik and Huang, Jerry and Shen, Xudong and Gala, Jay and Al-Ali, Hamdan and
Tair Djanibekov and Mukhituly, Nurdaulet and Nie, Shangrui and Sharma, Shanya and Stanczak, Karolina and Szczechla, Eliza and
Timponi Torrent, Tiago and Tunuguntla, Deepak and Viridiano, Marcelo and Van Der Wal, Oskar and Yakefu, Adina and Zhang, Mike and Zink, Sydney and
N{\'e}v{\'e}ol, Aur{\'e}lie and Talat, Zeerak",
journal = {Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics",
month = {April},
year = {2025}
}
-
Evaluating the Diversity and Quality of LLM Generated Content
Alexander Shypula, Shuo Li, Botong Zhang, Vishakh Padmakumar, Kayo Yin, Osbert Bastani.
COLM 2025.
PDFRecent work suggests that preference-tuning techniques--including Reinforcement Learning from Human Preferences (RLHF) methods like PPO and GRPO, as well as alternatives like DPO--reduce diversity, creating a dilemma given that such models are widely deployed in applications requiring diverse outputs. To address this, we introduce a framework for measuring effective semantic diversity--diversity among outputs that meet quality thresholds--which better reflects the practical utility of large language models (LLMs). Using open-ended tasks that require no human intervention, we find counterintuitive results: although preference-tuned models--especially those trained via RL--exhibit reduced lexical and syntactic diversity, they produce greater effective semantic diversity than SFT or base models, not from increasing diversity among high-quality outputs, but from generating more high-quality outputs overall. We discover that preference tuning reduces syntactic diversity while preserving semantic diversity--revealing a distinction between diversity in form and diversity in content that traditional metrics often overlook. Our analysis further shows that smaller models are consistently more parameter-efficient at generating unique content within a fixed sampling budget, offering insights into the relationship between model scaling and diversity. These findings have important implications for applications that require diverse yet high-quality outputs, from creative assistance to synthetic data generation.@article{shypula2025evaluating,
title={Evaluating the Diversity and Quality of LLM Generated Content},
author = {Shypula, Alexander and Li, Shuo and Zhang, Botong and Padmakumar, Vishakh and Yin, Kayo and Bastani, Osbert},
journal = {ICLR 2025 Third Workshop on Deep Learning for Code},
month = {April},
year = {2025}
}
-
ASL STEM Wiki: Dataset and Benchmark for Interpreting STEM Articles
Kayo Yin, Chinmay Singh, Fyodor O Minakov, Vanessa Milan, Hal Daumé III, Cyril Zhang, Alex Xijie Lu, Danielle Bragg.
EMNLP 2024.
PDF Code Video TweetDeaf and hard-of-hearing (DHH) students face significant barriers in accessing science, technology, engineering, and mathematics (STEM) education, notably due to the scarcity of STEM resources in signed languages. To help address this, we introduce ASL STEM Wiki: a parallel corpus of 254 Wikipedia articles on STEM topics in English, interpreted into 300 hours of American Sign Language (ASL). ASL STEM Wiki is the first continuous signing dataset focused on STEM, facilitating the development of AI resources for STEM education in ASL. We identify several use cases of ASL STEM Wiki with human-centered applications. For example, because this dataset highlights the frequent use of fingerspelling in technical language, which inhibits DHH students' ability to learn, we develop models to identify fingerspelled signs---which may later be used to query for appropriate ASL signs to suggest to interpreters.@inproceedings{yin24emnlp,
title = {{ASL} {STEM} Wiki: Dataset and Benchmark for Interpreting {STEM} Articles},
author = Yin, Kayo and Singh, Chinmay and O Minakov, Fyodor and Milan, Vanessa and Daum{'e} III, Hal and Zhang, Cyril and Xijie Lu, Alex and Bragg, Danielle
booktitle = {Annual Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {November},
year = {2024}
}
-
Using Language Models to Disambiguate Lexical Choices in Translation
Josh Barua, Sanjay Subramanian, Kayo Yin, Alane Suhr.
EMNLP 2024.
PDF Code TweetIn translation, a concept represented by a single word in a source language can have multiple variations in a target language. We introduce a dataset and evaluate language models on the task of lexical selection, which requires using context to identify which variation is most appropriate for a source text. We work with native speakers of nine languages, including seven low-resource languages, to collect a dataset of 1,377 sentence pairs that exhibit cross-lingual concept variation when translating from English. We evaluate recent LLMs and neural machine translation systems on lexical selection, with the best-performing model, GPT-4, achieving from 67 to 85% accuracy across languages. Finally, we use language models to generate English rules describing target-language concept variations. Providing weaker models with high-quality lexical rules improves accuracy substantially, in some cases reaching or outperforming GPT-4.@inproceedings{barua24emnlp,
title = {Using Language Models to Disambiguate Lexical Choices in Translation},
author = Barua, Josh and Subramanian, Sanjay and Yin, Kayo and Suhr, Alane
booktitle = {Annual Conference on Empirical Methods in Natural Language Processing (EMNLP)},
month = {November},
year = {2024}
}
-
American Sign Language Handshapes Reflect Pressures for Communicative Efficiency
Kayo Yin, Terry Regier, Dan Klein.
ACL 2024.
PDF Code TweetCommunicative efficiency is a key topic in linguistics and cognitive psychology, with many studies demonstrating how the pressure to communicate with minimal effort guides the form of natural language. However, this phenomenon is rarely explored in signed languages. This paper shows how handshapes in American Sign Language (ASL) reflect these efficiency pressures and provides new evidence of communicative efficiency in the visual-gestural modality.
We focus on hand configurations in native ASL signs and signs borrowed from English to compare efficiency pressures from both ASL and English usage. First, we develop new methodologies to quantify the articulatory effort needed to produce handshapes and the perceptual effort required to recognize them. Then, we analyze correlations between communicative effort and usage statistics in ASL or English. Our findings reveal that frequent ASL handshapes are easier to produce and that pressures for communicative efficiency mostly come from ASL usage, rather than from English lexical borrowing.@inproceedings{yin24acl,
title = {Pressures for Communicative Efficiency in American Sign Language},
author = {Yin, Kayo and Regier, Terry and Klein, Dan},
booktitle = {Annual Conference of the Association for Computational Linguistics (ACL)},
month = {August},
year = {2024}
}
-
🏆 Best Resource Paper
When Does Translation Require Context? A Data-driven, Multilingual Exploration
Patrick Fernandes*, Kayo Yin*, Emmy Liu, André F. T. Martins, Graham Neubig.
ACL 2023.
PDF Code Video TweetAlthough proper handling of discourse phenomena significantly contributes to the quality of machine translation (MT), common translation quality metrics do not adequately capture them. Recent works in context-aware MT attempt to target a small set of these phenomena during evaluation. In this paper, we propose a new metric, P-CXMI, which allows us to identify translations that require context systematically and confirm the difficulty of previously studied phenomena as well as uncover new ones that have not been addressed in previous work. We then develop the Multilingual Discourse-Aware (MuDA) benchmark, a series of taggers for these phenomena in 14 different language pairs, which we use to evaluate context-aware MT. We find that state-of-theart context-aware MT models find marginal improvements over context-agnostic models on our benchmark, which suggests current models do not handle these ambiguities effectively. We release code and data to invite the MT research community to increase efforts on context-aware translation on discourse phenomena and languages that are currently overlooked.@inproceedings{fernandes23acl,
title = {When Does Translation Require Context? A Data-driven, Multilingual Exploration},
author = {Patrick Fernandes and Kayo Yin and Emmy Liu, André Martins, Graham Neubig},
booktitle = {Annual Conference of the Association for Computational Linguistics (ACL)},
month = {July},
year = {2023}
}
-
🏆 Best Paper Runner-Up
Interpreting Language Models with Contrastive Explanations
Kayo Yin, Graham Neubig.
EMNLP 2022.
PDF Code Video TweetModel interpretability methods are often used to explain NLP model decisions on tasks such as text classification, where the output space is relatively small. However, when applied to language generation, where the output space often consists of tens of thousands of tokens, these methods are unable to provide informative explanations. Language models must consider various features to predict a token, such as its part of speech, number, tense, or semantics. Existing explanation methods conflate evidence for all these features into a single explanation, which is less interpretable for human understanding. To disentangle the different decisions in language modeling, we focus on explaining language models contrastively: we look for salient input tokens that explain why the model predicted one token instead of another. We demonstrate that contrastive explanations are quantifiably better than non-contrastive explanations in verifying major grammatical phenomena, and that they significantly improve contrastive model simulatability for human observers. We also identify groups of contrastive decisions where the model uses similar evidence, and we are able to characterize what input tokens models use during various language generation decisions.@article{yin2022interpreting,
title = "Interpreting Language Models with Contrastive Explanations",
author = "Yin, Kayo and Neubig, Graham",
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = dec,
year = "2022",
}
-
Signed Coreference Resolution
Kayo Yin, Kenneth DeHaan, Malihe Alikhani.
EMNLP 2021.
PDF Code VideoCoreference resolution is key to many natural language processing tasks and yet has only been explored for spoken languages. In signed languages, space is primarily used to establish reference. Solving coreference resolution for signed languages would not only enable higher-level Sign Language Processing systems, but also enhance our understanding of language in different modalities and of situated references, which are key problems in studying grounded language. In this paper, we: (1) introduce Signed Coreference Resolution, a new challenge for coreference modeling and Sign Language Processing; (2) collect an annotated corpus of German Sign Language with gold labels for coreference together with an annotation software for the task; (3) explore features of hand gesture, iconicity, and spatial situated properties and move forward to propose a set of linguistically informed heuristics and unsupervised models for the task; (4) put forward several proposals about ways to address the complexities of this challenge effectively. Finally, we invite the NLP community to collaborate with signing communities and direct efforts towards SCR to close this gap.@inproceedings{yin-etal-2021-signed,
title = "Signed Coreference Resolution",
author = "Yin, Kayo and DeHaan, Kenneth and Alikhani, Malihe",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2021",
address = "Online and Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.emnlp-main.405",
pages = "4950--4961",
}
-
When is Wall a Pared and when a Muro?: Extracting Rules Governing Lexical Selection
Aditi Chaudhary, Kayo Yin, Antonios Anastasopoulos, Graham Neubig.
EMNLP 2021.
PDF Code TweetLearning fine-grained distinctions between vocabulary items is a key challenge in learning a new language. For example, the noun ``wall'' has different lexical manifestations in Spanish -- ``pared'' refers to an indoor wall while ``muro'' refers to an outside wall. However, this variety of lexical distinction may not be obvious to non-native learners unless the distinction is explained in such a way. In this work, we present a method for automatically identifying fine-grained lexical distinctions, and extracting rules explaining these distinctions in a human- and machine-readable format. We confirm the quality of these extracted rules in a language learning setup for two languages, Spanish and Greek, where we use the rules to teach non-native speakers when to translate a given ambiguous word into its different possible translations@inproceedings{chaudhary21emnlp,
title = "When is Wall a Pared and when a Muro?: Extracting Rules Governing Lexical Selection",
author = "Chaudhary, Aditi and Yin, Kayo and Anastasopoulos, Antonios and Neubig, Graham",
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
month = nov,
year = "2021",
}
-
🏆 Best Theme Paper
Including Signed Languages in Natural Language Processing
Kayo Yin, Amit Moryossef, Julie Hochgesang, Yoav Goldberg, Malihe Alikhani.
ACL 2021.
PDF Video TweetSigned languages are the primary means of communication for many deaf and hard of hearing individuals. Since signed languages exhibit all the fundamental linguistic properties of natural language, we believe that tools and theories of Natural Language Processing (NLP) are crucial towards its modeling. However, existing research in Sign Language Processing (SLP) seldom attempt to explore and leverage the linguistic organization of signed languages. This position paper calls on the NLP community to include signed languages as a research area with high social and scientific impact. We first discuss the linguistic properties of signed languages to consider during their modeling. Then, we review the limitations of current SLP models and identify the open challenges to extend NLP to signed languages. Finally, we urge (1) the adoption of an efficient tokenization method; (2) the development of linguistically-informed models; (3) the collection of real-world signed language data; (4) the inclusion of local signed language communities as an active and leading voice in the direction of research.@inproceedings{yin-etal-2021-including,
title = "Including Signed Languages in Natural Language Processing",
author = "Yin, Kayo and Moryossef, Amit and Hochgesang, Julie and Goldberg, Yoav and Alikhani, Malihe",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.570",
pages = "7347--7360",
abstract = "Signed languages are the primary means of communication for many deaf and hard of hearing individuals. Since signed languages exhibit all the fundamental linguistic properties of natural language, we believe that tools and theories of Natural Language Processing (NLP) are crucial towards its modeling. However, existing research in Sign Language Processing (SLP) seldom attempt to explore and leverage the linguistic organization of signed languages. This position paper calls on the NLP community to include signed languages as a research area with high social and scientific impact. We first discuss the linguistic properties of signed languages to consider during their modeling. Then, we review the limitations of current SLP models and identify the open challenges to extend NLP to signed languages. Finally, we urge (1) the adoption of an efficient tokenization method; (2) the development of linguistically-informed models; (3) the collection of real-world signed language data; (4) the inclusion of local signed language communities as an active and leading voice in the direction of research.",
}
-
Do Context-Aware Translation Models Pay the Right Attention?
Kayo Yin, Patrick Fernandes, Danish Pruthi, Aditi Chaudhary, André F. T. Martins, Graham Neubig.
ACL 2021.
PDF Code Video TweetContext-aware machine translation models are designed to leverage contextual information, but often fail to do so. As a result, they inaccurately disambiguate pronouns and polysemous words that require context for resolution. In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words? Are models paying large amounts of attention to the same context? What if we explicitly train them to do so? To answer these questions, we introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations that professional translators found useful for pronoun disambiguation. Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words. Furthermore, we measure the degree of alignment between the model's attention scores and the supporting context from SCAT, and apply a guided attention strategy to encourage agreement between the two.@inproceedings{yin-etal-2021-context,
title = "Do Context-Aware Translation Models Pay the Right Attention?",
author = "Yin, Kayo and Fernandes, Patrick and Pruthi, Danish and Chaudhary, Aditi and Martins, Andr{\'e} F. T. and Neubig, Graham",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.65",
pages = "788--801",
abstract = "Context-aware machine translation models are designed to leverage contextual information, but often fail to do so. As a result, they inaccurately disambiguate pronouns and polysemous words that require context for resolution. In this paper, we ask several questions: What contexts do human translators use to resolve ambiguous words? Are models paying large amounts of attention to the same context? What if we explicitly train them to do so? To answer these questions, we introduce SCAT (Supporting Context for Ambiguous Translations), a new English-French dataset comprising supporting context words for 14K translations that professional translators found useful for pronoun disambiguation. Using SCAT, we perform an in-depth analysis of the context used to disambiguate, examining positional and lexical characteristics of the supporting words. Furthermore, we measure the degree of alignment between the model{'}s attention scores and the supporting context from SCAT, and apply a guided attention strategy to encourage agreement between the two.",
}
-
Measuring and Increasing Context Usage in Context-Aware Machine Translation
Patrick Fernandes, Kayo Yin, Graham Neubig, André F. T. Martins.
ACL 2021.
PDF Code TweetRecent work in neural machine translation has demonstrated both the necessity and feasibility of using inter-sentential context -- context from sentences other than those currently being translated. However, while many current methods present model architectures that theoretically can use this extra context, it is often not clear how much they do actually utilize it at translation time. In this paper, we introduce a new metric, conditional cross-mutual information, to quantify the usage of context by these models. Using this metric, we measure how much document-level machine translation systems use particular varieties of context. We find that target context is referenced more than source context, and that conditioning on a longer context has a diminishing effect on results. We then introduce a new, simple training method, context-aware word dropout, to increase the usage of context by context-aware models. Experiments show that our method increases context usage and that this reflects on the translation quality according to metrics such as BLEU and COMET, as well as performance on anaphoric pronoun resolution and lexical cohesion contrastive datasets.@inproceedings{fernandes-etal-2021-measuring,
title = "Measuring and Increasing Context Usage in Context-Aware Machine Translation",
author = "Fernandes, Patrick and Yin, Kayo and Neubig, Graham and Martins, Andr{\'e} F. T.",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
month = aug,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.acl-long.505",
pages = "6467--6478",
abstract = "Recent work in neural machine translation has demonstrated both the necessity and feasibility of using inter-sentential context, context from sentences other than those currently being translated. However, while many current methods present model architectures that theoretically can use this extra context, it is often not clear how much they do actually utilize it at translation time. In this paper, we introduce a new metric, conditional cross-mutual information, to quantify usage of context by these models. Using this metric, we measure how much document-level machine translation systems use particular varieties of context. We find that target context is referenced more than source context, and that including more context has a diminishing affect on results. We then introduce a new, simple training method, context-aware word dropout, to increase the usage of context by context-aware models. Experiments show that our method not only increases context usage, but also improves the translation quality according to metrics such as BLEU and COMET, as well as performance on anaphoric pronoun resolution and lexical cohesion contrastive datasets.",
}
-
Data Augmentation for Sign Language Gloss Translation
Amit Moryossef*, Kayo Yin*, Graham Neubig, Yoav Goldberg.
MTSummit 2021 Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL).
PDFSign language translation (SLT) is often decomposed into video-to-gloss recognition and gloss-to-text translation, where a gloss is a sequence of transcribed spoken-language words in the order in which they are signed. We focus here on gloss-to-text translation, which we treat as a low-resource neural machine translation (NMT) problem. However, unlike traditional low-resource NMT, gloss-to-text translation differs because gloss-text pairs often have a higher lexical overlap and lower syntactic overlap than pairs of spoken languages. We exploit this lexical overlap and handle syntactic divergence by proposing two rule-based heuristics that generate pseudo-parallel gloss-text pairs from monolingual spoken language text. By pre-training on the thus obtained synthetic data, we improve translation from American Sign Language (ASL) to English and German Sign Language (DGS) to German by up to 3.14 and 2.20 BLEU, respectively.@inproceedings{moryossef-etal-2021-data,
title = "Data Augmentation for Sign Language Gloss Translation",
author = "Moryossef, Amit and Yin, Kayo and Neubig, Graham and Goldberg, Yoav",
booktitle = "Proceedings of the 1st International Workshop on Automatic Translation for Signed and Spoken Languages (AT4SSL)",
month = aug,
year = "2021",
address = "Virtual",
publisher = "Association for Machine Translation in the Americas",
url = "https://aclanthology.org/2021.mtsummit-at4ssl.1",
pages = "1--11",
abstract = "Sign language translation (SLT) is often decomposed into video-to-gloss recognition and gloss to-text translation, where a gloss is a sequence of transcribed spoken-language words in the order in which they are signed. We focus here on gloss-to-text translation, which we treat as a low-resource neural machine translation (NMT) problem. However, unlike traditional low resource NMT, gloss-to-text translation differs because gloss-text pairs often have a higher lexical overlap and lower syntactic overlap than pairs of spoken languages. We exploit this lexical overlap and handle syntactic divergence by proposing two rule-based heuristics that generate pseudo-parallel gloss-text pairs from monolingual spoken language text. By pre-training on this synthetic data, we improve translation from American Sign Language (ASL) to English and German Sign Language (DGS) to German by up to 3.14 and 2.20 BLEU, respectively.", }
-
🏆 Global Undergraduate Award
Better Sign Language Translation with STMC-Transformer
Kayo Yin, Jesse Read.
COLING 2020.
PDF CodeSign Language Translation (SLT) first uses a Sign Language Recognition (SLR) system to extract sign language glosses from videos. Then, a translation system generates spoken language translations from the sign language glosses. This paper focuses on the translation system and introduces the STMC-Transformer which improves on the current state-of-the-art by over 5 and 7 BLEU respectively on gloss-to-text and video-to-text translation of the PHOENIX-Weather 2014T dataset. On the ASLG-PC12 corpus, we report an increase of over 16 BLEU. We also demonstrate the problem in current methods that rely on gloss supervision. The video-to-text translation of our STMC-Transformer outperforms translation of GT glosses. This contradicts previous claims that GT gloss translation acts as an upper bound for SLT performance and reveals that glosses are an inefficient representation of sign language. For future SLT research, we therefore suggest an end-to-end training of the recognition and translation models, or using a different sign language annotation scheme.@inproceedings{yin-read-2020-better,
title = "Better Sign Language Translation with {STMC}-Transformer",
author = "Yin, Kayo and Read, Jesse",
booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
month = December,
year = "2020",
address = "Barcelona, Spain (Online)", publisher = "International Committee on Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.coling-main.525",
doi = "10.18653/v1/2020.coling-main.525",
pages = "5975--5989",
}
-
Sign Language Translation with Transformers
Kayo Yin, Jesse Read.
ECCV 2020 Workshop on Sign Language Recognition, Translation and Production (SLRTP).
PDF Code VideoThis paper improves the translation system in Sign Language Translation (SLT) by using Transformers. We report a wide range of experimental results for various Transformer setups and introduce a novel end-to-end SLT system combining Spatial-Temporal Multi-Cue (STMC) and Transformer networks. Our methodology improves on the current state-of-the-art by over 5 and 7 BLEU respectively on ground truth (GT) glosses and predicted glosses of the PHOENIX-Weather 2014T dataset. On the ASLG-PC12 corpus, we report an improvement of over 16 BLEU. Our findings also reveal that end-to-end translation with predicted glosses outperforms translation on GT glosses. This shows the potential for further improvement in SLT by either jointly training the SLR and translation systems or by revising the gloss annotation scheme.@inproceedings{yin2020attention,
title={{Attention is All You Sign: Sign Language Translation with Transformers}},
author={Yin, Kayo and Read, Jesse},
booktitle={Sign Language Recognition, Translation and Production (SLRTP) Workshop-Extended Abstracts},
volume={4},
year={2020}
}Talks
2025
- Talk NLP for Signed Languages: Challenges and Opportunities (Stanford NLP Seminar)
- Talk AI for Signed Languages: Challenges and Opportunities (CMU Accessibility Lunch Seminar)
- Talk Pressures for Communicative Efficiency in American Sign Language (TISLR)
2024
- Talk 手話の自然言語処理 (NLPコロキウム)
- Talk Natural Language Processing for Signed Languages (University of Melbourne)
- Talk Natural Language Processing for Signed Languages (EPFL)
2023
- Talk L'inclusion des langues des signes dans le traitement du langage naturel - v0 (Université Laval)
- Talk Interpreting Language Models with Contrastive Explanations (KUNGFU.AI)
- Talk Natural Language Processing for Signed Languages - v0 (Sony CSL)
- Talk Natural Language Processing for Signed Languages - v0 (University of Chicago & Toyota Technological Institute at Chicago)
2022
- Talk Interpreting Language Models with Contrastive Explanations (University of Melbourne)
- Talk Understanding, Improving and Evaluating Context Usage in Context-aware Translation (Workshop on Pronouns and Machine Translation)
- Talk Queer Impostor Syndrome (Queer in AI Workshop @ NAACL)
- Talk Including Signed Languages in NLP (NLP Highlights)
- Interview Student Spotlight: Kayo Yin
2021
- Talk Natural Language Processing for Signed Languages - v0 (DeepMind)
- Interview Alumni Stories: Kayo Yin, Class of 2017
- Talk Extending Neural Machine Translation to Dialogue and Signed Languages (University of Pittsburgh)
- Talk Understanding, Improving and Evaluating Context Usage in Context-aware Machine Translation (SIGTYP)
- Interview LTI Master's Student Urges NLP Focus on Signed Languages
- Talk Do Context-Aware Translation Models Pay the Right Attention? (Unbabel)
2020
- Talk Sign Language Translation with Transformers (UA Global Summit)
- Interview Bachelor of Science Alumni - Kayo at Carnegie Mellon University
- Talk Sign Language Translation with Transformers (Computer Vision Talks)
- Interview Graduate of l’X Bachelor of Science, Kayo Yin wins the 2020 Global Undergraduate Awards
- Interview Testimony: Kayo Yin, BX2020 Maths/Computer Science Student
2017-2019
Selected Awards
- 2023 Best Resource Paper, ACL
- 2023-2027 Future of Life Fellowship
- 2022 Best Paper Honorable Mention, EMNLP
- 2022-2023 Berkeley Fellowship
- 2021-2022 Siebel Scholarship
- 2021 Best Theme Paper, ACL
- 2020-2022 Carnegie Mellon University Research Fellowship
- 2020 Global Winner, The Global Undergraduate Awards
- 2015 Gold medal, Concours Kangourou des Mathématiques (6th place out of 13011)
- 2012 Gold medal, Concours Kangourou des Mathématiques (5th place out of 53937)
Copyright © Kayo Yin 2021-2025
Last updated January 1, 2024
