| CARVIEW |
About
I am a postdoctoral researcher at the Institute for Logic, Language, and Computation at the University of Amsterdam. My interests lie at the intersection of natural language processing and probabilistic modelling. My research is part of the European UTTER project.
Current Interests
- Probabilistic Modelling
- Natural Language Generation
- Minimum Bayes Risk
Selected Publications
Bryan Eikema, Germán Kruszewski, Cristopher R Dance, Hady Elsahar, Marc Dymetman in Transactions on Machine Learning Research, 2022
Energy-based models (EBMs) allow flexible specifications of probability distributions. However, sampling from EBMs is non-trivial, usually requiring approximate techniques such as Markov chain Monte Carlo (MCMC). A major downside of MCMC sampling is that it is often impossible to compute the divergence of the sampling distribution from the target distribution: therefore, the quality of the samples cannot be guaranteed. Here, we introduce quasi-rejection sampling (QRS), a simple extension of rejection sampling that performs approximate sampling, but, crucially, does provide divergence diagnostics (in terms of f-divergences, such as KL divergence and total variation distance). We apply QRS to sampling from discrete EBMs over text for controlled generation. We show that we can sample from such EBMs with arbitrary precision in exchange for sampling efficiency and quantify the trade-off between the two by means of the aforementioned diagnostics.
Bryan Eikema and Wilker Aziz in Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, 2022
In NMT we search for the mode of the model distribution to form predictions. The mode and other high-probability translations found by beam search have been shown to often be inadequate in a number of ways. This prevents improving translation quality through better search, as these idiosyncratic translations end up selected by the decoding algorithm, a problem known as the beam search curse. Recently, an approximation to minimum Bayes risk (MBR) decoding has been proposed as an alternative decision rule that would likely not suffer from the same problems. We analyse this approximation and establish that it has no equivalent to the beam search curse. We then design approximations that decouple the cost of exploration from the cost of robust estimation of expected utility. This allows for much larger hypothesis spaces, which we show to be beneficial. We also show that mode-seeking strategies can aid in constructing compact sets of promising hypotheses and that MBR is effective in identifying good translations in them. We conduct experiments on three language pairs varying in amounts of resources available: English into and from German, Romanian, and Nepali.
@inproceedings{eikema-aziz-2022-sampling,
author = {Bryan Eikema and
Wilker Aziz},
title = {Sampling-Based Approximations to Minimum Bayes Risk Decoding for Neural Machine Translation},
booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2022",
address = "Online and Abu Dhabi, United Arab Emirates",
publisher = "Association for Computational Linguistics",
}
Bryan Eikema and Wilker Aziz in Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020 Best Paper Award
Recent studies have revealed a number of pathologies of neural machine translation (NMT) systems. Hypotheses explaining these mostly suggest there is something fundamentally wrong with NMT as a model or its training algorithm, maximum likelihood estimation (MLE). Most of this evidence was gathered using maximum a posteriori (MAP) decoding, a decision rule aimed at identifying the highest-scoring translation, i.e. the mode. We argue that the evidence corroborates the inadequacy of MAP decoding more than casts doubt on the model and its training algorithm. In this work, we show that translation distributions do reproduce various statistics of the data well, but that beam search strays from such statistics. We show that some of the known pathologies and biases of NMT are due to MAP decoding and not to NMT's statistical assumptions nor MLE. In particular, we show that the most likely translations under the model accumulate so little probability mass that the mode can be considered essentially arbitrary. We therefore advocate for the use of decision rules that take into account the translation distribution holistically. We show that an approximation to minimum Bayes risk decoding gives competitive results confirming that NMT models do capture important aspects of translation well in expectation.
@inproceedings{eikema-aziz-2020-is,
title = "Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation",
author = "Eikema, Bryan and
Aziz, Wilker",
booktitle = "Proceedings of the 28th International Conference on Computational Linguistics",
month = dec,
year = "2020",
address = "Barcelona, Spain",
publisher = "Association for Computational Linguistics",
}
Talks
- Cambridge NLIP Seminar Series 2022: Decoding is deciding under uncertainty
- AI Seminar Series KU 2022: A Distribution-Aware Decision Rule for NMT
- Unbabel, ILLC CLS 2021: The Inadequacy of the Mode in NMT
- COLING 2020: Is MAP Decoding All You Need? The Inadequacy of the Mode in Neural Machine Translation