WACO: Word-Aligned Contrastive Learning for Speech Translation

Ouyang, Siqi; Ye, Rong; Li, Lei

Computer Science > Computation and Language

arXiv:2212.09359 (cs)

[Submitted on 19 Dec 2022 (v1), last revised 7 Jul 2023 (this version, v3)]

Title:WACO: Word-Aligned Contrastive Learning for Speech Translation

Authors:Siqi Ouyang, Rong Ye, Lei Li

View PDF

Abstract:End-to-end Speech Translation (E2E ST) aims to directly translate source speech into target text. Existing ST methods perform poorly when only extremely small speech-text data are available for training. We observe that an ST model's performance closely correlates with its embedding similarity between speech and source transcript. In this paper, we propose Word-Aligned COntrastive learning (WACO), a simple and effective method for extremely low-resource speech-to-text translation. Our key idea is bridging word-level representations for both speech and text modalities via contrastive learning. We evaluate WACO and other methods on the MuST-C dataset, a widely used ST benchmark, and on a low-resource direction Maltese-English from IWSLT 2023. Our experiments demonstrate that WACO outperforms the best baseline by 9+ BLEU points with only 1-hour parallel ST data. Code is available at this https URL.

Comments:	ACL 2023 Poster
Subjects:	Computation and Language (cs.CL); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Cite as:	arXiv:2212.09359 [cs.CL]
	(or arXiv:2212.09359v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2212.09359

Submission history

From: Siqi Ouyang [view email]
[v1] Mon, 19 Dec 2022 10:49:35 UTC (2,644 KB)
[v2] Tue, 27 Jun 2023 02:15:24 UTC (1,712 KB)
[v3] Fri, 7 Jul 2023 04:56:14 UTC (1,712 KB)

Computer Science > Computation and Language

Title:WACO: Word-Aligned Contrastive Learning for Speech Translation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:WACO: Word-Aligned Contrastive Learning for Speech Translation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators