HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Sat, 04 Oct 2025 23:27:10 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"68e1ad4e-7ba3" expires: Mon, 29 Dec 2025 01:04:14 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: CDDB:292AC1:81CD5D:91D54A:6951D131 accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 00:54:14 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210033-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766969654.938865,VS0,VE204 vary: Accept-Encoding x-fastly-request-id: 7834b47b1302ebda943c2b69baa5a19ff25b5f0b content-length: 9431 Weiting Tan

about (current)
publications
cv

Weiting Tan

Center for Language and Speech Processing, Johns Hopkins University

Hi👋, I’m Weiting (Steven) Tan! I’m currently a PhD Candidate in Computer Science at Johns Hopkins University CLSP, advised by Prof. Philipp Koehn. Previously, I completed my Undergraduate and Master’s degree in Computer Science at JHU, Go Hop!

My research interest lies at machine learning and natural language processing. In particular, I am excited by efficient and scalable representation learning methods for cross-modal applications. I am also interested in derivatives pricing and hedging strategies.

Recently, I spent most effort focusing on allowing multi-modal agents to seamlessly reason, utilize tools, and interact with users/envs. I also work on unified multimodal understanding and generation LLMs.

If you have anything to share with me, please feel free to contact me through my email: wtan12 at jhu.edu

news

Sep 25, 2025	My summer intern work, Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents, is now released on arXiv. Will release our train/inference code for multi-turn RL soon.
Aug 20, 2025	Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation accepted to EMNLP 2025!
May 20, 2025	STAR and SSR accepted to IWSLT 2025! Please come to our presentation, see you in Vienna!
May 01, 2025	Starting a new internship at Bytedance Seed! Will be working on iterative (multi-turn, multi-modal) tool-use agents.
Sep 25, 2024	DiffNorm accepted to NeurIPS 2024! Please come to our poster, see you in Vancouver!
Aug 02, 2024	I will work on Multi-modal LLMs as a part-time student researcher at Meta in Fall 2024 & Spring 2025.
Feb 20, 2024	I will be interning at Meta AI (FAIR) in summer 2024, working on Speech Large Language Model. Looking forward to the new project!
Apr 07, 2023	I will be staying at Johns Hopkins University for my PhD, working with Prof. Philipp Koehn!

selected publications

arXiv

Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents

Weiting Tan, Xinghua Qu, Ming Tu, and 4 more authors

2025
EMNLP

Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation

Weiting Tan, Jiachen Lian, Hirofumi Inaguma, and 3 more authors

2025
IWSLT

SSR: Alignment-Aware Modality Connector for Speech Language Models

Weiting Tan, Hirofumi Inaguma, Ning Dong, and 2 more authors

2024

Abs

Fusing speech into pre-trained language model (SpeechLM) usually suffers from inefficient encoding of long-form speech and catastrophic forgetting of pre-trained text modality. We propose SSR-Connector (Segmented Speech Representation Connector) for better modality fusion. Leveraging speech-text alignments, our approach segments and compresses speech features to match the granularity of text embeddings. Additionally, we introduce a two-stage training pipeline that includes the distillation and fine-tuning phases to mitigate catastrophic forgetting. SSR-Connector outperforms existing mechanism for speech-text modality fusion, consistently achieving better speech understanding (e.g., +10 accuracy on StoryCloze and +20 on Speech-MMLU) while preserving pre-trained text ability.
NeurIPS

DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation

Weiting Tan, Jingyu Zhang, Lingfeng Shen, and 2 more authors

2024

Abs

Non-autoregressive Transformers (NATs) are recently applied in direct speech-to-speech translation systems, which convert speech across different languages without intermediate text data. Although NATs generate high-quality outputs and offer faster inference than autoregressive models, they tend to produce incoherent and repetitive results due to complex data distribution (e.g., acoustic and linguistic variations in speech). In this work, we introduce DiffNorm, a diffusion-based normalization strategy that simplifies data distributions for training NAT models. After training with a self-supervised noise estimation objective, DiffNorm constructs normalized target data by denoising synthetically corrupted speech features. Additionally, we propose to regularize NATs with classifier-free guidance, improving model robustness and translation quality by randomly dropping out source information during training. Our strategies result in a notable improvement of about +7 ASR-BLEU for English-Spanish (En-Es) and +2 ASR-BLEU for English-French (En-Fr) translations on the CVSS benchmark, while attaining over 14x speedup for En-Es and 5x speedup for En-Fr translations compared to autoregressive baselines.
IWSLT

Streaming Sequence Transduction through Dynamic Compression

Weiting Tan, Yunmo Chen, Tongfei Chen, and 5 more authors

2024

Abs

We introduce STAR (Stream Transduction with Anchor Representations), a novel Transformer-based model designed for efficient sequence-to-sequence transduction over streams. STAR dynamically segments input streams to create compressed anchor representations, achieving nearly lossless compression (12x) in Automatic Speech Recognition (ASR) and outperforming existing methods. Moreover, STAR demonstrates superior segmentation and latency-quality trade-offs in simultaneous speech-to-text tasks, optimizing latency, memory footprint, and quality
ICML

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Haoran Xu, Amr Sharaf, Yunmo Chen, and 5 more authors

In Proceedings of the 41st International Conference on Machine Learning, 21–27 jul 2024

Abs

Moderate-sized large language models (LLMs) – those with 7B or 13B parameters – exhibit promising machine translation (MT) performance. However, even the top-performing 13B LLM-based translation models, like ALMA, does not match the performance of state-of-the-art conventional encoder-decoder translation models or larger-scale LLMs such as GPT-4. In this study, we bridge this performance gap. We first assess the shortcomings of supervised fine-tuning for LLMs in the MT task, emphasizing the quality issues present in the reference data, despite being human-generated. Then, in contrast to SFT which mimics reference translations, we introduce Contrastive Preference Optimization (CPO), a novel approach that trains models to avoid generating adequate but not perfect translations. Applying CPO to ALMA models with only 22K parallel sentences and 12M parameters yields significant improvements. The resulting model, called ALMA-R, can match or exceed the performance of the WMT competition winners and GPT-4 on WMT’21, WMT’22 and WMT’23 test datasets.
NAACL

Narrowing the Gap between Zero- and Few-shot Machine Translation by Matching Styles

Weiting Tan, Haoran Xu, Lingfeng Shen, and 5 more authors

In Findings of the Association for Computational Linguistics: NAACL 2024, Jun 2024

Abs

Large language models trained primarily in a monolingual setting have demonstrated their ability to generalize to machine translation using zero- and few-shot examples with in-context learning. However, even though zero-shot translations are relatively good, there remains a discernible gap comparing their performance with the few-shot setting. In this paper, we investigate the factors contributing to this gap and find that this gap can largely be closed (for about 70%) by matching the writing styles of the target corpus. Additionally, we explore potential approaches to enhance zero-shot baselines without the need for parallel demonstration examples, providing valuable insights into how these methods contribute to improving translation metrics.
ACL

The Language Barrier: Dissecting Safety Challenges of LLMs in Multilingual Contexts

Lingfeng Shen, Weiting Tan, Sihao Chen, and 6 more authors

In Findings of the Association for Computational Linguistics ACL 2024, Aug 2024

Abs

As the influence of large language models (LLMs) spans across global communities, their safety challenges in multilingual settings become paramount for alignment research. This paper examines the variations in safety challenges faced by LLMs across different languages and discusses approaches to alleviating such concerns. By comparing how state-of-the-art LLMs respond to the same set of malicious prompts written in higher- vs. lower-resource languages,we observe that (1) LLMs tend to generate unsafe responses much more often when a malicious prompt is written in a lower-resource language, and (2) LLMs tend to generate more irrelevant responses to malicious prompts in lower-resource languages. To understand where the discrepancy can be attributed, we study the effect of instruction tuning with reinforcement learning from human feedback (RLHF) or supervised finetuning (SFT) on the HH-RLHF dataset. Surprisingly, while training with high-resource languages improves model alignment, training in lower-resource languages yields minimal improvement. This suggests that the bottleneck of cross-lingual alignment is rooted in the pretraining stage. Our findings highlight the challenges in cross-lingual LLM safety, and we hope they inform future research in this direction.
EACL

Multilingual Representation Distillation with Contrastive Learning

Weiting Tan, Kevin Heffernan, Holger Schwenk, and 1 more author

In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, May 2023

Abs

Multilingual sentence representations from large models encode semantic information from two or more languages and can be used for different cross-lingual information retrieval and matching tasks. In this paper, we integrate contrastive learning into multilingual representation distillation and use it for quality estimation of parallel sentences (i.e., find semantically similar sentences that can be used as translations of each other). We validate our approach with multilingual similarity search and corpus filtering tasks. Experiments across different low-resource languages show that our method greatly outperforms previous sentence encoders such as LASER, LASER3, and LaBSE.
EMNLP

Flatness-Aware Prompt Selection Improves Accuracy and Sample Efficiency

Lingfeng Shen, Weiting Tan, Boyuan Zheng, and 1 more author

In Findings of the Association for Computational Linguistics: EMNLP 2023, Dec 2023

Abs

With growing capabilities of large language models, prompting them has become the dominant way to access them. This has motivated the development of strategies for automatically selecting effective language prompts. In this paper, we introduce **pFlat** (prompt flatness), a new metric to quantify the expected utility of a language prompt. This metric is inspired by *flatness* regularization in statistical learning that quantifies the robustness of the model towards its parameter perturbations. We provide theoretical foundations for this metric and its relationship with other prompt selection metrics, providing a comprehensive understanding of existing methods. Empirically, we show that combining **pFlat** with existing metrics improves both performance and sample efficiency. Our metric outperforms the previous prompt selection metrics with an average increase of 10% in Pearson correlation across 6 classification benchmarks, and the prompt selected by our metric gains 5% higher accuracy than previous metrics across the benchmarks.
EMNLP

Condensing Multilingual Knowledge with Lightweight Language-Specific Modules

Haoran Xu, Weiting Tan, Shuyue Li, and 4 more authors

In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, Dec 2023

Abs

Incorporating language-specific (LS) modules or Mixture-of-Experts (MoE) are proven methods to boost performance in multilingual model performance, but the scalability of these approaches to hundreds of languages or experts tends to be hard to manage. We present Language-specific Matrix Synthesis (LMS), a novel method that addresses the issue. LMS utilizes parameter-efficient and lightweight modules, reducing the number of parameters while outperforming existing methods, e.g., +1.73 BLEU over Switch Transformer on OPUS-100 multilingual translation. Additionally, we introduce Fuse Distillation (FD) to condense multilingual knowledge from multiple LS modules into a single shared module, improving model inference and storage efficiency. Our approach demonstrates superior scalability and performance compared to state-of-the-art methods.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source