HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 server: GitHub.com content-type: text/html; charset=utf-8 last-modified: Wed, 30 Jul 2025 00:01:51 GMT access-control-allow-origin: * strict-transport-security: max-age=31556952 etag: W/"688960ef-4e92" expires: Mon, 29 Dec 2025 01:59:24 GMT cache-control: max-age=600 content-encoding: gzip x-proxy-cache: MISS x-github-request-id: A96F:2BC55:825637:9276D2:6951DE1F accept-ranges: bytes age: 0 date: Mon, 29 Dec 2025 01:49:24 GMT via: 1.1 varnish x-served-by: cache-bom-vanm7210065-BOM x-cache: MISS x-cache-hits: 0 x-timer: S1766972964.273625,VS0,VE223 vary: Accept-Encoding x-fastly-request-id: 74e12de558d2b836308d1e5a6018338c5efb198c content-length: 4536 GenARM

GenARM : Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment

Yuancheng Xu¹, Udari Madhushani Sehwag², Alec Koppel², Sicheng Zhu¹, Bang An¹
Furong Huang¹, Sumitra Ganesh²

¹University of Maryland, College Park ²JPMorgan AI Research

ICLR, 2025

Paper Code

(Next-token generation guided by different RMs) Using a trajectory-level RM to select the next token (top) requires the costly process of generating full responses for each candidate. In contrast, GenARM (bottom) efficiently samples the next token by combining scores from the base LLM and our proposed Autoregressive Reward Model, which is trained to predict next-token rewards directly.

Abstract

Large Language Models (LLMs) exhibit impressive capabilities but require careful alignment with human preferences. Traditional training-time methods finetune LLMs using human preference datasets but incur significant training costs and require repeated training to handle diverse user preferences. Test-time alignment methods address this by using reward models (RMs) to guide frozen LLMs without retraining. However, existing test-time approaches rely on trajectory-level RMs which are designed to evaluate complete responses, making them unsuitable for autoregressive text generation that requires computing next-token rewards from partial responses. To address this, we introduce GenARM, a test-time alignment approach that leverages the Autoregressive Reward Model--a novel reward parametrization designed to predict next-token rewards for efficient and effective autoregressive generation. Theoretically, we demonstrate that this parametrization can provably guide frozen LLMs toward any distribution achievable by traditional RMs within the KL-regularized reinforcement learning framework. Experimental results show that GenARM significantly outperforms prior test-time alignment baselines and matches the performance of training-time methods. Additionally, GenARM enables efficient weak-to-strong guidance, aligning larger LLMs with smaller RMs without the high costs of training larger models. Furthermore, GenARM supports multi-objective alignment, allowing real-time trade-offs between preference dimensions and catering to diverse user preferences without retraining.

TL;DR: GenARM uses an autoregressive reward model to efficiently guide a base LLM for test-time alignment, outperforming prior methods and enabling weak-to-strong guidance and multi-objective alignment.

Why Do We Need Autoregressive Reward Model?

❌ Traditional reward models = Slow 🚶

🔹They score entire responses post-generation 📜

🔹LLMs must generate fully before evaluation ⏳

✅ GenARM = Fast 🏎️

🔹 Predicts next-token rewards on the fly ⚡

🔹 Guides LLMs token by token—drastically improving efficiency! 💡

What’s an Autoregressive Reward Model?

Unlike conventional trajectory-level reward models, GenARM parametrizes rewards at the token level:

🔹 Rewards decompose naturally as log probabilities 🔄

🔹 Each token selection is guided dynamically 🎯

Parametrization of the Autoregressive Reward Model.

How Does GenARM Work?

Training Phase:

✅ Learns next-token rewards from trajectory-level preference data 📊

✅ Ensures that preferred responses accumulate higher total rewards 💯

🚀 Inference Phase:

✅ Combines LLM logits + next-token rewards to dynamically guide generation 🔄

💡 No model retraining. Just plug, play, and align!

How Well Does it Perform?

🔥 Fastest test-time alignment method—significantly outperforms baselines!

🔥 Achieves 90% of fine-tuned performance—without retraining!

🔥 Weak-to-strong guidance: Uses a 7B RM to align a 70B LLM, saving HUGE compute costs!

💡 More power, less compute! 🏆

BibTeX

@inproceedings{
        xu2025genarm,
        title={GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-Time Alignment},
        author={Xu, Yuancheng and Sehwag, Udari Madhushani and Koppel, Alec and Zhu, Sicheng and An, Bang and Huang, Furong and Ganesh, Sumitra},
        booktitle={The Thirteenth International Conference on Learning Representations},
        year={2025},
}

This page was built using the Academic Project Page Template.

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source