HOME
ABOUT
- RESULTS
- differences
- BENEFITS
- HISTORY
- TEAM
- LOCATION
- FACILITIES
- BANKING
- MEMBERSHIPS
- APPROVALS
- LICENCES
- SUPPLIERS
- SPONSORSHIPS
- MEDIA
- PRIVACY
AUCTIONS
SHIPPING
FEES
- TS REWARDS
TOOLS
guides
FAQ
CONTACT
- CONNECT

VEHICLES
BRAND
- JAPANESE CARS
  - DAIHATSU
  - EUNOS
  - FORD
  - HONDA
  - ISUZU
  - LEXUS
  - MAZDA
  - MITSUBISHI
  - MITSUOKA
  - NISSAN
  - SUBARU
  - SUZUKI
  - TOYOTA
- GERMAN CARS
- AMERICAN CARS
- BRITISH CARS
- ITALIAN CARS
- FRENCH CARS
- SWEDISH CARS
- KOREAN CARS
TYPE
- mobility
- VENDING
- instruction
- TAXIS
- AMBULANCES
- FIRE ENGINES
- HEARSES
- LIMOUSINES
- COMMERCIAL
CLASS
FUEL
TRUCKS
minitrucks
- DAIHATSU
- HONDA
- MAZDA
- MITSUBISHI
- NISSAN
- SUBARU
- SUZUKI
- DUMP
- CRANE
- CAMPER
- REFRIGERATED
- 4WD
- NEW
BUSES
MOTORHOMES
- YAHOO!
- RAKUTEN
- DEALER

PARTS
- FREE REPORT
- PARTS CONTAINERS
- PARTS SYSTEMS
- PARTS PROTECTION
- BODY SHELLS
- DISMANTLING
- ONLINE PARTS
- NEW PARTS
- INTERIOR PARTS
- EXTERIOR PARTS
  - BONNETS
  - BUMPERS
  - GRILLES
  - FENDERS
  - DOORS
  - TRUNKS
  - SPOILERS
  - LIGHTS
  - EMBLEMS
  - CAMERAS
- ENGINES
- TRANSMISSIONS
- WHEELS & TYRES
  - WHEELS
  - TYRES
CUTS
PERFORMANCE PARTS
TRUCK PARTS
MOTORBIKE PARTS
- MOTORBIKE ENGINES
- MOTORBIKE ACCESSORIES

MOTORBIKES
MARINE
FORKLIFTS
MACHINERY
AGRICULTURAL
OTHER
COUNTRY
- AUSTRALIA
- CANADA
- KENYA
- MYANMAR
- NEW ZEALAND
- PAKISTAN
- TANZANIA
- UNITED STATES

CARVIEW

MOTORHOMES

Select Language

HTTP/2 200 content-type: text/html; charset=utf-8 content-length: 65452 date: Tue, 15 Jul 2025 17:19:01 GMT etag: W/"ffac-KScWDexURaq4IKV1FBd3va3OC9k" x-powered-by: huggingface-moon cross-origin-opener-policy: same-origin referrer-policy: strict-origin-when-cross-origin x-request-id: Root=1-68768d85-46d1d3b84bb0517667746e18 set-cookie: token=; Path=/; Expires=Thu, 01 Jan 1970 00:00:00 GMT; Secure; SameSite=None set-cookie: token=; Domain=huggingface.co; Path=/; Expires=Thu, 01 Jan 1970 00:00:00 GMT; Secure; SameSite=Lax content-language: en x-frame-options: DENY x-cache: Miss from cloudfront via: 1.1 46508a87a6cc058f9886696605629398.cloudfront.net (CloudFront) x-amz-cf-pop: BOM78-P10 x-amz-cf-id: tmRTAsa5JNLC00yJtvjmDdjTgwNHBeyz3nVP3XETtxfFyAhUf3TxEw== Text Generation Inference

Hugging Face

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

text-generation-inference documentation

Text Generation Inference

text-generation-inference

Getting started

Text Generation Inference Quick Tour Supported Models Using TGI with Nvidia GPUs Using TGI with AMD GPUs Using TGI with Intel Gaudi Using TGI with AWS Trainium and Inferentia Using TGI with Google TPUs Using TGI with Intel GPUs Installation from source Multi-backend support Internal Architecture Usage Statistics

Tutorials

Consuming TGI Preparing Model for Serving Serving Private & Gated Models Using TGI CLI Non-core Model Serving Safety Using Guidance, JSON, tools Visual Language Models Monitoring TGI with Prometheus and Grafana Train Medusa

Backends

Neuron Gaudi TensorRT-LLM Llamacpp

Reference

All TGI CLI options Exported Metrics API Reference

Conceptual Guides

V3 update, caching and chunking Streaming Quantization Tensor Parallelism PagedAttention Safetensors Flash Attention Speculation (Medusa, ngram) How Guidance Works (via outlines) LoRA (Low-Rank Adaptation) External Resources

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Text Generation Inference

Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation for the most popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and T5.

Text Generation Inference

Text Generation Inference implements many optimizations and features, such as:

Simple launcher to serve most popular LLMs
Production ready (distributed tracing with Open Telemetry, Prometheus metrics)
Tensor Parallelism for faster inference on multiple GPUs
Token streaming using Server-Sent Events (SSE)
Continuous batching of incoming requests for increased total throughput
Optimized transformers code for inference using Flash Attention and Paged Attention on the most popular architectures
Quantization with bitsandbytes and GPT-Q
Safetensors weight loading
Watermarking with A Watermark for Large Language Models
Logits warper (temperature scaling, top-p, top-k, repetition penalty)
Stop sequences
Log probabilities
Fine-tuning Support: Utilize fine-tuned models for specific tasks to achieve higher accuracy and performance.
Guidance: Enable function calling and tool-use by forcing the model to generate structured outputs based on your own predefined output schemas.

Text Generation Inference is used in production by multiple projects, such as:

Hugging Chat, an open-source interface for open-access models, such as Open Assistant and Llama
OpenAssistant, an open-source community effort to train LLMs in the open
nat.dev, a playground to explore and compare LLMs.

< > Update on GitHub

Quick Tour→

Text Generation Inference

HOME
ABOUT
AUCTIONS
SHIPPING
FEES
TOOLS
HOW
FAQ
CONTACT

Original Source | Taken Source