Developer companion repo for working with NVIDIA's Nemotron models: inference, fine-tuning, agents, visual reasoning, deployment.
nemotron/
β
βββ usage-cookbook/ Usage cookbooks (how to deploy, and simple model usage guides)
β
β
βββ use-case-examples/ Examples of leveraging Nemotron Models in Agentic Workflows and more
NVIDIA Nemotronβ’ is a family of open, high-efficiency models with fully transparent training data, weights, and recipes.
Nemotron models are designed for agentic AI workflows β they excel at coding, math, scientific reasoning, tool calling, instruction following, and visual reasoning (for the VL models).
They are optimized for deployment across a spectrum of compute tiers (edge, single GPU, data center) and support frameworks like NeMo and TensorRT-LLM, vLLM, and SGLang, with NIM microservice options for scalable serving.
- Usage Cookbook - Practical deployment and simple model usage guides for Nemotron models
- Use Case Examples - Practical use-case examples and apps
- Nemotron Developer Page - Developer resources for the Nemotron family of models
- Nemotron Research Hub - Research affiliated with the Nemotron effort
- Nemotron Datasets - Datasets part of various Nemotron collections, from pre-training to post-training
Have an idea for improving Nemotron models? Visit the Nemotron Ideas Portal to:
- π³οΈ Vote on existing feature requests
- π Submit your own ideas and suggestions
- π See what the community is requesting
Your feedback helps shape the future of Nemotron models!
Full, reproducible training pipelines will be included in the nemotron package at src/nemotron/recipes/.
- π¨ Synthetic Data Generation - Scripts to generate synthetic datasets using NVIDIA-NeMo/DataDesigner
- ποΈ Data Curation - Scripts to prepare training data using NVIDIA-NeMo/Curator
- π Training - Complete training loops with hyperparameters using:
- NVIDIA-NeMo/Megatron-Bridge for Megatron models
- NVIDIA-NeMo/Automodel for HuggingFace models
- NVIDIA-NeMo/NeMo-RL when RL is needed
- π Evaluation - Benchmark evaluation on standard suites using NVIDIA-NeMo/Evaluator
- π Documentation - Detailed explanations of each stage
Learn how to deploy and use the models through an API.
| Model | Best For | Key Features | Trade-offs | Resources |
|---|---|---|---|---|
| NVIDIA-Nemotron-3-Nano | High-throughput agentic workflows, reasoning, tool-use, chat | β’ 31.6B total / 3.6B active (MoE) β’ Hybrid Mamba-Transformer MoE β’ 1M-token context window β’ Reasoning ON/OFF + thinking budget |
Sparse MoE trades total params for efficiency | π Cookbooks |
| Llama-3.3-Nemotron-Super-49B-v1.5 | Production deployments needing strong reasoning with efficiency | β’ 128K context β’ Single H200 GPU β’ RAG & tool calling β’ Optimized via NAS |
Balances accuracy & throughput | π Cookbooks |
| NVIDIA-Nemotron-Nano-9B-v2 | Resource-constrained environments needing flexible reasoning | β’ 9B params β’ Hybrid Mamba-2 architecture β’ Controllable reasoning traces β’ Unified reasoning/non-reasoning |
Smaller model with configurable reasoning | π Cookbooks |
| NVIDIA-Nemotron-Nano-12B-v2-VL | Document intelligence and video understanding | β’ 12B VLM β’ Video & multi-image reasoning β’ Controllable reasoning (/think mode) β’ Efficient Video Sampling (EVS) |
Vision-language with configurable reasoning | π Cookbooks |
| Llama-3.1-Nemotron-Safety-Guard-8B-v3 | Multilingual content moderation with cultural nuance | β’ 9 languages β’ 23 safety categories β’ Cultural sensitivity β’ NeMo Guardrails integration |
Focused on safety/moderation tasks | π Cookbooks |
| Nemotron-Parse (link coming soon!) | Document parsing for RAG and AI agents | β’ VLM for document parsing β’ Table extraction (LaTeX) β’ Semantic segmentation β’ Spatial grounding (bbox) |
Specialized for document structure | π Cookbooks |
Below is an outline of the end-to-end use case examples provided in the use-case-examples directory. These scenarios demonstrate practical applications that go beyond basic model inference.
-
Agentic Workflows
Orchestration of multi-step AI agents, integrating planning, context management, and external tools/APIs. -
Retrieval-Augmented Generation (RAG) Systems
Building pipelines that combine retrieval components (vector databases, search APIs) with Nemotron models for grounded, accurate outputs. -
Integration with External Tools & APIs
Examples of Nemotron models powering applications with structured tool calling, function execution, or data enrichment. -
Production-Ready Application Patterns
Architectures supporting scalability, monitoring, data pipelines, and real-world deployment considerations.
See the
use-case-examples/subfolders for in-depth, runnable examples illustrating these concepts.
We welcome contributions! Whether it's examples, recipes, or other tools you'd find useful.
Please read our Contributing Guidelines before submitting pull requests.
- Contributing Guidelines - How to contribute to this project
- Changelog - Version history and changes
Apache 2.0 License - see LICENSE file for details.
NVIDIA Nemotron - Open, transparent, and reproducible.