CARVIEW |
NVIDIA Tensor Cores
Unprecedented Acceleration for Generative AI
Tensor Cores enable mixed-precision computing, dynamically adapting calculations to accelerate throughput while preserving accuracy and providing enhanced security. The latest generation of Tensor Cores are faster than ever on a broad array of AI and high-performance computing (HPC) tasks. From 4X speedups in training trillion-parameter generative AI models to a 30X increase in inference performance, NVIDIA Tensor Cores accelerate all workloads for modern AI factories.
NVIDIA Blackwell Tensor Cores
Fifth Generation
The Blackwell architecture delivers a 30X speedup compared to the previous NVIDIA Hopper™ generation for massive models such as GPT-MoE-1.8T. This performance boost is made possible with the fifth-generation of Tensor Cores. Blackwell Tensor Cores add new precisions, including community-defined microscaling formats, giving better accuracy and ease of replacement for higher precisions.
-
New Precisions
-
Transformer Engine
New Precision Formats
As generative AI models explode in size and complexity, it’s critical to improve training and inference performance. To meet these compute needs, Blackwell Tensor Cores support new quantization formats and precisions, including community-defined microscaling formats.
Second-Generation Transformer Engine
The second-generation Transformer Engine uses custom Blackwell Tensor Core technology combined with NVIDIA® TensorRT™-LLM and NeMo™ Framework innovations to accelerate inference and training for large language models (LLMs) and mixture-of-experts (MoE) models. The Transformer Engine is fueled by the Tensor Cores’ FP4 precision, doubling performance and efficiency while maintaining high accuracy for current and next-generation MoE models.
The Transformer Engine works to democratize today’s LLMs with real-time performance. Enterprises can optimize business processes by deploying state-of-the-art generative AI models with affordable economics.
NVIDIA Hopper Architecture Tensor Cores
Fourth Generation
Since the introduction of Tensor Core technology, NVIDIA Hopper GPUs have increased their peak performance by 60X, fueling the democratization of computing for AI and HPC. The NVIDIA Hopper architecture advances fourth-generation Tensor Cores with the Transformer Engine, using FP8 to deliver 6X higher performance over FP16 for trillion-parameter-model training. Combined with 3X more performance using TF32, FP64, FP16, and INT8 precisions, Hopper Tensor Cores deliver speedups to all workloads.
The Most Powerful End-to-End AI and HPC Data Center Platform
Tensor Cores are essential building blocks of the complete NVIDIA data center solution that incorporates hardware, networking, software, libraries, and optimized AI models and applications from the NVIDIA NGC™ catalog. The most powerful end-to-end AI and HPC platform, it allows researchers to deliver real-world results and deploy solutions into production at scale.
Blackwell | Hopper | |
---|---|---|
Supported Tensor Core precisions | FP64, TF32, BF16, FP16, FP8, INT8, FP6, FP4 | FP64, TF32, BF16, FP16, FP8, INT8 |
Supported CUDA® Core precisions | FP64, FP32, FP16, BF16 | FP64, FP32, FP16, BF16, INT8 |
*Preliminary specifications, may be subject to change
Learn More About NVIDIA Blackwell.
- Accelerated Apps Catalog
- Blackwell Resources Center
- Data Center GPUs
- Data Center GPU Line Card
- Data Center GPUs Resource Center
- Data Center Product Performance
- Deep Learning Institute
- Energy Efficiency Calculator
- GPU Cloud Computing
- MLPerf Benchmarks
- NGC Catalog
- NVIDIA-Certified Systems
- NVIDIA Data Center Corporate Blogs
- NVIDIA Data Center Technical Blogs
- Qualified System Catalog
- Where to Buy
- Privacy Policy
- Your Privacy Choices
- Terms of Service
- Accessibility
- Corporate Policies
- Product Security
- Contact