CARVIEW |
Select Language
HTTP/2 200
cache-control: public, max-age=60, s-maxage=43200, stale-while-revalidate=86400, stale-if-error=604800
content-encoding: gzip
content-type: text/html; charset=UTF-8
link: ; rel="https://api.w.org/"
server: nginx
strict-transport-security: max-age=300
x-pantheon-styx-hostname: styx-fe2-b-5659d87bc-rlmmz
x-styx-req-id: a406c5b6-a652-11f0-99a9-eeb3ec388e58
x-tec-api-origin: https://pytorch.org
x-tec-api-root: https://pytorch.org/wp-json/tribe/events/v1/
x-tec-api-version: v1
age: 23073
accept-ranges: bytes
via: 1.1 varnish, 1.1 varnish, 1.1 varnish, 1.1 varnish
date: Sat, 11 Oct 2025 09:55:05 GMT
x-served-by: cache-chi-kigq8000137-CHI, cache-bom-vanm7210075-BOM, cache-bom-vanm7210097-BOM, cache-bom-vanm7210097-BOM
x-cache: HIT, HIT, MISS, MISS
x-cache-hits: 20, 0, 0, 0
x-timer: S1760176506.882404,VS0,VE9
vary: Accept-Encoding, Cookie, Cookie
content-length: 87975
Blog – PyTorch
Skip to main content
Blog
SuperOffload: Unleashing the Power of Large-Scale LLM Training on Superchips
TLDR: Efficient full-parameter fine-tuning of GPT-OSS-20B & Qwen3-14B models on a single NVIDIA GH200 and…
When Quantization Isn’t Enough: Why 2:4 Sparsity Matters
BlogCommunity
TL;DR Combining 2:4 sparsity with quantization offers a powerful approach to compress large language models…
TorchAO Quantized Models and Quantization Recipes Now Available on HuggingFace Hub
Blog
PyTorch now offers native quantized variants of Phi4-mini-instruct, Qwen3, SmolLM3-3B and gemma-3-270m-it through a collaboration…
Experience in Reducing PT2 Compilation Time for Meta Internal Workloads
Blog
The Challenge of PyTorch 2.0 Compilation Since the release of PyTorch 2.0 (PT2) and its…
High-performance quantized LLM inference on Intel CPUs with native PyTorch
Blog
PyTorch 2.8 has just been released with a set of exciting new features, including a…
Intel PyTorch TeamSeptember 17, 2025
PyTorch 2.8 Brings Native XCCL Support to Intel GPUs: Case Studies from Argonne National Laboratory
Blog
Intel announces a major enhancement for distributed training in PyTorch 2.8: the native integration of…
Intel PyTorch Team, Argonne National LaboratorySeptember 12, 2025
Disaggregated Inference at Scale with PyTorch & vLLM
BlogCommunity
Key takeaways: PyTorch and vLLM have been organically integrated to accelerate cutting-edge generative AI applications,…
Distributed Checkpoint: Efficient checkpointing in large-scale jobs
Blog
As training jobs become larger, the likelihood of failures such as preemptions, crashes, or infrastructure…
Yellow Teaming on Arm: A look inside our responsible AI workshop
BlogCommunity
A few months back, I traveled to Berlin to attend the WeAreDevelopers World Congress. During…
Annie TallundSeptember 5, 2025
Fast 2-Simplicial Attention: Hardware-Efficient Kernels in TLX
Blog
In this blog post, we explore the kernel design details presented in the paper Fast…
PyTorch 2.8+TorchAO: Unlock Efficient LLM Inference on Intel® AI PCs
Blog
Large Language Models (LLMs) have transformed tasks across numerous industries, including drafting emails, generating code,…
Intel PyTorch TeamSeptember 3, 2025
Accelerating 2K scale pre-training up to 1.28x with TorchAO, MXFP8 and TorchTitan on Crusoe B200 Cluster
Blog
tldr: 1.22x - 1.28x training acceleration with MXFP8, equivalent convergence compared to BF16. We recently…
A Primer on LLM Post-Training
Blog
Large Language Models (LLMs) have revolutionized how we write and consume documents. In the past…
Davide TestuggineAugust 26, 2025
DRAMA Model Inference Efficiency Boosted by 1.7x-2.3x
Blog
TL;DR NJTs (Nested Jagged Tensors) boost DRAMA model inference efficiency by 1.7x-2.3x, making it more…
Shreya GoyalAugust 22, 2025
ZenFlow: Stall-Free Offloading Engine for LLM Training
Blog
Introduction ZenFlow is a new extension to DeepSpeed introduced in summer 2025, designed as a…
Accelerating MoE’s with a Triton Persistent Cache-Aware Grouped GEMM Kernel
Blog
In this post, we present an optimized Triton BF16 Grouped GEMM kernel for running training…
Less Wright, Adnan Hoque, Garrett GoonAugust 18, 2025
PyTorch Wheel Variants, the Frontier of Python Packaging
Blog
charliemarsh’s tweet, creator of uv PyTorch is the leading machine learning framework for developing and…
Eli UriegasAugust 13, 2025
PyTorch Day China Recap
BlogCommunity
On June 7, 2025, PyTorch Day China was held in Beijing, co-hosted by PyTorch Foundation…
PyTorch FoundationAugust 12, 2025
Introducing Mixed Precision Training in Opacus
Blog
Introduction We integrate mixed and low-precision training with Opacus to unlock increased throughput and training…
Iden Kalemaj, Huanyu ZhangAugust 12, 2025
Bringing Generative AI to the Masses with ExecuTorch and KleidiAI
Blog
Key Takeaways: ExecuTorch 0.7 now enables KleidiAI by default, delivering automatic acceleration on Arm CPUs…
Stay in touch for updates, event info, and the latest news
By submitting this form, I consent to receive marketing emails from the LF and its projects regarding their events, training, research, developments, and related announcements. I understand that I can unsubscribe at any time using the links in the footers of the emails I receive. Privacy Policy.
© 2025 PyTorch. Copyright © The Linux Foundation®. All rights reserved. The Linux Foundation has registered trademarks and uses trademarks. For more information, including terms of use, privacy policy, and trademark usage, please see our Policies page. Trademark Usage. Privacy Policy.