You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jul 1, 2025. It is now read-only.
This page tracks the ongoing development of Glow. It documents the goals for upcoming development iterations, the status of some high-level tasks, and relevant information that can help people join the ongoing efforts.
Top-Level Tasks
Load additional quantized neural networks
Quantization is the process of converting neural networks that are programmed using 32-bit floating point operations to using 8-bit integer arithmetic. Glow can quantize existing floating point networks using Profile Guided Quantization and then run the quantized model for inference. Glow starts to support loading quantized Caffe2/ONNX models directly. The goal of this top-level task is to extend the loader support to additional quantized Caffe2 operators (https://github.com/pytorch/pytorch/tree/master/caffe2/quantization/server) and ONNX operators.
Glow is designed as a compiler and execution engine for neural network hardware accelerators. The current implementation of the execution engine is very basic and exposes a simple single-device synchronous run method. The goal for this top-level task is to rewrite the execution engine and implement an asynchronous execution mechanism that can be extended to support execution of code on multiple acceleration units concurrently. The execution engine will need to manage the state of multiple cards, queue requests and manage the complex state of buffers on the host and the device.
Glow integrates into PyTorch using the ONNXIFI interface. This interface offloads the compute graph from PyTorch onto Glow. This top-level task tracks the work to fully implement the ONNXIFI specification and to qualify the compiler using the ONNXIFI test suite.
Glow is designed to support both Inference and Training accelerators. The focus of the initial bringup effort was the support of Inference accelerators. In the next development iterations we'll focus on improving the support and productizing training accelerators. This work will include adding additional gradient operators, improving the gradient check test coverage and enabling asynchronous communication patterns in the runtime.