CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 183
0.19.8
Compare
2e3da2c
Nebius
InfiniBand clusters
The nebius
backend now supports InfiniBand clusters. A cluster is automatically created when you apply a fleet configuration with placement: cluster
and supported GPUs: e.g. 8xH100 or 8xH200.
type: fleet
name: my-fleet
nodes: 2
placement: cluster
resources:
gpu: H100,H200:8
A suitable InfiniBand fabric for the cluster is selected automatically. You can also limit the allowed fabrics in the backend settings.
Once the cluster is provisioned, you can benefit from its high-speed networking when running distributed tasks, such as NCCL tests or Hugging Face TRL.
ARM
dstack
now supports compute instances with ARM CPUs. To request ARM CPUs in a run or fleet configuration, specify the arm
architecture in the resources.cpu
property:
resources:
cpu: arm:4.. # 4 or more ARM cores
If the hosts in an SSH fleet have ARM CPUs, dstack
will automatically detect them and enable their use.
To see available offers with ARM CPUs, pass --cpu arm
to the dstack offer
command.
Lambda
GH200
With the lambda
backend, it's now possible to use GH200 instances that come with an ARM-based 72-core NVIDIA Grace CPU and an NVIDIA H200 Tensor Core GPU, connected with a high-bandwidth, memory-coherent NVIDIA NVLink-C2C interconnect.
type: dev-environment
name: my-env
ide: vscode
resources:
gpu: GH200:1
If Lambda has GH200 on-demand instances at the time, you'll see them when you run dstack apply
:
$ dstack apply -f .dstack.yml
# BACKEND RESOURCES INSTANCE TYPE PRICE
1 lambda (us-east-3) cpu=arm:64 mem=464GB disk=4399GB GH200:96GB:1 gpu_1x_gh200 $1.49
Note, if no GH200 is available at the moment, you can specify the retry
policy in your run configuration so that dstack
can run the configuration once the GPU becomes available.
Azure
Managed identities
The new vm_managed_identity
backend setting allows you to configure the managed identity that is assigned to VMs created in the azure
backend.
projects:
- name: main
backends:
- type: azure
subscription_id: 06c82ce3-28ff-4285-a146-c5e981a9d808
tenant_id: f84a7584-88e4-4fd2-8e97-623f0a715ee1
creds:
type: default
vm_managed_identity: dstack-rg/my-managed-identity
Make sure that dstack
has the required permissions for managed identities to work.
What's changed
- Fix: handle OSError from os.get_terminal_size() in CLI table rendering for non-TTY environments by @vuyelwadr in #2599
- Clarify how retry works for tasks and services by @r4victor in #2600
- [Docs] Added Tenstorrent example by @peterschmidt85 in #2596
- Lambda: Docker: use
cgroupfs
driver by @un-def in #2603 - Don't collect Prometheus metrics on container-based backends by @un-def in #2605
- Support Nebius InfiniBand clusters by @jvstme in #2604
- Add ARM64 support by @un-def in #2595
- Allow to configure Nebius InfiniBand fabrics by @jvstme in #2607
- Support vm_managed_identity for Azure by @r4victor in #2608
- Fix API quota hitting when provisioning many A3 instances by @r4victor in #2610
New contributors
- @vuyelwadr made their first contribution in #2599
Full changelog: 0.19.7...0.19.8