0.19.8

@vuyelwadr

Nebius

InfiniBand clusters

The nebius backend now supports InfiniBand clusters. A cluster is automatically created when you apply a fleet configuration with placement: cluster and supported GPUs: e.g. 8xH100 or 8xH200.

type: fleet
name: my-fleet
nodes: 2
placement: cluster
resources:
  gpu: H100,H200:8

A suitable InfiniBand fabric for the cluster is selected automatically. You can also limit the allowed fabrics in the backend settings.

Once the cluster is provisioned, you can benefit from its high-speed networking when running distributed tasks, such as NCCL tests or Hugging Face TRL.

ARM

dstack now supports compute instances with ARM CPUs. To request ARM CPUs in a run or fleet configuration, specify the arm architecture in the resources.cpu property:

resources:
  cpu: arm:4..  # 4 or more ARM cores

If the hosts in an SSH fleet have ARM CPUs, dstack will automatically detect them and enable their use.

To see available offers with ARM CPUs, pass --cpu arm to the dstack offer command.

Lambda

GH200

With the lambda backend, it's now possible to use GH200 instances that come with an ARM-based 72-core NVIDIA Grace CPU and an NVIDIA H200 Tensor Core GPU, connected with a high-bandwidth, memory-coherent NVIDIA NVLink-C2C interconnect.

type: dev-environment
name: my-env
ide: vscode
resources:
  gpu: GH200:1

If Lambda has GH200 on-demand instances at the time, you'll see them when you run dstack apply:

$ dstack apply -f .dstack.yml
 #   BACKEND             RESOURCES                                      INSTANCE TYPE  PRICE
 1   lambda (us-east-3)  cpu=arm:64 mem=464GB disk=4399GB GH200:96GB:1  gpu_1x_gh200   $1.49

Note, if no GH200 is available at the moment, you can specify the retry policy in your run configuration so that dstack can run the configuration once the GPU becomes available.

Azure

Managed identities

The new vm_managed_identity backend setting allows you to configure the managed identity that is assigned to VMs created in the azure backend.

projects:
- name: main
  backends:
  - type: azure
    subscription_id: 06c82ce3-28ff-4285-a146-c5e981a9d808
    tenant_id: f84a7584-88e4-4fd2-8e97-623f0a715ee1
    creds:
      type: default
    vm_managed_identity: dstack-rg/my-managed-identity

Make sure that dstack has the required permissions for managed identities to work.

What's changed

Fix: handle OSError from os.get_terminal_size() in CLI table rendering for non-TTY environments by @vuyelwadr in #2599
Clarify how retry works for tasks and services by @r4victor in #2600
[Docs] Added Tenstorrent example by @peterschmidt85 in #2596
Lambda: Docker: use cgroupfs driver by @un-def in #2603
Don't collect Prometheus metrics on container-based backends by @un-def in #2605
Support Nebius InfiniBand clusters by @jvstme in #2604
Add ARM64 support by @un-def in #2595
Allow to configure Nebius InfiniBand fabrics by @jvstme in #2607
Support vm_managed_identity for Azure by @r4victor in #2608
Fix API quota hitting when provisioning many A3 instances by @r4victor in #2610

New contributors

@vuyelwadr made their first contribution in #2599

Full changelog: 0.19.7...0.19.8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!