Carview!

CANN: implement the SSM_CONV operator (#17737)

CANN: implement SSM_CONV operator

Co-authored-by: Aleksei Lobanov, zeromarblectm@gmail.com
Co-authored-by: Sujin Kang, waterjin326@gmail.com

CANN: remove custom error limit for SSM_CONV
CANN: merge SSM_CONV tensor shape/strides into one line

Co-authored-by: Sujin Kang, waterjin326@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

ggml-cuda: fix regex for arch list (#18371)

ggml-cuda: fix regex for arch list
make regex exact

macOS/iOS:

Linux:

Windows:

openEuler:

cuda: optimize cumsum cub path (#18362)

cuda: optimize cumsum cub path
remove heavy perf test

macOS/iOS:

Linux:

Windows:

openEuler:

ggml-cuda: fix blackwell native builds (#18361)

ggml-cuda: fix blackwell native builds

Replace 12x in native architectures by 12xa

replace for GGML_NATIVE=OFF too
only replace for native
remove 120f-virtual for default compilation

Co-authored-by: Aman Gupta

macOS/iOS:

Linux:

Windows:

openEuler:

model : support for LlamaBidirectionalModel architecture (#18220)

model: llama-embed-nemotron
minor: python lint
changed arch-name
templated llm_build_llama to be used for both llama and llama-embed arch

macOS/iOS:

Linux:

Windows:

openEuler:

vulkan: fix command buffer corruption in ggml_backend_vk_event_wait (#18302)

macOS/iOS:

Linux:

Windows:

openEuler:

CANN : refactor ACL graph cache (#17752)

Move the graph property checking code into methods of LRU cache.

Signed-off-by: Wang Weixuan wangweixvan@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

vulkan: use fewer FA rows for small cache runs (#18280)

macOS/iOS:

Linux:

Windows:

openEuler:

CANN: Uses yarn_ramp cache in ROPE (#17725)

macOS/iOS:

Linux:

Windows:

openEuler:

common: add LLAMA_ARG_OVERRIDE_TENSOR env var for -ot arg (#18267)

macOS/iOS:

Linux:

Windows:

openEuler:

Releases: ggml-org/llama.cpp

b7541

Uh oh!

b7540

Uh oh!

b7539

Uh oh!

b7538

Uh oh!

b7531

Uh oh!

b7530

Uh oh!

b7529

Uh oh!

b7527

Uh oh!

b7526

Uh oh!

b7525

Uh oh!