Release v3.6.0

@jerryyin

IREE Release v3.6.0

Compiler

FissionTransferOpsInControlFlow pass for shared memory prefetching, improving padded convolution performance. (#21018)
Refactored iree_gpu.multi_mma to iree_codegen.inner_tiled, enabling arbitrarily many operands and centralizing methods. (#21000, #21062)
Support for distributing vector.constant_mask ops, aligning with existing mask behavior. (#20708)
Added scaled MMA layout descriptor attribute for supporting scale operands in MMA ops. (#21141)
Early bufferization ops support (store_to_buffer, load_from_buffer) in destination passing style conversion. (#21136)
Intrinsic sorting for GPU MMA by key alignment and size preferences. (#21128)
GPU intrinsic management simplified using new GPUIntrinsicType field. (#21103)
New #iree_gpu.promote_with_cache_swizzle attribute to control operand promotion behavior. (#21105)
New GPUApplyPaddingLevel pass and vectorization masking to reduce shared memory roundtrips. (#21074)
Support for expand_shape through tensor.concat, enabling fusion with attention ops. (#21158)
ROCM ping-pong matmul support for BF16 (large/medium, expanded). (#21267)
Bug Fixes and Robustness Updates (
#21036, #21037, #20108, #21063, #21069, #21047, #21166, #21160, #21121, #21113, #21132, #21118, #21151, #21190, #21244, #21237, #21355, #21345, #21244, #21270, #21245, #21126, #20977, #21137, #21241, #21337, #21353, #21351, #21151, #21295, #21281, #21324)
Linalg Extension Improvements (#21021, #21138, #21090, #21106, #20263, #21220, #21217, #21116, #21338, #21316, #21309, #21189)
Enhanced Testing, Debugging and Documentation (#21145, #21143, #21242, #21273, #21229, #21374, #21368, #21335, #21280, #21374, #21324)

Runtime

Added AMDGPU executable implementation with no-op cache, supporting verified, topology-wide loading and optimized kernel argument management for dispatches. (#21040)
Enabled auto torch input conversion triggered by function argument and result types to streamline input handling. (#21067)
Added rematerialize parallel ops support in the vector distribute pipeline to improve elementwise operation fusion. (#21073)
Introduced skeleton AMDGPU buffer handle and handle pool with external and transient buffer types supporting async allocations and device pointer resolution. (#21044)
Added support for group_any in iree_thread_affinity_t to assign threads to processor groups (e.g., NUMA nodes) instead of specific CPUs, aiding loosely coordinated thread pools. (#21089)
Added _base variants for all string view integer parsing functions, aligning with standard C APIs, and cleaned up HIP driver integer parsing code. (#21086)
Added iree_hal_amdgpu_system_t to manage shared HSA/topology/pools resources across physical devices in a logical device. (#21043)
Added device-side AMDGPU signal and queue utility headers derived from HSA spec and ROCR implementation. (#21042)
Implemented AMDGPU command buffer host-side and device-side, supporting recording, execution, and segmented command buffers with conditional branch groundwork. (#21123)
Added device->host service worker to mimic HSA/AQL queue semantics for hosting device communication, enabling future tooling compatibility. (#21094)
Added blit kernels and device-side enqueue support as initial implementations for copy operations, enabling CTS test passes. (#21057)
Added device-side tracing macros and ringbuffer trace buffer, laying groundwork for on-device tracing interoperable with host tooling like Tracy. (#21046)
Added AMDGPU semaphore allocation and pooling with host-side HAL support; device-side semaphore implementation and external semaphore imports are forthcoming. (#21201)
Enhanced loop fission pass (FissionTransferOpsInControlFlow) to support loops containing multiple transfer_read/write pairs, improving IR simplification with additional pattern application. (#21213)
Introduced IREE_ENABLE_RUNTIME_COVERAGE CMake mode to enable LLVM coverage for runtime libraries, test binaries, and tools, along with scripts to generate LCOV reports and IDE integration. (#21191)
Added iree-hal-drivers-amdgpu-tests target to enable building all AMDGPU HAL tests together easily via IDE actions. (#21389)
Implemented AMDGPU logical and physical devices with skeleton queues support, allowing multiple virtual queues per logical device and preparing for host- and device-side queue operations. (#21251)
Fixes and Stability Enhancements: (#21056, #21060, #21061, #21153, #21200)
Testing, Debuggability and Tooling: (#21046, #21191, #21389, #21094)

Change Log

Git History

What's Changed

[Codegen][GPU] Creating FissionTransferOpsInControlFlow to assist convolution prefetching by @jerryyin in #21018
[LLVMGPU] Add lowering strategy selection for map_scatter by @Max191 in #21034
[LinalgExt] Add argmax op with rountrip and invalid mlir test by @bangtianliu in #21021
Expose creation of FileHandles from FDs to python. by @AWoloszyn in #21016
[Codegen] Generalize MultiMmaInterfaceAttr to InnerTileDescAttrInterface by @krzysz00 in #21000
[Codegen] Fix FoldCollapseShapeIntoInterfaceTensorStoreFullSlice by @IanWood1 in #21036
Adding iree_hal_amdgpu_executable_t implementation + no-op cache. by @benvanik in #21040
[GPU] Handle transient private values in control-flow when prefetching by @nirvedhmeshram in #21037
Ensuring unique names for outlined hal.dispatch.extern ops. by @benvanik in #21055
[NFC] simplify check in scf.if stage selection when prefetching by @nirvedhmeshram in #21054
Only consider executables with no external variants for linking. by @benvanik in #21056
Translate flat operand index into segment relative index. by @benvanik in #21060
[Dispatch] Only bubble reshapes when possibly blocking fusion by @IanWood1 in #20108
[mlir][GPU] Make small reductions go down tile and fuse pipeline. by @MaheshRavishankar in #21063
Bump llvm to llvm/llvm-project@80ea5f46df3e by @pashu123 in #21065
Triggering auto torch input conversion based on func arg/result types. by @benvanik in #21067
[Codegen] Add reshape map_scatter folding to BlockDynamicDimensions by @Max191 in #21047
Add default option to only do loop fission for unit trip loops by @nirvedhmeshram in #21069
[GPU] Add rematerialize parallel ops in the vector distribute pipeline by @pashu123 in #21073
Bump version to 3.6.0 after 3.5.0 release. by @ScottTodd in #21078
Guard HIP macro against redefinition by @erieaton-amd in #21061
Bump llvm to llvm/llvm-project@0a64630 by @pashu123 in #21072
Bump llvm to llvm/llvm-project@bc7ea63 by @pashu123 in #21083
[build] Fix Bazel dependency in EncodingUtils for shared library builds by @AGindinson in #21085
[LLVMGPU] Enable hip e2e tests for map_scatter by @Max191 in #21079
Adding dummy AMDGPU channel/event. by @benvanik in #21041
Adding device-side AMDGPU signal/queue utils. by @benvanik in #21042
Adding iree_hal_amdgpu_system_t to manage HSA/topology/pools. by @benvanik in #21043
Adding _base variants of all string view int parsing and clean up hip options. by @benvanik in #21086
Adding support for group_any in iree_thread_affinity_t. by @benvanik in #21089
Adding skeleton AMDGPU buffer handle and handle pool. by @benvanik in #21044
Adding skeleton AMDGPU allocator. by @benvanik in #21093
[LLVMGPU] Delete LLVMGPUPadAndVectorDistribute by @Groverkss in #21095
[VectorDistribution] Add support for distributing vector.constant_mask by @Groverkss in #20708
[Codegen] Generalize iree_gpu.multi_mma to iree_codegen.inner_tiled by @krzysz00 in #21062
Don't erase the target executable in the loop using it. by @benvanik in #21097
Implement PartitionableLoopsInterface for tensor.concat by @IanWood1 in #21082
[LLVMGPU] Add relayout combination behind a flag by @Max191 in #21076
[Encoding][LLVMGPU] Add encoding fusion e2e test by @Max191 in #21088
Enable the linalg.mmt4d operation and add mmt4d microkernels for the riscv64 by @adeel10x in #20263
[HAL] Refactor memory property attributes by @ziereis in #21005
[runtime] Add riscv pause instruction for spinning by @NoumanAmir657 in #21075
Reland "[Codegen][ROCDL] Drop nominal support for dynamic shared mem (#21020)" by @MaheshRavishankar in #21102
Switch experimental to false on windows release packages by @zeeshanhaque21 in #21104
[Codegen] Port AMDGPU device lib implementations to MLIR rewrites by @keshavvinayak01 in #20598
Bump dawidd6/action-download-artifact from 10 to 11 in the github-actions group by @dependabot[bot] in #21109
Adding device-side tracing macros and a device-side trace buffer. by @benvanik in #21046
Adding blit kernels and device-side enqueuing. by @benvanik in #21057
Adding skeleton device->host service worker. by @benvanik in #21094
Add padding, masking and fold vector.transfer_write -> vector.transfer_read to avoid memory roundtrips by @nicolasvasilache in #21074
[Codegen][GPU] Move operand promotion control to attribute interface by @qedawkins in #21098
[HIP] Emit error for non-zero dynamic shared memory by @qedawkins in #21118
[Codegen][GPU] Add promotion attribute for setting cache swizzling by @qedawkins in #21105
Force install python version 3.13.5 for windows by @jitesh-gupta in #21120
Add tiling interface to tensor.concat by @IanWood1 in #21081
[Codegen][GPU] Sort intrinsic according to k alignment - Step 1 of 2- Track MmaInterfaceAttr via field instead of index by @jerryyin in #21103
[NFC] Extract common reshape patterns to dedicated file by @jtuyls in #21111
Extract reshape into interface folding tests into dedicated file by @jtuyls in #21112
[Flow] Fix crash when flow.return has no operands by @IanWood1 in #21132
[GlobalOpt] Don't modify concat in dispatch by @IanWood1 in #21129
Change linux arm64 runners to newly available github hosted runners by @jitesh-gupta in #21131
[compiler] remove uses of memref::ExpandOps pass by @ftynse in #21113
[Dispatch Creation] Improve extract_slice expand_shape bubbling by @IanWood1 in #21121
[LinalgExt] fix arg_compare op with region and start index by @bangtianliu in #21106
Integrate LLVM at 029f8892 by @bjacob in #21140
[Flow] Improve reduction dispatch names by @IanWood1 in #21139
[doc] Add tips of reading input from a file by @jinchen62 in #21143
[doc] Fix tip render to mkdocs style by @jinchen62 in #21145
Integrate LLVM at 836201f by @bjacob in #21148
[Codegen][GPU] Sort intrinsic according to k alignment - Step 2 of 2 - Creating intrinsic sort routine by @jerryyin in #21128
Integrate LLVM at 227f759644 by @bjacob in #21156
Adding AMDGPU command buffer implementation. by @benvanik in #21123
[VectorDistribute] Implement layout analysis for transfer_gather by @Groverkss in #21164
Don't use dl_tensor.byte_offset when exporting capsules. by @AWoloszyn in #21153
[LinalgExt] Add simple vectorization for map_scatter by @Max191 in #21090
[Codegen] Simplify tensor load/store padding materialization by @jtuyls in #21160
[Util] Add folder for assumes of X / C * C by @qedawkins in #21168
Integrate LLVM at c5b256a0e480 by @lialan in #21162
CMake: catch some recurring problems with LLVM configuration. by @bjacob in #21174
[Encoding] Rename testing purpose encodings to follow the convention. by @hanhanW in #21144
[VectorDistribution] Add pattern to distribute transfer_gather ops by @Groverkss in #20764
[Codegen] Support early bufferization ops in ConvertToDPS by @Max191 in #21136
[Codegen][GPU] Prevent vector transfer fission from applying on loops with side-effecting ops by @rkayaith in #21166
[Encoding] Refresh practical encodings to follow the naming convention. by @hanhanW in #21146
[ROCMTarget] Add pass for applying builtin specialization patterns by @qedawkins in #21001
[Encoding][NFC] Improve the docs for Encoding dialect. by @hanhanW in #21147
Expand all affine applies before and during Flow by @qedawkins in #21169
[LinalgExt] support converting argcompare to loops. by @bangtianliu in #21138
[CPU] Add option to LLVMCPUTileRootAndFuseProducerConsumer to tiling with scf.forall by @AaronStGeorge in #21009
[Codegen][GPU] Add a inner tiled op descriptor for scaled MMA by @krzysz00 in #21141
[Codegen][GPU] Generalize ConcretizeMmaShapes to arbitrary inner tiles by @krzysz00 in #21142
Re-enable e2e pack.mlir tests for RISC-V targets. by @hanhanW in #21179
[Codegen][Tuner] expose python binding for attention op details by @bangtianliu in #21170
[LLVMGPU] Re-run alloc hoisting after SCFToControlFlow by @rkayaith in #21193
Expand on commit access policies. by @ScottTodd in #21205
[DispatchCreation] Don't pad on attention in producer dispatch by @jtuyls in #21134
[CPU] Use option to tile with scf.forall in TileRootAndFuseProducerConsumer pass by @AaronStGeorge in #21198
Pinning ninja on the Windows CI to 1.12.1 due to a 1.13.0 bug. by @benvanik in #21208
[LinalgExt] add TilingInterface support for ArgCompareOp by @bangtianliu in #21077
Fixing typo in #21142 that was causing failures on MSVC. by @benvanik in #21211
Fixing bool->iree_status_t cast error. by @benvanik in #21212
Bump LLVM to 4ac4726d00644f6c6b0e2de1df0d00deed0015bf by @nicolasvasilache in #21175
[Codegen] Fix undefined behavior in InnerTileOp expansion by @jtuyls in #21218
[Codegen][GPU] Move LLVMGPUPrefetching pass to be invoked from only amdgpu backend by @jerryyin in #21190
[NFC] remove redundant checks in the TilingInterface by @bangtianliu in #21225
Integrate LLVM @ c73e5e3e209c by @lialan in #21224
[LinalgExt] add e2e tests for argcompare op by @bangtianliu in #21217
Revert "Force install python version 3.13.5 for windows" by @saienduri in #21215
[Codegen] Fix multiple function support in materialize user configs by @qedawkins in #21227
Fix missing return in DeviceOptimalAttr::joinOR by @rkayaith in #21228
[LinalgExt] Implement Unit Dim folding for slice dimensions by @Groverkss in #21220
[Codegen][GPU] Adding scheduling barrier between compute and write stage in prefetcher by @jerryyin in #21151
Adding IREE_ENABLE_RUNTIME_COVERAGE cmake mode. by @benvanik in #21191
[Codegen][GPU] Support fission of loops with multiple transfer_reads/writes by @rkayaith in #21213
Extending AMDGPU tests, fixing issues, and cleaning up comments. by @benvanik in #21200
[Integrate] Drop revert for vectorization API change by @Max191 in #21239
[Codegen][Tuner] expose python binding isa_attention_op by @bangtianliu in #21216
[Codegen] Fix TileLargeTensors handling of dynamic reduction dims by @qedawkins in #21244
[DispatchCreation] Add pass to hoist scalar ops out of dispatch regions by @qedawkins in #21210
Adding AMDGPU semaphore (WIP) and semaphore pool. by @benvanik in #21201
Integrate LLVM to llvm/llvm-project@a99fee69 by @yzhang93 in #21242
[Dispatch Creation] Add concat expand_shape bubbling by @dan-garvey in #21158
[ROCm] Fix typo in R9700 SKU definition. NFC. by @kuhar in #21247
[ROCMTarget] Make all pingpong arithmetic nsw and nuw by @qedawkins in #21248
[Integrate] Drop the revert of unknown type conversion in bufferization. by @hanhanW in #21243
[Codegen] Add pass to propagate constant offsets towards accesses by @qedawkins in #21236
[CodeGen] Re-enable memref::AssumeAlignmentOp for SPIRV pipelines. by @hanhanW in #21133
[Integrate] Update bufferization related codes for upstream custom types support. by @hanhanW in #21250
[Codegen] Change swizzle hint offset logic to use arith by @qedawkins in #21237
[NFC] Make internal LLVMGPU APIs for vector_distribute available by @nicolasvasilache in #21161
[VectorExt] Implement BufferizationInterface for transfer_gather by @Groverkss in #21219
[VectorExt] Implement masked vectorization for iree_linalg_ext.gather by @Groverkss in #21189
[CodeGen] Fix gather fusion on vector distribute path by @pashu123 in #21117
[StableHLO] Fix ArrayRef(std::nullopt) deprecation warnings by @qedawkins in #21257
[mlir][DispatchCreation] Avoid SSA violation due to consumer fusion while forming dispatches by @MaheshRavishankar in #21186
[CPU] Use scf.forall for TileRootAndFuseProducerConsumer by default. by @hanhanW in #21260
Bump ncipollo/release-action from 1.16.0 to 1.18.0 in the github-actions group by @dependabot[bot] in #21254
[Encoding] Add new identity encoding attribute by @jtuyls in #21258
[mlir][Codegen] Remove workaround for handling consumer fusion along multiple operands. by @MaheshRavishankar in #21171
[DT] fixup(MaterializeEncodingPatterns) remove legacy type conversions by @egebeysel in #21262
Integrate LLVM to llvm/llvm-project@5ed852f7 by @yzhang93 in #21263
Add link to the LLVM Social Bangalore talk by @pashu123 in #21265
[Codegen] Fix specialize exports never applies check by @qedawkins in #21270
[integrate|compiler] Drop carried LLVM reverts and use ub.poison in some transfer reads by @fabianmcg in #21259
[ROCM] Ping pong matmul Bf16 matcher by @sebvince in #21267
Integrate LLVM to llvm/llvm-project@e3edc1bd by @yzhang93 in #21272
[Util] Fix assume.int operand deduplication canonicalizer by @qedawkins in #21273
[Codegen] Fix lhs/rhs batch offsets size in vector contract distribution by @jtuyls in #21238
Pad OnlineAttention by @nicolasvasilache in #21152
[Codegen] Generalize ukernel strided_outer_dims and fix GPU ukernel bug by @Max191 in #21249
[Codegen] Support inner_tiled and load_from_buffer in ConvertAccGEMMToGEMMPass by @Max191 in #21245
[LinalgExt] Improve scatter unit dim folding by @IanWood1 in #21271
[Codegen] Support dynamic dimensions in collapse_shape into interface store folding by @jtuyls in #21126
[Codegen] Don't fold workgroup loops during workgroup tiling by @Max191 in #21137
[DispatchCreation] Fold collapse(expand) unit dims by @IanWood1 in #21274
[VectorDistribute][NFC] Refactor subgroup reduction distribution by @Groverkss in #21305
[DT] add control function to FoldIntoPackUnpackPatterns by @egebeysel in #21276
[CPU] Disable lowering_config propagation for Mmt4dTilingExpert pipeline by @hanhanW in #21298
[Preprocessing][NFC] Drop dependency workaround from TransposeMatmulPass. by @hanhanW in #21296
[CodeGen] Add a pass that patches func ops for debugging purpose. by @hanhanW in #21229
[Integrate] Integrate llvm-project @0f391d6f51217de5cb6735b17f359eb078bbe94e by @Max191 in #21302
[DataTiling] Enable layout transformation combination in GPU DT e2e tests by @Max191 in #21163
[DT] Improve encoding hoisting pass. by @hanhanW in #21275
[LinalgExt] Add an DeviceMappingInterfaceAttribute to tag split-k loops by @MaheshRavishankar in #21309
[CodeGen][NFC] Delete DeadMemAlloc patterns. by @hanhanW in #21310
[Codegen][AMDGPU] Allow vector distribute configuration selection to handle scf.forall from split-reduction. by @MaheshRavishankar in #21281
[Codegen][Common] Teach VerifyWorkgroupDistribution to allow scf.forall generated by split reduction. by @MaheshRavishankar in #21282
Revert "[Codegen] Don't fold workgroup loops during workgroup tiling (#21137)" by @Max191 in #21318
[Codegen] Bubble up/down reshape operations before blocking dynamic dimensions by @jtuyls in #21241
[docs] Add 2025 AsiaLLVM talk about data-tiling from Hanhan. by @hanhanW in #21308
[CPU] Introduce dictionary-based lowering_config attribute. by @hanhanW in #21312
[TensorExt] Add new operation that is a placeholder for modifying number of workgroups for split reduction. by @MaheshRavishankar in #21314
[docs] Correct default values in optimization options. by @hanhanW in #21321
[CodeGen][NFC] Retire native_vector_sizes from LoweringConfigAttr. by @hanhanW in #21322
[Codegen] Add StoreToBufferOp to vector distribution dispatch check by @jtuyls in #21294
[CodeGen][NFC] Simplify logging with LDBG for Utils.cpp. by @hanhanW in #21328
[LinalgExt] Improve Attention partial tiling new batch dimension insertion by @Groverkss in #21316
[CPU] Implement OpAsmDialectInterface for IREE::CPU::LoweringConfigAttr. by @hanhanW in #21325
[CodeGen] Make TilingConfig compatible with LoweringConfigAttrInterface. by @hanhanW in #21323
[CPU] Teach TilingConfig about IREE::CPU::LoweringConfigAttr. by @hanhanW in #21327
[GPU] Remove ROCm llvm plugin by @efric in #21311
[CPU] Switch mmt4d pipeline to use IREE::CPU::LoweringConfigAttr. by @hanhanW in #21326
[Dispatch Creation] Make producer fusable via interchange by @IanWood1 in #20977
[Codegen] Add AMDGPU specific narrow type emulation pass for AMDGPUDialect by @qedawkins in #21333
[TensorExt] Mirror a tensor_ext version of flow.tensor.bitcast by @qedawkins in #21277
Move all producing uses of flow.bitcast to tensor_ext by @qedawkins in #21279
Cleanup uses of --verify-diagnostics in tests by @qedawkins in #21340
[Integrate] Bump LLVM to 77914c96dfc55562404d18c1ab777137055679db by @Groverkss in #21341
[VectorDistribute] Fix buffer reduction during subgroup reduction by @Groverkss in #21315
[CPU] Use TilingConfig for lowering_config propagation. by @hanhanW in #21336
[CPU] Implement lowering_config propagation for IREE::CPU::LoweringConfigAttr. by @hanhanW in #21337
[Codegen][Tuner] remove decomposition attr for attention op by @bangtianliu in #21345
[CPU] Switch all LinalgExt dispatches to root-based tiling pipeline. by @hanhanW in #21338
[GlobalOpt] Delete experimental FuseSiluHorizontalMatmul pass. by @hanhanW in #21350
[Codegen] Resolve scf.forall operations created during split-reduction by @MaheshRavishankar in #21324
[Codegen][GPU] Fuse nested warp and lane foralls by @Max191 in #21295
[LinalgExt] Add decomposition for vector map_scatter by @Max191 in #21116
[DispatchCreation] Add pass to cast away unsupported element types by @qedawkins in #21339
[CPU] Improve TileRootAndFuseProducerConsumer like TileAndFuse pass. by @hanhanW in #21351
[CPU] Refactor logic to LoweringConfigGenerator. by @hanhanW in #21352
[LLVMGPU] Add canonicalization for select(pred, true, false) -> broadcast(pred) by @Groverkss in #21342
[VectorExt] Use ub.poison for padding in vector_ext vectorization by @Groverkss in #21362
[Codegen] Fix 1x1 Conv2D to Matmul pass ordering by @HalfBloodPrince010 in #21355
Fix unsupported bitcasting of complex operands by @qedawkins in #21367
[CPU][NFCI] Update tile sizes selection workaround if it has dynamic shape. by @hanhanW in #21353
[Codegen] Combine layout transformation after GPUFuseAndHoistParallelLoops by @YashDeshpande25 in #21206
[Stream] New ElideAsyncTransfersPass by @ziereis in #21029
Fix failure in Windows build. by @MaheshRavishankar in #21369
[DispatchCreation] Add a pass to split long running reduction loops. by @MaheshRavishankar in #21280
Integrate LLVM to llvm/llvm-project@3ed3a33 by @bangtianliu in #21364
[NFC] Fix smallvector size in kernel configattrs by @efric in #21348
Integrate LLVM to llvm/llvm-project@bda5602 by @bangtianliu in #21377
[CPU] Switch convolution pipelines to IREE::CPU::LoweringConfigAttr. by @hanhanW in #21347
[CPU] Use IREE::CPU::TilingLevel in TileRootAndFuseProducerConsumer pass by @hanhanW in #21370
[Stream] Fix dominance error for multi-result dispatches by @IanWood1 in #21368
[codegen][gpu] Add the iree-rocdl-use-buffer-instructions pass by @fabianmcg in #21335
[DispatchCreation] Avoid hoisting set encodings on scalar tensors by @jtuyls in #21376
[Codegen] Distribute workgroups along X dim by @Max191 in #21334
Add e2e test for split reduction using tiling. by @MaheshRavishankar in #21374
[Dispatch Creation] Remove assert and handle null map by @IanWood1 in #21380
[e2e] Adding default tuning specs tests. by @lialan in #21383
Use ShapedType::isStatic. NFC. by @kuhar in #21385
Reapply "[Codegen] Don't fold workgroup loops during workgroup tiling (#21137)" by @Max191 in #21382
Implementing AMDGPU logical/physical devices and skeleton queues. by @benvanik in #21251
Integrate LLVM to llvm/llvm-project@d9190f8 by @bangtianliu in #21388
Adding iree-hal-drivers-amdgpu-tests target. by @benvanik in #21389
[CPU] Switch pack/unpack disptaches to use IREE::CPU::LoweringConfigAttr. by @hanhanW in #21392
[CPU] Teach SplitReduction about IREE::CPU::LoweringConfigAttr. by @hanhanW in #21391
Add linalg.softmax e2e tests. by @hanhanW in #21396
[Codegen][GPU] Canonicalize to remove the empty extract slice in combineLayoutTransformation pass by @jerryyin in #21395
[Attention] Use multiple subgroups for memory bound attention by @Groverkss in #21363
[CPU] Teach TilingConfig::getVectorTileSizes about CPU lowering config. by @hanhanW in #21397
[CPU] Get rootOp based on lowering config in TileRootAndFuseProducerConsumer pass. by @hanhanW in #21394
Integrate LLVM to llvm/llvm-project@e0cce5c by @bangtianliu in #21398
[WIP] Expose multi-use fusion flag to pipeline options. by @IanWood1 in #21400
[CPU][NFC] Switch infusible pack tests to use imperfect tiling case. by @hanhanW in #21404

New Contributors

@adeel10x made their first contribution in #20263
@NoumanAmir657 made their first contribution in #21075
@zeeshanhaque21 made their first contribution in #21104
@keshavvinayak01 made their first contribution in #20598
@jitesh-gupta made their first contribution in #21120
@sebvince made their first contribution in #21267
@efric made their first contribution in #21311
@HalfBloodPrince010 made their first contribution in #21355

Full Changelog: v3.5.0...v3.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release v3.6.0

IREE Release v3.6.0

Compiler

Runtime

Change Log

What's Changed

New Contributors

Contributors

Uh oh!