CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 730
Compare
af56a47
IREE Release v3.6.0
Compiler
-
FissionTransferOpsInControlFlow pass for shared memory prefetching, improving padded convolution performance. (#21018)
-
Refactored iree_gpu.multi_mma to iree_codegen.inner_tiled, enabling arbitrarily many operands and centralizing methods. (#21000, #21062)
-
Support for distributing vector.constant_mask ops, aligning with existing mask behavior. (#20708)
-
Added scaled MMA layout descriptor attribute for supporting scale operands in MMA ops. (#21141)
-
Early bufferization ops support (store_to_buffer, load_from_buffer) in destination passing style conversion. (#21136)
-
Intrinsic sorting for GPU MMA by key alignment and size preferences. (#21128)
-
GPU intrinsic management simplified using new GPUIntrinsicType field. (#21103)
-
New #iree_gpu.promote_with_cache_swizzle attribute to control operand promotion behavior. (#21105)
-
New GPUApplyPaddingLevel pass and vectorization masking to reduce shared memory roundtrips. (#21074)
-
Support for expand_shape through tensor.concat, enabling fusion with attention ops. (#21158)
-
ROCM ping-pong matmul support for BF16 (large/medium, expanded). (#21267)
-
Bug Fixes and Robustness Updates (
#21036, #21037, #20108, #21063, #21069, #21047, #21166, #21160, #21121, #21113, #21132, #21118, #21151, #21190, #21244, #21237, #21355, #21345, #21244, #21270, #21245, #21126, #20977, #21137, #21241, #21337, #21353, #21351, #21151, #21295, #21281, #21324) -
Linalg Extension Improvements (#21021, #21138, #21090, #21106, #20263, #21220, #21217, #21116, #21338, #21316, #21309, #21189)
-
Enhanced Testing, Debugging and Documentation (#21145, #21143, #21242, #21273, #21229, #21374, #21368, #21335, #21280, #21374, #21324)
Runtime
- Added AMDGPU executable implementation with no-op cache, supporting verified, topology-wide loading and optimized kernel argument management for dispatches. (#21040)
- Enabled auto torch input conversion triggered by function argument and result types to streamline input handling. (#21067)
- Added rematerialize parallel ops support in the vector distribute pipeline to improve elementwise operation fusion. (#21073)
- Introduced skeleton AMDGPU buffer handle and handle pool with external and transient buffer types supporting async allocations and device pointer resolution. (#21044)
- Added support for group_any in iree_thread_affinity_t to assign threads to processor groups (e.g., NUMA nodes) instead of specific CPUs, aiding loosely coordinated thread pools. (#21089)
- Added _base variants for all string view integer parsing functions, aligning with standard C APIs, and cleaned up HIP driver integer parsing code. (#21086)
- Added iree_hal_amdgpu_system_t to manage shared HSA/topology/pools resources across physical devices in a logical device. (#21043)
- Added device-side AMDGPU signal and queue utility headers derived from HSA spec and ROCR implementation. (#21042)
- Implemented AMDGPU command buffer host-side and device-side, supporting recording, execution, and segmented command buffers with conditional branch groundwork. (#21123)
- Added device->host service worker to mimic HSA/AQL queue semantics for hosting device communication, enabling future tooling compatibility. (#21094)
- Added blit kernels and device-side enqueue support as initial implementations for copy operations, enabling CTS test passes. (#21057)
- Added device-side tracing macros and ringbuffer trace buffer, laying groundwork for on-device tracing interoperable with host tooling like Tracy. (#21046)
- Added AMDGPU semaphore allocation and pooling with host-side HAL support; device-side semaphore implementation and external semaphore imports are forthcoming. (#21201)
- Enhanced loop fission pass (FissionTransferOpsInControlFlow) to support loops containing multiple transfer_read/write pairs, improving IR simplification with additional pattern application. (#21213)
- Introduced IREE_ENABLE_RUNTIME_COVERAGE CMake mode to enable LLVM coverage for runtime libraries, test binaries, and tools, along with scripts to generate LCOV reports and IDE integration. (#21191)
- Added iree-hal-drivers-amdgpu-tests target to enable building all AMDGPU HAL tests together easily via IDE actions. (#21389)
- Implemented AMDGPU logical and physical devices with skeleton queues support, allowing multiple virtual queues per logical device and preparing for host- and device-side queue operations. (#21251)
- Fixes and Stability Enhancements: (#21056, #21060, #21061, #21153, #21200)
- Testing, Debuggability and Tooling: (#21046, #21191, #21389, #21094)
Change Log
Git History
What's Changed
- [Codegen][GPU] Creating FissionTransferOpsInControlFlow to assist convolution prefetching by @jerryyin in #21018
- [LLVMGPU] Add lowering strategy selection for map_scatter by @Max191 in #21034
- [LinalgExt] Add argmax op with rountrip and invalid mlir test by @bangtianliu in #21021
- Expose creation of FileHandles from FDs to python. by @AWoloszyn in #21016
- [Codegen] Generalize MultiMmaInterfaceAttr to InnerTileDescAttrInterface by @krzysz00 in #21000
- [Codegen] Fix FoldCollapseShapeIntoInterfaceTensorStoreFullSlice by @IanWood1 in #21036
- Adding
iree_hal_amdgpu_executable_t
implementation + no-op cache. by @benvanik in #21040 - [GPU] Handle transient private values in control-flow when prefetching by @nirvedhmeshram in #21037
- Ensuring unique names for outlined
hal.dispatch.extern
ops. by @benvanik in #21055 - [NFC] simplify check in scf.if stage selection when prefetching by @nirvedhmeshram in #21054
- Only consider executables with no external variants for linking. by @benvanik in #21056
- Translate flat operand index into segment relative index. by @benvanik in #21060
- [Dispatch] Only bubble reshapes when possibly blocking fusion by @IanWood1 in #20108
- [mlir][GPU] Make small reductions go down tile and fuse pipeline. by @MaheshRavishankar in #21063
- Bump llvm to llvm/llvm-project@80ea5f46df3e by @pashu123 in #21065
- Triggering auto torch input conversion based on func arg/result types. by @benvanik in #21067
- [Codegen] Add reshape map_scatter folding to BlockDynamicDimensions by @Max191 in #21047
- Add default option to only do loop fission for unit trip loops by @nirvedhmeshram in #21069
- [GPU] Add rematerialize parallel ops in the vector distribute pipeline by @pashu123 in #21073
- Bump version to 3.6.0 after 3.5.0 release. by @ScottTodd in #21078
- Guard HIP macro against redefinition by @erieaton-amd in #21061
- Bump llvm to llvm/llvm-project@0a64630 by @pashu123 in #21072
- Bump llvm to llvm/llvm-project@bc7ea63 by @pashu123 in #21083
- [build] Fix Bazel dependency in EncodingUtils for shared library builds by @AGindinson in #21085
- [LLVMGPU] Enable hip e2e tests for map_scatter by @Max191 in #21079
- Adding dummy AMDGPU channel/event. by @benvanik in #21041
- Adding device-side AMDGPU signal/queue utils. by @benvanik in #21042
- Adding iree_hal_amdgpu_system_t to manage HSA/topology/pools. by @benvanik in #21043
- Adding _base variants of all string view int parsing and clean up hip options. by @benvanik in #21086
- Adding support for group_any in iree_thread_affinity_t. by @benvanik in #21089
- Adding skeleton AMDGPU buffer handle and handle pool. by @benvanik in #21044
- Adding skeleton AMDGPU allocator. by @benvanik in #21093
- [LLVMGPU] Delete LLVMGPUPadAndVectorDistribute by @Groverkss in #21095
- [VectorDistribution] Add support for distributing vector.constant_mask by @Groverkss in #20708
- [Codegen] Generalize iree_gpu.multi_mma to iree_codegen.inner_tiled by @krzysz00 in #21062
- Don't erase the target executable in the loop using it. by @benvanik in #21097
- Implement PartitionableLoopsInterface for tensor.concat by @IanWood1 in #21082
- [LLVMGPU] Add relayout combination behind a flag by @Max191 in #21076
- [Encoding][LLVMGPU] Add encoding fusion e2e test by @Max191 in #21088
- Enable the linalg.mmt4d operation and add mmt4d microkernels for the riscv64 by @adeel10x in #20263
- [HAL] Refactor memory property attributes by @ziereis in #21005
- [runtime] Add riscv
pause
instruction for spinning by @NoumanAmir657 in #21075 - Reland "[Codegen][ROCDL] Drop nominal support for dynamic shared mem (#21020)" by @MaheshRavishankar in #21102
- Switch experimental to false on windows release packages by @zeeshanhaque21 in #21104
- [Codegen] Port AMDGPU device lib implementations to MLIR rewrites by @keshavvinayak01 in #20598
- Bump dawidd6/action-download-artifact from 10 to 11 in the github-actions group by @dependabot[bot] in #21109
- Adding device-side tracing macros and a device-side trace buffer. by @benvanik in #21046
- Adding blit kernels and device-side enqueuing. by @benvanik in #21057
- Adding skeleton device->host service worker. by @benvanik in #21094
- Add padding, masking and fold vector.transfer_write -> vector.transfer_read to avoid memory roundtrips by @nicolasvasilache in #21074
- [Codegen][GPU] Move operand promotion control to attribute interface by @qedawkins in #21098
- [HIP] Emit error for non-zero dynamic shared memory by @qedawkins in #21118
- [Codegen][GPU] Add promotion attribute for setting cache swizzling by @qedawkins in #21105
- Force install python version 3.13.5 for windows by @jitesh-gupta in #21120
- Add tiling interface to
tensor.concat
by @IanWood1 in #21081 - [Codegen][GPU] Sort intrinsic according to k alignment - Step 1 of 2- Track MmaInterfaceAttr via field instead of index by @jerryyin in #21103
- [NFC] Extract common reshape patterns to dedicated file by @jtuyls in #21111
- Extract reshape into interface folding tests into dedicated file by @jtuyls in #21112
- [Flow] Fix crash when flow.return has no operands by @IanWood1 in #21132
- [GlobalOpt] Don't modify concat in dispatch by @IanWood1 in #21129
- Change linux arm64 runners to newly available github hosted runners by @jitesh-gupta in #21131
- [compiler] remove uses of memref::ExpandOps pass by @ftynse in #21113
- [Dispatch Creation] Improve extract_slice expand_shape bubbling by @IanWood1 in #21121
- [LinalgExt] fix arg_compare op with region and start index by @bangtianliu in #21106
- Integrate LLVM at 029f8892 by @bjacob in #21140
- [Flow] Improve reduction dispatch names by @IanWood1 in #21139
- [doc] Add tips of reading input from a file by @jinchen62 in #21143
- [doc] Fix tip render to mkdocs style by @jinchen62 in #21145
- Integrate LLVM at 836201f by @bjacob in #21148
- [Codegen][GPU] Sort intrinsic according to k alignment - Step 2 of 2 - Creating intrinsic sort routine by @jerryyin in #21128
- Integrate LLVM at 227f759644 by @bjacob in #21156
- Adding AMDGPU command buffer implementation. by @benvanik in #21123
- [VectorDistribute] Implement layout analysis for transfer_gather by @Groverkss in #21164
- Don't use dl_tensor.byte_offset when exporting capsules. by @AWoloszyn in #21153
- [LinalgExt] Add simple vectorization for map_scatter by @Max191 in #21090
- [Codegen] Simplify tensor load/store padding materialization by @jtuyls in #21160
- [Util] Add folder for assumes of X / C * C by @qedawkins in #21168
- Integrate LLVM at c5b256a0e480 by @lialan in #21162
- CMake: catch some recurring problems with LLVM configuration. by @bjacob in #21174
- [Encoding] Rename testing purpose encodings to follow the convention. by @hanhanW in #21144
- [VectorDistribution] Add pattern to distribute transfer_gather ops by @Groverkss in #20764
- [Codegen] Support early bufferization ops in ConvertToDPS by @Max191 in #21136
- [Codegen][GPU] Prevent vector transfer fission from applying on loops with side-effecting ops by @rkayaith in #21166
- [Encoding] Refresh practical encodings to follow the naming convention. by @hanhanW in #21146
- [ROCMTarget] Add pass for applying builtin specialization patterns by @qedawkins in #21001
- [Encoding][NFC] Improve the docs for Encoding dialect. by @hanhanW in #21147
- Expand all affine applies before and during Flow by @qedawkins in #21169
- [LinalgExt] support converting argcompare to loops. by @bangtianliu in #21138
- [CPU] Add option to
LLVMCPUTileRootAndFuseProducerConsumer
to tiling withscf.forall
by @AaronStGeorge in #21009 - [Codegen][GPU] Add a inner tiled op descriptor for scaled MMA by @krzysz00 in #21141
- [Codegen][GPU] Generalize ConcretizeMmaShapes to arbitrary inner tiles by @krzysz00 in #21142
- Re-enable e2e pack.mlir tests for RISC-V targets. by @hanhanW in #21179
- [Codegen][Tuner] expose python binding for attention op details by @bangtianliu in #21170
- [LLVMGPU] Re-run alloc hoisting after SCFToControlFlow by @rkayaith in #21193
- Expand on commit access policies. by @ScottTodd in #21205
- [DispatchCreation] Don't pad on attention in producer dispatch by @jtuyls in #21134
- [CPU] Use option to tile with
scf.forall
in TileRootAndFuseProducerConsumer pass by @AaronStGeorge in #21198 - Pinning ninja on the Windows CI to 1.12.1 due to a 1.13.0 bug. by @benvanik in #21208
- [LinalgExt] add TilingInterface support for ArgCompareOp by @bangtianliu in #21077
- Fixing typo in #21142 that was causing failures on MSVC. by @benvanik in #21211
- Fixing bool->iree_status_t cast error. by @benvanik in #21212
- Bump LLVM to 4ac4726d00644f6c6b0e2de1df0d00deed0015bf by @nicolasvasilache in #21175
- [Codegen] Fix undefined behavior in InnerTileOp expansion by @jtuyls in #21218
- [Codegen][GPU] Move LLVMGPUPrefetching pass to be invoked from only amdgpu backend by @jerryyin in #21190
- [NFC] remove redundant checks in the TilingInterface by @bangtianliu in #21225
- Integrate LLVM @ c73e5e3e209c by @lialan in #21224
- [LinalgExt] add e2e tests for argcompare op by @bangtianliu in #21217
- Revert "Force install python version 3.13.5 for windows" by @saienduri in #21215
- [Codegen] Fix multiple function support in materialize user configs by @qedawkins in #21227
- Fix missing return in
DeviceOptimalAttr::joinOR
by @rkayaith in #21228 - [LinalgExt] Implement Unit Dim folding for slice dimensions by @Groverkss in #21220
- [Codegen][GPU] Adding scheduling barrier between compute and write stage in prefetcher by @jerryyin in #21151
- Adding IREE_ENABLE_RUNTIME_COVERAGE cmake mode. by @benvanik in #21191
- [Codegen][GPU] Support fission of loops with multiple transfer_reads/writes by @rkayaith in #21213
- Extending AMDGPU tests, fixing issues, and cleaning up comments. by @benvanik in #21200
- [Integrate] Drop revert for vectorization API change by @Max191 in #21239
- [Codegen][Tuner] expose python binding isa_attention_op by @bangtianliu in #21216
- [Codegen] Fix TileLargeTensors handling of dynamic reduction dims by @qedawkins in #21244
- [DispatchCreation] Add pass to hoist scalar ops out of dispatch regions by @qedawkins in #21210
- Adding AMDGPU semaphore (WIP) and semaphore pool. by @benvanik in #21201
- Integrate LLVM to llvm/llvm-project@a99fee69 by @yzhang93 in #21242
- [Dispatch Creation] Add concat expand_shape bubbling by @dan-garvey in #21158
- [ROCm] Fix typo in R9700 SKU definition. NFC. by @kuhar in #21247
- [ROCMTarget] Make all pingpong arithmetic nsw and nuw by @qedawkins in #21248
- [Integrate] Drop the revert of unknown type conversion in bufferization. by @hanhanW in #21243
- [Codegen] Add pass to propagate constant offsets towards accesses by @qedawkins in #21236
- [CodeGen] Re-enable memref::AssumeAlignmentOp for SPIRV pipelines. by @hanhanW in #21133
- [Integrate] Update bufferization related codes for upstream custom types support. by @hanhanW in #21250
- [Codegen] Change swizzle hint offset logic to use arith by @qedawkins in #21237
- [NFC] Make internal LLVMGPU APIs for vector_distribute available by @nicolasvasilache in #21161
- [VectorExt] Implement BufferizationInterface for transfer_gather by @Groverkss in #21219
- [VectorExt] Implement masked vectorization for iree_linalg_ext.gather by @Groverkss in #21189
- [CodeGen] Fix gather fusion on vector distribute path by @pashu123 in #21117
- [StableHLO] Fix ArrayRef(std::nullopt) deprecation warnings by @qedawkins in #21257
- [mlir][DispatchCreation] Avoid SSA violation due to consumer fusion while forming dispatches by @MaheshRavishankar in #21186
- [CPU] Use scf.forall for TileRootAndFuseProducerConsumer by default. by @hanhanW in #21260
- Bump ncipollo/release-action from 1.16.0 to 1.18.0 in the github-actions group by @dependabot[bot] in #21254
- [Encoding] Add new identity encoding attribute by @jtuyls in #21258
- [mlir][Codegen] Remove workaround for handling consumer fusion along multiple operands. by @MaheshRavishankar in #21171
- [DT] fixup(MaterializeEncodingPatterns) remove legacy type conversions by @egebeysel in #21262
- Integrate LLVM to llvm/llvm-project@5ed852f7 by @yzhang93 in #21263
- Add link to the LLVM Social Bangalore talk by @pashu123 in #21265
- [Codegen] Fix specialize exports never applies check by @qedawkins in #21270
- [integrate|compiler] Drop carried LLVM reverts and use
ub.poison
in some transfer reads by @fabianmcg in #21259 - [ROCM] Ping pong matmul Bf16 matcher by @sebvince in #21267
- Integrate LLVM to llvm/llvm-project@e3edc1bd by @yzhang93 in #21272
- [Util] Fix assume.int operand deduplication canonicalizer by @qedawkins in #21273
- [Codegen] Fix lhs/rhs batch offsets size in vector contract distribution by @jtuyls in #21238
- Pad OnlineAttention by @nicolasvasilache in #21152
- [Codegen] Generalize ukernel strided_outer_dims and fix GPU ukernel bug by @Max191 in #21249
- [Codegen] Support inner_tiled and load_from_buffer in ConvertAccGEMMToGEMMPass by @Max191 in #21245
- [LinalgExt] Improve scatter unit dim folding by @IanWood1 in #21271
- [Codegen] Support dynamic dimensions in collapse_shape into interface store folding by @jtuyls in #21126
- [Codegen] Don't fold workgroup loops during workgroup tiling by @Max191 in #21137
- [DispatchCreation] Fold collapse(expand) unit dims by @IanWood1 in #21274
- [VectorDistribute][NFC] Refactor subgroup reduction distribution by @Groverkss in #21305
- [DT] add control function to FoldIntoPackUnpackPatterns by @egebeysel in #21276
- [CPU] Disable lowering_config propagation for Mmt4dTilingExpert pipeline by @hanhanW in #21298
- [Preprocessing][NFC] Drop dependency workaround from TransposeMatmulPass. by @hanhanW in #21296
- [CodeGen] Add a pass that patches func ops for debugging purpose. by @hanhanW in #21229
- [Integrate] Integrate llvm-project @0f391d6f51217de5cb6735b17f359eb078bbe94e by @Max191 in #21302
- [DataTiling] Enable layout transformation combination in GPU DT e2e tests by @Max191 in #21163
- [DT] Improve encoding hoisting pass. by @hanhanW in #21275
- [LinalgExt] Add an
DeviceMappingInterfaceAttribute
to tag split-k loops by @MaheshRavishankar in #21309 - [CodeGen][NFC] Delete DeadMemAlloc patterns. by @hanhanW in #21310
- [Codegen][AMDGPU] Allow vector distribute configuration selection to handle
scf.forall
from split-reduction. by @MaheshRavishankar in #21281 - [Codegen][Common] Teach
VerifyWorkgroupDistribution
to allowscf.forall
generated by split reduction. by @MaheshRavishankar in #21282 - Revert "[Codegen] Don't fold workgroup loops during workgroup tiling (#21137)" by @Max191 in #21318
- [Codegen] Bubble up/down reshape operations before blocking dynamic dimensions by @jtuyls in #21241
- [docs] Add 2025 AsiaLLVM talk about data-tiling from Hanhan. by @hanhanW in #21308
- [CPU] Introduce dictionary-based lowering_config attribute. by @hanhanW in #21312
- [TensorExt] Add new operation that is a placeholder for modifying number of workgroups for split reduction. by @MaheshRavishankar in #21314
- [docs] Correct default values in optimization options. by @hanhanW in #21321
- [CodeGen][NFC] Retire native_vector_sizes from LoweringConfigAttr. by @hanhanW in #21322
- [Codegen] Add StoreToBufferOp to vector distribution dispatch check by @jtuyls in #21294
- [CodeGen][NFC] Simplify logging with LDBG for Utils.cpp. by @hanhanW in #21328
- [LinalgExt] Improve Attention partial tiling new batch dimension insertion by @Groverkss in #21316
- [CPU] Implement OpAsmDialectInterface for IREE::CPU::LoweringConfigAttr. by @hanhanW in #21325
- [CodeGen] Make TilingConfig compatible with LoweringConfigAttrInterface. by @hanhanW in #21323
- [CPU] Teach TilingConfig about IREE::CPU::LoweringConfigAttr. by @hanhanW in #21327
- [GPU] Remove ROCm llvm plugin by @efric in #21311
- [CPU] Switch mmt4d pipeline to use IREE::CPU::LoweringConfigAttr. by @hanhanW in #21326
- [Dispatch Creation] Make producer fusable via interchange by @IanWood1 in #20977
- [Codegen] Add AMDGPU specific narrow type emulation pass for AMDGPUDialect by @qedawkins in #21333
- [TensorExt] Mirror a tensor_ext version of flow.tensor.bitcast by @qedawkins in #21277
- Move all producing uses of flow.bitcast to tensor_ext by @qedawkins in #21279
- Cleanup uses of --verify-diagnostics in tests by @qedawkins in #21340
- [Integrate] Bump LLVM to 77914c96dfc55562404d18c1ab777137055679db by @Groverkss in #21341
- [VectorDistribute] Fix buffer reduction during subgroup reduction by @Groverkss in #21315
- [CPU] Use TilingConfig for lowering_config propagation. by @hanhanW in #21336
- [CPU] Implement lowering_config propagation for IREE::CPU::LoweringConfigAttr. by @hanhanW in #21337
- [Codegen][Tuner] remove decomposition attr for attention op by @bangtianliu in #21345
- [CPU] Switch all LinalgExt dispatches to root-based tiling pipeline. by @hanhanW in #21338
- [GlobalOpt] Delete experimental FuseSiluHorizontalMatmul pass. by @hanhanW in #21350
- [Codegen] Resolve scf.forall operations created during split-reduction by @MaheshRavishankar in #21324
- [Codegen][GPU] Fuse nested warp and lane foralls by @Max191 in #21295
- [LinalgExt] Add decomposition for vector map_scatter by @Max191 in #21116
- [DispatchCreation] Add pass to cast away unsupported element types by @qedawkins in #21339
- [CPU] Improve TileRootAndFuseProducerConsumer like TileAndFuse pass. by @hanhanW in #21351
- [CPU] Refactor logic to LoweringConfigGenerator. by @hanhanW in #21352
- [LLVMGPU] Add canonicalization for select(pred, true, false) -> broadcast(pred) by @Groverkss in #21342
- [VectorExt] Use ub.poison for padding in vector_ext vectorization by @Groverkss in #21362
- [Codegen] Fix 1x1 Conv2D to Matmul pass ordering by @HalfBloodPrince010 in #21355
- Fix unsupported bitcasting of complex operands by @qedawkins in #21367
- [CPU][NFCI] Update tile sizes selection workaround if it has dynamic shape. by @hanhanW in #21353
- [Codegen] Combine layout transformation after GPUFuseAndHoistParallelLoops by @YashDeshpande25 in #21206
- [Stream] New ElideAsyncTransfersPass by @ziereis in #21029
- Fix failure in Windows build. by @MaheshRavishankar in #21369
- [DispatchCreation] Add a pass to split long running reduction loops. by @MaheshRavishankar in #21280
- Integrate LLVM to llvm/llvm-project@3ed3a33 by @bangtianliu in #21364
- [NFC] Fix smallvector size in kernel configattrs by @efric in #21348
- Integrate LLVM to llvm/llvm-project@bda5602 by @bangtianliu in #21377
- [CPU] Switch convolution pipelines to IREE::CPU::LoweringConfigAttr. by @hanhanW in #21347
- [CPU] Use IREE::CPU::TilingLevel in TileRootAndFuseProducerConsumer pass by @hanhanW in #21370
- [Stream] Fix dominance error for multi-result dispatches by @IanWood1 in #21368
- [codegen][gpu] Add the
iree-rocdl-use-buffer-instructions
pass by @fabianmcg in #21335 - [DispatchCreation] Avoid hoisting set encodings on scalar tensors by @jtuyls in #21376
- [Codegen] Distribute workgroups along X dim by @Max191 in #21334
- Add e2e test for split reduction using tiling. by @MaheshRavishankar in #21374
- [Dispatch Creation] Remove assert and handle null map by @IanWood1 in #21380
- [e2e] Adding default tuning specs tests. by @lialan in #21383
- Use
ShapedType::isStatic
. NFC. by @kuhar in #21385 - Reapply "[Codegen] Don't fold workgroup loops during workgroup tiling (#21137)" by @Max191 in #21382
- Implementing AMDGPU logical/physical devices and skeleton queues. by @benvanik in #21251
- Integrate LLVM to llvm/llvm-project@d9190f8 by @bangtianliu in #21388
- Adding
iree-hal-drivers-amdgpu-tests
target. by @benvanik in #21389 - [CPU] Switch pack/unpack disptaches to use IREE::CPU::LoweringConfigAttr. by @hanhanW in #21392
- [CPU] Teach SplitReduction about IREE::CPU::LoweringConfigAttr. by @hanhanW in #21391
- Add linalg.softmax e2e tests. by @hanhanW in #21396
- [Codegen][GPU] Canonicalize to remove the empty extract slice in combineLayoutTransformation pass by @jerryyin in #21395
- [Attention] Use multiple subgroups for memory bound attention by @Groverkss in #21363
- [CPU] Teach TilingConfig::getVectorTileSizes about CPU lowering config. by @hanhanW in #21397
- [CPU] Get rootOp based on lowering config in TileRootAndFuseProducerConsumer pass. by @hanhanW in #21394
- Integrate LLVM to llvm/llvm-project@e0cce5c by @bangtianliu in #21398
- [WIP] Expose multi-use fusion flag to pipeline options. by @IanWood1 in #21400
- [CPU][NFC] Switch infusible pack tests to use imperfect tiling case. by @hanhanW in #21404
New Contributors
- @adeel10x made their first contribution in #20263
- @NoumanAmir657 made their first contribution in #21075
- @zeeshanhaque21 made their first contribution in #21104
- @keshavvinayak01 made their first contribution in #20598
- @jitesh-gupta made their first contribution in #21120
- @sebvince made their first contribution in #21267
- @efric made their first contribution in #21311
- @HalfBloodPrince010 made their first contribution in #21355
Full Changelog: v3.5.0...v3.6.0