Propagate binding correlation information to LLVM codegen for mutable binding fusion by kimm240 · Pull Request #23184 · iree-org/iree

kimm240 · 2026-01-19T05:23:28Z

Problem

When --iree-stream-resource-alias-mutable-bindings=true is enabled, the FuseDispatchBindings pass fuses multiple bindings that refer to the same underlying storage. However, the correlation information about which bindings alias or don't alias was not propagated to LLVM code generation, causing LLVM's alias analysis to be overly conservative. This prevents optimizations like vectorization, especially in cases with multiple in-place operands (e.g., the sort_3d test case with two in-place operands of different types).

This addresses issue #7737.

Solution

We propagate binding correlation information from the Stream dialect through to LLVM code generation by:

Storing correlation attributes in Stream dialect: The FuseDispatchBindings pass now attaches stream.binding_correlation and stream.binding_noalias attributes to function arguments, indicating which bindings are in the same correlation group (same resource, different offsets) or point to different resources.
Reading attributes in LLVM codegen: The ConvertFunc class in LLVMGPU/ConvertToLLVM.cpp reads these attributes and applies LLVM noalias attributes to function arguments, enabling LLVM's alias analysis to make better optimization decisions.

Implementation Details

FuseDispatchBindings Pass (`FuseDispatchBindings.cpp`)

In updateExecutableSignature, we now generate two types of attributes:

stream.binding_correlation: Indicates bindings in the same correlation group (same resource, different offsets - they don't alias each other)
stream.binding_noalias: Indicates bindings pointing to different resources (guaranteed not to alias)

// For bindings in different correlation groups (different resources)
if (binding.value().correlationMap != otherBinding.value().correlationMap) {
  noaliasIndices.push_back(IntegerAttr::get(..., otherBinding.index()));
}
if (!noaliasIndices.empty()) {
  funcOp.setArgAttr(bindingArg.getArgNumber(), "stream.binding_noalias",
                   ArrayAttr::get(funcOp.getContext(), noaliasIndices));
}

LLVMGPU ConvertToLLVM (`LLVMGPU/ConvertToLLVM.cpp`)

The ConvertFunc class reads the correlation attributes from the original function and applies LLVM noalias attributes:

// Read noalias information
if (auto noaliasAttr = funcOp.getArgAttrOfType<ArrayAttr>(
        binding, "stream.binding_noalias")) {
  // ... process and store ...
}

// Apply LLVM noalias attribute
newFuncOp.setArgAttr(idx, LLVM::LLVMDialect::getNoAliasAttrName(), unit);

Verification

The implementation has been verified with the sort_3d test case:

IR was made by
../iree-build/tools/iree-compile --iree-hal-target-device=cuda --iree-cuda-target=sm_86 --mlir-print-ir-after-all test.mlir -o test.vmfb 2> ir_dump.log.

test.mlir is it. which is @sort_3d test in dispatch_linalg_on_tensors.mlir :

util.func public @sort_3d(%arg0: tensor<?x?x?xi32>, %arg1 : tensor<?x?x?xf32>)  
    -> (tensor<?x?x?xi32>, tensor<?x?x?xf32>) {  
  %0, %1 = iree_linalg_ext.sort dimension(0)  
      outs(%arg0, %arg1 : tensor<?x?x?xi32>, tensor<?x?x?xf32>) {  
      ^bb0(%arg2: i32, %arg3: i32, %arg4 : f32, %arg5 : f32):  
        %2 = arith.cmpf ogt, %arg4, %arg5 : f32  
        iree_linalg_ext.yield %2 : i1  
      } -> tensor<?x?x?xi32>, tensor<?x?x?xf32>  
  util.return %0, %1 : tensor<?x?x?xi32>, tensor<?x?x?xf32>  
}

Stream dialect: stream.binding_noalias attributes are correctly generated:

func.func @dispatch(
  %arg0: !stream.binding {stream.binding_noalias = [1 : i32]},
  %arg1: !stream.binding {stream.binding_noalias = [0 : i32]}
)

It is in after FuseDispatchBindingsPass.

LLVM dialect: llvm.noalias attributes are correctly applied:

llvm.func @dispatch(
  %arg0: !llvm.ptr<1> {llvm.noalias, llvm.nonnull, llvm.noundef},
  %arg1: !llvm.ptr<1> {llvm.noalias, llvm.nonnull, llvm.noundef}
)

It is in after ConvertToNVVMPass.

Future Work

Implement similar support for LLVMCPU backend (currently only LLVMGPU is supported)
Consider using alias.scope metadata for more precise alias information (as mentioned in the TODO comments)

krzysz00

Is this going to be stacked on top of another PR that implements these TODOs? Otherwise, I figure gesturing at a feature that doesn't quite exist doesn't really make sense

krzysz00 · 2026-01-19T16:10:50Z

+      if (auto correlationAttr = funcOp.getArgAttrOfType<ArrayAttr>(
+              binding, "stream.binding_correlation")) {
+        SmallVector<int64_t> correlatedBindings;
+        for (auto attr : correlationAttr) {


Can we spell out the types here?

... more importantly

Can we turn this as corralatedBindings.getAsValueRange

If you're worried about the types, you can add a dialect attribute verifier to stream tAt'll check the format of these attributes

I fix it in commit commit 0638ea0.

krzysz00 · 2026-01-19T16:20:52Z

+      if (correlationIt != bindingCorrelationMap.end() && !correlationIt->second.empty()) {
+        // This binding is in a correlation group. The bindings in the same
+        // group don't alias each other (different offsets).
+        // TODO: In the future, we could use alias.scope metadata to be more


Ah, this is where the code will go for annotating the different subspans of a binding as noasias etc.

Some of that would be made simpler by something like https://discourse.llvm.org/t/n-ary-separate-objects-non-argument-noalias-groups/88420 landing.

I can also flag #22236 as an interesting related piece of work

Thanks for the guidance! I'm very grateful for the high-level context and the links you shared.

Regarding the TODOs: I agree that gesturing at features that don't yet exist can be confusing. I've removed those comments in commit faf30dd.

However, seeing the discussions in the LLVM community you shared, I’m open to exploring the implementation of that.

MaheshRavishankar

Not reviewed the code yet, but I think for the code that lowers to LLVM GPU, we should be OK just adding noalias to all operands. The fact that they are the same bindings shouldnt matter, cause the generated LLVM function takes a different pointer for each argument. One or more pointers could have been set by using the same base but different offsets, but LLVM shouldnt care about this.

kimm240 · 2026-01-20T02:37:28Z

@MaheshRavishankar
The reason I implemented the correlation tracking is to be conservative and safe. While most GPU dispatch arguments don't overlap, I was concerned about cases where multiple bindings might point to the exact same resource (true aliasing).

If we can guarantee that IREE's high-level flow/stream always produces non-overlapping subspans for separate bindings, then simply applying noalias to all operands would be simpler.

MaheshRavishankar · 2026-01-20T02:48:46Z

@MaheshRavishankar
The reason I implemented the correlation tracking is to be conservative and safe. While most GPU dispatch arguments don't overlap, I was concerned about cases where multiple bindings might point to the exact same resource (true aliasing).

If we can guarantee that IREE's high-level flow/stream always produces non-overlapping subspans for separate bindings, then simply applying noalias to all operands would be simpler.

Yes. It is guaranteed.

kuhar · 2026-01-20T02:51:58Z

Two subspans into the same binding may overlap, distinct bindings do not

kimm240 · 2026-01-20T06:03:40Z

I completely agree that applying a blanket noalias to all pointer arguments is a simpler approach and works effectively under IREE's current guarantee that distinct bindings never overlap.

However, I have preserved the correlation tracking logic in this PR to prepare for the following future scenarios:
Foundation for alias.scope Metadata:

As discussed in the LLVM forum thread, the next step for maximizing GPU performance is moving beyond function attributes to granular !alias.scope and !noalias metadata. And My implementation provides the exact 'pointer grouping' information required to emit this metadata.

While distinct bindings don't overlap, subspans within the same binding might. The correlation logic is the only way to distinguish these cases. By keeping this infrastructure, we can safely optimize instructions even when multiple subspans are fused into a single binding argument.

Relying on a global "no-overlap" guarantee is fine for now, but explicit correlation makes the codegen much more resilient to future changes in resource management or binding fusion strategies.

I’d appreciate it if we could keep this logic as a foundation for upcoming metadata work.

MaheshRavishankar · 2026-01-20T16:23:51Z

Two subspans into the same binding may overlap, distinct bindings do not

Are you sure? Two subspans into the same binding I think do not overlap. Thats how the combine bindings in Stream works. @benvanik could you confirm that in a dispatch the binding subspans do not overlap.

kuhar · 2026-01-20T19:50:24Z

Two subspans into the same binding may overlap, distinct bindings do not

Are you sure? Two subspans into the same binding I think do not overlap.

I remember debugging an elementwise dispatch that had two inputs coming from the same buffer at different base offsets.

MaheshRavishankar · 2026-01-20T21:55:47Z

Ok, maybe I am getting confused. I haven't looked at in a while. The subspans are independent, that is guaranteed. Whether the offset is added to the base inside the kernel or outside. I don't remember now. Would be good to check

krzysz00 · 2026-01-20T22:17:02Z

... the subspan condition is actually slightly weaker, per a bunch of previous discussions I had with @benvanik on the writeonly stuff. The summary of that is that, as best as I can recall, we obey Rust rules here: a given region of memory can either be covered by N immutable subspans (including 0) or exactly 1 mutable subspan.

So multiple read-only subspans aliasing each other is allowed, though one could probably get away with noaliasing them anyway.

MaheshRavishankar · 2026-01-21T17:26:02Z

... the subspan condition is actually slightly weaker, per a bunch of previous discussions I had with @benvanik on the writeonly stuff. The summary of that is that, as best as I can recall, we obey Rust rules here: a given region of memory can either be covered by N immutable subspans (including 0) or exactly 1 mutable subspan.

So multiple read-only subspans aliasing each other is allowed, though one could probably get away with noaliasing them anyway.

Well, then if there is a way to say the reads dont alias with the writes, thats enough I guess. I am not blocking on this PR... but the logic seems more complicated, and I thought we could just use some existing gaurantees. I am not reviewing this actively. If @kuhar and @krzysz00 are fine with this, it works for me.

kimm240 · 2026-02-04T02:01:02Z

Hi all! Regarding the previous discussion, as @kuhar and @krzysz00 mentioned, read-only subspans can indeed overlap, so I believe the current correlation tracking logic is necessary for safety and for the future alias.scope metadata work.

If you think any further refinements or additional implementations are needed to better align this with IREE's architecture, please let me know. I’m more than happy to follow up and implement them.

kuhar

Can we add some tests for this?

kimm240 · 2026-02-04T09:43:51Z

@kuhar
This test confirms that FuseDispatchBindings correctly generates stream.binding_noalias when distinct resources are allocated and alias_mutable_bindings is set to false. It ensures the analysis logic properly identifies non-aliasing bindings at the Stream level.

Please Give me any feedback if there is an issue in test.

kuhar · 2026-02-05T00:39:15Z

+      int64_t binding = subspan.getBinding().getSExtValue();
+      // Try to read correlation information from the original function.
+      // Note: This assumes the correlation attribute was preserved through
+      // Stream → HAL conversion. If not, we'll need to preserve it explicitly.


nit: stick to ascii

Suggested change

// Stream → HAL conversion. If not, we'll need to preserve it explicitly.

// Stream -> HAL conversion. If not, we'll need to preserve it explicitly.

I fix it in commit ca20bad

kuhar · 2026-02-05T00:40:44Z

+// After ConvertToNVVMPass: llvm.noalias attributes should be applied.
+// LLVM-LABEL: llvm.func @dispatch
+// LLVM-SAME: (%[[LLVM_ARG0:.+]]: !llvm.ptr {llvm.align = 16 : i32, llvm.noalias, llvm.nonnull, llvm.noundef},
+// LLVM-SAME:  %[[LLVM_ARG1:.+]]: !llvm.ptr {llvm.align = 16 : i32, llvm.noalias, llvm.nonnull, llvm.noundef}


These are never checked

I delete it for clarity.

kuhar · 2026-02-05T00:41:04Z

+// 2. LLVM dialect: After ConvertToNVVMPass, llvm.noalias attributes are
+//    applied to function arguments


this one is missing from the test

@kuhar
I add test for LLVM dialect in commit 6c07283

kimm240 · 2026-03-16T04:46:46Z

Hi @kuhar @krzysz00, just a gentle ping on this PR.
I addressed all @kuhar 's previous feedback, including adding the requested NVVM/LLVM dialect tests and fixing the ascii comments, back in early February.
Could you please take another look when you have a chance? Please let me know if there are any further refinements needed. Thanks for your time!

kuhar · 2026-03-16T20:24:58Z

Can you rebase this? We plan to have a release in a couple of days, I'd hold off with merging this until it's out.

kimm240 · 2026-03-17T00:56:09Z

Hi @kuhar, thank you for the update. I have already rebased this PR onto the latest main. I'm happy to wait until the upcoming release is out. Thanks again for the review!

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

…tchBindings.cpp Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

kimm240 · 2026-04-09T02:33:59Z

Hi @kuhar! As requested, I've rebased the PR onto the latest main branch again. Please let me know when you have a chance to take a look. Thanks!

kimm240 · 2026-04-14T09:20:41Z

@kuhar @krzysz00 @bangtianliu
Prev CI is reproducibly failing on llama_8b_fp16/quality_gfx942 with an output mismatch, so this may indicate a correctness issue. (Latest Commit may solve it, but I can't sure what it is.)

If this is confirmed as a correctness bug, we likely need to decide whether this is:

acceptable drift from aliasing-related optimization (and therefore an expected/weight update case), or
a logic issue that requires making noalias propagation more conservative.

I can’t reliably reproduce this exact test locally in my current environment, so I can’t confidently conclude which direction is correct yet.

Guidance on the preferred direction would be greatly appreciated.

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

kimm240 · 2026-04-17T01:43:34Z

Change FfileCheck for this pr

kimm240 requested review from Groverkss, MaheshRavishankar, Max191, benvanik, hanhanW, krzysz00, kuhar, nirvedhmeshram and qedawkins as code owners January 19, 2026 05:23

bangtianliu reviewed Jan 19, 2026

View reviewed changes

Comment thread compiler/src/iree/compiler/Dialect/Stream/Transforms/FuseDispatchBindings.cpp Outdated

krzysz00 reviewed Jan 19, 2026

View reviewed changes

MaheshRavishankar reviewed Jan 20, 2026

View reviewed changes

kuhar reviewed Jan 20, 2026

View reviewed changes

kimm240 force-pushed the kimm240/alias branch 2 times, most recently from 9ff7ddc to 207ad2e Compare January 20, 2026 05:18

kuhar reviewed Feb 4, 2026

View reviewed changes

Comment thread compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp Outdated

Comment thread compiler/src/iree/compiler/Codegen/LLVMGPU/ConvertToLLVM.cpp Outdated

kuhar reviewed Feb 4, 2026

View reviewed changes

Comment thread compiler/src/iree/compiler/Codegen/LLVMGPU/test/convert_to_nvvm_noalias_e2e.mlir Outdated

kuhar reviewed Feb 5, 2026

View reviewed changes

Hyun Gyu Kim and others added 20 commits April 9, 2026 11:22

1

77d003c

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

2

5720d2d

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

use braces with single-line if statements and loops

542e278

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Use type and correlatedBindings.getAsValueRange

be9c92d

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Delete Todo

293db43

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Formatting

3bf6a5e

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

spell out the type: subspan

e0c4a1f

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Delete variables not used

8f2da3c

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

use structured bindings for the index and the value

d3a9d91

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

create a Builder and do something like b.getI32Attr(...);

bec13a9

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Update compiler/src/iree/compiler/Dialect/Stream/Transforms/FuseDispa…

2a21d42

…tchBindings.cpp Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Update compiler/src/iree/compiler/Dialect/Stream/Transforms/FuseDispa…

aaef7e6

…tchBindings.cpp Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com> Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Change code convention to llvm::map_to_vector

10303c0

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Add Tmp Test

a3dbeaf

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Add Tmp Test 2

baac4d5

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

delete test tmp1

7f8bdaa

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Edit Stream Test

ccff936

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Use stream.noalias info for converting LLVM IR

94a006c

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

Add Conversion From HAL to LLVM and Test

1dd73f9

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

nit: stick to ascii

ce8604e

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

kimm240 force-pushed the kimm240/alias branch from ca20bad to ce8604e Compare April 9, 2026 02:25

kimm240 force-pushed the kimm240/alias branch from c079c98 to 9304185 Compare April 14, 2026 09:17

change test for filecheck

27bec17

Signed-off-by: Hyun Gyu Kim <kimm240@telepix.net>

kimm240 force-pushed the kimm240/alias branch from 9304185 to 27bec17 Compare April 17, 2026 01:19

	// Stream → HAL conversion. If not, we'll need to preserve it explicitly.
	// Stream -> HAL conversion. If not, we'll need to preserve it explicitly.

		// 2. LLVM dialect: After ConvertToNVVMPass, llvm.noalias attributes are
		// applied to function arguments

Conversation

kimm240 commented Jan 19, 2026

Problem

Solution

Implementation Details

FuseDispatchBindings Pass (FuseDispatchBindings.cpp)

LLVMGPU ConvertToLLVM (LLVMGPU/ConvertToLLVM.cpp)

Verification

Future Work

Uh oh!

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaheshRavishankar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kimm240 commented Jan 20, 2026

Uh oh!

MaheshRavishankar commented Jan 20, 2026

Uh oh!

kuhar commented Jan 20, 2026

Uh oh!

kimm240 commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaheshRavishankar commented Jan 20, 2026

Uh oh!

kuhar commented Jan 20, 2026

Uh oh!

MaheshRavishankar commented Jan 20, 2026

Uh oh!

krzysz00 commented Jan 20, 2026

Uh oh!

MaheshRavishankar commented Jan 21, 2026

Uh oh!

kimm240 commented Feb 4, 2026

Uh oh!

kuhar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

kimm240 commented Feb 4, 2026

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kimm240 commented Mar 16, 2026

Uh oh!

kuhar commented Mar 16, 2026

FuseDispatchBindings Pass (`FuseDispatchBindings.cpp`)

LLVMGPU ConvertToLLVM (`LLVMGPU/ConvertToLLVM.cpp`)

kimm240 commented Jan 20, 2026 •

edited

Loading