Generalize noop_with_empty_axes handling across all Reduce operators #26436
      
        
          +242
        
        
          −11
        
        
          
        
      
    
  
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Description
This PR fixes the behavior of the reduction operators so it's aligned with the ONNX specification.
See ONNX ReduceSumSquare Specification
for the definition of noop_with_empty_axes and expected behavior when axes=[].
Main changes:
This function performs elementwise operations according to the aggregator type:
If the aggregator defines Pre/Post operations (e.g., abs, square, sqrt, log), they are applied elementwise on each element of the input without reduction.
Otherwise, a direct memory copy (memcpy) is performed to efficiently produce an identical output.
Introduced a compile-time trait system ReduceAggTraits to detect whether each aggregator defines a PreOp and/or PostOp.
This allows compile-time specialization and avoids redundant runtime checks.
Updated the generic reduction paths (CommonReduce1Loop and CommonReduce2Loops)
to invoke ApplyNoopEmptyAxesElementwise() when axes=[] and noop_with_empty_axes=1.
These paths are used by all reduction operators, ensuring consistent handling across the entire operators.
Fixed the conditional inside CommonFastReduceCopy: replaced the previous unconditional memcpy and return true with a safe return false, since empty-axes cases are now fully handled upstream by ApplyNoopEmptyAxesElementwise().
This keeps the control flow explicit and prevents unintended fallback copies.
Added unit tests for all affected reduction operators (ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp),and corrected one test that previously expected identity output but now correctly applies Pre/Post operations per the ONNX specification.
These updates ensure spec-compliant behavior for all Reduce ops and slightly improve performance for identity cases.
Motivation and Context
Before this fix, some Reduce operators (e.g., ReduceSumSquare, ReduceL1, ReduceL2, ReduceLogSum, ReduceLogSumExp) did not follow the ONNX spec when axes=[] and noop_with_empty_axes=1.Per the ONNX specification:
No reduction should occur if axes is empty and noop_with_empty_axes=1.
Operators with Pre/Post (e.g., abs, square, sqrt, log) must apply them elementwise.
Others should return the input unchanged.
Fixes #26288