Arm64 Hardware Intrinsics APIs in .NET

Introduction

In my vectorization using .NET APIs blog, I describe SIMD datatypes Vector64<T> and Vector128<T> that operates on ‘Arm64 hardware intrinsic’ APIs present under System.Runtime.Intrinsics.Arm.AdvSimd and System.Runtime.Intrinsics.Arm.AdvSimd.Arm64 class. In this post I will describe those hardware intrinsic APIs by showing sample code usage along with examples and generated Arm64 code. This will help people in understanding these APIs so they can use them to optimize their .NET code written to target Arm64. Since there are 360 APIs, describing all of them in a single post will be overwhelming. So I have divided these APIs among 8 blogs and will demonstrate 45 APIs in each blog. This is part 4 of that blog series. You can checkout my previous blogs at:

Most of the description of these APIs is adapted and referenced from Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile document. You can also refer to the description of SIMD and Floating-point instructions description at Arm developer docs page.

The blog page is programmatically generated and might contain mistakes. If you find any mistake, please leave a comment and I will address it.

APIs covered

FusedMultiplySubtractScalarBySelectedScalar	MinAcross
FusedSubtractHalving	MinNumber
Insert	MinNumberAcross
InsertScalar	MinNumberPairwise
InsertSelectedScalar	MinNumberPairwiseScalar
LeadingSignCount	MinNumberScalar
LeadingZeroCount	MinPairwise
LoadAndInsertScalar	MinPairwiseScalar
LoadAndReplicateToVector128	MinScalar
LoadAndReplicateToVector64	Multiply
LoadVector128	MultiplyAdd
LoadVector64	MultiplyAddByScalar
Max	MultiplyAddBySelectedScalar
MaxAcross	MultiplyByScalar
MaxNumber	MultiplyBySelectedScalar
MaxNumberAcross	MultiplyBySelectedScalarWideningLower
MaxNumberPairwise	MultiplyBySelectedScalarWideningLowerAndAdd
MaxNumberPairwiseScalar	MultiplyBySelectedScalarWideningLowerAndSubtract
MaxNumberScalar	MultiplyBySelectedScalarWideningUpper
MaxPairwise	MultiplyBySelectedScalarWideningUpperAndAdd
MaxPairwiseScalar	MultiplyBySelectedScalarWideningUpperAndSubtract
MaxScalar	MultiplyDoublingByScalarSaturateHigh
Min

1. FusedMultiplySubtractScalarBySelectedScalar

Vector64<double> FusedMultiplySubtractScalarBySelectedScalar(Vector64<double> minuend, Vector64<double> left, Vector128<double> right, byte rightIndex)

This method multiplies the vector elements in the left vector by the rightIndex element in the right vector, and subtracts the results from the vector elements of the minuend vector and returns the result.

private Vector64<double> FusedMultiplySubtractScalarBySelectedScalarTest(Vector64<double> minuend, Vector64<double> left, Vector128<double> right, byte rightIndex)
{
  return AdvSimd.Arm64.FusedMultiplySubtractScalarBySelectedScalar(minuend, left, right, 0);
}
// minuend = <11.5>
// left = <11.5>
// right = <11.5, 12.5>
// rightIndex = 0
// Result = <-120.75>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> FusedMultiplySubtractScalarBySelectedScalar(Vector64<float> minuend, Vector64<float> left, Vector64<float> right, byte rightIndex)
Vector64<float> FusedMultiplySubtractScalarBySelectedScalar(Vector64<float> minuend, Vector64<float> left, Vector128<float> right, byte rightIndex)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:FusedMultiplySubtractScalarBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector128`1[Double],ubyte):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;  V02 arg2         [V02,T02] (  3,  3   )  simd16  ->   d2         HFA(simd16) 
;* V03 arg3         [V03    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V04 OutArgs      [V04    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fmls    d0, d1, v2.d[0]
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

2. FusedSubtractHalving

Vector64<byte> FusedSubtractHalving(Vector64<byte> left, Vector64<byte> right)

This method subtracts the corresponding vector elements in the right vector from those of left vector, shifts each result right one bit, stores the result in a vector, and returns the result vector.

private Vector64<byte> FusedSubtractHalvingTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.FusedSubtractHalving(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <251, 251, 251, 251, 251, 251, 251, 251>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> FusedSubtractHalving(Vector64<short> left, Vector64<short> right)
Vector64<int> FusedSubtractHalving(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> FusedSubtractHalving(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<ushort> FusedSubtractHalving(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> FusedSubtractHalving(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> FusedSubtractHalving(Vector128<byte> left, Vector128<byte> right)
Vector128<short> FusedSubtractHalving(Vector128<short> left, Vector128<short> right)
Vector128<int> FusedSubtractHalving(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> FusedSubtractHalving(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> FusedSubtractHalving(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> FusedSubtractHalving(Vector128<uint> left, Vector128<uint> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:FusedSubtractHalvingTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uhsub   v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

3. Insert

Vector64<byte> Insert(Vector64<byte> vector, byte index, byte data)

This method copies the vector vector in result vector with the element at index set to data value.

private Vector64<byte> InsertTest(Vector64<byte> vector, byte index, byte data)
{
  return AdvSimd.Insert(vector, 4, 200);
}
// vector = <11, 12, 13, 14, 15, 16, 17, 18>
// index = 4
// data = 200
// Result = <11, 12, 13, 14, 200, 16, 17, 18>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> Insert(Vector64<short> vector, byte index, short data)
Vector64<int> Insert(Vector64<int> vector, byte index, int data)
Vector64<sbyte> Insert(Vector64<sbyte> vector, byte index, sbyte data)
Vector64<float> Insert(Vector64<float> vector, byte index, float data)
Vector64<ushort> Insert(Vector64<ushort> vector, byte index, ushort data)
Vector64<uint> Insert(Vector64<uint> vector, byte index, uint data)
Vector128<byte> Insert(Vector128<byte> vector, byte index, byte data)
Vector128<double> Insert(Vector128<double> vector, byte index, double data)
Vector128<short> Insert(Vector128<short> vector, byte index, short data)
Vector128<int> Insert(Vector128<int> vector, byte index, int data)
Vector128<long> Insert(Vector128<long> vector, byte index, long data)
Vector128<sbyte> Insert(Vector128<sbyte> vector, byte index, sbyte data)
Vector128<float> Insert(Vector128<float> vector, byte index, float data)
Vector128<ushort> Insert(Vector128<ushort> vector, byte index, ushort data)
Vector128<uint> Insert(Vector128<uint> vector, byte index, uint data)
Vector128<ulong> Insert(Vector128<ulong> vector, byte index, ulong data)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:InsertTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte,ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            mov     w0, #200
            ins     v0.b[4], w0
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

4. InsertScalar

Vector128<double> InsertScalar(Vector128<double> result, byte resultIndex, Vector64<double> value)

This method copies the result vector in a result vector, except the element at resultIndex of result is set to that from value vector.

private Vector128<double> InsertScalarTest(Vector128<double> result, byte resultIndex, Vector64<double> value)
{
  return AdvSimd.InsertScalar(result, 1, value);
}
// result = <5.5, 5.5>
// resultIndex = 1
// value = <15.5>
// Result = <5.5, 15.5>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> InsertScalar(Vector128<long> result, byte resultIndex, Vector64<long> value)
Vector128<ulong> InsertScalar(Vector128<ulong> result, byte resultIndex, Vector64<ulong> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:InsertScalarTest(System.Runtime.Intrinsics.Vector128`1[Double],ubyte,System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;  V02 arg2         [V02,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ins     v0.d[1], v1.d[0]
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

5. InsertSelectedScalar

Vector64<byte> InsertSelectedScalar(Vector64<byte> result, byte resultIndex, Vector64<byte> value, byte valueIndex)

This method copies the result vector in a result vector, except the element at resultIndex of result is set to that of valueIndex element of value vector.

private Vector64<byte> InsertSelectedScalarTest(Vector64<byte> result, byte resultIndex, Vector64<byte> value, byte valueIndex)
{
  return AdvSimd.Arm64.InsertSelectedScalar(result, 0, value, 1);
}
// result = <11, 12, 13, 14, 15, 16, 17, 18>
// resultIndex = 0
// value = <21, 22, 23, 24, 25, 26, 27, 28>
// valueIndex = 1
// Result = <22, 12, 13, 14, 15, 16, 17, 18>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<byte> InsertSelectedScalar(Vector64<byte> result, byte resultIndex, Vector128<byte> value, byte valueIndex)
Vector64<short> InsertSelectedScalar(Vector64<short> result, byte resultIndex, Vector64<short> value, byte valueIndex)
Vector64<short> InsertSelectedScalar(Vector64<short> result, byte resultIndex, Vector128<short> value, byte valueIndex)
Vector64<int> InsertSelectedScalar(Vector64<int> result, byte resultIndex, Vector64<int> value, byte valueIndex)
Vector64<int> InsertSelectedScalar(Vector64<int> result, byte resultIndex, Vector128<int> value, byte valueIndex)
Vector64<sbyte> InsertSelectedScalar(Vector64<sbyte> result, byte resultIndex, Vector64<sbyte> value, byte valueIndex)
Vector64<sbyte> InsertSelectedScalar(Vector64<sbyte> result, byte resultIndex, Vector128<sbyte> value, byte valueIndex)
Vector64<float> InsertSelectedScalar(Vector64<float> result, byte resultIndex, Vector64<float> value, byte valueIndex)
Vector64<float> InsertSelectedScalar(Vector64<float> result, byte resultIndex, Vector128<float> value, byte valueIndex)
Vector64<ushort> InsertSelectedScalar(Vector64<ushort> result, byte resultIndex, Vector64<ushort> value, byte valueIndex)
Vector64<ushort> InsertSelectedScalar(Vector64<ushort> result, byte resultIndex, Vector128<ushort> value, byte valueIndex)
Vector64<uint> InsertSelectedScalar(Vector64<uint> result, byte resultIndex, Vector64<uint> value, byte valueIndex)
Vector64<uint> InsertSelectedScalar(Vector64<uint> result, byte resultIndex, Vector128<uint> value, byte valueIndex)
Vector128<byte> InsertSelectedScalar(Vector128<byte> result, byte resultIndex, Vector64<byte> value, byte valueIndex)
Vector128<byte> InsertSelectedScalar(Vector128<byte> result, byte resultIndex, Vector128<byte> value, byte valueIndex)
Vector128<double> InsertSelectedScalar(Vector128<double> result, byte resultIndex, Vector128<double> value, byte valueIndex)
Vector128<short> InsertSelectedScalar(Vector128<short> result, byte resultIndex, Vector64<short> value, byte valueIndex)
Vector128<short> InsertSelectedScalar(Vector128<short> result, byte resultIndex, Vector128<short> value, byte valueIndex)
Vector128<int> InsertSelectedScalar(Vector128<int> result, byte resultIndex, Vector64<int> value, byte valueIndex)
Vector128<int> InsertSelectedScalar(Vector128<int> result, byte resultIndex, Vector128<int> value, byte valueIndex)
Vector128<long> InsertSelectedScalar(Vector128<long> result, byte resultIndex, Vector128<long> value, byte valueIndex)
Vector128<sbyte> InsertSelectedScalar(Vector128<sbyte> result, byte resultIndex, Vector64<sbyte> value, byte valueIndex)
Vector128<sbyte> InsertSelectedScalar(Vector128<sbyte> result, byte resultIndex, Vector128<sbyte> value, byte valueIndex)
Vector128<float> InsertSelectedScalar(Vector128<float> result, byte resultIndex, Vector64<float> value, byte valueIndex)
Vector128<float> InsertSelectedScalar(Vector128<float> result, byte resultIndex, Vector128<float> value, byte valueIndex)
Vector128<ushort> InsertSelectedScalar(Vector128<ushort> result, byte resultIndex, Vector64<ushort> value, byte valueIndex)
Vector128<ushort> InsertSelectedScalar(Vector128<ushort> result, byte resultIndex, Vector128<ushort> value, byte valueIndex)
Vector128<uint> InsertSelectedScalar(Vector128<uint> result, byte resultIndex, Vector64<uint> value, byte valueIndex)
Vector128<uint> InsertSelectedScalar(Vector128<uint> result, byte resultIndex, Vector128<uint> value, byte valueIndex)
Vector128<ulong> InsertSelectedScalar(Vector128<ulong> result, byte resultIndex, Vector128<ulong> value, byte valueIndex)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:InsertSelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte,System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;  V02 arg2         [V02,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V03 arg3         [V03    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V04 OutArgs      [V04    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ins     v0.b[0], v1.b[1]
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

6. LeadingSignCount

Vector64<short> LeadingSignCount(Vector64<short> value)

This method counts the number of leading bits of individual elements of value vector that have the same value as the most significant bit and stores the result in result vector. This count does not include the most significant bit of the input.

private Vector64<short> LeadingSignCountTest(Vector64<short> value)
{
  return AdvSimd.LeadingSignCount(value);
}
// value = <32757, 165, 0, 15>
// Result = <0, 7, 15, 11>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> LeadingSignCount(Vector64<int> value)
Vector64<sbyte> LeadingSignCount(Vector64<sbyte> value)
Vector128<short> LeadingSignCount(Vector128<short> value)
Vector128<int> LeadingSignCount(Vector128<int> value)
Vector128<sbyte> LeadingSignCount(Vector128<sbyte> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:LeadingSignCountTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            cls     v16.4h, v0.4h
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

7. LeadingZeroCount

Vector64<byte> LeadingZeroCount(Vector64<byte> value)

This method counts the number of binary zero bits before the first binary one bit in individual elements of the value vector , and writes the result to the result vector.

private Vector64<byte> LeadingZeroCountTest(Vector64<byte> value)
{
  return AdvSimd.LeadingZeroCount(value);
}
// value = <32757, 165, 0, 15>
// Result = <1, 8, 16, 12>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> LeadingZeroCount(Vector64<short> value)
Vector64<int> LeadingZeroCount(Vector64<int> value)
Vector64<sbyte> LeadingZeroCount(Vector64<sbyte> value)
Vector64<ushort> LeadingZeroCount(Vector64<ushort> value)
Vector64<uint> LeadingZeroCount(Vector64<uint> value)
Vector128<byte> LeadingZeroCount(Vector128<byte> value)
Vector128<short> LeadingZeroCount(Vector128<short> value)
Vector128<int> LeadingZeroCount(Vector128<int> value)
Vector128<sbyte> LeadingZeroCount(Vector128<sbyte> value)
Vector128<ushort> LeadingZeroCount(Vector128<ushort> value)
Vector128<uint> LeadingZeroCount(Vector128<uint> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:LeadingZeroCountTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            clz     v16.8b, v0.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

8. LoadAndInsertScalar

Vector64<byte> LoadAndInsertScalar(Vector64<byte> value, byte index, byte* address)

This method loads a single-element structure from memory at address and writes the result to the specified index of thevalue vector without affecting the other elements of the result vector.

private Vector64<byte> LoadAndInsertScalarTest(Vector64<byte> value, byte index, byte* address)
{
  return AdvSimd.LoadAndInsertScalar(value, 2, address);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// index = 2
// address = Address of byte[]{ 21, 22, 23, 24, 25, 26, 27, 28 }
// Result = <11, 12, 21, 14, 15, 16, 17, 18>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> LoadAndInsertScalar(Vector64<short> value, byte index, short* address)
Vector64<int> LoadAndInsertScalar(Vector64<int> value, byte index, int* address)
Vector64<sbyte> LoadAndInsertScalar(Vector64<sbyte> value, byte index, sbyte* address)
Vector64<float> LoadAndInsertScalar(Vector64<float> value, byte index, float* address)
Vector64<ushort> LoadAndInsertScalar(Vector64<ushort> value, byte index, ushort* address)
Vector64<uint> LoadAndInsertScalar(Vector64<uint> value, byte index, uint* address)
Vector128<byte> LoadAndInsertScalar(Vector128<byte> value, byte index, byte* address)
Vector128<double> LoadAndInsertScalar(Vector128<double> value, byte index, double* address)
Vector128<short> LoadAndInsertScalar(Vector128<short> value, byte index, short* address)
Vector128<int> LoadAndInsertScalar(Vector128<int> value, byte index, int* address)
Vector128<long> LoadAndInsertScalar(Vector128<long> value, byte index, long* address)
Vector128<sbyte> LoadAndInsertScalar(Vector128<sbyte> value, byte index, sbyte* address)
Vector128<float> LoadAndInsertScalar(Vector128<float> value, byte index, float* address)
Vector128<ushort> LoadAndInsertScalar(Vector128<ushort> value, byte index, ushort* address)
Vector128<uint> LoadAndInsertScalar(Vector128<uint> value, byte index, uint* address)
Vector128<ulong> LoadAndInsertScalar(Vector128<ulong> value, byte index, ulong* address)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:LoadAndInsertScalarTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte,long):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T01] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;  V02 arg2         [V02,T00] (  3,  3   )    long  ->   x1        
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            mov     v16.8b, v0.8b
            ld1     {v16.b}[2], [x1]
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 28, prolog size 8

9. LoadAndReplicateToVector128

Vector128<byte> LoadAndReplicateToVector128(byte* address)

This method loads a single-element structure from memory at address and replicates the value to all the elements of the result vector.

private Vector128<byte> LoadAndReplicateToVector128Test(byte* address)
{
  return AdvSimd.LoadAndReplicateToVector128(address);
}
// address = Address of byte[]{ 11}
// Result = <11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> LoadAndReplicateToVector128(short* address)
Vector128<int> LoadAndReplicateToVector128(int* address)
Vector128<sbyte> LoadAndReplicateToVector128(sbyte* address)
Vector128<float> LoadAndReplicateToVector128(float* address)
Vector128<ushort> LoadAndReplicateToVector128(ushort* address)
Vector128<uint> LoadAndReplicateToVector128(uint* address)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> LoadAndReplicateToVector128(double* address)
Vector128<long> LoadAndReplicateToVector128(long* address)
Vector128<ulong> LoadAndReplicateToVector128(ulong* address)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:LoadAndReplicateToVector128Test(long):System.Runtime.Intrinsics.Vector128`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )    long  ->   x0        
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ld1r    {v16.16b}, [x0]
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

10. LoadAndReplicateToVector64

Vector64<byte> LoadAndReplicateToVector64(byte* address)

This method loads a single-element structure from memory at address and replicates the value to all the elements of the result vector.

private Vector64<byte> LoadAndReplicateToVector64Test(byte* address)
{
  return AdvSimd.LoadAndReplicateToVector64(address);
}
// address = Address of byte[]{ 11}
// Result = <11, 11, 11, 11, 11, 11, 11, 11>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> LoadAndReplicateToVector64(short* address)
Vector64<int> LoadAndReplicateToVector64(int* address)
Vector64<sbyte> LoadAndReplicateToVector64(sbyte* address)
Vector64<float> LoadAndReplicateToVector64(float* address)
Vector64<ushort> LoadAndReplicateToVector64(ushort* address)
Vector64<uint> LoadAndReplicateToVector64(uint* address)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:LoadAndReplicateToVector64Test(long):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )    long  ->   x0        
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ld1r    {v16.8b}, [x0]
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

11. LoadVector128

Vector128<byte> LoadVector128(byte* address)

This method loads a multiple-element structure like array from memory at address and writes it to the result vector. If the elements in memory don?t fill up all the elements of result vector, then the remaining are set to 0.

private Vector128<byte> LoadVector128Test(byte* address)
{
  return AdvSimd.LoadVector128(address);
}
// address = Address of  new byte[14] { 21, 22, 23, 24, 25, 26, 27, 28, 1, 2, 23, 24, 25, 26}
// Result = <21, 22, 23, 24, 25, 26, 27, 28, 1, 2, 23, 24, 25, 26, 0, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<double> LoadVector128(double* address)
Vector128<short> LoadVector128(short* address)
Vector128<int> LoadVector128(int* address)
Vector128<long> LoadVector128(long* address)
Vector128<sbyte> LoadVector128(sbyte* address)
Vector128<float> LoadVector128(float* address)
Vector128<ushort> LoadVector128(ushort* address)
Vector128<uint> LoadVector128(uint* address)
Vector128<ulong> LoadVector128(ulong* address)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:LoadVector128Test(long):System.Runtime.Intrinsics.Vector128`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )    long  ->   x0        
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ld1     {v16.16b}, [x0]
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

12. LoadVector64

Vector64<byte> LoadVector64(byte* address)

private Vector64<byte> LoadVector64Test(byte* address)
{
  return AdvSimd.LoadVector64(address);
}
// address = Address of  new byte[14] { 21, 22, 23, 24, 25, 26, 27, 28, 1, 2, 23, 24, 25, 26}
// Result = <21, 22, 23, 24, 25, 26, 27, 28>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> LoadVector64(double* address)
Vector64<short> LoadVector64(short* address)
Vector64<int> LoadVector64(int* address)
Vector64<long> LoadVector64(long* address)
Vector64<sbyte> LoadVector64(sbyte* address)
Vector64<float> LoadVector64(float* address)
Vector64<ushort> LoadVector64(ushort* address)
Vector64<uint> LoadVector64(uint* address)
Vector64<ulong> LoadVector64(ulong* address)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:LoadVector64Test(long):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )    long  ->   x0        
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ld1     {v16.8b}, [x0]
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

13. Max

Vector64<byte> Max(Vector64<byte> left, Vector64<byte> right)

This method compares corresponding elements in the left and right vectors, places the larger of each pair in the result vector, and returns the result vector.

private Vector64<byte> MaxTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.Max(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <21, 22, 23, 24, 25, 26, 27, 28>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> Max(Vector64<short> left, Vector64<short> right)
Vector64<int> Max(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> Max(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> Max(Vector64<float> left, Vector64<float> right)
Vector64<ushort> Max(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> Max(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> Max(Vector128<byte> left, Vector128<byte> right)
Vector128<short> Max(Vector128<short> left, Vector128<short> right)
Vector128<int> Max(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> Max(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> Max(Vector128<float> left, Vector128<float> right)
Vector128<ushort> Max(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> Max(Vector128<uint> left, Vector128<uint> right)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Max(Vector128<double> left, Vector128<double> right)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MaxTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            umax    v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

14. MaxAcross

Vector64<byte> MaxAcross(Vector64<byte> value)

This method compares all the vector elements in the value vector, and writes the largest value element in result vector at 0th index while other elements are set to 0.

private Vector64<byte> MaxAcrossTest(Vector64<byte> value)
{
  return AdvSimd.Arm64.MaxAcross(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <18, 0, 0, 0, 0, 0, 0, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> MaxAcross(Vector64<short> value)
Vector64<sbyte> MaxAcross(Vector64<sbyte> value)
Vector64<ushort> MaxAcross(Vector64<ushort> value)
Vector64<byte> MaxAcross(Vector128<byte> value)
Vector64<short> MaxAcross(Vector128<short> value)
Vector64<int> MaxAcross(Vector128<int> value)
Vector64<sbyte> MaxAcross(Vector128<sbyte> value)
Vector64<float> MaxAcross(Vector128<float> value)
Vector64<ushort> MaxAcross(Vector128<ushort> value)
Vector64<uint> MaxAcross(Vector128<uint> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MaxAcrossTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            umaxv   b16, v0.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

15. MaxNumber

Vector64<float> MaxNumber(Vector64<float> left, Vector64<float> right)

This method compares corresponding elements in the left and right vectors, places the larger of each pair in the result vector, and returns the result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value.

private Vector64<float> MaxNumberTest(Vector64<float> left, Vector64<float> right)
{
  return AdvSimd.MaxNumber(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <21.5, 22.5>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> MaxNumber(Vector128<float> left, Vector128<float> right)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MaxNumber(Vector128<double> left, Vector128<double> right)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MaxNumberTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fmaxnm  v16.2s, v0.2s, v1.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

16. MaxNumberAcross

Vector64<float> MaxNumberAcross(Vector128<float> value)

This method compares all the vector elements in the value vector, and writes the largest value element in result vector at 0th index while other elements are set to 0. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result of the comparison is the numerical value, otherwise the result is identical to MaxScalar().

private Vector64<float> MaxNumberAcrossTest(Vector128<float> value)
{
  return AdvSimd.Arm64.MaxNumberAcross(value);
}
// value = <11.5, 12.5, 13.5, 14.5>
// Result = <14.5, 0>

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MaxNumberAcrossTest(System.Runtime.Intrinsics.Vector128`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fmaxnmv s16, v0.4s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

17. MaxNumberPairwise

Vector64<float> MaxNumberPairwise(Vector64<float> left, Vector64<float> right)

This method creates a vector by concatenating the vector elements of left vector followed by those of the right vector, compares adjacent vector elements and writes the largest of each pair in a result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result is the numerical value.

private Vector64<float> MaxNumberPairwiseTest(Vector64<float> left, Vector64<float> right)
{
  return AdvSimd.Arm64.MaxNumberPairwise(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <12.5, 22.5>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MaxNumberPairwise(Vector128<double> left, Vector128<double> right)
Vector128<float> MaxNumberPairwise(Vector128<float> left, Vector128<float> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MaxNumberPairwiseTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fmaxnmp v16.2s, v0.2s, v1.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

18. MaxNumberPairwiseScalar

Vector64<float> MaxNumberPairwiseScalar(Vector64<float> value)

This method creates a vector by concatenating the vector elements of left vector followed by those of the right vector, compares adjacent vector elements and writes the largest of each pair in 0th element of result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result is the numerical value.

private Vector64<float> MaxNumberPairwiseScalarTest(Vector64<float> value)
{
  return AdvSimd.Arm64.MaxNumberPairwiseScalar(value);
}
// value = <11.5, 12.5>
// Result = <12.5, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<double> MaxNumberPairwiseScalar(Vector128<double> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MaxNumberPairwiseScalarTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fmaxnmp s16, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

19. MaxNumberScalar

Vector64<double> MaxNumberScalar(Vector64<double> left, Vector64<double> right)

This method compares corresponding vector elements in left and right vector and stores the larger value in a result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value.

private Vector64<double> MaxNumberScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.MaxNumberScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <11.5>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> MaxNumberScalar(Vector64<float> left, Vector64<float> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MaxNumberScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fmaxnm  d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

20. MaxPairwise

Vector64<byte> MaxPairwise(Vector64<byte> left, Vector64<byte> right)

This method creates a vector by concatenating the vector elements of the left after the vector elements of the right vector, reads each pair of adjacent vector elements in the vectors, writes the largest of each pair into a result vector, and writes the vector to the result vector.

private Vector64<byte> MaxPairwiseTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.MaxPairwise(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <12, 14, 16, 18, 22, 24, 26, 28>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MaxPairwise(Vector64<short> left, Vector64<short> right)
Vector64<int> MaxPairwise(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> MaxPairwise(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> MaxPairwise(Vector64<float> left, Vector64<float> right)
Vector64<ushort> MaxPairwise(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> MaxPairwise(Vector64<uint> left, Vector64<uint> right)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<byte> MaxPairwise(Vector128<byte> left, Vector128<byte> right)
Vector128<double> MaxPairwise(Vector128<double> left, Vector128<double> right)
Vector128<short> MaxPairwise(Vector128<short> left, Vector128<short> right)
Vector128<int> MaxPairwise(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> MaxPairwise(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> MaxPairwise(Vector128<float> left, Vector128<float> right)
Vector128<ushort> MaxPairwise(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> MaxPairwise(Vector128<uint> left, Vector128<uint> right)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MaxPairwiseTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            umaxp   v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

21. MaxPairwiseScalar

Vector64<float> MaxPairwiseScalar(Vector64<float> value)

This method compares two vector elements in the value vector and writes the largest of the floating-point values as a scalar to the result vector.

private Vector64<float> MaxPairwiseScalarTest(Vector64<float> value)
{
  return AdvSimd.Arm64.MaxPairwiseScalar(value);
}
// value = <11.5, 12.5>
// Result = <12.5, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<double> MaxPairwiseScalar(Vector128<double> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MaxPairwiseScalarTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fmaxp   s16, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

22. MaxScalar

Vector64<double> MaxScalar(Vector64<double> left, Vector64<double> right)

This method compares theleft and right vector, and writes the larger of the two floating-point values to the result vector.

private Vector64<double> MaxScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.Arm64.MaxScalar(left, right);
}
// left = <11.5>
// right = <10.5>
// Result = <11.5>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> MaxScalar(Vector64<float> left, Vector64<float> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MaxScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fmax    d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

23. Min

Vector64<byte> Min(Vector64<byte> left, Vector64<byte> right)

This method compares corresponding elements in the left and right vectors, places the smaller of each pair in the result vector, and returns the result vector.

private Vector64<byte> MinTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.Min(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <11, 12, 13, 14, 15, 16, 17, 18>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> Min(Vector64<short> left, Vector64<short> right)
Vector64<int> Min(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> Min(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> Min(Vector64<float> left, Vector64<float> right)
Vector64<ushort> Min(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> Min(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> Min(Vector128<byte> left, Vector128<byte> right)
Vector128<short> Min(Vector128<short> left, Vector128<short> right)
Vector128<int> Min(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> Min(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> Min(Vector128<float> left, Vector128<float> right)
Vector128<ushort> Min(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> Min(Vector128<uint> left, Vector128<uint> right)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Min(Vector128<double> left, Vector128<double> right)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MinTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            umin    v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

24. MinAcross

Vector64<byte> MinAcross(Vector64<byte> value)

This method compares all the vector elements in the value vector, and writes the smaller value element in result vector at 0th index while other elements are set to 0.

private Vector64<byte> MinAcrossTest(Vector64<byte> value)
{
  return AdvSimd.Arm64.MinAcross(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <11, 0, 0, 0, 0, 0, 0, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> MinAcross(Vector64<short> value)
Vector64<sbyte> MinAcross(Vector64<sbyte> value)
Vector64<ushort> MinAcross(Vector64<ushort> value)
Vector64<byte> MinAcross(Vector128<byte> value)
Vector64<short> MinAcross(Vector128<short> value)
Vector64<int> MinAcross(Vector128<int> value)
Vector64<sbyte> MinAcross(Vector128<sbyte> value)
Vector64<float> MinAcross(Vector128<float> value)
Vector64<ushort> MinAcross(Vector128<ushort> value)
Vector64<uint> MinAcross(Vector128<uint> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MinAcrossTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uminv   b16, v0.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

25. MinNumber

Vector64<float> MinNumber(Vector64<float> left, Vector64<float> right)

This method compares corresponding elements in the left and right vectors, places the smaller of each pair in the result vector, and returns the result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value.

private Vector64<float> MinNumberTest(Vector64<float> left, Vector64<float> right)
{
  return AdvSimd.MinNumber(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <11.5, 12.5>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> MinNumber(Vector128<float> left, Vector128<float> right)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MinNumber(Vector128<double> left, Vector128<double> right)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MinNumberTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fminnm  v16.2s, v0.2s, v1.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

26. MinNumberAcross

Vector64<float> MinNumberAcross(Vector128<float> value)

This method compares all the vector elements in the value vector, and writes the smaller value element in result vector at 0th index while other elements are set to 0. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result of the comparison is the numerical value, otherwise the result is identical to MaxScalar().

private Vector64<float> MinNumberAcrossTest(Vector128<float> value)
{
  return AdvSimd.Arm64.MinNumberAcross(value);
}
// value = <11.5, 12.5, 13.5, 14.5>
// Result = <11.5, 0>

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MinNumberAcrossTest(System.Runtime.Intrinsics.Vector128`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fminnmv s16, v0.4s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

27. MinNumberPairwise

Vector64<float> MinNumberPairwise(Vector64<float> left, Vector64<float> right)

This method creates a vector by concatenating the vector elements of left vector followed by those of the right vector, compares adjacent vector elements and writes the smallest of each pair in a result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result is the numerical value.

private Vector64<float> MinNumberPairwiseTest(Vector64<float> left, Vector64<float> right)
{
  return AdvSimd.Arm64.MinNumberPairwise(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <11.5, 21.5>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MinNumberPairwise(Vector128<double> left, Vector128<double> right)
Vector128<float> MinNumberPairwise(Vector128<float> left, Vector128<float> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MinNumberPairwiseTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fminnmp v16.2s, v0.2s, v1.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

28. MinNumberPairwiseScalar

Vector64<float> MinNumberPairwiseScalar(Vector64<float> value)

This method creates a vector by concatenating the vector elements of left vector followed by those of the right vector, compares adjacent vector elements and writes the smallest of each pair in 0th element of result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result is the numerical value.

private Vector64<float> MinNumberPairwiseScalarTest(Vector64<float> value)
{
  return AdvSimd.Arm64.MinNumberPairwiseScalar(value);
}
// value = <11.5, 12.5>
// Result = <11.5, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<double> MinNumberPairwiseScalar(Vector128<double> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MinNumberPairwiseScalarTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fminnmp s16, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

29. MinNumberScalar

Vector64<double> MinNumberScalar(Vector64<double> left, Vector64<double> right)

This method compares corresponding vector elements in left and right vector and stores the smaller value in a result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value.

private Vector64<double> MinNumberScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.MinNumberScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <11.5>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> MinNumberScalar(Vector64<float> left, Vector64<float> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MinNumberScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fminnm  d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

30. MinPairwise

Vector64<byte> MinPairwise(Vector64<byte> left, Vector64<byte> right)

This method creates a vector by concatenating the vector elements of the left after the vector elements of the right vector, reads each pair of adjacent vector elements in the vectors, writes the smallest of each pair into a result vector, and writes the vector to the result vector.

private Vector64<byte> MinPairwiseTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.MinPairwise(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <11, 13, 15, 17, 21, 23, 25, 27>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MinPairwise(Vector64<short> left, Vector64<short> right)
Vector64<int> MinPairwise(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> MinPairwise(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> MinPairwise(Vector64<float> left, Vector64<float> right)
Vector64<ushort> MinPairwise(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> MinPairwise(Vector64<uint> left, Vector64<uint> right)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<byte> MinPairwise(Vector128<byte> left, Vector128<byte> right)
Vector128<double> MinPairwise(Vector128<double> left, Vector128<double> right)
Vector128<short> MinPairwise(Vector128<short> left, Vector128<short> right)
Vector128<int> MinPairwise(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> MinPairwise(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> MinPairwise(Vector128<float> left, Vector128<float> right)
Vector128<ushort> MinPairwise(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> MinPairwise(Vector128<uint> left, Vector128<uint> right)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MinPairwiseTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uminp   v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

31. MinPairwiseScalar

Vector64<float> MinPairwiseScalar(Vector64<float> value)

This method compares two vector elements in the value vector and writes the smallest of the floating-point values as a scalar to the result vector.

private Vector64<float> MinPairwiseScalarTest(Vector64<float> value)
{
  return AdvSimd.Arm64.MinPairwiseScalar(value);
}
// value = <11.5, 12.5>
// Result = <11.5, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<double> MinPairwiseScalar(Vector128<double> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MinPairwiseScalarTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fminp   s16, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

32. MinScalar

Vector64<double> MinScalar(Vector64<double> left, Vector64<double> right)

This method compares theleft and right vector, and writes the smaller of the two floating-point values to the result vector.

private Vector64<double> MinScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.Arm64.MinScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <11.5>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> MinScalar(Vector64<float> left, Vector64<float> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MinScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fmin    d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

33. Multiply

Vector64<byte> Multiply(Vector64<byte> left, Vector64<byte> right)

This method performs multiplication of corresponding vector elements in left and right vectors, writes the product to the result vector.

private Vector64<byte> MultiplyTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.Multiply(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <231, 8, 43, 80, 119, 160, 203, 248>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> Multiply(Vector64<short> left, Vector64<short> right)
Vector64<int> Multiply(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> Multiply(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> Multiply(Vector64<float> left, Vector64<float> right)
Vector64<ushort> Multiply(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> Multiply(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> Multiply(Vector128<byte> left, Vector128<byte> right)
Vector128<short> Multiply(Vector128<short> left, Vector128<short> right)
Vector128<int> Multiply(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> Multiply(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> Multiply(Vector128<float> left, Vector128<float> right)
Vector128<ushort> Multiply(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> Multiply(Vector128<uint> left, Vector128<uint> right)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Multiply(Vector128<double> left, Vector128<double> right)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            mul     v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

34. MultiplyAdd

Vector64<byte> MultiplyAdd(Vector64<byte> addend, Vector64<byte> left, Vector64<byte> right)

This method multiplies corresponding elements in the vectors of the left and right vectors, and accumulates the product with the vector elements of the addend and returns the accumulated result.

private Vector64<byte> MultiplyAddTest(Vector64<byte> addend, Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.MultiplyAdd(addend, left, right);
}
// addend = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <2, 22, 23, 24, 25, 26, 27, 28>
// right = <3, 32, 33, 34, 35, 36, 37, 38>
// Result = <17, 204, 4, 62, 122, 184, 248, 58>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MultiplyAdd(Vector64<short> addend, Vector64<short> left, Vector64<short> right)
Vector64<int> MultiplyAdd(Vector64<int> addend, Vector64<int> left, Vector64<int> right)
Vector64<sbyte> MultiplyAdd(Vector64<sbyte> addend, Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<ushort> MultiplyAdd(Vector64<ushort> addend, Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> MultiplyAdd(Vector64<uint> addend, Vector64<uint> left, Vector64<uint> right)
Vector128<byte> MultiplyAdd(Vector128<byte> addend, Vector128<byte> left, Vector128<byte> right)
Vector128<short> MultiplyAdd(Vector128<short> addend, Vector128<short> left, Vector128<short> right)
Vector128<int> MultiplyAdd(Vector128<int> addend, Vector128<int> left, Vector128<int> right)
Vector128<sbyte> MultiplyAdd(Vector128<sbyte> addend, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> MultiplyAdd(Vector128<ushort> addend, Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> MultiplyAdd(Vector128<uint> addend, Vector128<uint> left, Vector128<uint> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyAddTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;  V02 arg2         [V02,T02] (  3,  3   )   simd8  ->   d2         HFA(simd8) 
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            mla     v0.8b, v1.8b, v2.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

35. MultiplyAddByScalar

Vector64<short> MultiplyAddByScalar(Vector64<short> addend, Vector64<short> left, Vector64<short> right)

This method multiplies the vector elements in the left by the 0th element value in the right, and accumulates the product with the vector elements of the addend vector and return the result vector.

private Vector64<short> MultiplyAddByScalarTest(Vector64<short> addend, Vector64<short> left, Vector64<short> right)
{
  return AdvSimd.MultiplyAddByScalar(addend, left, right);
}
// addend = <11, 12, 13, 14>
// left = <21, 22, 23, 24>
// right = <31, 32, 33, 34>
// Result = <662, 694, 726, 758>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> MultiplyAddByScalar(Vector64<int> addend, Vector64<int> left, Vector64<int> right)
Vector64<ushort> MultiplyAddByScalar(Vector64<ushort> addend, Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> MultiplyAddByScalar(Vector64<uint> addend, Vector64<uint> left, Vector64<uint> right)
Vector128<short> MultiplyAddByScalar(Vector128<short> addend, Vector128<short> left, Vector64<short> right)
Vector128<int> MultiplyAddByScalar(Vector128<int> addend, Vector128<int> left, Vector64<int> right)
Vector128<ushort> MultiplyAddByScalar(Vector128<ushort> addend, Vector128<ushort> left, Vector64<ushort> right)
Vector128<uint> MultiplyAddByScalar(Vector128<uint> addend, Vector128<uint> left, Vector64<uint> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyAddByScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;  V02 arg2         [V02,T02] (  3,  3   )   simd8  ->   d2         HFA(simd8) 
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            mla     v0.4h, v1.4h, v2.h[0]
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

36. MultiplyAddBySelectedScalar

Vector64<short> MultiplyAddBySelectedScalar(Vector64<short> addend, Vector64<short> left, Vector64<short> right, byte rightIndex)

This method multiplies the vector elements in the left by the rightIndex element value in the right, and accumulates the product with the vector elements of the addend vector and return the result vector.

private Vector64<short> MultiplyAddBySelectedScalarTest(Vector64<short> addend, Vector64<short> left, Vector64<short> right, byte rightIndex)
{
  return AdvSimd.MultiplyAddBySelectedScalar(addend, left, right, 3);
}
// addend = <100, 100, 100, 100>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 3
// Result = <364, 388, 412, 436>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MultiplyAddBySelectedScalar(Vector64<short> addend, Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector64<int> MultiplyAddBySelectedScalar(Vector64<int> addend, Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector64<int> MultiplyAddBySelectedScalar(Vector64<int> addend, Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector64<ushort> MultiplyAddBySelectedScalar(Vector64<ushort> addend, Vector64<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector64<ushort> MultiplyAddBySelectedScalar(Vector64<ushort> addend, Vector64<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector64<uint> MultiplyAddBySelectedScalar(Vector64<uint> addend, Vector64<uint> left, Vector64<uint> right, byte rightIndex)
Vector64<uint> MultiplyAddBySelectedScalar(Vector64<uint> addend, Vector64<uint> left, Vector128<uint> right, byte rightIndex)
Vector128<short> MultiplyAddBySelectedScalar(Vector128<short> addend, Vector128<short> left, Vector64<short> right, byte rightIndex)
Vector128<short> MultiplyAddBySelectedScalar(Vector128<short> addend, Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<int> MultiplyAddBySelectedScalar(Vector128<int> addend, Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<int> MultiplyAddBySelectedScalar(Vector128<int> addend, Vector128<int> left, Vector128<int> right, byte rightIndex)
Vector128<ushort> MultiplyAddBySelectedScalar(Vector128<ushort> addend, Vector128<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<ushort> MultiplyAddBySelectedScalar(Vector128<ushort> addend, Vector128<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<uint> MultiplyAddBySelectedScalar(Vector128<uint> addend, Vector128<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<uint> MultiplyAddBySelectedScalar(Vector128<uint> addend, Vector128<uint> left, Vector128<uint> right, byte rightIndex)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyAddBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;  V02 arg2         [V02,T02] (  3,  3   )   simd8  ->   d2         HFA(simd8) 
;* V03 arg3         [V03    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V04 OutArgs      [V04    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            mla     v0.4h, v1.4h, v2.h[3]
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

37. MultiplyByScalar

Vector64<short> MultiplyByScalar(Vector64<short> left, Vector64<short> right)

This method multiplies corresponding vector elements in the left by the 0th element of right vector and returns the result vector.

private Vector64<short> MultiplyByScalarTest(Vector64<short> left, Vector64<short> right)
{
  return AdvSimd.MultiplyByScalar(left, right);
}
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// Result = <231, 252, 273, 294>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> MultiplyByScalar(Vector64<int> left, Vector64<int> right)
Vector64<float> MultiplyByScalar(Vector64<float> left, Vector64<float> right)
Vector64<ushort> MultiplyByScalar(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> MultiplyByScalar(Vector64<uint> left, Vector64<uint> right)
Vector128<short> MultiplyByScalar(Vector128<short> left, Vector64<short> right)
Vector128<int> MultiplyByScalar(Vector128<int> left, Vector64<int> right)
Vector128<float> MultiplyByScalar(Vector128<float> left, Vector64<float> right)
Vector128<ushort> MultiplyByScalar(Vector128<ushort> left, Vector64<ushort> right)
Vector128<uint> MultiplyByScalar(Vector128<uint> left, Vector64<uint> right)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MultiplyByScalar(Vector128<double> left, Vector64<double> right)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyByScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            mul     v16.4h, v0.4h, v1.h[0]
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

38. MultiplyBySelectedScalar

Vector64<short> MultiplyBySelectedScalar(Vector64<short> left, Vector64<short> right, byte rightIndex)

This method multiplies corresponding vector elements in the left by the rightIndex element of right vector and returns the result vector.

private Vector64<short> MultiplyBySelectedScalarTest(Vector64<short> left, Vector64<short> right, byte rightIndex)
{
  return AdvSimd.MultiplyBySelectedScalar(left, right, 3);
}
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 3
// Result = <264, 288, 312, 336>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MultiplyBySelectedScalar(Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector64<int> MultiplyBySelectedScalar(Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector64<int> MultiplyBySelectedScalar(Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector64<float> MultiplyBySelectedScalar(Vector64<float> left, Vector64<float> right, byte rightIndex)
Vector64<float> MultiplyBySelectedScalar(Vector64<float> left, Vector128<float> right, byte rightIndex)
Vector64<ushort> MultiplyBySelectedScalar(Vector64<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector64<ushort> MultiplyBySelectedScalar(Vector64<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector64<uint> MultiplyBySelectedScalar(Vector64<uint> left, Vector64<uint> right, byte rightIndex)
Vector64<uint> MultiplyBySelectedScalar(Vector64<uint> left, Vector128<uint> right, byte rightIndex)
Vector128<short> MultiplyBySelectedScalar(Vector128<short> left, Vector64<short> right, byte rightIndex)
Vector128<short> MultiplyBySelectedScalar(Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<int> MultiplyBySelectedScalar(Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<int> MultiplyBySelectedScalar(Vector128<int> left, Vector128<int> right, byte rightIndex)
Vector128<float> MultiplyBySelectedScalar(Vector128<float> left, Vector64<float> right, byte rightIndex)
Vector128<float> MultiplyBySelectedScalar(Vector128<float> left, Vector128<float> right, byte rightIndex)
Vector128<ushort> MultiplyBySelectedScalar(Vector128<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<ushort> MultiplyBySelectedScalar(Vector128<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalar(Vector128<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalar(Vector128<uint> left, Vector128<uint> right, byte rightIndex)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MultiplyBySelectedScalar(Vector128<double> left, Vector128<double> right, byte rightIndex)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            mul     v16.4h, v0.4h, v1.h[3]
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

39. MultiplyBySelectedScalarWideningLower

Vector128<int> MultiplyBySelectedScalarWideningLower(Vector64<short> left, Vector64<short> right, byte rightIndex)

This method multiplies each vector element in the left vector by the rightIndex vector element of the right vector, places the product in a result vector, and returns the result vector. As seen in example below, the result vector element int size is twice as long as the elements that are multiplied short.

private Vector128<int> MultiplyBySelectedScalarWideningLowerTest(Vector64<short> left, Vector64<short> right, byte rightIndex)
{
  return AdvSimd.MultiplyBySelectedScalarWideningLower(left, right, 3);
}
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 3
// Result = <264, 288, 312, 336>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyBySelectedScalarWideningLower(Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningLower(Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningLower(Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningLower(Vector64<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningLower(Vector64<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningLower(Vector64<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningLower(Vector64<uint> left, Vector128<uint> right, byte rightIndex)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarWideningLowerTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            smull   v16.4s, v0.4h, v1.h[3]
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

40. MultiplyBySelectedScalarWideningLowerAndAdd

Vector128<int> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<int> addend, Vector64<short> left, Vector64<short> right, byte rightIndex)

This method multiplies corresponding values in the left and right vectors, and accumulates the results with the vector elements of the addend vector and returns the result vector. As seen in example below, the result vector element’s size int is twice as long as that of input’s size short.

private Vector128<int> MultiplyBySelectedScalarWideningLowerAndAddTest(Vector128<int> addend, Vector64<short> left, Vector64<short> right, byte rightIndex)
{
  return AdvSimd.MultiplyBySelectedScalarWideningLowerAndAdd(addend, left, right, 2);
}
// addend = <1000, 1000, 1000, 1000>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 2
// Result = <1253, 1276, 1299, 1322>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<int> addend, Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<long> addend, Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<long> addend, Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<uint> addend, Vector64<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<uint> addend, Vector64<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<ulong> addend, Vector64<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<ulong> addend, Vector64<uint> left, Vector128<uint> right, byte rightIndex)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarWideningLowerAndAddTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;  V02 arg2         [V02,T02] (  3,  3   )   simd8  ->   d2         HFA(simd8) 
;* V03 arg3         [V03    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V04 OutArgs      [V04    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            smlal   v0.4s, v1.4h, v2.h[2]
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

41. MultiplyBySelectedScalarWideningLowerAndSubtract

Vector128<int> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<int> minuend, Vector64<short> left, Vector64<short> right, byte rightIndex)

This method multiplies corresponding values in the left and right vectors, and substracts the results from the vector elements of the minuend vector and returns the result. As seen in example below, the result vector element’s size int is twice as long as that of input’s size short.

private Vector128<int> MultiplyBySelectedScalarWideningLowerAndSubtractTest(Vector128<int> minuend, Vector64<short> left, Vector64<short> right, byte rightIndex)
{
  return AdvSimd.MultiplyBySelectedScalarWideningLowerAndSubtract(minuend, left, right, 2);
}
// minuend = <1000, 1000, 1000, 1000>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 2
// Result = <747, 724, 701, 678>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<int> minuend, Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<long> minuend, Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<long> minuend, Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<uint> minuend, Vector64<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<uint> minuend, Vector64<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<ulong> minuend, Vector64<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<ulong> minuend, Vector64<uint> left, Vector128<uint> right, byte rightIndex)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarWideningLowerAndSubtractTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;  V02 arg2         [V02,T02] (  3,  3   )   simd8  ->   d2         HFA(simd8) 
;* V03 arg3         [V03    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V04 OutArgs      [V04    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            smlsl   v0.4s, v1.4h, v2.h[2]
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

42. MultiplyBySelectedScalarWideningUpper

Vector128<int> MultiplyBySelectedScalarWideningUpper(Vector128<short> left, Vector64<short> right, byte rightIndex)

private Vector128<int> MultiplyBySelectedScalarWideningUpperTest(Vector128<short> left, Vector64<short> right, byte rightIndex)
{
  return AdvSimd.MultiplyBySelectedScalarWideningUpper(left, right, 2);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 12, 13, 14>
// rightIndex = 2
// Result = <345, 368, 391, 414>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyBySelectedScalarWideningUpper(Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningUpper(Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningUpper(Vector128<int> left, Vector128<int> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningUpper(Vector128<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningUpper(Vector128<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningUpper(Vector128<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningUpper(Vector128<uint> left, Vector128<uint> right, byte rightIndex)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarWideningUpperTest(System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            smull2  v16.4s, v0.8h, v1.h[2]
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

43. MultiplyBySelectedScalarWideningUpperAndAdd

Vector128<int> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<int> addend, Vector128<short> left, Vector64<short> right, byte rightIndex)

This method multiplies corresponding values in left and right vectors, and accumulates the results with the vector elements of the addend vector and return the result vector. As seen in example below, the result vector element int size is twice as long as the elements that are multiplied short.

private Vector128<int> MultiplyBySelectedScalarWideningUpperAndAddTest(Vector128<int> addend, Vector128<short> left, Vector64<short> right, byte rightIndex)
{
  return AdvSimd.MultiplyBySelectedScalarWideningUpperAndAdd(addend, left, right, 0);
}
// addend = <1000, 1000, 1000, 1000>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 12, 13, 14>
// rightIndex = 0
// Result = <1165, 1176, 1187, 1198>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<int> addend, Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<long> addend, Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<long> addend, Vector128<int> left, Vector128<int> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<uint> addend, Vector128<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<uint> addend, Vector128<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<ulong> addend, Vector128<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<ulong> addend, Vector128<uint> left, Vector128<uint> right, byte rightIndex)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarWideningUpperAndAddTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;  V02 arg2         [V02,T02] (  3,  3   )   simd8  ->   d2         HFA(simd8) 
;* V03 arg3         [V03    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V04 OutArgs      [V04    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            smlal2  v0.4s, v1.8h, v2.h[0]
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

44. MultiplyBySelectedScalarWideningUpperAndSubtract

Vector128<int> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<int> minuend, Vector128<short> left, Vector64<short> right, byte rightIndex)

This method multiplies corresponding values in left and right vectors, and substracts the results with the vector elements of the minuend vector and return the result vector. As seen in example below, the result vector element int size is twice as long as the elements that are multiplied short.

private Vector128<int> MultiplyBySelectedScalarWideningUpperAndSubtractTest(Vector128<int> minuend, Vector128<short> left, Vector64<short> right, byte rightIndex)
{
  return AdvSimd.MultiplyBySelectedScalarWideningUpperAndSubtract(minuend, left, right, 0);
}
// minuend = <11, 12, 13, 14>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 12, 13, 14>
// rightIndex = 0
// Result = <-154, -164, -174, -184>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<int> minuend, Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<long> minuend, Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<long> minuend, Vector128<int> left, Vector128<int> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<uint> minuend, Vector128<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<uint> minuend, Vector128<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<ulong> minuend, Vector128<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<ulong> minuend, Vector128<uint> left, Vector128<uint> right, byte rightIndex)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarWideningUpperAndSubtractTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;  V02 arg2         [V02,T02] (  3,  3   )   simd8  ->   d2         HFA(simd8) 
;* V03 arg3         [V03    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V04 OutArgs      [V04    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            smlsl2  v0.4s, v1.8h, v2.h[0]
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

45. MultiplyDoublingByScalarSaturateHigh

Vector64<short> MultiplyDoublingByScalarSaturateHigh(Vector64<short> left, Vector64<short> right)

This method multiplies each vector element in the left by the rightIndex vector element of the right vector, doubles the results, places the most significant half of the final results in a result vector.

private Vector64<short> MultiplyDoublingByScalarSaturateHighTest(Vector64<short> left, Vector64<short> right)
{
  return AdvSimd.MultiplyDoublingByScalarSaturateHigh(left, right);
}
// left = <1000, 12, 13, 14>
// right = <100, 22, 23, 24>
// Result = <3, 0, 0, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> MultiplyDoublingByScalarSaturateHigh(Vector64<int> left, Vector64<int> right)
Vector128<short> MultiplyDoublingByScalarSaturateHigh(Vector128<short> left, Vector64<short> right)
Vector128<int> MultiplyDoublingByScalarSaturateHigh(Vector128<int> left, Vector64<int> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyDoublingByScalarSaturateHighTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqdmulh v16.4h, v0.4h, v1.h[0]
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

Introduction

APIs covered

1. FusedMultiplySubtractScalarBySelectedScalar

2. FusedSubtractHalving

3. Insert

4. InsertScalar

5. InsertSelectedScalar

6. LeadingSignCount

7. LeadingZeroCount

8. LoadAndInsertScalar

9. LoadAndReplicateToVector128

10. LoadAndReplicateToVector64

11. LoadVector128

12. LoadVector64

13. Max

14. MaxAcross

15. MaxNumber

16. MaxNumberAcross

17. MaxNumberPairwise

18. MaxNumberPairwiseScalar

19. MaxNumberScalar

20. MaxPairwise

21. MaxPairwiseScalar

22. MaxScalar

23. Min

24. MinAcross

25. MinNumber

26. MinNumberAcross

27. MinNumberPairwise

28. MinNumberPairwiseScalar

29. MinNumberScalar

30. MinPairwise

31. MinPairwiseScalar

32. MinScalar

33. Multiply

34. MultiplyAdd

35. MultiplyAddByScalar

36. MultiplyAddBySelectedScalar

37. MultiplyByScalar

38. MultiplyBySelectedScalar

39. MultiplyBySelectedScalarWideningLower

40. MultiplyBySelectedScalarWideningLowerAndAdd

41. MultiplyBySelectedScalarWideningLowerAndSubtract

42. MultiplyBySelectedScalarWideningUpper

43. MultiplyBySelectedScalarWideningUpperAndAdd

44. MultiplyBySelectedScalarWideningUpperAndSubtract

45. MultiplyDoublingByScalarSaturateHigh

Leave a Comment