Arm64 Hardware Intrinsics APIs in .NET

Introduction

In my vectorization using .NET APIs blog, I describe SIMD datatypes Vector64<T> and Vector128<T> that operates on ‘Arm64 hardware intrinsic’ APIs present under System.Runtime.Intrinsics.Arm.AdvSimd and System.Runtime.Intrinsics.Arm.AdvSimd.Arm64 class. In this post I will describe those hardware intrinsic APIs by showing sample code usage along with examples and generated Arm64 code. This will help people in understanding these APIs so they can use them to optimize their .NET code written to target Arm64. Since there are 360 APIs, describing all of them in a single post will be overwhelming. So I have divided these APIs among 8 blogs and will demonstrate 45 APIs in each blog. This is part 6 of that blog series. You can checkout my previous blogs at:

Most of the description of these APIs is adapted and referenced from Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile document. You can also refer to the description of SIMD and Floating-point instructions description at Arm developer docs page.

The blog page is programmatically generated and might contain mistakes. If you find any mistake, please leave a comment and I will address it.

APIs covered

MultiplyWideningLowerAndSubtract	ReciprocalStepScalar
MultiplyWideningUpper	ReverseElement16
MultiplyWideningUpperAndAdd	ReverseElement32
MultiplyWideningUpperAndSubtract	ReverseElement8
Negate	ReverseElementBits
NegateSaturate	RoundAwayFromZero
NegateSaturateScalar	RoundAwayFromZeroScalar
NegateScalar	RoundToNearest
Not	RoundToNearestScalar
Or	RoundToNegativeInfinity
OrNot	RoundToNegativeInfinityScalar
PolynomialMultiply	RoundToPositiveInfinity
PolynomialMultiplyWideningLower	RoundToPositiveInfinityScalar
PolynomialMultiplyWideningUpper	RoundToZero
PopCount	RoundToZeroScalar
ReciprocalEstimate	ShiftArithmetic
ReciprocalEstimateScalar	ShiftArithmeticRounded
ReciprocalExponentScalar	ShiftArithmeticRoundedSaturate
ReciprocalSquareRootEstimate	ShiftArithmeticRoundedSaturateScalar
ReciprocalSquareRootEstimateScalar	ShiftArithmeticRoundedScalar
ReciprocalSquareRootStep	ShiftArithmeticSaturate
ReciprocalSquareRootStepScalar	ShiftArithmeticSaturateScalar
ReciprocalStep

1. MultiplyWideningLowerAndSubtract

Vector128<ushort> MultiplyWideningLowerAndSubtract(Vector128<ushort> minuend, Vector64<byte> left, Vector64<byte> right)

This method multiplies the vector elements in the left by the corresponding vector elements of the right vector, and subtracts the results with the vector elements from the minuend vector and return the result vector. As seen in below example, the result vector element’s size ushort is twice as long as the input element’s size byte.

private Vector128<ushort> MultiplyWideningLowerAndSubtractTest(Vector128<ushort> minuend, Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.MultiplyWideningLowerAndSubtract(minuend, left, right);
}
// minuend = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <65316, 65284, 65250, 65214, 65176, 65136, 65094, 65050>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyWideningLowerAndSubtract(Vector128<int> minuend, Vector64<short> left, Vector64<short> right)
Vector128<long> MultiplyWideningLowerAndSubtract(Vector128<long> minuend, Vector64<int> left, Vector64<int> right)
Vector128<short> MultiplyWideningLowerAndSubtract(Vector128<short> minuend, Vector64<sbyte> left, Vector64<sbyte> right)
Vector128<uint> MultiplyWideningLowerAndSubtract(Vector128<uint> minuend, Vector64<ushort> left, Vector64<ushort> right)
Vector128<ulong> MultiplyWideningLowerAndSubtract(Vector128<ulong> minuend, Vector64<uint> left, Vector64<uint> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyWideningLowerAndSubtractTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;  V02 arg2         [V02,T02] (  3,  3   )   simd8  ->   d2         HFA(simd8) 
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            umlsl   v0.8h, v1.8b, v2.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

2. MultiplyWideningUpper

Vector128<ushort> MultiplyWideningUpper(Vector128<byte> left, Vector128<byte> right)

This method multiplies corresponding vector elements in the upper-half of left and right vector, stores the result in a vector, and returns the result vector. As seen in below example, the result vector element’s size ushort is twice as long as the input element’s size byte.

private Vector128<ushort> MultiplyWideningUpperTest(Vector128<byte> left, Vector128<byte> right)
{
  return AdvSimd.MultiplyWideningUpper(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36>
// Result = <551, 600, 651, 704, 759, 816, 875, 936>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyWideningUpper(Vector128<short> left, Vector128<short> right)
Vector128<long> MultiplyWideningUpper(Vector128<int> left, Vector128<int> right)
Vector128<short> MultiplyWideningUpper(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<uint> MultiplyWideningUpper(Vector128<ushort> left, Vector128<ushort> right)
Vector128<ulong> MultiplyWideningUpper(Vector128<uint> left, Vector128<uint> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyWideningUpperTest(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            umull2  v16.8h, v0.16b, v1.16b
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

3. MultiplyWideningUpperAndAdd

Vector128<ushort> MultiplyWideningUpperAndAdd(Vector128<ushort> addend, Vector128<byte> left, Vector128<byte> right)

This method multiplies corresponding vector elements in the upper-half of left and right vector, and accumulates the results with the vector elements of the addend vector and returns the result vector. As seen in below example, the result vector element’s size ushort is twice as long as the input element’s size byte.

private Vector128<ushort> MultiplyWideningUpperAndAddTest(Vector128<ushort> addend, Vector128<byte> left, Vector128<byte> right)
{
  return AdvSimd.MultiplyWideningUpperAndAdd(addend, left, right);
}
// addend = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36>
// Result = <562, 612, 664, 718, 774, 832, 892, 954>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyWideningUpperAndAdd(Vector128<int> addend, Vector128<short> left, Vector128<short> right)
Vector128<long> MultiplyWideningUpperAndAdd(Vector128<long> addend, Vector128<int> left, Vector128<int> right)
Vector128<short> MultiplyWideningUpperAndAdd(Vector128<short> addend, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<uint> MultiplyWideningUpperAndAdd(Vector128<uint> addend, Vector128<ushort> left, Vector128<ushort> right)
Vector128<ulong> MultiplyWideningUpperAndAdd(Vector128<ulong> addend, Vector128<uint> left, Vector128<uint> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyWideningUpperAndAddTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;  V02 arg2         [V02,T02] (  3,  3   )  simd16  ->   d2         HFA(simd16) 
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            umlal2  v0.8h, v1.16b, v2.16b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

4. MultiplyWideningUpperAndSubtract

Vector128<ushort> MultiplyWideningUpperAndSubtract(Vector128<ushort> minuend, Vector128<byte> left, Vector128<byte> right)

This method multiplies the vector elements in the upper-half of left by the corresponding vector elements of the right vector, and subtracts the results with the vector elements from the minuend vector and return the result vector. As seen in below example, the result vector element’s size ushort is twice as long as the input element’s size byte.

private Vector128<ushort> MultiplyWideningUpperAndSubtractTest(Vector128<ushort> minuend, Vector128<byte> left, Vector128<byte> right)
{
  return AdvSimd.MultiplyWideningUpperAndSubtract(minuend, left, right);
}
// minuend = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36>
// Result = <64996, 64948, 64898, 64846, 64792, 64736, 64678, 64618>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyWideningUpperAndSubtract(Vector128<int> minuend, Vector128<short> left, Vector128<short> right)
Vector128<long> MultiplyWideningUpperAndSubtract(Vector128<long> minuend, Vector128<int> left, Vector128<int> right)
Vector128<short> MultiplyWideningUpperAndSubtract(Vector128<short> minuend, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<uint> MultiplyWideningUpperAndSubtract(Vector128<uint> minuend, Vector128<ushort> left, Vector128<ushort> right)
Vector128<ulong> MultiplyWideningUpperAndSubtract(Vector128<ulong> minuend, Vector128<uint> left, Vector128<uint> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:MultiplyWideningUpperAndSubtractTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;  V02 arg2         [V02,T02] (  3,  3   )  simd16  ->   d2         HFA(simd16) 
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            umlsl2  v0.8h, v1.16b, v2.16b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

5. Negate

Vector64<short> Negate(Vector64<short> value)

This method negates each element of value vector and returns the result vector.

private Vector64<short> NegateTest(Vector64<short> value)
{
  return AdvSimd.Negate(value);
}
// value = <11, 12, 13, 14>
// Result = <-11, -12, -13, -14>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> Negate(Vector64<int> value)
Vector64<sbyte> Negate(Vector64<sbyte> value)
Vector64<float> Negate(Vector64<float> value)
Vector128<short> Negate(Vector128<short> value)
Vector128<int> Negate(Vector128<int> value)
Vector128<sbyte> Negate(Vector128<sbyte> value)
Vector128<float> Negate(Vector128<float> value)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Negate(Vector128<double> value)
Vector128<long> Negate(Vector128<long> value)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:NegateTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            neg     v16.4h, v0.4h
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

6. NegateSaturate

Vector64<short> NegateSaturate(Vector64<short> value)

This method negates each vector element in the value vector, stores the results in a vector and returns the result vector. All the values in this method are signed integer values. If there is an overflow with the negation, that element result is saturated.

private Vector64<short> NegateSaturateTest(Vector64<short> value)
{
  return AdvSimd.NegateSaturate(value);
}
// value = <11, 12, 13, 14>
// Result = <-11, -12, -13, -14>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> NegateSaturate(Vector64<int> value)
Vector64<sbyte> NegateSaturate(Vector64<sbyte> value)
Vector128<short> NegateSaturate(Vector128<short> value)
Vector128<int> NegateSaturate(Vector128<int> value)
Vector128<sbyte> NegateSaturate(Vector128<sbyte> value)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<long> NegateSaturate(Vector128<long> value)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:NegateSaturateTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqneg   v16.4h, v0.4h
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

7. NegateSaturateScalar

Vector64<short> NegateSaturateScalar(Vector64<short> value)

This method negates 0th vector element in the value vector, stores the result in 0th element of a vector and returns the result vector. All non-zero elements are initialized to 0. All the values in this method are signed integer values. If there is an overflow with the negation, that element result is saturated.

private Vector64<short> NegateSaturateScalarTest(Vector64<short> value)
{
  return AdvSimd.Arm64.NegateSaturateScalar(value);
}
// value = <11, 12, 13, 14>
// Result = <-11, 0, 0, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> NegateSaturateScalar(Vector64<int> value)
Vector64<long> NegateSaturateScalar(Vector64<long> value)
Vector64<sbyte> NegateSaturateScalar(Vector64<sbyte> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:NegateSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqneg   h16, h0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

8. NegateScalar

Vector64<double> NegateScalar(Vector64<double> value)

This method negates the elements in the value vector and returns the result.

private Vector64<double> NegateScalarTest(Vector64<double> value)
{
  return AdvSimd.NegateScalar(value);
}
// value = <11.5>
// Result = <-11.5>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> NegateScalar(Vector64<float> value)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<long> NegateScalar(Vector64<long> value)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:NegateScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            fneg    d16, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

9. Not

Vector64<byte> Not(Vector64<byte> value)

This method performs bitwise inverse of each element of the value vector, stores the result in a vector and returns the result vector.

private Vector64<byte> NotTest(Vector64<byte> value)
{
  return AdvSimd.Not(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <244, 243, 242, 241, 240, 239, 238, 237>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> Not(Vector64<double> value)
Vector64<short> Not(Vector64<short> value)
Vector64<int> Not(Vector64<int> value)
Vector64<long> Not(Vector64<long> value)
Vector64<sbyte> Not(Vector64<sbyte> value)
Vector64<float> Not(Vector64<float> value)
Vector64<ushort> Not(Vector64<ushort> value)
Vector64<uint> Not(Vector64<uint> value)
Vector64<ulong> Not(Vector64<ulong> value)
Vector128<byte> Not(Vector128<byte> value)
Vector128<double> Not(Vector128<double> value)
Vector128<short> Not(Vector128<short> value)
Vector128<int> Not(Vector128<int> value)
Vector128<long> Not(Vector128<long> value)
Vector128<sbyte> Not(Vector128<sbyte> value)
Vector128<float> Not(Vector128<float> value)
Vector128<ushort> Not(Vector128<ushort> value)
Vector128<uint> Not(Vector128<uint> value)
Vector128<ulong> Not(Vector128<ulong> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:NotTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            mvn     v16.8b, v0.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

10. Or

Vector64<byte> Or(Vector64<byte> left, Vector64<byte> right)

This method performs a bitwise OR between the left and right vectors, and returns the result.

private Vector64<byte> OrTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.Or(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <31, 30, 31, 30, 31, 26, 27, 30>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> Or(Vector64<double> left, Vector64<double> right)
Vector64<short> Or(Vector64<short> left, Vector64<short> right)
Vector64<int> Or(Vector64<int> left, Vector64<int> right)
Vector64<long> Or(Vector64<long> left, Vector64<long> right)
Vector64<sbyte> Or(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> Or(Vector64<float> left, Vector64<float> right)
Vector64<ushort> Or(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> Or(Vector64<uint> left, Vector64<uint> right)
Vector64<ulong> Or(Vector64<ulong> left, Vector64<ulong> right)
Vector128<byte> Or(Vector128<byte> left, Vector128<byte> right)
Vector128<double> Or(Vector128<double> left, Vector128<double> right)
Vector128<short> Or(Vector128<short> left, Vector128<short> right)
Vector128<int> Or(Vector128<int> left, Vector128<int> right)
Vector128<long> Or(Vector128<long> left, Vector128<long> right)
Vector128<sbyte> Or(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> Or(Vector128<float> left, Vector128<float> right)
Vector128<ushort> Or(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> Or(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> Or(Vector128<ulong> left, Vector128<ulong> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:OrTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            orr     v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

11. OrNot

Vector64<byte> OrNot(Vector64<byte> left, Vector64<byte> right)

This method performs a bitwise OR NOT between the left and right vectors, and returns the result.

private Vector64<byte> OrNotTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.OrNot(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <235, 237, 237, 239, 239, 245, 245, 243>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> OrNot(Vector64<double> left, Vector64<double> right)
Vector64<short> OrNot(Vector64<short> left, Vector64<short> right)
Vector64<int> OrNot(Vector64<int> left, Vector64<int> right)
Vector64<long> OrNot(Vector64<long> left, Vector64<long> right)
Vector64<sbyte> OrNot(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> OrNot(Vector64<float> left, Vector64<float> right)
Vector64<ushort> OrNot(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> OrNot(Vector64<uint> left, Vector64<uint> right)
Vector64<ulong> OrNot(Vector64<ulong> left, Vector64<ulong> right)
Vector128<byte> OrNot(Vector128<byte> left, Vector128<byte> right)
Vector128<double> OrNot(Vector128<double> left, Vector128<double> right)
Vector128<short> OrNot(Vector128<short> left, Vector128<short> right)
Vector128<int> OrNot(Vector128<int> left, Vector128<int> right)
Vector128<long> OrNot(Vector128<long> left, Vector128<long> right)
Vector128<sbyte> OrNot(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> OrNot(Vector128<float> left, Vector128<float> right)
Vector128<ushort> OrNot(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> OrNot(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> OrNot(Vector128<ulong> left, Vector128<ulong> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:OrNotTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            orn     v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

12. PolynomialMultiply

Vector64<byte> PolynomialMultiply(Vector64<byte> left, Vector64<byte> right)

This method multiplies corresponding elements in the vectors of the left and right vectors, stores the results in a vector and returns the result vector.

private Vector64<byte> PolynomialMultiplyTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.PolynomialMultiply(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <151, 232, 243, 144, 135, 160, 171, 248>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<sbyte> PolynomialMultiply(Vector64<sbyte> left, Vector64<sbyte> right)
Vector128<byte> PolynomialMultiply(Vector128<byte> left, Vector128<byte> right)
Vector128<sbyte> PolynomialMultiply(Vector128<sbyte> left, Vector128<sbyte> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:PolynomialMultiplyTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            pmul    v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

13. PolynomialMultiplyWideningLower

Vector128<ushort> PolynomialMultiplyWideningLower(Vector64<byte> left, Vector64<byte> right)

This method multiplies corresponding elements in the left and right vectors, stores the results in a vector and returns the result vector. As seen in below example, the result vector element’s size ushort is twice as long as the input element’s size byte.

private Vector128<ushort> PolynomialMultiplyWideningLowerTest(Vector64<byte> left, Vector64<byte> right)
{
  return AdvSimd.PolynomialMultiplyWideningLower(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <151, 232, 243, 144, 135, 416, 427, 504>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> PolynomialMultiplyWideningLower(Vector64<sbyte> left, Vector64<sbyte> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:PolynomialMultiplyWideningLowerTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            pmull   v16.8h, v0.8b, v1.8b
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

14. PolynomialMultiplyWideningUpper

Vector128<ushort> PolynomialMultiplyWideningUpper(Vector128<byte> left, Vector128<byte> right)

This method multiplies corresponding elements in the upper-half of left with corresponding elements of right vectors, stores the results in a vector and returns the result vector. As seen in below example, the result vector element’s size ushort is twice as long as the input element’s size byte.

private Vector128<ushort> PolynomialMultiplyWideningUpperTest(Vector128<byte> left, Vector128<byte> right)
{
  return AdvSimd.PolynomialMultiplyWideningUpper(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36>
// Result = <503, 408, 403, 704, 759, 816, 779, 808>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> PolynomialMultiplyWideningUpper(Vector128<sbyte> left, Vector128<sbyte> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:PolynomialMultiplyWideningUpperTest(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            pmull2  v16.8h, v0.16b, v1.16b
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

15. PopCount

Vector64<byte> PopCount(Vector64<byte> value)

This method counts the number of bits that have a value of one in each element in the value vector, stores the results in a vector and returns the result vector.

private Vector64<byte> PopCountTest(Vector64<byte> value)
{
  return AdvSimd.PopCount(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <3, 2, 3, 3, 4, 1, 2, 2>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<sbyte> PopCount(Vector64<sbyte> value)
Vector128<byte> PopCount(Vector128<byte> value)
Vector128<sbyte> PopCount(Vector128<sbyte> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:PopCountTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            cnt     v16.8b, v0.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

16. ReciprocalEstimate

Vector64<float> ReciprocalEstimate(Vector64<float> value)

This method finds an approximate reciprocal estimate for each element in the value vector, stores the results in a vector and returns the result vector.

private Vector64<float> ReciprocalEstimateTest(Vector64<float> value)
{
  return AdvSimd.ReciprocalEstimate(value);
}
// value = <11.5, 12.5>
// Result = <0.08691406, 0.079833984>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<uint> ReciprocalEstimate(Vector64<uint> value)
Vector128<float> ReciprocalEstimate(Vector128<float> value)
Vector128<uint> ReciprocalEstimate(Vector128<uint> value)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> ReciprocalEstimate(Vector128<double> value)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReciprocalEstimateTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frecpe  v16.2s, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

17. ReciprocalEstimateScalar

Vector64<double> ReciprocalEstimateScalar(Vector64<double> value)

This method finds an approximate reciprocal estimate for each element in the value vector, stores the results in a vector and returns the result vector.

private Vector64<double> ReciprocalEstimateScalarTest(Vector64<double> value)
{
  return AdvSimd.Arm64.ReciprocalEstimateScalar(value);
}
// value = <11.5>
// Result = <0.0869140625>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> ReciprocalEstimateScalar(Vector64<float> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReciprocalEstimateScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frecpe  d16, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

18. ReciprocalExponentScalar

Vector64<double> ReciprocalExponentScalar(Vector64<double> value)

This method finds an approximate reciprocal exponent for each element in the value vector, stores the results in a vector and returns the result vector.

private Vector64<double> ReciprocalExponentScalarTest(Vector64<double> value)
{
  return AdvSimd.Arm64.ReciprocalExponentScalar(value);
}
// value = <11.5>
// Result = <0.25>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> ReciprocalExponentScalar(Vector64<float> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReciprocalExponentScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frecpx  d16, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

19. ReciprocalSquareRootEstimate

Vector64<float> ReciprocalSquareRootEstimate(Vector64<float> value)

This method calculates an approximate square root for each element in the value vector, stores the results in a vector and returns the result vector.

private Vector64<float> ReciprocalSquareRootEstimateTest(Vector64<float> value)
{
  return AdvSimd.ReciprocalSquareRootEstimate(value);
}
// value = <11.5, 12.5>
// Result = <0.29492188, 0.28222656>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<uint> ReciprocalSquareRootEstimate(Vector64<uint> value)
Vector128<float> ReciprocalSquareRootEstimate(Vector128<float> value)
Vector128<uint> ReciprocalSquareRootEstimate(Vector128<uint> value)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> ReciprocalSquareRootEstimate(Vector128<double> value)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReciprocalSquareRootEstimateTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frsqrte v16.2s, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

20. ReciprocalSquareRootEstimateScalar

Vector64<double> ReciprocalSquareRootEstimateScalar(Vector64<double> value)

This method calculates an approximate square root for each element in the value vector, stores the results in a vector and returns the result vector.

private Vector64<double> ReciprocalSquareRootEstimateScalarTest(Vector64<double> value)
{
  return AdvSimd.Arm64.ReciprocalSquareRootEstimateScalar(value);
}
// value = <11.5>
// Result = <0.294921875>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> ReciprocalSquareRootEstimateScalar(Vector64<float> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReciprocalSquareRootEstimateScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frsqrte d16, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

21. ReciprocalSquareRootStep

Vector64<float> ReciprocalSquareRootStep(Vector64<float> left, Vector64<float> right)

This method multiplies corresponding floating-point values in the left and right vector, subtracts each of the products from 3.0, divides these results by 2.0, stores the results in a vector and returns the result vector.

private Vector64<float> ReciprocalSquareRootStepTest(Vector64<float> left, Vector64<float> right)
{
  return AdvSimd.ReciprocalSquareRootStep(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <-122.125, -139.125>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> ReciprocalSquareRootStep(Vector128<float> left, Vector128<float> right)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> ReciprocalSquareRootStep(Vector128<double> left, Vector128<double> right)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReciprocalSquareRootStepTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frsqrts v16.2s, v0.2s, v1.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

22. ReciprocalSquareRootStepScalar

Vector64<double> ReciprocalSquareRootStepScalar(Vector64<double> left, Vector64<double> right)

This method multiplies corresponding floating-point values in the vectors of the left and right vectors, subtracts each of the products from 3.0, divides these results by 2.0, stores the results in a vector and returns the result vector.

private Vector64<double> ReciprocalSquareRootStepScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.Arm64.ReciprocalSquareRootStepScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <-64.625>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> ReciprocalSquareRootStepScalar(Vector64<float> left, Vector64<float> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReciprocalSquareRootStepScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frsqrts d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

23. ReciprocalStep

Vector64<float> ReciprocalStep(Vector64<float> left, Vector64<float> right)

This method multiplies the corresponding floating-point values in the left and right vectors, subtracts each of the products from 2.0, stores the results in a vector and returns the result vector.

private Vector64<float> ReciprocalStepTest(Vector64<float> left, Vector64<float> right)
{
  return AdvSimd.ReciprocalStep(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <-245.25, -279.25>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> ReciprocalStep(Vector128<float> left, Vector128<float> right)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> ReciprocalStep(Vector128<double> left, Vector128<double> right)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReciprocalStepTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frecps  v16.2s, v0.2s, v1.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

24. ReciprocalStepScalar

Vector64<double> ReciprocalStepScalar(Vector64<double> left, Vector64<double> right)

private Vector64<double> ReciprocalStepScalarTest(Vector64<double> left, Vector64<double> right)
{
  return AdvSimd.Arm64.ReciprocalStepScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <-130.25>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> ReciprocalStepScalar(Vector64<float> left, Vector64<float> right)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReciprocalStepScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frecps  d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

25. ReverseElement16

Vector64<int> ReverseElement16(Vector64<int> value)

Reverse bytes in each 32-bits value and returns the result.

private Vector64<int> ReverseElement16Test(Vector64<int> value)
{
  return AdvSimd.ReverseElement16(value);
}
// value = <11, 12>
// Result = <720896, 786432>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<long> ReverseElement16(Vector64<long> value)
Vector64<uint> ReverseElement16(Vector64<uint> value)
Vector64<ulong> ReverseElement16(Vector64<ulong> value)
Vector128<int> ReverseElement16(Vector128<int> value)
Vector128<long> ReverseElement16(Vector128<long> value)
Vector128<uint> ReverseElement16(Vector128<uint> value)
Vector128<ulong> ReverseElement16(Vector128<ulong> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReverseElement16Test(System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector64`1[Int32]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            rev32   v16.4h, v0.4h
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

26. ReverseElement32

Vector64<long> ReverseElement32(Vector64<long> value)

Reverse bytes in each 64-bits value and returns the result.

private Vector64<long> ReverseElement32Test(Vector64<long> value)
{
  return AdvSimd.ReverseElement32(value);
}
// value = <11>
// Result = <47244640256>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ReverseElement32(Vector64<ulong> value)
Vector128<long> ReverseElement32(Vector128<long> value)
Vector128<ulong> ReverseElement32(Vector128<ulong> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReverseElement32Test(System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            rev64   v16.2s, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

27. ReverseElement8

Vector64<short> ReverseElement8(Vector64<short> value)

Reverse bytes in each 16-bit half word values and returns the result.

private Vector64<short> ReverseElement8Test(Vector64<short> value)
{
  return AdvSimd.ReverseElement8(value);
}
// value = <11, 12, 13, 14>
// Result = <2816, 3072, 3328, 3584>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ReverseElement8(Vector64<int> value)
Vector64<long> ReverseElement8(Vector64<long> value)
Vector64<ushort> ReverseElement8(Vector64<ushort> value)
Vector64<uint> ReverseElement8(Vector64<uint> value)
Vector64<ulong> ReverseElement8(Vector64<ulong> value)
Vector128<short> ReverseElement8(Vector128<short> value)
Vector128<int> ReverseElement8(Vector128<int> value)
Vector128<long> ReverseElement8(Vector128<long> value)
Vector128<ushort> ReverseElement8(Vector128<ushort> value)
Vector128<uint> ReverseElement8(Vector128<uint> value)
Vector128<ulong> ReverseElement8(Vector128<ulong> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReverseElement8Test(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            rev16   v16.8b, v0.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

28. ReverseElementBits

Vector64<byte> ReverseElementBits(Vector64<byte> value)

Reverses the bit order of all elements in value vector.

private Vector64<byte> ReverseElementBitsTest(Vector64<byte> value)
{
  return AdvSimd.Arm64.ReverseElementBits(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <208, 48, 176, 112, 240, 8, 136, 72>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<sbyte> ReverseElementBits(Vector64<sbyte> value)
Vector128<byte> ReverseElementBits(Vector128<byte> value)
Vector128<sbyte> ReverseElementBits(Vector128<sbyte> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ReverseElementBitsTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            rbit    v16.8b, v0.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

29. RoundAwayFromZero

Vector64<float> RoundAwayFromZero(Vector64<float> value)

This method rounds a vector of floating-point values in the value vector to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

private Vector64<float> RoundAwayFromZeroTest(Vector64<float> value)
{
  return AdvSimd.RoundAwayFromZero(value);
}
// value = <11.5, 12.5>
// Result = <12, 13>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> RoundAwayFromZero(Vector128<float> value)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> RoundAwayFromZero(Vector128<double> value)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:RoundAwayFromZeroTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frinta  v16.2s, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

30. RoundAwayFromZeroScalar

Vector64<double> RoundAwayFromZeroScalar(Vector64<double> value)

This method rounds a floating-point value in the value vector to an integral floating-point value of the same size using the Round to Nearest with Ties to Away rounding mode, and returns the result. As per ARM docs, zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

private Vector64<double> RoundAwayFromZeroScalarTest(Vector64<double> value)
{
  return AdvSimd.RoundAwayFromZeroScalar(value);
}
// value = <11.5>
// Result = <12>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> RoundAwayFromZeroScalar(Vector64<float> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:RoundAwayFromZeroScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frinta  d16, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

31. RoundToNearest

Vector64<float> RoundToNearest(Vector64<float> value)

This method rounds a vector of floating-point values in the value vector to integral floating-point values of the same size using the Round to Nearest rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

private Vector64<float> RoundToNearestTest(Vector64<float> value)
{
  return AdvSimd.RoundToNearest(value);
}
// value = <11.4, 12.8>
// Result = <11, 13>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> RoundToNearest(Vector128<float> value)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> RoundToNearest(Vector128<double> value)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:RoundToNearestTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frintn  v16.2s, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

32. RoundToNearestScalar

Vector64<double> RoundToNearestScalar(Vector64<double> value)

private Vector64<double> RoundToNearestScalarTest(Vector64<double> value)
{
  return AdvSimd.RoundToNearestScalar(value);
}
// value = <11.4>
// Result = <11>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> RoundToNearestScalar(Vector64<float> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:RoundToNearestScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frintn  d16, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

33. RoundToNegativeInfinity

Vector64<float> RoundToNegativeInfinity(Vector64<float> value)

This method rounds a floating-point value in the value vector to an integral floating-point value of the same size using the Round towards Minus Infinity rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

private Vector64<float> RoundToNegativeInfinityTest(Vector64<float> value)
{
  return AdvSimd.RoundToNegativeInfinity(value);
}
// value = <11.5, 12.5>
// Result = <11, 12>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> RoundToNegativeInfinity(Vector128<float> value)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> RoundToNegativeInfinity(Vector128<double> value)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:RoundToNegativeInfinityTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frintm  v16.2s, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

34. RoundToNegativeInfinityScalar

Vector64<double> RoundToNegativeInfinityScalar(Vector64<double> value)

private Vector64<double> RoundToNegativeInfinityScalarTest(Vector64<double> value)
{
  return AdvSimd.RoundToNegativeInfinityScalar(value);
}
// value = <11.5>
// Result = <11>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> RoundToNegativeInfinityScalar(Vector64<float> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:RoundToNegativeInfinityScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frintm  d16, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

35. RoundToPositiveInfinity

Vector64<float> RoundToPositiveInfinity(Vector64<float> value)

This method rounds a floating-point value in the value vector to an integral floating-point value of the same size using the Round towards Plus Infinity rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

private Vector64<float> RoundToPositiveInfinityTest(Vector64<float> value)
{
  return AdvSimd.RoundToPositiveInfinity(value);
}
// value = <11.5, 12.5>
// Result = <12, 13>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> RoundToPositiveInfinity(Vector128<float> value)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> RoundToPositiveInfinity(Vector128<double> value)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:RoundToPositiveInfinityTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frintp  v16.2s, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

36. RoundToPositiveInfinityScalar

Vector64<double> RoundToPositiveInfinityScalar(Vector64<double> value)

private Vector64<double> RoundToPositiveInfinityScalarTest(Vector64<double> value)
{
  return AdvSimd.RoundToPositiveInfinityScalar(value);
}
// value = <11.5>
// Result = <12>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> RoundToPositiveInfinityScalar(Vector64<float> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:RoundToPositiveInfinityScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frintp  d16, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

37. RoundToZero

Vector64<float> RoundToZero(Vector64<float> value)

This method rounds a vector of floating-point values in the value vector to integral floating-point values of the same size using the Round towards Zero rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.

private Vector64<float> RoundToZeroTest(Vector64<float> value)
{
  return AdvSimd.RoundToZero(value);
}
// value = <11.4, 12.8>
// Result = <11, 12>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> RoundToZero(Vector128<float> value)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> RoundToZero(Vector128<double> value)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:RoundToZeroTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frintz  v16.2s, v0.2s
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

38. RoundToZeroScalar

Vector64<double> RoundToZeroScalar(Vector64<double> value)

private Vector64<double> RoundToZeroScalarTest(Vector64<double> value)
{
  return AdvSimd.RoundToZeroScalar(value);
}
// value = <11.4>
// Result = <11>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> RoundToZeroScalar(Vector64<float> value)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:RoundToZeroScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;# V01 OutArgs      [V01    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            frintz  d16, d0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

39. ShiftArithmetic

Vector64<short> ShiftArithmetic(Vector64<short> value, Vector64<short> count)

This method performs arithmetic shifts of each signed integer value in the value vector, by the value in corresponding element of the count vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift.

private Vector64<short> ShiftArithmeticTest(Vector64<short> value, Vector64<short> count)
{
  return AdvSimd.ShiftArithmetic(value, count);
}
// value = <11, 12, 13, 14>
// count = <18, 2, 3, -2>
// Result = <0, 48, 104, 3>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftArithmetic(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftArithmetic(Vector64<sbyte> value, Vector64<sbyte> count)
Vector128<short> ShiftArithmetic(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftArithmetic(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftArithmetic(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftArithmetic(Vector128<sbyte> value, Vector128<sbyte> count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftArithmeticTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sshl    v16.4h, v0.4h, v1.4h
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

40. ShiftArithmeticRounded

Vector64<short> ShiftArithmeticRounded(Vector64<short> value, Vector64<short> count)

This method performs arithmetic shift of each signed integer value in the value vector, by a value in corresponding element of the count vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift.

private Vector64<short> ShiftArithmeticRoundedTest(Vector64<short> value, Vector64<short> count)
{
  return AdvSimd.ShiftArithmeticRounded(value, count);
}
// value = <11, 12, 13, 14>
// count = <18, 2, 3, -2>
// Result = <0, 48, 104, 4>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftArithmeticRounded(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftArithmeticRounded(Vector64<sbyte> value, Vector64<sbyte> count)
Vector128<short> ShiftArithmeticRounded(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftArithmeticRounded(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftArithmeticRounded(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftArithmeticRounded(Vector128<sbyte> value, Vector128<sbyte> count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftArithmeticRoundedTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            srshl   v16.4h, v0.4h, v1.4h
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

41. ShiftArithmeticRoundedSaturate

Vector64<short> ShiftArithmeticRoundedSaturate(Vector64<short> value, Vector64<short> count)

This method performs arithmetic shift of each vector element in the value vector, by a value of the corresponding vector element of the count vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded.

private Vector64<short> ShiftArithmeticRoundedSaturateTest(Vector64<short> value, Vector64<short> count)
{
  return AdvSimd.ShiftArithmeticRoundedSaturate(value, count);
}
// value = <11, 12, 13, 14>
// count = <18, 2, 3, -2>
// Result = <32767, 48, 104, 4>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftArithmeticRoundedSaturate(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftArithmeticRoundedSaturate(Vector64<sbyte> value, Vector64<sbyte> count)
Vector128<short> ShiftArithmeticRoundedSaturate(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftArithmeticRoundedSaturate(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftArithmeticRoundedSaturate(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftArithmeticRoundedSaturate(Vector128<sbyte> value, Vector128<sbyte> count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftArithmeticRoundedSaturateTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqrshl  v16.4h, v0.4h, v1.4h
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

42. ShiftArithmeticRoundedSaturateScalar

Vector64<long> ShiftArithmeticRoundedSaturateScalar(Vector64<long> value, Vector64<long> count)

This method performs arithmetic shift of 0th element in the value vector, by a value of the corresponding 0th element in the count vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded.

private Vector64<long> ShiftArithmeticRoundedSaturateScalarTest(Vector64<long> value, Vector64<long> count)
{
  return AdvSimd.ShiftArithmeticRoundedSaturateScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> ShiftArithmeticRoundedSaturateScalar(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftArithmeticRoundedSaturateScalar(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftArithmeticRoundedSaturateScalar(Vector64<sbyte> value, Vector64<sbyte> count)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftArithmeticRoundedSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqrshl  d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

43. ShiftArithmeticRoundedScalar

Vector64<long> ShiftArithmeticRoundedScalar(Vector64<long> value, Vector64<long> count)

This method performs arithmetic shift of each signed integer value in the value vector, by a value of the corresponding element of the count vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift.

private Vector64<long> ShiftArithmeticRoundedScalarTest(Vector64<long> value, Vector64<long> count)
{
  return AdvSimd.ShiftArithmeticRoundedScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftArithmeticRoundedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            srshl   d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

44. ShiftArithmeticSaturate

Vector64<short> ShiftArithmeticSaturate(Vector64<short> value, Vector64<short> count)

This method performs arithmetic shift of each element in the value vector, by a value of the corresponding element of the count vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated.

private Vector64<short> ShiftArithmeticSaturateTest(Vector64<short> value, Vector64<short> count)
{
  return AdvSimd.ShiftArithmeticSaturate(value, count);
}
// value = <11, 12, 13, 14>
// count = <21, 22, 23, 24>
// Result = <32767, 32767, 32767, 32767>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftArithmeticSaturate(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftArithmeticSaturate(Vector64<sbyte> value, Vector64<sbyte> count)
Vector128<short> ShiftArithmeticSaturate(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftArithmeticSaturate(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftArithmeticSaturate(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftArithmeticSaturate(Vector128<sbyte> value, Vector128<sbyte> count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftArithmeticSaturateTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqshl   v16.4h, v0.4h, v1.4h
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

45. ShiftArithmeticSaturateScalar

Vector64<long> ShiftArithmeticSaturateScalar(Vector64<long> value, Vector64<long> count)

private Vector64<long> ShiftArithmeticSaturateScalarTest(Vector64<long> value, Vector64<long> count)
{
  return AdvSimd.ShiftArithmeticSaturateScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> ShiftArithmeticSaturateScalar(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftArithmeticSaturateScalar(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftArithmeticSaturateScalar(Vector64<sbyte> value, Vector64<sbyte> count)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftArithmeticSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqshl   d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

Introduction

APIs covered

1. MultiplyWideningLowerAndSubtract

2. MultiplyWideningUpper

3. MultiplyWideningUpperAndAdd

4. MultiplyWideningUpperAndSubtract

5. Negate

6. NegateSaturate

7. NegateSaturateScalar

8. NegateScalar

9. Not

10. Or

11. OrNot

12. PolynomialMultiply

13. PolynomialMultiplyWideningLower

14. PolynomialMultiplyWideningUpper

15. PopCount

16. ReciprocalEstimate

17. ReciprocalEstimateScalar

18. ReciprocalExponentScalar

19. ReciprocalSquareRootEstimate

20. ReciprocalSquareRootEstimateScalar

21. ReciprocalSquareRootStep

22. ReciprocalSquareRootStepScalar

23. ReciprocalStep

24. ReciprocalStepScalar

25. ReverseElement16

26. ReverseElement32

27. ReverseElement8

28. ReverseElementBits

29. RoundAwayFromZero

30. RoundAwayFromZeroScalar

31. RoundToNearest

32. RoundToNearestScalar

33. RoundToNegativeInfinity

34. RoundToNegativeInfinityScalar

35. RoundToPositiveInfinity

36. RoundToPositiveInfinityScalar

37. RoundToZero

38. RoundToZeroScalar

39. ShiftArithmetic

40. ShiftArithmeticRounded

41. ShiftArithmeticRoundedSaturate

42. ShiftArithmeticRoundedSaturateScalar

43. ShiftArithmeticRoundedScalar

44. ShiftArithmeticSaturate

45. ShiftArithmeticSaturateScalar

Leave a Comment