Arm64 Hardware Intrinsics APIs in .NET

Introduction

In my vectorization using .NET APIs blog, I describe SIMD datatypes Vector64<T> and Vector128<T> that operates on ‘Arm64 hardware intrinsic’ APIs present under System.Runtime.Intrinsics.Arm.AdvSimd and System.Runtime.Intrinsics.Arm.AdvSimd.Arm64 class. In this post I will describe those hardware intrinsic APIs by showing sample code usage along with examples and generated Arm64 code. This will help people in understanding these APIs so they can use them to optimize their .NET code written to target Arm64. Since there are 360 APIs, describing all of them in a single post will be overwhelming. So I have divided these APIs among 8 blogs and will demonstrate 45 APIs in each blog. This is part 7 of that blog series. You can checkout my previous blogs at:

Most of the description of these APIs is adapted and referenced from Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile document. You can also refer to the description of SIMD and Floating-point instructions description at Arm developer docs page.

The blog page is programmatically generated and might contain mistakes. If you find any mistake, please leave a comment and I will address it.

APIs covered

ShiftArithmeticScalar	ShiftRightArithmeticAddScalar
ShiftLeftAndInsert	ShiftRightArithmeticNarrowingSaturateLower
ShiftLeftAndInsertScalar	ShiftRightArithmeticNarrowingSaturateScalar
ShiftLeftLogical	ShiftRightArithmeticNarrowingSaturateUnsignedLower
ShiftLeftLogicalSaturate	ShiftRightArithmeticNarrowingSaturateUnsignedScalar
ShiftLeftLogicalSaturateScalar	ShiftRightArithmeticNarrowingSaturateUnsignedUpper
ShiftLeftLogicalSaturateUnsigned	ShiftRightArithmeticNarrowingSaturateUpper
ShiftLeftLogicalSaturateUnsignedScalar	ShiftRightArithmeticRounded
ShiftLeftLogicalScalar	ShiftRightArithmeticRoundedAdd
ShiftLeftLogicalWideningLower	ShiftRightArithmeticRoundedAddScalar
ShiftLeftLogicalWideningUpper	ShiftRightArithmeticRoundedNarrowingSaturateLower
ShiftLogical	ShiftRightArithmeticRoundedNarrowingSaturateScalar
ShiftLogicalRounded	ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLower
ShiftLogicalRoundedSaturate	ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalar
ShiftLogicalRoundedSaturateScalar	ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpper
ShiftLogicalRoundedScalar	ShiftRightArithmeticRoundedNarrowingSaturateUpper
ShiftLogicalSaturate	ShiftRightArithmeticRoundedScalar
ShiftLogicalSaturateScalar	ShiftRightArithmeticScalar
ShiftLogicalScalar	ShiftRightLogical
ShiftRightAndInsert	ShiftRightLogicalAdd
ShiftRightAndInsertScalar	ShiftRightLogicalAddScalar
ShiftRightArithmetic	ShiftRightLogicalNarrowingLower
ShiftRightArithmeticAdd

1. ShiftArithmeticScalar

Vector64<long> ShiftArithmeticScalar(Vector64<long> value, Vector64<long> count)

This method performs arithmetic shift of each signed integer value in the value vector, by a value of the corresponding element of the count vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift.

private Vector64<long> ShiftArithmeticScalarTest(Vector64<long> value, Vector64<long> count)
{
  return AdvSimd.ShiftArithmeticScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftArithmeticScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sshl    d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

2. ShiftLeftAndInsert

Vector64<byte> ShiftLeftAndInsert(Vector64<byte> left, Vector64<byte> right, byte shift)

This method left shifts each vector element in the right vector, by shift value, and inserts the result into the corresponding vector element in the left vector such that the new zero bits created by the shift are not inserted but retain their existing value as in left vector. Bits shifted out of the left of each vector element in the right are lost.

private Vector64<byte> ShiftLeftAndInsertTest(Vector64<byte> left, Vector64<byte> right, byte shift)
{
  return AdvSimd.ShiftLeftAndInsert(left, right, 1);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <1, 2, 3, 4, 5, 6, 7, 8>
// shift = 1
// Result = <3, 4, 7, 8, 11, 12, 15, 16>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLeftAndInsert(Vector64<short> left, Vector64<short> right, byte shift)
Vector64<int> ShiftLeftAndInsert(Vector64<int> left, Vector64<int> right, byte shift)
Vector64<sbyte> ShiftLeftAndInsert(Vector64<sbyte> left, Vector64<sbyte> right, byte shift)
Vector64<ushort> ShiftLeftAndInsert(Vector64<ushort> left, Vector64<ushort> right, byte shift)
Vector64<uint> ShiftLeftAndInsert(Vector64<uint> left, Vector64<uint> right, byte shift)
Vector128<byte> ShiftLeftAndInsert(Vector128<byte> left, Vector128<byte> right, byte shift)
Vector128<short> ShiftLeftAndInsert(Vector128<short> left, Vector128<short> right, byte shift)
Vector128<int> ShiftLeftAndInsert(Vector128<int> left, Vector128<int> right, byte shift)
Vector128<long> ShiftLeftAndInsert(Vector128<long> left, Vector128<long> right, byte shift)
Vector128<sbyte> ShiftLeftAndInsert(Vector128<sbyte> left, Vector128<sbyte> right, byte shift)
Vector128<ushort> ShiftLeftAndInsert(Vector128<ushort> left, Vector128<ushort> right, byte shift)
Vector128<uint> ShiftLeftAndInsert(Vector128<uint> left, Vector128<uint> right, byte shift)
Vector128<ulong> ShiftLeftAndInsert(Vector128<ulong> left, Vector128<ulong> right, byte shift)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLeftAndInsertTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sli     v0.8b, v1.8b, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

3. ShiftLeftAndInsertScalar

Vector64<long> ShiftLeftAndInsertScalar(Vector64<long> left, Vector64<long> right, byte shift)

private Vector64<long> ShiftLeftAndInsertScalarTest(Vector64<long> left, Vector64<long> right, byte shift)
{
  return AdvSimd.ShiftLeftAndInsertScalar(left, right, 1);
}
// left = <50000>
// right = <60000>
// shift = 1
// Result = <120000>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLeftAndInsertScalar(Vector64<ulong> left, Vector64<ulong> right, byte shift)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLeftAndInsertScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sli     d0, d1, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

4. ShiftLeftLogical

Vector64<byte> ShiftLeftLogical(Vector64<byte> value, byte count)

This method left shifts each value from a vector, by count, stores the results in a vector and returns the result vector.

private Vector64<byte> ShiftLeftLogicalTest(Vector64<byte> value, byte count)
{
  return AdvSimd.ShiftLeftLogical(value, 1);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <22, 24, 26, 28, 30, 32, 34, 36>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLeftLogical(Vector64<short> value, byte count)
Vector64<int> ShiftLeftLogical(Vector64<int> value, byte count)
Vector64<sbyte> ShiftLeftLogical(Vector64<sbyte> value, byte count)
Vector64<ushort> ShiftLeftLogical(Vector64<ushort> value, byte count)
Vector64<uint> ShiftLeftLogical(Vector64<uint> value, byte count)
Vector128<byte> ShiftLeftLogical(Vector128<byte> value, byte count)
Vector128<short> ShiftLeftLogical(Vector128<short> value, byte count)
Vector128<long> ShiftLeftLogical(Vector128<long> value, byte count)
Vector128<sbyte> ShiftLeftLogical(Vector128<sbyte> value, byte count)
Vector128<ushort> ShiftLeftLogical(Vector128<ushort> value, byte count)
Vector128<uint> ShiftLeftLogical(Vector128<uint> value, byte count)
Vector128<ulong> ShiftLeftLogical(Vector128<ulong> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            shl     v16.8b, v0.8b, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

5. ShiftLeftLogicalSaturate

Vector64<byte> ShiftLeftLogicalSaturate(Vector64<byte> value, byte count)

This method left shifts each element in the value vector, shifts it by count, stores the results in a vector and returns the result vector. The results are truncated.

private Vector64<byte> ShiftLeftLogicalSaturateTest(Vector64<byte> value, byte count)
{
  return AdvSimd.ShiftLeftLogicalSaturate(value, 6);
}
// value = <11, 112, 13, 14, 15, 16, 17, 18>
// count = 6
// Result = <64, 255, 255, 255, 255, 255, 255, 255>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLeftLogicalSaturate(Vector64<short> value, byte count)
Vector64<int> ShiftLeftLogicalSaturate(Vector64<int> value, byte count)
Vector64<sbyte> ShiftLeftLogicalSaturate(Vector64<sbyte> value, byte count)
Vector64<ushort> ShiftLeftLogicalSaturate(Vector64<ushort> value, byte count)
Vector64<uint> ShiftLeftLogicalSaturate(Vector64<uint> value, byte count)
Vector128<byte> ShiftLeftLogicalSaturate(Vector128<byte> value, byte count)
Vector128<short> ShiftLeftLogicalSaturate(Vector128<short> value, byte count)
Vector128<int> ShiftLeftLogicalSaturate(Vector128<int> value, byte count)
Vector128<long> ShiftLeftLogicalSaturate(Vector128<long> value, byte count)
Vector128<sbyte> ShiftLeftLogicalSaturate(Vector128<sbyte> value, byte count)
Vector128<ushort> ShiftLeftLogicalSaturate(Vector128<ushort> value, byte count)
Vector128<uint> ShiftLeftLogicalSaturate(Vector128<uint> value, byte count)
Vector128<ulong> ShiftLeftLogicalSaturate(Vector128<ulong> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalSaturateTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uqshl   v16.8b, v0.8b, #6
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

6. ShiftLeftLogicalSaturateScalar

Vector64<long> ShiftLeftLogicalSaturateScalar(Vector64<long> value, byte count)

This method left shift each element in the value vector, by count, stores the results in a vector and returns the result vector. The results are truncated.

private Vector64<long> ShiftLeftLogicalSaturateScalarTest(Vector64<long> value, byte count)
{
  return AdvSimd.ShiftLeftLogicalSaturateScalar(value, 0);
}
// value = <11>
// count = 0
// Result = <11>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLeftLogicalSaturateScalar(Vector64<ulong> value, byte count)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<byte> ShiftLeftLogicalSaturateScalar(Vector64<byte> value, byte count)
Vector64<short> ShiftLeftLogicalSaturateScalar(Vector64<short> value, byte count)
Vector64<int> ShiftLeftLogicalSaturateScalar(Vector64<int> value, byte count)
Vector64<sbyte> ShiftLeftLogicalSaturateScalar(Vector64<sbyte> value, byte count)
Vector64<ushort> ShiftLeftLogicalSaturateScalar(Vector64<ushort> value, byte count)
Vector64<uint> ShiftLeftLogicalSaturateScalar(Vector64<uint> value, byte count)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqshl   d16, d0, #0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

7. ShiftLeftLogicalSaturateUnsigned

Vector64<ushort> ShiftLeftLogicalSaturateUnsigned(Vector64<short> value, byte count)

This method left shifts each signed integer value in the value vector, by count, saturates the shifted result to an unsigned integer value, stores the results in a vector and returns the result vector. The results are truncated.

private Vector64<ushort> ShiftLeftLogicalSaturateUnsignedTest(Vector64<short> value, byte count)
{
  return AdvSimd.ShiftLeftLogicalSaturateUnsigned(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <22, 24, 26, 28>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<uint> ShiftLeftLogicalSaturateUnsigned(Vector64<int> value, byte count)
Vector64<byte> ShiftLeftLogicalSaturateUnsigned(Vector64<sbyte> value, byte count)
Vector128<ushort> ShiftLeftLogicalSaturateUnsigned(Vector128<short> value, byte count)
Vector128<uint> ShiftLeftLogicalSaturateUnsigned(Vector128<int> value, byte count)
Vector128<ulong> ShiftLeftLogicalSaturateUnsigned(Vector128<long> value, byte count)
Vector128<byte> ShiftLeftLogicalSaturateUnsigned(Vector128<sbyte> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalSaturateUnsignedTest(System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqshlu  v16.4h, v0.4h, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

8. ShiftLeftLogicalSaturateUnsignedScalar

Vector64<ulong> ShiftLeftLogicalSaturateUnsignedScalar(Vector64<long> value, byte count)

This method shifts signed integer value in the value vector, by count, saturates the shifted result to an unsigned integer value, stores the results in a vector and returns the result vector. The results are truncated.

private Vector64<ulong> ShiftLeftLogicalSaturateUnsignedScalarTest(Vector64<long> value, byte count)
{
  return AdvSimd.ShiftLeftLogicalSaturateUnsignedScalar(value, 0);
}
// value = <11>
// count = 0
// Result = <11>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<ushort> ShiftLeftLogicalSaturateUnsignedScalar(Vector64<short> value, byte count)
Vector64<uint> ShiftLeftLogicalSaturateUnsignedScalar(Vector64<int> value, byte count)
Vector64<byte> ShiftLeftLogicalSaturateUnsignedScalar(Vector64<sbyte> value, byte count)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalSaturateUnsignedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[UInt64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqshlu  d16, d0, #0
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

9. ShiftLeftLogicalScalar

Vector64<long> ShiftLeftLogicalScalar(Vector64<long> value, byte count)

This method left shifts each value in value vector, by count, stores the results in a vector and returns the result vector.

private Vector64<long> ShiftLeftLogicalScalarTest(Vector64<long> value, byte count)
{
  return AdvSimd.ShiftLeftLogicalScalar(value, 1);
}
// value = <971324>
// count = 1
// Result = <1942648>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLeftLogicalScalar(Vector64<ulong> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            shl     d16, d0, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

10. ShiftLeftLogicalWideningLower

Vector128<ushort> ShiftLeftLogicalWideningLower(Vector64<byte> value, byte count)

This method left shifts each vector element in the value vector, by the specified number of bits in count, stores the results in a vector and returns the result vector. As seen in below example, the result vector element’s size ushort is twice as long as the input vector element’s size byte.

private Vector128<ushort> ShiftLeftLogicalWideningLowerTest(Vector64<byte> value, byte count)
{
  return AdvSimd.ShiftLeftLogicalWideningLower(value, 0);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 0
// Result = <11, 12, 13, 14, 15, 16, 17, 18>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> ShiftLeftLogicalWideningLower(Vector64<short> value, byte count)
Vector128<long> ShiftLeftLogicalWideningLower(Vector64<int> value, byte count)
Vector128<short> ShiftLeftLogicalWideningLower(Vector64<sbyte> value, byte count)
Vector128<uint> ShiftLeftLogicalWideningLower(Vector64<ushort> value, byte count)
Vector128<ulong> ShiftLeftLogicalWideningLower(Vector64<uint> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalWideningLowerTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ushll   v16.8h, v0.8b, #0
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

11. ShiftLeftLogicalWideningUpper

Vector128<ushort> ShiftLeftLogicalWideningUpper(Vector128<byte> value, byte count)

This method shifts each vector element in the upper-half of value vector, by the specified number of bits in count, stores the results in a vector and returns the result vector. As seen in below example, the result vector element’s size ushort is twice as long as the input vector element’s size byte.

private Vector128<ushort> ShiftLeftLogicalWideningUpperTest(Vector128<byte> value, byte count)
{
  return AdvSimd.ShiftLeftLogicalWideningUpper(value, 1);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// count = 1
// Result = <38, 40, 42, 44, 46, 48, 50, 52>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> ShiftLeftLogicalWideningUpper(Vector128<short> value, byte count)
Vector128<long> ShiftLeftLogicalWideningUpper(Vector128<int> value, byte count)
Vector128<short> ShiftLeftLogicalWideningUpper(Vector128<sbyte> value, byte count)
Vector128<uint> ShiftLeftLogicalWideningUpper(Vector128<ushort> value, byte count)
Vector128<ulong> ShiftLeftLogicalWideningUpper(Vector128<uint> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalWideningUpperTest(System.Runtime.Intrinsics.Vector128`1[Byte],ubyte):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ushll2  v16.8h, v0.16b, #1
            mov     v0.16b, v16.16b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

12. ShiftLogical

Vector64<byte> ShiftLogical(Vector64<byte> value, Vector64<sbyte> count)

This method shifts each element in the value vector, by the corresponding element of the count vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift.

private Vector64<byte> ShiftLogicalTest(Vector64<byte> value, Vector64<sbyte> count)
{
  return AdvSimd.ShiftLogical(value, count);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = <-3, 2, 3, 5, 6, 7, -7, 0>
// Result = <1, 48, 104, 192, 192, 0, 0, 18>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLogical(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftLogical(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftLogical(Vector64<sbyte> value, Vector64<sbyte> count)
Vector64<ushort> ShiftLogical(Vector64<ushort> value, Vector64<short> count)
Vector64<uint> ShiftLogical(Vector64<uint> value, Vector64<int> count)
Vector128<byte> ShiftLogical(Vector128<byte> value, Vector128<sbyte> count)
Vector128<short> ShiftLogical(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftLogical(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftLogical(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftLogical(Vector128<sbyte> value, Vector128<sbyte> count)
Vector128<ushort> ShiftLogical(Vector128<ushort> value, Vector128<short> count)
Vector128<uint> ShiftLogical(Vector128<uint> value, Vector128<int> count)
Vector128<ulong> ShiftLogical(Vector128<ulong> value, Vector128<long> count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLogicalTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ushl    v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

13. ShiftLogicalRounded

Vector64<byte> ShiftLogicalRounded(Vector64<byte> value, Vector64<sbyte> count)

This method shifts each element in the value vector , by the corresponding element of the count vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift.

private Vector64<byte> ShiftLogicalRoundedTest(Vector64<byte> value, Vector64<sbyte> count)
{
  return AdvSimd.ShiftLogicalRounded(value, count);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = <-3, 2, 3, 5, 6, 7, -7, 0>
// Result = <1, 48, 104, 192, 192, 0, 0, 18>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLogicalRounded(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftLogicalRounded(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftLogicalRounded(Vector64<sbyte> value, Vector64<sbyte> count)
Vector64<ushort> ShiftLogicalRounded(Vector64<ushort> value, Vector64<short> count)
Vector64<uint> ShiftLogicalRounded(Vector64<uint> value, Vector64<int> count)
Vector128<byte> ShiftLogicalRounded(Vector128<byte> value, Vector128<sbyte> count)
Vector128<short> ShiftLogicalRounded(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftLogicalRounded(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftLogicalRounded(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftLogicalRounded(Vector128<sbyte> value, Vector128<sbyte> count)
Vector128<ushort> ShiftLogicalRounded(Vector128<ushort> value, Vector128<short> count)
Vector128<uint> ShiftLogicalRounded(Vector128<uint> value, Vector128<int> count)
Vector128<ulong> ShiftLogicalRounded(Vector128<ulong> value, Vector128<long> count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLogicalRoundedTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            urshl   v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

14. ShiftLogicalRoundedSaturate

Vector64<byte> ShiftLogicalRoundedSaturate(Vector64<byte> value, Vector64<sbyte> count)

This method shifts each vector element of the value vector, by the corresponding vector element of the count vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. If overflow occurs with any of the results, those results are saturated.

private Vector64<byte> ShiftLogicalRoundedSaturateTest(Vector64<byte> value, Vector64<sbyte> count)
{
  return AdvSimd.ShiftLogicalRoundedSaturate(value, count);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <255, 255, 255, 255, 255, 255, 255, 255>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLogicalRoundedSaturate(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftLogicalRoundedSaturate(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftLogicalRoundedSaturate(Vector64<sbyte> value, Vector64<sbyte> count)
Vector64<ushort> ShiftLogicalRoundedSaturate(Vector64<ushort> value, Vector64<short> count)
Vector64<uint> ShiftLogicalRoundedSaturate(Vector64<uint> value, Vector64<int> count)
Vector128<byte> ShiftLogicalRoundedSaturate(Vector128<byte> value, Vector128<sbyte> count)
Vector128<short> ShiftLogicalRoundedSaturate(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftLogicalRoundedSaturate(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftLogicalRoundedSaturate(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftLogicalRoundedSaturate(Vector128<sbyte> value, Vector128<sbyte> count)
Vector128<ushort> ShiftLogicalRoundedSaturate(Vector128<ushort> value, Vector128<short> count)
Vector128<uint> ShiftLogicalRoundedSaturate(Vector128<uint> value, Vector128<int> count)
Vector128<ulong> ShiftLogicalRoundedSaturate(Vector128<ulong> value, Vector128<long> count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLogicalRoundedSaturateTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uqrshl  v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

15. ShiftLogicalRoundedSaturateScalar

Vector64<long> ShiftLogicalRoundedSaturateScalar(Vector64<long> value, Vector64<long> count)

private Vector64<long> ShiftLogicalRoundedSaturateScalarTest(Vector64<long> value, Vector64<long> count)
{
  return AdvSimd.ShiftLogicalRoundedSaturateScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLogicalRoundedSaturateScalar(Vector64<ulong> value, Vector64<long> count)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<byte> ShiftLogicalRoundedSaturateScalar(Vector64<byte> value, Vector64<sbyte> count)
Vector64<short> ShiftLogicalRoundedSaturateScalar(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftLogicalRoundedSaturateScalar(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftLogicalRoundedSaturateScalar(Vector64<sbyte> value, Vector64<sbyte> count)
Vector64<ushort> ShiftLogicalRoundedSaturateScalar(Vector64<ushort> value, Vector64<short> count)
Vector64<uint> ShiftLogicalRoundedSaturateScalar(Vector64<uint> value, Vector64<int> count)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLogicalRoundedSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uqrshl  d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

16. ShiftLogicalRoundedScalar

Vector64<long> ShiftLogicalRoundedScalar(Vector64<long> value, Vector64<long> count)

private Vector64<long> ShiftLogicalRoundedScalarTest(Vector64<long> value, Vector64<long> count)
{
  return AdvSimd.ShiftLogicalRoundedScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLogicalRoundedScalar(Vector64<ulong> value, Vector64<long> count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLogicalRoundedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            urshl   d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

17. ShiftLogicalSaturate

Vector64<byte> ShiftLogicalSaturate(Vector64<byte> value, Vector64<sbyte> count)

This method shifts each element in the value vector, by the corresponding element of the count vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. If overflow occurs with any of the results, those results are saturated.

private Vector64<byte> ShiftLogicalSaturateTest(Vector64<byte> value, Vector64<sbyte> count)
{
  return AdvSimd.ShiftLogicalSaturate(value, count);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = <-3, 2, 3, 5, 6, 7, -8, 0>
// Result = <1, 48, 104, 255, 255, 255, 0, 18>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLogicalSaturate(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftLogicalSaturate(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftLogicalSaturate(Vector64<sbyte> value, Vector64<sbyte> count)
Vector64<ushort> ShiftLogicalSaturate(Vector64<ushort> value, Vector64<short> count)
Vector64<uint> ShiftLogicalSaturate(Vector64<uint> value, Vector64<int> count)
Vector128<byte> ShiftLogicalSaturate(Vector128<byte> value, Vector128<sbyte> count)
Vector128<short> ShiftLogicalSaturate(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftLogicalSaturate(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftLogicalSaturate(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftLogicalSaturate(Vector128<sbyte> value, Vector128<sbyte> count)
Vector128<ushort> ShiftLogicalSaturate(Vector128<ushort> value, Vector128<short> count)
Vector128<uint> ShiftLogicalSaturate(Vector128<uint> value, Vector128<int> count)
Vector128<ulong> ShiftLogicalSaturate(Vector128<ulong> value, Vector128<long> count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLogicalSaturateTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uqshl   v16.8b, v0.8b, v1.8b
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

18. ShiftLogicalSaturateScalar

Vector64<long> ShiftLogicalSaturateScalar(Vector64<long> value, Vector64<long> count)

This method shifts 0th element in the value vector, by the corresponding element of the count vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. If overflow occurs with any of the results, those results are saturated.

private Vector64<long> ShiftLogicalSaturateScalarTest(Vector64<long> value, Vector64<long> count)
{
  return AdvSimd.ShiftLogicalSaturateScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLogicalSaturateScalar(Vector64<ulong> value, Vector64<long> count)

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<byte> ShiftLogicalSaturateScalar(Vector64<byte> value, Vector64<sbyte> count)
Vector64<short> ShiftLogicalSaturateScalar(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftLogicalSaturateScalar(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftLogicalSaturateScalar(Vector64<sbyte> value, Vector64<sbyte> count)
Vector64<ushort> ShiftLogicalSaturateScalar(Vector64<ushort> value, Vector64<short> count)
Vector64<uint> ShiftLogicalSaturateScalar(Vector64<uint> value, Vector64<int> count)

See Microsoft docs here and here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLogicalSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            uqshl   d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

19. ShiftLogicalScalar

Vector64<long> ShiftLogicalScalar(Vector64<long> value, Vector64<long> count)

private Vector64<long> ShiftLogicalScalarTest(Vector64<long> value, Vector64<long> count)
{
  return AdvSimd.ShiftLogicalScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLogicalScalar(Vector64<ulong> value, Vector64<long> count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftLogicalScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ushl    d16, d0, d1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

20. ShiftRightAndInsert

Vector64<byte> ShiftRightAndInsert(Vector64<byte> left, Vector64<byte> right, byte shift)

This method right shifts each vector element in the right vector, by shift value, and inserts the result into the corresponding vector element in the left vector such that the new zero bits created by the shift are not inserted but retain their existing value as in left vector. Bits shifted out of the left of each vector element in the right are lost.

private Vector64<byte> ShiftRightAndInsertTest(Vector64<byte> left, Vector64<byte> right, byte shift)
{
  return AdvSimd.ShiftRightAndInsert(left, right, 1);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// shift = 1
// Result = <10, 11, 11, 12, 12, 13, 13, 14>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftRightAndInsert(Vector64<short> left, Vector64<short> right, byte shift)
Vector64<int> ShiftRightAndInsert(Vector64<int> left, Vector64<int> right, byte shift)
Vector64<sbyte> ShiftRightAndInsert(Vector64<sbyte> left, Vector64<sbyte> right, byte shift)
Vector64<ushort> ShiftRightAndInsert(Vector64<ushort> left, Vector64<ushort> right, byte shift)
Vector64<uint> ShiftRightAndInsert(Vector64<uint> left, Vector64<uint> right, byte shift)
Vector128<byte> ShiftRightAndInsert(Vector128<byte> left, Vector128<byte> right, byte shift)
Vector128<short> ShiftRightAndInsert(Vector128<short> left, Vector128<short> right, byte shift)
Vector128<int> ShiftRightAndInsert(Vector128<int> left, Vector128<int> right, byte shift)
Vector128<long> ShiftRightAndInsert(Vector128<long> left, Vector128<long> right, byte shift)
Vector128<sbyte> ShiftRightAndInsert(Vector128<sbyte> left, Vector128<sbyte> right, byte shift)
Vector128<ushort> ShiftRightAndInsert(Vector128<ushort> left, Vector128<ushort> right, byte shift)
Vector128<uint> ShiftRightAndInsert(Vector128<uint> left, Vector128<uint> right, byte shift)
Vector128<ulong> ShiftRightAndInsert(Vector128<ulong> left, Vector128<ulong> right, byte shift)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightAndInsertTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sri     v0.8b, v1.8b, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

21. ShiftRightAndInsertScalar

Vector64<long> ShiftRightAndInsertScalar(Vector64<long> left, Vector64<long> right, byte shift)

private Vector64<long> ShiftRightAndInsertScalarTest(Vector64<long> left, Vector64<long> right, byte shift)
{
  return AdvSimd.ShiftRightAndInsertScalar(left, right, 1);
}
// left = <11>
// right = <11>
// shift = 1
// Result = <5>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftRightAndInsertScalar(Vector64<ulong> left, Vector64<ulong> right, byte shift)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightAndInsertScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sri     d0, d1, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

22. ShiftRightArithmetic

Vector64<short> ShiftRightArithmetic(Vector64<short> value, byte count)

This method right shifts each element in the value vector by count, stores the truncated results in a vector and returns the result vector. All the values in this method are signed integer values.

private Vector64<short> ShiftRightArithmeticTest(Vector64<short> value, byte count)
{
  return AdvSimd.ShiftRightArithmetic(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <5, 6, 6, 7>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftRightArithmetic(Vector64<int> value, byte count)
Vector64<sbyte> ShiftRightArithmetic(Vector64<sbyte> value, byte count)
Vector128<short> ShiftRightArithmetic(Vector128<short> value, byte count)
Vector128<int> ShiftRightArithmetic(Vector128<int> value, byte count)
Vector128<long> ShiftRightArithmetic(Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightArithmetic(Vector128<sbyte> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticTest(System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sshr    v16.4h, v0.4h, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

23. ShiftRightArithmeticAdd

Vector64<short> ShiftRightArithmeticAdd(Vector64<short> addend, Vector64<short> value, byte count)

This method right shifts each element in the value vector, by a count, and accumulates the final results with the vector elements of the addend vector and return the accumulated vector. All the values in this method are signed integer values. All results are truncated.

private Vector64<short> ShiftRightArithmeticAddTest(Vector64<short> addend, Vector64<short> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticAdd(addend, value, 1);
}
// addend = <11, 12, 13, 14>
// value = <21, 22, 23, 24>
// count = 1
// Result = <21, 23, 24, 26>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftRightArithmeticAdd(Vector64<int> addend, Vector64<int> value, byte count)
Vector64<sbyte> ShiftRightArithmeticAdd(Vector64<sbyte> addend, Vector64<sbyte> value, byte count)
Vector128<short> ShiftRightArithmeticAdd(Vector128<short> addend, Vector128<short> value, byte count)
Vector128<int> ShiftRightArithmeticAdd(Vector128<int> addend, Vector128<int> value, byte count)
Vector128<long> ShiftRightArithmeticAdd(Vector128<long> addend, Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightArithmeticAdd(Vector128<sbyte> addend, Vector128<sbyte> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticAddTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ssra    v0.4h, v1.4h, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

24. ShiftRightArithmeticAddScalar

Vector64<long> ShiftRightArithmeticAddScalar(Vector64<long> addend, Vector64<long> value, byte count)

private Vector64<long> ShiftRightArithmeticAddScalarTest(Vector64<long> addend, Vector64<long> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticAddScalar(addend, value, 1);
}
// addend = <11>
// value = <11>
// count = 1
// Result = <16>

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticAddScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ssra    d0, d1, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

25. ShiftRightArithmeticNarrowingSaturateLower

Vector64<short> ShiftRightArithmeticNarrowingSaturateLower(Vector128<int> value, byte count)

This method right shifts and truncates each vector element in thevalue vector, by count, saturates each shifted result to a value that is half the original width, stores the final result into a vector, and writes the vector to the result vector. All the values in this method are signed integer values. As seen in below example, the result vector element’s size short is half as long as the source vector element’s size int.

private Vector64<short> ShiftRightArithmeticNarrowingSaturateLowerTest(Vector128<int> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticNarrowingSaturateLower(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <5, 6, 6, 7>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftRightArithmeticNarrowingSaturateLower(Vector128<long> value, byte count)
Vector64<sbyte> ShiftRightArithmeticNarrowingSaturateLower(Vector128<short> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticNarrowingSaturateLowerTest(System.Runtime.Intrinsics.Vector128`1[Int32],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqshrn  v16.4h, v0.4s, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

26. ShiftRightArithmeticNarrowingSaturateScalar

Vector64<short> ShiftRightArithmeticNarrowingSaturateScalar(Vector64<int> value, byte count)

This method right shifts and truncates 0th vector element in thevalue vector, by count, saturates each shifted result to a value that is half the original width, stores the final result into a vector, and writes the vector to the 0th element of result vector, other elements being set to 0. All the values in this method are signed integer values. As seen in below example, the result vector element’s size short is half as long as the source vector element’s size int.

private Vector64<short> ShiftRightArithmeticNarrowingSaturateScalarTest(Vector64<int> value, byte count)
{
  return AdvSimd.Arm64.ShiftRightArithmeticNarrowingSaturateScalar(value, 1);
}
// value = <11, 12>
// count = 1
// Result = <5, 0, 0, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> ShiftRightArithmeticNarrowingSaturateScalar(Vector64<long> value, byte count)
Vector64<sbyte> ShiftRightArithmeticNarrowingSaturateScalar(Vector64<short> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticNarrowingSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int32],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqshrn  h16, s0, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

27. ShiftRightArithmeticNarrowingSaturateUnsignedLower

Vector64<byte> ShiftRightArithmeticNarrowingSaturateUnsignedLower(Vector128<short> value, byte count)

This method right shifts each signed integer value in the value vector, by count, saturates the result to an unsigned integer value that is half the original width, stores the results in a vector and returns the result vector. The results are truncated.

private Vector64<byte> ShiftRightArithmeticNarrowingSaturateUnsignedLowerTest(Vector128<short> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticNarrowingSaturateUnsignedLower(value, 1);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <5, 6, 6, 7, 7, 8, 8, 9>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<ushort> ShiftRightArithmeticNarrowingSaturateUnsignedLower(Vector128<int> value, byte count)
Vector64<uint> ShiftRightArithmeticNarrowingSaturateUnsignedLower(Vector128<long> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticNarrowingSaturateUnsignedLowerTest(System.Runtime.Intrinsics.Vector128`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqshrun v16.8b, v0.8h, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

28. ShiftRightArithmeticNarrowingSaturateUnsignedScalar

Vector64<byte> ShiftRightArithmeticNarrowingSaturateUnsignedScalar(Vector64<short> value, byte count)

This method right shifts signed integer value in the value vector at 0th index, by count, saturates the result to an unsigned integer value that is half the original width, stores the results in a vector and returns the result vector, other elements being set to 0. The results are truncated.

private Vector64<byte> ShiftRightArithmeticNarrowingSaturateUnsignedScalarTest(Vector64<short> value, byte count)
{
  return AdvSimd.Arm64.ShiftRightArithmeticNarrowingSaturateUnsignedScalar(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <5, 0, 0, 0, 0, 0, 0, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<ushort> ShiftRightArithmeticNarrowingSaturateUnsignedScalar(Vector64<int> value, byte count)
Vector64<uint> ShiftRightArithmeticNarrowingSaturateUnsignedScalar(Vector64<long> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticNarrowingSaturateUnsignedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqshrun b16, h0, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

29. ShiftRightArithmeticNarrowingSaturateUnsignedUpper

Vector128<byte> ShiftRightArithmeticNarrowingSaturateUnsignedUpper(Vector64<byte> lower, Vector128<short> value, byte count)

This method right shifts each signed integer value in the upper-half of value vector, by count, saturates the result to an unsigned integer value that is half the original width, stores the final result into a vector, and writes the vector to the upper-half of result vector, the lower-half contains values from lower vector. The results are truncated.

private Vector128<byte> ShiftRightArithmeticNarrowingSaturateUnsignedUpperTest(Vector64<byte> lower, Vector128<short> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticNarrowingSaturateUnsignedUpper(lower, value, 1);
}
// lower = <11, 12, 13, 14, 15, 16, 17, 18>
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <11, 12, 13, 14, 15, 16, 17, 18, 5, 6, 6, 7, 7, 8, 8, 9>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<ushort> ShiftRightArithmeticNarrowingSaturateUnsignedUpper(Vector64<ushort> lower, Vector128<int> value, byte count)
Vector128<uint> ShiftRightArithmeticNarrowingSaturateUnsignedUpper(Vector64<uint> lower, Vector128<long> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticNarrowingSaturateUnsignedUpperTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector128`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqshrun2 v0.16b, v1.8h, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

30. ShiftRightArithmeticNarrowingSaturateUpper

Vector128<short> ShiftRightArithmeticNarrowingSaturateUpper(Vector64<short> lower, Vector128<int> value, byte count)

This method right shifts each vector element in the upper-half of value vector, by count, saturates each shifted result to a value that is half the original width, stores the final result into a vector, and writes the vector to the upper-half of result vector while lower-half contains lower vector values. All the values in this method are signed integer values. All results are truncated.

private Vector128<short> ShiftRightArithmeticNarrowingSaturateUpperTest(Vector64<short> lower, Vector128<int> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticNarrowingSaturateUpper(lower, value, 1);
}
// lower = <11, 12, 13, 14>
// value = <11, 12, 13, 14>
// count = 1
// Result = <11, 12, 13, 14, 5, 6, 6, 7>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> ShiftRightArithmeticNarrowingSaturateUpper(Vector64<int> lower, Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightArithmeticNarrowingSaturateUpper(Vector64<sbyte> lower, Vector128<short> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticNarrowingSaturateUpperTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector128`1[Int32],ubyte):System.Runtime.Intrinsics.Vector128`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqshrn2 v0.8h, v1.4s, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

31. ShiftRightArithmeticRounded

Vector64<short> ShiftRightArithmeticRounded(Vector64<short> value, byte count)

This method right shifts each element in the value vector, by count and then rounded, stores the results in a vector and returns the result vector. All the values in this method are signed integer values.

private Vector64<short> ShiftRightArithmeticRoundedTest(Vector64<short> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticRounded(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <6, 6, 7, 7>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftRightArithmeticRounded(Vector64<int> value, byte count)
Vector64<sbyte> ShiftRightArithmeticRounded(Vector64<sbyte> value, byte count)
Vector128<short> ShiftRightArithmeticRounded(Vector128<short> value, byte count)
Vector128<int> ShiftRightArithmeticRounded(Vector128<int> value, byte count)
Vector128<long> ShiftRightArithmeticRounded(Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightArithmeticRounded(Vector128<sbyte> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedTest(System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            srshr   v16.4h, v0.4h, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

32. ShiftRightArithmeticRoundedAdd

Vector64<short> ShiftRightArithmeticRoundedAdd(Vector64<short> addend, Vector64<short> value, byte count)

private Vector64<short> ShiftRightArithmeticRoundedAddTest(Vector64<short> addend, Vector64<short> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticRoundedAdd(addend, value, 1);
}
// addend = <11, 12, 13, 14>
// value = <21, 22, 23, 24>
// count = 1
// Result = <22, 23, 25, 26>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftRightArithmeticRoundedAdd(Vector64<int> addend, Vector64<int> value, byte count)
Vector64<sbyte> ShiftRightArithmeticRoundedAdd(Vector64<sbyte> addend, Vector64<sbyte> value, byte count)
Vector128<short> ShiftRightArithmeticRoundedAdd(Vector128<short> addend, Vector128<short> value, byte count)
Vector128<int> ShiftRightArithmeticRoundedAdd(Vector128<int> addend, Vector128<int> value, byte count)
Vector128<long> ShiftRightArithmeticRoundedAdd(Vector128<long> addend, Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightArithmeticRoundedAdd(Vector128<sbyte> addend, Vector128<sbyte> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedAddTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            srsra   v0.4h, v1.4h, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

33. ShiftRightArithmeticRoundedAddScalar

Vector64<long> ShiftRightArithmeticRoundedAddScalar(Vector64<long> addend, Vector64<long> value, byte count)

private Vector64<long> ShiftRightArithmeticRoundedAddScalarTest(Vector64<long> addend, Vector64<long> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticRoundedAddScalar(addend, value, 1);
}
// addend = <11>
// value = <11>
// count = 1
// Result = <17>

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedAddScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            srsra   d0, d1, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

34. ShiftRightArithmeticRoundedNarrowingSaturateLower

Vector64<short> ShiftRightArithmeticRoundedNarrowingSaturateLower(Vector128<int> value, byte count)

This method right shifts each vector element in the value vector, by count, saturates each shifted result to a value that is half the original width, stores the final result into a vector, and writes the vector to the result vector. All the values in this method are signed integer values. As seen in below example, the result vector element’s size short is half as long as the source vector element’s size int. The results are rounded.

private Vector64<short> ShiftRightArithmeticRoundedNarrowingSaturateLowerTest(Vector128<int> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticRoundedNarrowingSaturateLower(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <6, 6, 7, 7>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftRightArithmeticRoundedNarrowingSaturateLower(Vector128<long> value, byte count)
Vector64<sbyte> ShiftRightArithmeticRoundedNarrowingSaturateLower(Vector128<short> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedNarrowingSaturateLowerTest(System.Runtime.Intrinsics.Vector128`1[Int32],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqrshrn v16.4h, v0.4s, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

35. ShiftRightArithmeticRoundedNarrowingSaturateScalar

Vector64<short> ShiftRightArithmeticRoundedNarrowingSaturateScalar(Vector64<int> value, byte count)

This method right shifts 0th element in the value vector, by count, saturates each shifted result to a value that is half the original width, stores the final result into 0th element of vector, and writes the vector to the result vector, other elements being set to 0. All the values in this method are signed integer values. As seen in below example, the result vector element’s size short is half as long as the source vector element’s size int. The results are rounded.

private Vector64<short> ShiftRightArithmeticRoundedNarrowingSaturateScalarTest(Vector64<int> value, byte count)
{
  return AdvSimd.Arm64.ShiftRightArithmeticRoundedNarrowingSaturateScalar(value, 1);
}
// value = <11, 12>
// count = 1
// Result = <6, 0, 0, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> ShiftRightArithmeticRoundedNarrowingSaturateScalar(Vector64<long> value, byte count)
Vector64<sbyte> ShiftRightArithmeticRoundedNarrowingSaturateScalar(Vector64<short> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedNarrowingSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int32],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqrshrn h16, s0, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

36. ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLower

Vector64<byte> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLower(Vector128<short> value, byte count)

private Vector64<byte> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLowerTest(Vector128<short> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLower(value, 1);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <6, 6, 7, 7, 8, 8, 9, 9>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<ushort> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLower(Vector128<int> value, byte count)
Vector64<uint> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLower(Vector128<long> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLowerTest(System.Runtime.Intrinsics.Vector128`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqrshrun v16.8b, v0.8h, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

37. ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalar

Vector64<byte> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalar(Vector64<short> value, byte count)

private Vector64<byte> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalarTest(Vector64<short> value, byte count)
{
  return AdvSimd.Arm64.ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalar(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <6, 0, 0, 0, 0, 0, 0, 0>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<ushort> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalar(Vector64<int> value, byte count)
Vector64<uint> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalar(Vector64<long> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqrshrun b16, h0, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

38. ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpper

Vector128<byte> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpper(Vector64<byte> lower, Vector128<short> value, byte count)

private Vector128<byte> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpperTest(Vector64<byte> lower, Vector128<short> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpper(lower, value, 1);
}
// lower = <11, 12, 13, 14, 15, 16, 17, 18>
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <11, 12, 13, 14, 15, 16, 17, 18, 6, 6, 7, 7, 8, 8, 9, 9>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<ushort> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpper(Vector64<ushort> lower, Vector128<int> value, byte count)
Vector128<uint> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpper(Vector64<uint> lower, Vector128<long> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpperTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector128`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqrshrun2 v0.16b, v1.8h, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

39. ShiftRightArithmeticRoundedNarrowingSaturateUpper

Vector128<short> ShiftRightArithmeticRoundedNarrowingSaturateUpper(Vector64<short> lower, Vector128<int> value, byte count)

private Vector128<short> ShiftRightArithmeticRoundedNarrowingSaturateUpperTest(Vector64<short> lower, Vector128<int> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticRoundedNarrowingSaturateUpper(lower, value, 1);
}
// lower = <11, 12, 13, 14>
// value = <11, 12, 13, 14>
// count = 1
// Result = <11, 12, 13, 14, 6, 6, 7, 7>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> ShiftRightArithmeticRoundedNarrowingSaturateUpper(Vector64<int> lower, Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightArithmeticRoundedNarrowingSaturateUpper(Vector64<sbyte> lower, Vector128<short> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedNarrowingSaturateUpperTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector128`1[Int32],ubyte):System.Runtime.Intrinsics.Vector128`1[Int16]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )  simd16  ->   d1         HFA(simd16) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sqrshrn2 v0.8h, v1.4s, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

40. ShiftRightArithmeticRoundedScalar

Vector64<long> ShiftRightArithmeticRoundedScalar(Vector64<long> value, byte count)

private Vector64<long> ShiftRightArithmeticRoundedScalarTest(Vector64<long> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticRoundedScalar(value, 1);
}
// value = <11>
// count = 1
// Result = <6>

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            srshr   d16, d0, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

41. ShiftRightArithmeticScalar

Vector64<long> ShiftRightArithmeticScalar(Vector64<long> value, byte count)

private Vector64<long> ShiftRightArithmeticScalarTest(Vector64<long> value, byte count)
{
  return AdvSimd.ShiftRightArithmeticScalar(value, 1);
}
// value = <11>
// count = 1
// Result = <5>

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            sshr    d16, d0, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

42. ShiftRightLogical

Vector64<byte> ShiftRightLogical(Vector64<byte> value, byte count)

This method right shifts each element in the value vector, by count, stores the results in a vector and returns the result vector. The results are truncated.

private Vector64<byte> ShiftRightLogicalTest(Vector64<byte> value, byte count)
{
  return AdvSimd.ShiftRightLogical(value, 1);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <5, 6, 6, 7, 7, 8, 8, 9>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftRightLogical(Vector64<short> value, byte count)
Vector64<int> ShiftRightLogical(Vector64<int> value, byte count)
Vector64<sbyte> ShiftRightLogical(Vector64<sbyte> value, byte count)
Vector64<ushort> ShiftRightLogical(Vector64<ushort> value, byte count)
Vector64<uint> ShiftRightLogical(Vector64<uint> value, byte count)
Vector128<byte> ShiftRightLogical(Vector128<byte> value, byte count)
Vector128<short> ShiftRightLogical(Vector128<short> value, byte count)
Vector128<int> ShiftRightLogical(Vector128<int> value, byte count)
Vector128<long> ShiftRightLogical(Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightLogical(Vector128<sbyte> value, byte count)
Vector128<ushort> ShiftRightLogical(Vector128<ushort> value, byte count)
Vector128<uint> ShiftRightLogical(Vector128<uint> value, byte count)
Vector128<ulong> ShiftRightLogical(Vector128<ulong> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightLogicalTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            ushr    v16.8b, v0.8b, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

43. ShiftRightLogicalAdd

Vector64<byte> ShiftRightLogicalAdd(Vector64<byte> addend, Vector64<byte> value, byte count)

This method right shifts each element in the value vector, by count, and accumulates the final results with the vector elements of the addend vector and return the result vector. The results are truncated.

private Vector64<byte> ShiftRightLogicalAddTest(Vector64<byte> addend, Vector64<byte> value, byte count)
{
  return AdvSimd.ShiftRightLogicalAdd(addend, value, 1);
}
// addend = <11, 12, 13, 14, 15, 16, 17, 18>
// value = <21, 22, 23, 24, 25, 26, 27, 28>
// count = 1
// Result = <21, 23, 24, 26, 27, 29, 30, 32>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftRightLogicalAdd(Vector64<short> addend, Vector64<short> value, byte count)
Vector64<int> ShiftRightLogicalAdd(Vector64<int> addend, Vector64<int> value, byte count)
Vector64<sbyte> ShiftRightLogicalAdd(Vector64<sbyte> addend, Vector64<sbyte> value, byte count)
Vector64<ushort> ShiftRightLogicalAdd(Vector64<ushort> addend, Vector64<ushort> value, byte count)
Vector64<uint> ShiftRightLogicalAdd(Vector64<uint> addend, Vector64<uint> value, byte count)
Vector128<byte> ShiftRightLogicalAdd(Vector128<byte> addend, Vector128<byte> value, byte count)
Vector128<short> ShiftRightLogicalAdd(Vector128<short> addend, Vector128<short> value, byte count)
Vector128<int> ShiftRightLogicalAdd(Vector128<int> addend, Vector128<int> value, byte count)
Vector128<long> ShiftRightLogicalAdd(Vector128<long> addend, Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightLogicalAdd(Vector128<sbyte> addend, Vector128<sbyte> value, byte count)
Vector128<ushort> ShiftRightLogicalAdd(Vector128<ushort> addend, Vector128<ushort> value, byte count)
Vector128<uint> ShiftRightLogicalAdd(Vector128<uint> addend, Vector128<uint> value, byte count)
Vector128<ulong> ShiftRightLogicalAdd(Vector128<ulong> addend, Vector128<ulong> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightLogicalAddTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            usra    v0.8b, v1.8b, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

44. ShiftRightLogicalAddScalar

Vector64<long> ShiftRightLogicalAddScalar(Vector64<long> addend, Vector64<long> value, byte count)

private Vector64<long> ShiftRightLogicalAddScalarTest(Vector64<long> addend, Vector64<long> value, byte count)
{
  return AdvSimd.ShiftRightLogicalAddScalar(addend, value, 1);
}
// addend = <11>
// value = <11>
// count = 1
// Result = <16>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftRightLogicalAddScalar(Vector64<ulong> addend, Vector64<ulong> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightLogicalAddScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
;  V00 arg0         [V00,T00] (  3,  3   )   simd8  ->   d0         HFA(simd8) 
;  V01 arg1         [V01,T01] (  3,  3   )   simd8  ->   d1         HFA(simd8) 
;* V02 arg2         [V02    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V03 OutArgs      [V03    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            usra    d0, d1, #1
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 20, prolog size 8

45. ShiftRightLogicalNarrowingLower

Vector64<byte> ShiftRightLogicalNarrowingLower(Vector128<ushort> value, byte count)

This method right shifts each integer value in the value vector, by count, stoes the final result into a vector, and writes the vector to the result vector. As seen in below example, the result vector element’s size byte is half as long as the source vector element’s size ushort. The results are truncated.

private Vector64<byte> ShiftRightLogicalNarrowingLowerTest(Vector128<ushort> value, byte count)
{
  return AdvSimd.ShiftRightLogicalNarrowingLower(value, 1);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <5, 6, 6, 7, 7, 8, 8, 9>

Similar APIs that operate on different sizes:

// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftRightLogicalNarrowingLower(Vector128<int> value, byte count)
Vector64<int> ShiftRightLogicalNarrowingLower(Vector128<long> value, byte count)
Vector64<sbyte> ShiftRightLogicalNarrowingLower(Vector128<short> value, byte count)
Vector64<ushort> ShiftRightLogicalNarrowingLower(Vector128<uint> value, byte count)
Vector64<uint> ShiftRightLogicalNarrowingLower(Vector128<ulong> value, byte count)

See Microsoft docs here, ARM docs here.

Assembly generated:

; Assembly listing for method AdvSimdMethods:ShiftRightLogicalNarrowingLowerTest(System.Runtime.Intrinsics.Vector128`1[UInt16],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
;  V00 arg0         [V00,T00] (  3,  3   )  simd16  ->   d0         HFA(simd16) 
;* V01 arg1         [V01    ] (  0,  0   )   ubyte  ->  zero-ref   
;# V02 OutArgs      [V02    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
; Lcl frame size = 0
            stp     fp, lr, [sp,#-16]!
            mov     fp, sp
            shrn    v16.8b, v0.8h, #1
            mov     v0.8b, v16.8b
            ldp     fp, lr, [sp],#16
            ret     lr

; Total bytes of code 24, prolog size 8

Introduction

APIs covered

1. ShiftArithmeticScalar

2. ShiftLeftAndInsert

3. ShiftLeftAndInsertScalar

4. ShiftLeftLogical

5. ShiftLeftLogicalSaturate

6. ShiftLeftLogicalSaturateScalar

7. ShiftLeftLogicalSaturateUnsigned

8. ShiftLeftLogicalSaturateUnsignedScalar

9. ShiftLeftLogicalScalar

10. ShiftLeftLogicalWideningLower

11. ShiftLeftLogicalWideningUpper

12. ShiftLogical

13. ShiftLogicalRounded

14. ShiftLogicalRoundedSaturate

15. ShiftLogicalRoundedSaturateScalar

16. ShiftLogicalRoundedScalar

17. ShiftLogicalSaturate

18. ShiftLogicalSaturateScalar

19. ShiftLogicalScalar

20. ShiftRightAndInsert

21. ShiftRightAndInsertScalar

22. ShiftRightArithmetic

23. ShiftRightArithmeticAdd

24. ShiftRightArithmeticAddScalar

25. ShiftRightArithmeticNarrowingSaturateLower

26. ShiftRightArithmeticNarrowingSaturateScalar

27. ShiftRightArithmeticNarrowingSaturateUnsignedLower

28. ShiftRightArithmeticNarrowingSaturateUnsignedScalar

29. ShiftRightArithmeticNarrowingSaturateUnsignedUpper

30. ShiftRightArithmeticNarrowingSaturateUpper

31. ShiftRightArithmeticRounded

32. ShiftRightArithmeticRoundedAdd

33. ShiftRightArithmeticRoundedAddScalar

34. ShiftRightArithmeticRoundedNarrowingSaturateLower

35. ShiftRightArithmeticRoundedNarrowingSaturateScalar

36. ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLower

37. ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalar

38. ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpper

39. ShiftRightArithmeticRoundedNarrowingSaturateUpper

40. ShiftRightArithmeticRoundedScalar

41. ShiftRightArithmeticScalar

42. ShiftRightLogical

43. ShiftRightLogicalAdd

44. ShiftRightLogicalAddScalar

45. ShiftRightLogicalNarrowingLower

Leave a Comment