Introduction
In my vectorization using .NET APIs blog, I describe SIMD datatypes Vector64<T>
and Vector128<T>
that operates on ‘Arm64 hardware intrinsic’ APIs present under System.Runtime.Intrinsics.Arm.AdvSimd and System.Runtime.Intrinsics.Arm.AdvSimd.Arm64 class. In this post I will describe those hardware intrinsic APIs by showing sample code usage along with examples and generated Arm64 code. This will help people in understanding these APIs so they can use them to optimize their .NET code written to target Arm64. Since there are 360 APIs, describing all of them in a single post will be overwhelming. So I have divided these APIs among 8 blogs and will demonstrate 45 APIs in each blog. This is part 7 of that blog series. You can checkout my previous blogs at:
Most of the description of these APIs is adapted and referenced from Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile document. You can also refer to the description of SIMD and Floating-point instructions description at Arm developer docs page.
The blog page is programmatically generated and might contain mistakes. If you find any mistake, please leave a comment and I will address it.
APIs covered
1. ShiftArithmeticScalar
Vector64<long> ShiftArithmeticScalar(Vector64<long> value, Vector64<long> count)
This method performs arithmetic shift of each signed integer value in the value
vector, by a value of the corresponding element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift.
private Vector64<long> ShiftArithmeticScalarTest(Vector64<long> value, Vector64<long> count)
{
return AdvSimd.ShiftArithmeticScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftArithmeticScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sshl d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
2. ShiftLeftAndInsert
Vector64<byte> ShiftLeftAndInsert(Vector64<byte> left, Vector64<byte> right, byte shift)
This method left shifts each vector element in the right
vector, by shift
value, and inserts the result into the corresponding vector element in the left
vector such that the new zero bits created by the shift are not inserted but retain their existing value as in left
vector. Bits shifted out of the left of each vector element in the right
are lost.
private Vector64<byte> ShiftLeftAndInsertTest(Vector64<byte> left, Vector64<byte> right, byte shift)
{
return AdvSimd.ShiftLeftAndInsert(left, right, 1);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <1, 2, 3, 4, 5, 6, 7, 8>
// shift = 1
// Result = <3, 4, 7, 8, 11, 12, 15, 16>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLeftAndInsert(Vector64<short> left, Vector64<short> right, byte shift)
Vector64<int> ShiftLeftAndInsert(Vector64<int> left, Vector64<int> right, byte shift)
Vector64<sbyte> ShiftLeftAndInsert(Vector64<sbyte> left, Vector64<sbyte> right, byte shift)
Vector64<ushort> ShiftLeftAndInsert(Vector64<ushort> left, Vector64<ushort> right, byte shift)
Vector64<uint> ShiftLeftAndInsert(Vector64<uint> left, Vector64<uint> right, byte shift)
Vector128<byte> ShiftLeftAndInsert(Vector128<byte> left, Vector128<byte> right, byte shift)
Vector128<short> ShiftLeftAndInsert(Vector128<short> left, Vector128<short> right, byte shift)
Vector128<int> ShiftLeftAndInsert(Vector128<int> left, Vector128<int> right, byte shift)
Vector128<long> ShiftLeftAndInsert(Vector128<long> left, Vector128<long> right, byte shift)
Vector128<sbyte> ShiftLeftAndInsert(Vector128<sbyte> left, Vector128<sbyte> right, byte shift)
Vector128<ushort> ShiftLeftAndInsert(Vector128<ushort> left, Vector128<ushort> right, byte shift)
Vector128<uint> ShiftLeftAndInsert(Vector128<uint> left, Vector128<uint> right, byte shift)
Vector128<ulong> ShiftLeftAndInsert(Vector128<ulong> left, Vector128<ulong> right, byte shift)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLeftAndInsertTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sli v0.8b, v1.8b, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
3. ShiftLeftAndInsertScalar
Vector64<long> ShiftLeftAndInsertScalar(Vector64<long> left, Vector64<long> right, byte shift)
This method left shifts each vector element in the right
vector, by shift
value, and inserts the result into the corresponding vector element in the left
vector such that the new zero bits created by the shift are not inserted but retain their existing value as in left
vector. Bits shifted out of the left of each vector element in the right
are lost.
private Vector64<long> ShiftLeftAndInsertScalarTest(Vector64<long> left, Vector64<long> right, byte shift)
{
return AdvSimd.ShiftLeftAndInsertScalar(left, right, 1);
}
// left = <50000>
// right = <60000>
// shift = 1
// Result = <120000>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLeftAndInsertScalar(Vector64<ulong> left, Vector64<ulong> right, byte shift)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLeftAndInsertScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sli d0, d1, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
4. ShiftLeftLogical
Vector64<byte> ShiftLeftLogical(Vector64<byte> value, byte count)
This method left shifts each value from a vector, by count
, stores the results in a vector and returns the result vector.
private Vector64<byte> ShiftLeftLogicalTest(Vector64<byte> value, byte count)
{
return AdvSimd.ShiftLeftLogical(value, 1);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <22, 24, 26, 28, 30, 32, 34, 36>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLeftLogical(Vector64<short> value, byte count)
Vector64<int> ShiftLeftLogical(Vector64<int> value, byte count)
Vector64<sbyte> ShiftLeftLogical(Vector64<sbyte> value, byte count)
Vector64<ushort> ShiftLeftLogical(Vector64<ushort> value, byte count)
Vector64<uint> ShiftLeftLogical(Vector64<uint> value, byte count)
Vector128<byte> ShiftLeftLogical(Vector128<byte> value, byte count)
Vector128<short> ShiftLeftLogical(Vector128<short> value, byte count)
Vector128<long> ShiftLeftLogical(Vector128<long> value, byte count)
Vector128<sbyte> ShiftLeftLogical(Vector128<sbyte> value, byte count)
Vector128<ushort> ShiftLeftLogical(Vector128<ushort> value, byte count)
Vector128<uint> ShiftLeftLogical(Vector128<uint> value, byte count)
Vector128<ulong> ShiftLeftLogical(Vector128<ulong> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
shl v16.8b, v0.8b, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
5. ShiftLeftLogicalSaturate
Vector64<byte> ShiftLeftLogicalSaturate(Vector64<byte> value, byte count)
This method left shifts each element in the value
vector, shifts it by count
, stores the results in a vector and returns the result vector. The results are truncated.
private Vector64<byte> ShiftLeftLogicalSaturateTest(Vector64<byte> value, byte count)
{
return AdvSimd.ShiftLeftLogicalSaturate(value, 6);
}
// value = <11, 112, 13, 14, 15, 16, 17, 18>
// count = 6
// Result = <64, 255, 255, 255, 255, 255, 255, 255>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLeftLogicalSaturate(Vector64<short> value, byte count)
Vector64<int> ShiftLeftLogicalSaturate(Vector64<int> value, byte count)
Vector64<sbyte> ShiftLeftLogicalSaturate(Vector64<sbyte> value, byte count)
Vector64<ushort> ShiftLeftLogicalSaturate(Vector64<ushort> value, byte count)
Vector64<uint> ShiftLeftLogicalSaturate(Vector64<uint> value, byte count)
Vector128<byte> ShiftLeftLogicalSaturate(Vector128<byte> value, byte count)
Vector128<short> ShiftLeftLogicalSaturate(Vector128<short> value, byte count)
Vector128<int> ShiftLeftLogicalSaturate(Vector128<int> value, byte count)
Vector128<long> ShiftLeftLogicalSaturate(Vector128<long> value, byte count)
Vector128<sbyte> ShiftLeftLogicalSaturate(Vector128<sbyte> value, byte count)
Vector128<ushort> ShiftLeftLogicalSaturate(Vector128<ushort> value, byte count)
Vector128<uint> ShiftLeftLogicalSaturate(Vector128<uint> value, byte count)
Vector128<ulong> ShiftLeftLogicalSaturate(Vector128<ulong> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalSaturateTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uqshl v16.8b, v0.8b, #6
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
6. ShiftLeftLogicalSaturateScalar
Vector64<long> ShiftLeftLogicalSaturateScalar(Vector64<long> value, byte count)
This method left shift each element in the value
vector, by count
, stores the results in a vector and returns the result vector. The results are truncated.
private Vector64<long> ShiftLeftLogicalSaturateScalarTest(Vector64<long> value, byte count)
{
return AdvSimd.ShiftLeftLogicalSaturateScalar(value, 0);
}
// value = <11>
// count = 0
// Result = <11>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLeftLogicalSaturateScalar(Vector64<ulong> value, byte count)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<byte> ShiftLeftLogicalSaturateScalar(Vector64<byte> value, byte count)
Vector64<short> ShiftLeftLogicalSaturateScalar(Vector64<short> value, byte count)
Vector64<int> ShiftLeftLogicalSaturateScalar(Vector64<int> value, byte count)
Vector64<sbyte> ShiftLeftLogicalSaturateScalar(Vector64<sbyte> value, byte count)
Vector64<ushort> ShiftLeftLogicalSaturateScalar(Vector64<ushort> value, byte count)
Vector64<uint> ShiftLeftLogicalSaturateScalar(Vector64<uint> value, byte count)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqshl d16, d0, #0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
7. ShiftLeftLogicalSaturateUnsigned
Vector64<ushort> ShiftLeftLogicalSaturateUnsigned(Vector64<short> value, byte count)
This method left shifts each signed integer value in the value
vector, by count
, saturates the shifted result to an unsigned integer value, stores the results in a vector and returns the result vector. The results are truncated.
private Vector64<ushort> ShiftLeftLogicalSaturateUnsignedTest(Vector64<short> value, byte count)
{
return AdvSimd.ShiftLeftLogicalSaturateUnsigned(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <22, 24, 26, 28>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<uint> ShiftLeftLogicalSaturateUnsigned(Vector64<int> value, byte count)
Vector64<byte> ShiftLeftLogicalSaturateUnsigned(Vector64<sbyte> value, byte count)
Vector128<ushort> ShiftLeftLogicalSaturateUnsigned(Vector128<short> value, byte count)
Vector128<uint> ShiftLeftLogicalSaturateUnsigned(Vector128<int> value, byte count)
Vector128<ulong> ShiftLeftLogicalSaturateUnsigned(Vector128<long> value, byte count)
Vector128<byte> ShiftLeftLogicalSaturateUnsigned(Vector128<sbyte> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalSaturateUnsignedTest(System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqshlu v16.4h, v0.4h, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
8. ShiftLeftLogicalSaturateUnsignedScalar
Vector64<ulong> ShiftLeftLogicalSaturateUnsignedScalar(Vector64<long> value, byte count)
This method shifts signed integer value in the value
vector, by count
, saturates the shifted result to an unsigned integer value, stores the results in a vector and returns the result vector. The results are truncated.
private Vector64<ulong> ShiftLeftLogicalSaturateUnsignedScalarTest(Vector64<long> value, byte count)
{
return AdvSimd.ShiftLeftLogicalSaturateUnsignedScalar(value, 0);
}
// value = <11>
// count = 0
// Result = <11>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<ushort> ShiftLeftLogicalSaturateUnsignedScalar(Vector64<short> value, byte count)
Vector64<uint> ShiftLeftLogicalSaturateUnsignedScalar(Vector64<int> value, byte count)
Vector64<byte> ShiftLeftLogicalSaturateUnsignedScalar(Vector64<sbyte> value, byte count)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalSaturateUnsignedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[UInt64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqshlu d16, d0, #0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
9. ShiftLeftLogicalScalar
Vector64<long> ShiftLeftLogicalScalar(Vector64<long> value, byte count)
This method left shifts each value in value
vector, by count
, stores the results in a vector and returns the result vector.
private Vector64<long> ShiftLeftLogicalScalarTest(Vector64<long> value, byte count)
{
return AdvSimd.ShiftLeftLogicalScalar(value, 1);
}
// value = <971324>
// count = 1
// Result = <1942648>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLeftLogicalScalar(Vector64<ulong> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
shl d16, d0, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
10. ShiftLeftLogicalWideningLower
Vector128<ushort> ShiftLeftLogicalWideningLower(Vector64<byte> value, byte count)
This method left shifts each vector element in the value
vector, by the specified number of bits in count
, stores the results in a vector and returns the result vector. As seen in below example, the result vector element’s size ushort
is twice as long as the input vector element’s size byte
.
private Vector128<ushort> ShiftLeftLogicalWideningLowerTest(Vector64<byte> value, byte count)
{
return AdvSimd.ShiftLeftLogicalWideningLower(value, 0);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 0
// Result = <11, 12, 13, 14, 15, 16, 17, 18>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> ShiftLeftLogicalWideningLower(Vector64<short> value, byte count)
Vector128<long> ShiftLeftLogicalWideningLower(Vector64<int> value, byte count)
Vector128<short> ShiftLeftLogicalWideningLower(Vector64<sbyte> value, byte count)
Vector128<uint> ShiftLeftLogicalWideningLower(Vector64<ushort> value, byte count)
Vector128<ulong> ShiftLeftLogicalWideningLower(Vector64<uint> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalWideningLowerTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ushll v16.8h, v0.8b, #0
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
11. ShiftLeftLogicalWideningUpper
Vector128<ushort> ShiftLeftLogicalWideningUpper(Vector128<byte> value, byte count)
This method shifts each vector element in the upper-half of value
vector, by the specified number of bits in count
, stores the results in a vector and returns the result vector. As seen in below example, the result vector element’s size ushort
is twice as long as the input vector element’s size byte
.
private Vector128<ushort> ShiftLeftLogicalWideningUpperTest(Vector128<byte> value, byte count)
{
return AdvSimd.ShiftLeftLogicalWideningUpper(value, 1);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// count = 1
// Result = <38, 40, 42, 44, 46, 48, 50, 52>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> ShiftLeftLogicalWideningUpper(Vector128<short> value, byte count)
Vector128<long> ShiftLeftLogicalWideningUpper(Vector128<int> value, byte count)
Vector128<short> ShiftLeftLogicalWideningUpper(Vector128<sbyte> value, byte count)
Vector128<uint> ShiftLeftLogicalWideningUpper(Vector128<ushort> value, byte count)
Vector128<ulong> ShiftLeftLogicalWideningUpper(Vector128<uint> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLeftLogicalWideningUpperTest(System.Runtime.Intrinsics.Vector128`1[Byte],ubyte):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ushll2 v16.8h, v0.16b, #1
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
12. ShiftLogical
Vector64<byte> ShiftLogical(Vector64<byte> value, Vector64<sbyte> count)
This method shifts each element in the value
vector, by the corresponding element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift.
private Vector64<byte> ShiftLogicalTest(Vector64<byte> value, Vector64<sbyte> count)
{
return AdvSimd.ShiftLogical(value, count);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = <-3, 2, 3, 5, 6, 7, -7, 0>
// Result = <1, 48, 104, 192, 192, 0, 0, 18>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLogical(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftLogical(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftLogical(Vector64<sbyte> value, Vector64<sbyte> count)
Vector64<ushort> ShiftLogical(Vector64<ushort> value, Vector64<short> count)
Vector64<uint> ShiftLogical(Vector64<uint> value, Vector64<int> count)
Vector128<byte> ShiftLogical(Vector128<byte> value, Vector128<sbyte> count)
Vector128<short> ShiftLogical(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftLogical(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftLogical(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftLogical(Vector128<sbyte> value, Vector128<sbyte> count)
Vector128<ushort> ShiftLogical(Vector128<ushort> value, Vector128<short> count)
Vector128<uint> ShiftLogical(Vector128<uint> value, Vector128<int> count)
Vector128<ulong> ShiftLogical(Vector128<ulong> value, Vector128<long> count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLogicalTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ushl v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
13. ShiftLogicalRounded
Vector64<byte> ShiftLogicalRounded(Vector64<byte> value, Vector64<sbyte> count)
This method shifts each element in the value
vector , by the corresponding element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift.
private Vector64<byte> ShiftLogicalRoundedTest(Vector64<byte> value, Vector64<sbyte> count)
{
return AdvSimd.ShiftLogicalRounded(value, count);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = <-3, 2, 3, 5, 6, 7, -7, 0>
// Result = <1, 48, 104, 192, 192, 0, 0, 18>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLogicalRounded(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftLogicalRounded(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftLogicalRounded(Vector64<sbyte> value, Vector64<sbyte> count)
Vector64<ushort> ShiftLogicalRounded(Vector64<ushort> value, Vector64<short> count)
Vector64<uint> ShiftLogicalRounded(Vector64<uint> value, Vector64<int> count)
Vector128<byte> ShiftLogicalRounded(Vector128<byte> value, Vector128<sbyte> count)
Vector128<short> ShiftLogicalRounded(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftLogicalRounded(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftLogicalRounded(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftLogicalRounded(Vector128<sbyte> value, Vector128<sbyte> count)
Vector128<ushort> ShiftLogicalRounded(Vector128<ushort> value, Vector128<short> count)
Vector128<uint> ShiftLogicalRounded(Vector128<uint> value, Vector128<int> count)
Vector128<ulong> ShiftLogicalRounded(Vector128<ulong> value, Vector128<long> count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLogicalRoundedTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
urshl v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
14. ShiftLogicalRoundedSaturate
Vector64<byte> ShiftLogicalRoundedSaturate(Vector64<byte> value, Vector64<sbyte> count)
This method shifts each vector element of the value
vector, by the corresponding vector element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. If overflow occurs with any of the results, those results are saturated.
private Vector64<byte> ShiftLogicalRoundedSaturateTest(Vector64<byte> value, Vector64<sbyte> count)
{
return AdvSimd.ShiftLogicalRoundedSaturate(value, count);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <255, 255, 255, 255, 255, 255, 255, 255>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLogicalRoundedSaturate(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftLogicalRoundedSaturate(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftLogicalRoundedSaturate(Vector64<sbyte> value, Vector64<sbyte> count)
Vector64<ushort> ShiftLogicalRoundedSaturate(Vector64<ushort> value, Vector64<short> count)
Vector64<uint> ShiftLogicalRoundedSaturate(Vector64<uint> value, Vector64<int> count)
Vector128<byte> ShiftLogicalRoundedSaturate(Vector128<byte> value, Vector128<sbyte> count)
Vector128<short> ShiftLogicalRoundedSaturate(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftLogicalRoundedSaturate(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftLogicalRoundedSaturate(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftLogicalRoundedSaturate(Vector128<sbyte> value, Vector128<sbyte> count)
Vector128<ushort> ShiftLogicalRoundedSaturate(Vector128<ushort> value, Vector128<short> count)
Vector128<uint> ShiftLogicalRoundedSaturate(Vector128<uint> value, Vector128<int> count)
Vector128<ulong> ShiftLogicalRoundedSaturate(Vector128<ulong> value, Vector128<long> count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLogicalRoundedSaturateTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uqrshl v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
15. ShiftLogicalRoundedSaturateScalar
Vector64<long> ShiftLogicalRoundedSaturateScalar(Vector64<long> value, Vector64<long> count)
This method shifts each vector element of the value
vector, by the corresponding vector element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded. If overflow occurs with any of the results, those results are saturated.
private Vector64<long> ShiftLogicalRoundedSaturateScalarTest(Vector64<long> value, Vector64<long> count)
{
return AdvSimd.ShiftLogicalRoundedSaturateScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLogicalRoundedSaturateScalar(Vector64<ulong> value, Vector64<long> count)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<byte> ShiftLogicalRoundedSaturateScalar(Vector64<byte> value, Vector64<sbyte> count)
Vector64<short> ShiftLogicalRoundedSaturateScalar(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftLogicalRoundedSaturateScalar(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftLogicalRoundedSaturateScalar(Vector64<sbyte> value, Vector64<sbyte> count)
Vector64<ushort> ShiftLogicalRoundedSaturateScalar(Vector64<ushort> value, Vector64<short> count)
Vector64<uint> ShiftLogicalRoundedSaturateScalar(Vector64<uint> value, Vector64<int> count)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLogicalRoundedSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uqrshl d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
16. ShiftLogicalRoundedScalar
Vector64<long> ShiftLogicalRoundedScalar(Vector64<long> value, Vector64<long> count)
This method shifts each element in the value
vector, by the corresponding element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift.
private Vector64<long> ShiftLogicalRoundedScalarTest(Vector64<long> value, Vector64<long> count)
{
return AdvSimd.ShiftLogicalRoundedScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLogicalRoundedScalar(Vector64<ulong> value, Vector64<long> count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLogicalRoundedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
urshl d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
17. ShiftLogicalSaturate
Vector64<byte> ShiftLogicalSaturate(Vector64<byte> value, Vector64<sbyte> count)
This method shifts each element in the value
vector, by the corresponding element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. If overflow occurs with any of the results, those results are saturated.
private Vector64<byte> ShiftLogicalSaturateTest(Vector64<byte> value, Vector64<sbyte> count)
{
return AdvSimd.ShiftLogicalSaturate(value, count);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = <-3, 2, 3, 5, 6, 7, -8, 0>
// Result = <1, 48, 104, 255, 255, 255, 0, 18>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftLogicalSaturate(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftLogicalSaturate(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftLogicalSaturate(Vector64<sbyte> value, Vector64<sbyte> count)
Vector64<ushort> ShiftLogicalSaturate(Vector64<ushort> value, Vector64<short> count)
Vector64<uint> ShiftLogicalSaturate(Vector64<uint> value, Vector64<int> count)
Vector128<byte> ShiftLogicalSaturate(Vector128<byte> value, Vector128<sbyte> count)
Vector128<short> ShiftLogicalSaturate(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftLogicalSaturate(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftLogicalSaturate(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftLogicalSaturate(Vector128<sbyte> value, Vector128<sbyte> count)
Vector128<ushort> ShiftLogicalSaturate(Vector128<ushort> value, Vector128<short> count)
Vector128<uint> ShiftLogicalSaturate(Vector128<uint> value, Vector128<int> count)
Vector128<ulong> ShiftLogicalSaturate(Vector128<ulong> value, Vector128<long> count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLogicalSaturateTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[SByte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uqshl v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
18. ShiftLogicalSaturateScalar
Vector64<long> ShiftLogicalSaturateScalar(Vector64<long> value, Vector64<long> count)
This method shifts 0th element in the value
vector, by the corresponding element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated. If overflow occurs with any of the results, those results are saturated.
private Vector64<long> ShiftLogicalSaturateScalarTest(Vector64<long> value, Vector64<long> count)
{
return AdvSimd.ShiftLogicalSaturateScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLogicalSaturateScalar(Vector64<ulong> value, Vector64<long> count)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<byte> ShiftLogicalSaturateScalar(Vector64<byte> value, Vector64<sbyte> count)
Vector64<short> ShiftLogicalSaturateScalar(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftLogicalSaturateScalar(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftLogicalSaturateScalar(Vector64<sbyte> value, Vector64<sbyte> count)
Vector64<ushort> ShiftLogicalSaturateScalar(Vector64<ushort> value, Vector64<short> count)
Vector64<uint> ShiftLogicalSaturateScalar(Vector64<uint> value, Vector64<int> count)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLogicalSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uqshl d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
19. ShiftLogicalScalar
Vector64<long> ShiftLogicalScalar(Vector64<long> value, Vector64<long> count)
This method shifts each element in the value
vector, by the corresponding element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift.
private Vector64<long> ShiftLogicalScalarTest(Vector64<long> value, Vector64<long> count)
{
return AdvSimd.ShiftLogicalScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftLogicalScalar(Vector64<ulong> value, Vector64<long> count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftLogicalScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ushl d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
20. ShiftRightAndInsert
Vector64<byte> ShiftRightAndInsert(Vector64<byte> left, Vector64<byte> right, byte shift)
This method right shifts each vector element in the right
vector, by shift
value, and inserts the result into the corresponding vector element in the left
vector such that the new zero bits created by the shift are not inserted but retain their existing value as in left
vector. Bits shifted out of the left of each vector element in the right
are lost.
private Vector64<byte> ShiftRightAndInsertTest(Vector64<byte> left, Vector64<byte> right, byte shift)
{
return AdvSimd.ShiftRightAndInsert(left, right, 1);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// shift = 1
// Result = <10, 11, 11, 12, 12, 13, 13, 14>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftRightAndInsert(Vector64<short> left, Vector64<short> right, byte shift)
Vector64<int> ShiftRightAndInsert(Vector64<int> left, Vector64<int> right, byte shift)
Vector64<sbyte> ShiftRightAndInsert(Vector64<sbyte> left, Vector64<sbyte> right, byte shift)
Vector64<ushort> ShiftRightAndInsert(Vector64<ushort> left, Vector64<ushort> right, byte shift)
Vector64<uint> ShiftRightAndInsert(Vector64<uint> left, Vector64<uint> right, byte shift)
Vector128<byte> ShiftRightAndInsert(Vector128<byte> left, Vector128<byte> right, byte shift)
Vector128<short> ShiftRightAndInsert(Vector128<short> left, Vector128<short> right, byte shift)
Vector128<int> ShiftRightAndInsert(Vector128<int> left, Vector128<int> right, byte shift)
Vector128<long> ShiftRightAndInsert(Vector128<long> left, Vector128<long> right, byte shift)
Vector128<sbyte> ShiftRightAndInsert(Vector128<sbyte> left, Vector128<sbyte> right, byte shift)
Vector128<ushort> ShiftRightAndInsert(Vector128<ushort> left, Vector128<ushort> right, byte shift)
Vector128<uint> ShiftRightAndInsert(Vector128<uint> left, Vector128<uint> right, byte shift)
Vector128<ulong> ShiftRightAndInsert(Vector128<ulong> left, Vector128<ulong> right, byte shift)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightAndInsertTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sri v0.8b, v1.8b, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
21. ShiftRightAndInsertScalar
Vector64<long> ShiftRightAndInsertScalar(Vector64<long> left, Vector64<long> right, byte shift)
This method right shifts each vector element in the right
vector, by shift
value, and inserts the result into the corresponding vector element in the left
vector such that the new zero bits created by the shift are not inserted but retain their existing value as in left
vector. Bits shifted out of the left of each vector element in the right
are lost.
private Vector64<long> ShiftRightAndInsertScalarTest(Vector64<long> left, Vector64<long> right, byte shift)
{
return AdvSimd.ShiftRightAndInsertScalar(left, right, 1);
}
// left = <11>
// right = <11>
// shift = 1
// Result = <5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftRightAndInsertScalar(Vector64<ulong> left, Vector64<ulong> right, byte shift)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightAndInsertScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sri d0, d1, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
22. ShiftRightArithmetic
Vector64<short> ShiftRightArithmetic(Vector64<short> value, byte count)
This method right shifts each element in the value
vector by count
, stores the truncated results in a vector and returns the result vector. All the values in this method are signed integer values.
private Vector64<short> ShiftRightArithmeticTest(Vector64<short> value, byte count)
{
return AdvSimd.ShiftRightArithmetic(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <5, 6, 6, 7>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftRightArithmetic(Vector64<int> value, byte count)
Vector64<sbyte> ShiftRightArithmetic(Vector64<sbyte> value, byte count)
Vector128<short> ShiftRightArithmetic(Vector128<short> value, byte count)
Vector128<int> ShiftRightArithmetic(Vector128<int> value, byte count)
Vector128<long> ShiftRightArithmetic(Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightArithmetic(Vector128<sbyte> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticTest(System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sshr v16.4h, v0.4h, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
23. ShiftRightArithmeticAdd
Vector64<short> ShiftRightArithmeticAdd(Vector64<short> addend, Vector64<short> value, byte count)
This method right shifts each element in the value
vector, by a count
, and accumulates the final results with the vector elements of the addend
vector and return the accumulated vector. All the values in this method are signed integer values. All results are truncated.
private Vector64<short> ShiftRightArithmeticAddTest(Vector64<short> addend, Vector64<short> value, byte count)
{
return AdvSimd.ShiftRightArithmeticAdd(addend, value, 1);
}
// addend = <11, 12, 13, 14>
// value = <21, 22, 23, 24>
// count = 1
// Result = <21, 23, 24, 26>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftRightArithmeticAdd(Vector64<int> addend, Vector64<int> value, byte count)
Vector64<sbyte> ShiftRightArithmeticAdd(Vector64<sbyte> addend, Vector64<sbyte> value, byte count)
Vector128<short> ShiftRightArithmeticAdd(Vector128<short> addend, Vector128<short> value, byte count)
Vector128<int> ShiftRightArithmeticAdd(Vector128<int> addend, Vector128<int> value, byte count)
Vector128<long> ShiftRightArithmeticAdd(Vector128<long> addend, Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightArithmeticAdd(Vector128<sbyte> addend, Vector128<sbyte> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticAddTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ssra v0.4h, v1.4h, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
24. ShiftRightArithmeticAddScalar
Vector64<long> ShiftRightArithmeticAddScalar(Vector64<long> addend, Vector64<long> value, byte count)
This method right shifts each element in the value
vector, by a count
, and accumulates the final results with the vector elements of the addend
vector and return the accumulated vector. All the values in this method are signed integer values. All results are truncated.
private Vector64<long> ShiftRightArithmeticAddScalarTest(Vector64<long> addend, Vector64<long> value, byte count)
{
return AdvSimd.ShiftRightArithmeticAddScalar(addend, value, 1);
}
// addend = <11>
// value = <11>
// count = 1
// Result = <16>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticAddScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ssra d0, d1, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
25. ShiftRightArithmeticNarrowingSaturateLower
Vector64<short> ShiftRightArithmeticNarrowingSaturateLower(Vector128<int> value, byte count)
This method right shifts and truncates each vector element in thevalue
vector, by count
, saturates each shifted result to a value that is half the original width, stores the final result into a vector, and writes the vector to the result vector. All the values in this method are signed integer values. As seen in below example, the result vector element’s size short
is half as long as the source vector element’s size int
.
private Vector64<short> ShiftRightArithmeticNarrowingSaturateLowerTest(Vector128<int> value, byte count)
{
return AdvSimd.ShiftRightArithmeticNarrowingSaturateLower(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <5, 6, 6, 7>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftRightArithmeticNarrowingSaturateLower(Vector128<long> value, byte count)
Vector64<sbyte> ShiftRightArithmeticNarrowingSaturateLower(Vector128<short> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticNarrowingSaturateLowerTest(System.Runtime.Intrinsics.Vector128`1[Int32],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqshrn v16.4h, v0.4s, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
26. ShiftRightArithmeticNarrowingSaturateScalar
Vector64<short> ShiftRightArithmeticNarrowingSaturateScalar(Vector64<int> value, byte count)
This method right shifts and truncates 0th vector element in thevalue
vector, by count
, saturates each shifted result to a value that is half the original width, stores the final result into a vector, and writes the vector to the 0th element of result vector, other elements being set to 0. All the values in this method are signed integer values. As seen in below example, the result vector element’s size short
is half as long as the source vector element’s size int
.
private Vector64<short> ShiftRightArithmeticNarrowingSaturateScalarTest(Vector64<int> value, byte count)
{
return AdvSimd.Arm64.ShiftRightArithmeticNarrowingSaturateScalar(value, 1);
}
// value = <11, 12>
// count = 1
// Result = <5, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> ShiftRightArithmeticNarrowingSaturateScalar(Vector64<long> value, byte count)
Vector64<sbyte> ShiftRightArithmeticNarrowingSaturateScalar(Vector64<short> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticNarrowingSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int32],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqshrn h16, s0, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
27. ShiftRightArithmeticNarrowingSaturateUnsignedLower
Vector64<byte> ShiftRightArithmeticNarrowingSaturateUnsignedLower(Vector128<short> value, byte count)
This method right shifts each signed integer value in the value
vector, by count
, saturates the result to an unsigned integer value that is half the original width, stores the results in a vector and returns the result vector. The results are truncated.
private Vector64<byte> ShiftRightArithmeticNarrowingSaturateUnsignedLowerTest(Vector128<short> value, byte count)
{
return AdvSimd.ShiftRightArithmeticNarrowingSaturateUnsignedLower(value, 1);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <5, 6, 6, 7, 7, 8, 8, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ushort> ShiftRightArithmeticNarrowingSaturateUnsignedLower(Vector128<int> value, byte count)
Vector64<uint> ShiftRightArithmeticNarrowingSaturateUnsignedLower(Vector128<long> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticNarrowingSaturateUnsignedLowerTest(System.Runtime.Intrinsics.Vector128`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqshrun v16.8b, v0.8h, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
28. ShiftRightArithmeticNarrowingSaturateUnsignedScalar
Vector64<byte> ShiftRightArithmeticNarrowingSaturateUnsignedScalar(Vector64<short> value, byte count)
This method right shifts signed integer value in the value
vector at 0th index, by count
, saturates the result to an unsigned integer value that is half the original width, stores the results in a vector and returns the result vector, other elements being set to 0. The results are truncated.
private Vector64<byte> ShiftRightArithmeticNarrowingSaturateUnsignedScalarTest(Vector64<short> value, byte count)
{
return AdvSimd.Arm64.ShiftRightArithmeticNarrowingSaturateUnsignedScalar(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <5, 0, 0, 0, 0, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<ushort> ShiftRightArithmeticNarrowingSaturateUnsignedScalar(Vector64<int> value, byte count)
Vector64<uint> ShiftRightArithmeticNarrowingSaturateUnsignedScalar(Vector64<long> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticNarrowingSaturateUnsignedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqshrun b16, h0, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
29. ShiftRightArithmeticNarrowingSaturateUnsignedUpper
Vector128<byte> ShiftRightArithmeticNarrowingSaturateUnsignedUpper(Vector64<byte> lower, Vector128<short> value, byte count)
This method right shifts each signed integer value in the upper-half of value
vector, by count
, saturates the result to an unsigned integer value that is half the original width, stores the final result into a vector, and writes the vector to the upper-half of result vector, the lower-half contains values from lower
vector. The results are truncated.
private Vector128<byte> ShiftRightArithmeticNarrowingSaturateUnsignedUpperTest(Vector64<byte> lower, Vector128<short> value, byte count)
{
return AdvSimd.ShiftRightArithmeticNarrowingSaturateUnsignedUpper(lower, value, 1);
}
// lower = <11, 12, 13, 14, 15, 16, 17, 18>
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <11, 12, 13, 14, 15, 16, 17, 18, 5, 6, 6, 7, 7, 8, 8, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<ushort> ShiftRightArithmeticNarrowingSaturateUnsignedUpper(Vector64<ushort> lower, Vector128<int> value, byte count)
Vector128<uint> ShiftRightArithmeticNarrowingSaturateUnsignedUpper(Vector64<uint> lower, Vector128<long> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticNarrowingSaturateUnsignedUpperTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector128`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqshrun2 v0.16b, v1.8h, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
30. ShiftRightArithmeticNarrowingSaturateUpper
Vector128<short> ShiftRightArithmeticNarrowingSaturateUpper(Vector64<short> lower, Vector128<int> value, byte count)
This method right shifts each vector element in the upper-half of value
vector, by count
, saturates each shifted result to a value that is half the original width, stores the final result into a vector, and writes the vector to the upper-half of result vector while lower-half contains lower
vector values. All the values in this method are signed integer values. All results are truncated.
private Vector128<short> ShiftRightArithmeticNarrowingSaturateUpperTest(Vector64<short> lower, Vector128<int> value, byte count)
{
return AdvSimd.ShiftRightArithmeticNarrowingSaturateUpper(lower, value, 1);
}
// lower = <11, 12, 13, 14>
// value = <11, 12, 13, 14>
// count = 1
// Result = <11, 12, 13, 14, 5, 6, 6, 7>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> ShiftRightArithmeticNarrowingSaturateUpper(Vector64<int> lower, Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightArithmeticNarrowingSaturateUpper(Vector64<sbyte> lower, Vector128<short> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticNarrowingSaturateUpperTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector128`1[Int32],ubyte):System.Runtime.Intrinsics.Vector128`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqshrn2 v0.8h, v1.4s, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
31. ShiftRightArithmeticRounded
Vector64<short> ShiftRightArithmeticRounded(Vector64<short> value, byte count)
This method right shifts each element in the value
vector, by count
and then rounded, stores the results in a vector and returns the result vector. All the values in this method are signed integer values.
private Vector64<short> ShiftRightArithmeticRoundedTest(Vector64<short> value, byte count)
{
return AdvSimd.ShiftRightArithmeticRounded(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <6, 6, 7, 7>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftRightArithmeticRounded(Vector64<int> value, byte count)
Vector64<sbyte> ShiftRightArithmeticRounded(Vector64<sbyte> value, byte count)
Vector128<short> ShiftRightArithmeticRounded(Vector128<short> value, byte count)
Vector128<int> ShiftRightArithmeticRounded(Vector128<int> value, byte count)
Vector128<long> ShiftRightArithmeticRounded(Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightArithmeticRounded(Vector128<sbyte> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedTest(System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
srshr v16.4h, v0.4h, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
32. ShiftRightArithmeticRoundedAdd
Vector64<short> ShiftRightArithmeticRoundedAdd(Vector64<short> addend, Vector64<short> value, byte count)
This method right shifts each element in the value
vector, by a count
, and accumulates the final results with the vector elements of the addend
vector and return the accumulated vector. All the values in this method are signed integer values. All results are rounded.
private Vector64<short> ShiftRightArithmeticRoundedAddTest(Vector64<short> addend, Vector64<short> value, byte count)
{
return AdvSimd.ShiftRightArithmeticRoundedAdd(addend, value, 1);
}
// addend = <11, 12, 13, 14>
// value = <21, 22, 23, 24>
// count = 1
// Result = <22, 23, 25, 26>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftRightArithmeticRoundedAdd(Vector64<int> addend, Vector64<int> value, byte count)
Vector64<sbyte> ShiftRightArithmeticRoundedAdd(Vector64<sbyte> addend, Vector64<sbyte> value, byte count)
Vector128<short> ShiftRightArithmeticRoundedAdd(Vector128<short> addend, Vector128<short> value, byte count)
Vector128<int> ShiftRightArithmeticRoundedAdd(Vector128<int> addend, Vector128<int> value, byte count)
Vector128<long> ShiftRightArithmeticRoundedAdd(Vector128<long> addend, Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightArithmeticRoundedAdd(Vector128<sbyte> addend, Vector128<sbyte> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedAddTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
srsra v0.4h, v1.4h, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
33. ShiftRightArithmeticRoundedAddScalar
Vector64<long> ShiftRightArithmeticRoundedAddScalar(Vector64<long> addend, Vector64<long> value, byte count)
This method right shifts each element in the value
vector, by a count
, and accumulates the final results with the vector elements of the addend
vector and return the accumulated vector. All the values in this method are signed integer values. All results are rounded.
private Vector64<long> ShiftRightArithmeticRoundedAddScalarTest(Vector64<long> addend, Vector64<long> value, byte count)
{
return AdvSimd.ShiftRightArithmeticRoundedAddScalar(addend, value, 1);
}
// addend = <11>
// value = <11>
// count = 1
// Result = <17>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedAddScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
srsra d0, d1, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
34. ShiftRightArithmeticRoundedNarrowingSaturateLower
Vector64<short> ShiftRightArithmeticRoundedNarrowingSaturateLower(Vector128<int> value, byte count)
This method right shifts each vector element in the value
vector, by count
, saturates each shifted result to a value that is half the original width, stores the final result into a vector, and writes the vector to the result vector. All the values in this method are signed integer values. As seen in below example, the result vector element’s size short
is half as long as the source vector element’s size int
. The results are rounded.
private Vector64<short> ShiftRightArithmeticRoundedNarrowingSaturateLowerTest(Vector128<int> value, byte count)
{
return AdvSimd.ShiftRightArithmeticRoundedNarrowingSaturateLower(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <6, 6, 7, 7>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftRightArithmeticRoundedNarrowingSaturateLower(Vector128<long> value, byte count)
Vector64<sbyte> ShiftRightArithmeticRoundedNarrowingSaturateLower(Vector128<short> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedNarrowingSaturateLowerTest(System.Runtime.Intrinsics.Vector128`1[Int32],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrshrn v16.4h, v0.4s, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
35. ShiftRightArithmeticRoundedNarrowingSaturateScalar
Vector64<short> ShiftRightArithmeticRoundedNarrowingSaturateScalar(Vector64<int> value, byte count)
This method right shifts 0th element in the value
vector, by count
, saturates each shifted result to a value that is half the original width, stores the final result into 0th element of vector, and writes the vector to the result vector, other elements being set to 0. All the values in this method are signed integer values. As seen in below example, the result vector element’s size short
is half as long as the source vector element’s size int
. The results are rounded.
private Vector64<short> ShiftRightArithmeticRoundedNarrowingSaturateScalarTest(Vector64<int> value, byte count)
{
return AdvSimd.Arm64.ShiftRightArithmeticRoundedNarrowingSaturateScalar(value, 1);
}
// value = <11, 12>
// count = 1
// Result = <6, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> ShiftRightArithmeticRoundedNarrowingSaturateScalar(Vector64<long> value, byte count)
Vector64<sbyte> ShiftRightArithmeticRoundedNarrowingSaturateScalar(Vector64<short> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedNarrowingSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int32],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrshrn h16, s0, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
36. ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLower
Vector64<byte> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLower(Vector128<short> value, byte count)
This method right shifts each signed integer value in the value
vector, by count
, saturates the result to an unsigned integer value that is half the original width, stores the results in a vector and returns the result vector. The results are rounded.
private Vector64<byte> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLowerTest(Vector128<short> value, byte count)
{
return AdvSimd.ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLower(value, 1);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <6, 6, 7, 7, 8, 8, 9, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ushort> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLower(Vector128<int> value, byte count)
Vector64<uint> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLower(Vector128<long> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedNarrowingSaturateUnsignedLowerTest(System.Runtime.Intrinsics.Vector128`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrshrun v16.8b, v0.8h, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
37. ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalar
Vector64<byte> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalar(Vector64<short> value, byte count)
This method right shifts signed integer value in the value
vector at 0th index, by count
, saturates the result to an unsigned integer value that is half the original width, stores the results in a vector and returns the result vector, other elements being set to 0. The results are rounded.
private Vector64<byte> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalarTest(Vector64<short> value, byte count)
{
return AdvSimd.Arm64.ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalar(value, 1);
}
// value = <11, 12, 13, 14>
// count = 1
// Result = <6, 0, 0, 0, 0, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<ushort> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalar(Vector64<int> value, byte count)
Vector64<uint> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalar(Vector64<long> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedNarrowingSaturateUnsignedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrshrun b16, h0, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
38. ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpper
Vector128<byte> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpper(Vector64<byte> lower, Vector128<short> value, byte count)
This method right shifts each signed integer value in the upper-half of value
vector, by count
, saturates the result to an unsigned integer value that is half the original width, stores the final result into a vector, and writes the vector to the upper-half of result vector, the lower-half contains values from lower
vector. The results are rounded.
private Vector128<byte> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpperTest(Vector64<byte> lower, Vector128<short> value, byte count)
{
return AdvSimd.ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpper(lower, value, 1);
}
// lower = <11, 12, 13, 14, 15, 16, 17, 18>
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <11, 12, 13, 14, 15, 16, 17, 18, 6, 6, 7, 7, 8, 8, 9, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<ushort> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpper(Vector64<ushort> lower, Vector128<int> value, byte count)
Vector128<uint> ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpper(Vector64<uint> lower, Vector128<long> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedNarrowingSaturateUnsignedUpperTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector128`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrshrun2 v0.16b, v1.8h, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
39. ShiftRightArithmeticRoundedNarrowingSaturateUpper
Vector128<short> ShiftRightArithmeticRoundedNarrowingSaturateUpper(Vector64<short> lower, Vector128<int> value, byte count)
This method right shifts each vector element in the upper-half of value
vector, by count
, saturates each shifted result to a value that is half the original width, stores the final result into a vector, and writes the vector to the upper-half of result vector while lower-half contains lower
vector values. All the values in this method are signed integer values. All results are rounded.
private Vector128<short> ShiftRightArithmeticRoundedNarrowingSaturateUpperTest(Vector64<short> lower, Vector128<int> value, byte count)
{
return AdvSimd.ShiftRightArithmeticRoundedNarrowingSaturateUpper(lower, value, 1);
}
// lower = <11, 12, 13, 14>
// value = <11, 12, 13, 14>
// count = 1
// Result = <11, 12, 13, 14, 6, 6, 7, 7>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> ShiftRightArithmeticRoundedNarrowingSaturateUpper(Vector64<int> lower, Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightArithmeticRoundedNarrowingSaturateUpper(Vector64<sbyte> lower, Vector128<short> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedNarrowingSaturateUpperTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector128`1[Int32],ubyte):System.Runtime.Intrinsics.Vector128`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrshrn2 v0.8h, v1.4s, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
40. ShiftRightArithmeticRoundedScalar
Vector64<long> ShiftRightArithmeticRoundedScalar(Vector64<long> value, byte count)
This method right shifts each element in the value
vector, by count
and then rounded, stores the results in a vector and returns the result vector. All the values in this method are signed integer values.
private Vector64<long> ShiftRightArithmeticRoundedScalarTest(Vector64<long> value, byte count)
{
return AdvSimd.ShiftRightArithmeticRoundedScalar(value, 1);
}
// value = <11>
// count = 1
// Result = <6>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticRoundedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
srshr d16, d0, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
41. ShiftRightArithmeticScalar
Vector64<long> ShiftRightArithmeticScalar(Vector64<long> value, byte count)
This method right shifts each element in the value
vector by count
, stores the truncated results in a vector and returns the result vector. All the values in this method are signed integer values.
private Vector64<long> ShiftRightArithmeticScalarTest(Vector64<long> value, byte count)
{
return AdvSimd.ShiftRightArithmeticScalar(value, 1);
}
// value = <11>
// count = 1
// Result = <5>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightArithmeticScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sshr d16, d0, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
42. ShiftRightLogical
Vector64<byte> ShiftRightLogical(Vector64<byte> value, byte count)
This method right shifts each element in the value
vector, by count
, stores the results in a vector and returns the result vector. The results are truncated.
private Vector64<byte> ShiftRightLogicalTest(Vector64<byte> value, byte count)
{
return AdvSimd.ShiftRightLogical(value, 1);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <5, 6, 6, 7, 7, 8, 8, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftRightLogical(Vector64<short> value, byte count)
Vector64<int> ShiftRightLogical(Vector64<int> value, byte count)
Vector64<sbyte> ShiftRightLogical(Vector64<sbyte> value, byte count)
Vector64<ushort> ShiftRightLogical(Vector64<ushort> value, byte count)
Vector64<uint> ShiftRightLogical(Vector64<uint> value, byte count)
Vector128<byte> ShiftRightLogical(Vector128<byte> value, byte count)
Vector128<short> ShiftRightLogical(Vector128<short> value, byte count)
Vector128<int> ShiftRightLogical(Vector128<int> value, byte count)
Vector128<long> ShiftRightLogical(Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightLogical(Vector128<sbyte> value, byte count)
Vector128<ushort> ShiftRightLogical(Vector128<ushort> value, byte count)
Vector128<uint> ShiftRightLogical(Vector128<uint> value, byte count)
Vector128<ulong> ShiftRightLogical(Vector128<ulong> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightLogicalTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ushr v16.8b, v0.8b, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
43. ShiftRightLogicalAdd
Vector64<byte> ShiftRightLogicalAdd(Vector64<byte> addend, Vector64<byte> value, byte count)
This method right shifts each element in the value
vector, by count
, and accumulates the final results with the vector elements of the addend
vector and return the result vector. The results are truncated.
private Vector64<byte> ShiftRightLogicalAddTest(Vector64<byte> addend, Vector64<byte> value, byte count)
{
return AdvSimd.ShiftRightLogicalAdd(addend, value, 1);
}
// addend = <11, 12, 13, 14, 15, 16, 17, 18>
// value = <21, 22, 23, 24, 25, 26, 27, 28>
// count = 1
// Result = <21, 23, 24, 26, 27, 29, 30, 32>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftRightLogicalAdd(Vector64<short> addend, Vector64<short> value, byte count)
Vector64<int> ShiftRightLogicalAdd(Vector64<int> addend, Vector64<int> value, byte count)
Vector64<sbyte> ShiftRightLogicalAdd(Vector64<sbyte> addend, Vector64<sbyte> value, byte count)
Vector64<ushort> ShiftRightLogicalAdd(Vector64<ushort> addend, Vector64<ushort> value, byte count)
Vector64<uint> ShiftRightLogicalAdd(Vector64<uint> addend, Vector64<uint> value, byte count)
Vector128<byte> ShiftRightLogicalAdd(Vector128<byte> addend, Vector128<byte> value, byte count)
Vector128<short> ShiftRightLogicalAdd(Vector128<short> addend, Vector128<short> value, byte count)
Vector128<int> ShiftRightLogicalAdd(Vector128<int> addend, Vector128<int> value, byte count)
Vector128<long> ShiftRightLogicalAdd(Vector128<long> addend, Vector128<long> value, byte count)
Vector128<sbyte> ShiftRightLogicalAdd(Vector128<sbyte> addend, Vector128<sbyte> value, byte count)
Vector128<ushort> ShiftRightLogicalAdd(Vector128<ushort> addend, Vector128<ushort> value, byte count)
Vector128<uint> ShiftRightLogicalAdd(Vector128<uint> addend, Vector128<uint> value, byte count)
Vector128<ulong> ShiftRightLogicalAdd(Vector128<ulong> addend, Vector128<ulong> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightLogicalAddTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
usra v0.8b, v1.8b, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
44. ShiftRightLogicalAddScalar
Vector64<long> ShiftRightLogicalAddScalar(Vector64<long> addend, Vector64<long> value, byte count)
This method right shifts each element in the value
vector, by count
, and accumulates the final results with the vector elements of the addend
vector and return the result vector. The results are truncated.
private Vector64<long> ShiftRightLogicalAddScalarTest(Vector64<long> addend, Vector64<long> value, byte count)
{
return AdvSimd.ShiftRightLogicalAddScalar(addend, value, 1);
}
// addend = <11>
// value = <11>
// count = 1
// Result = <16>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ShiftRightLogicalAddScalar(Vector64<ulong> addend, Vector64<ulong> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightLogicalAddScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64],ubyte):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
usra d0, d1, #1
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
45. ShiftRightLogicalNarrowingLower
Vector64<byte> ShiftRightLogicalNarrowingLower(Vector128<ushort> value, byte count)
This method right shifts each integer value in the value
vector, by count
, stoes the final result into a vector, and writes the vector to the result vector. As seen in below example, the result vector element’s size byte
is half as long as the source vector element’s size ushort
. The results are truncated.
private Vector64<byte> ShiftRightLogicalNarrowingLowerTest(Vector128<ushort> value, byte count)
{
return AdvSimd.ShiftRightLogicalNarrowingLower(value, 1);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// count = 1
// Result = <5, 6, 6, 7, 7, 8, 8, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> ShiftRightLogicalNarrowingLower(Vector128<int> value, byte count)
Vector64<int> ShiftRightLogicalNarrowingLower(Vector128<long> value, byte count)
Vector64<sbyte> ShiftRightLogicalNarrowingLower(Vector128<short> value, byte count)
Vector64<ushort> ShiftRightLogicalNarrowingLower(Vector128<uint> value, byte count)
Vector64<uint> ShiftRightLogicalNarrowingLower(Vector128<ulong> value, byte count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftRightLogicalNarrowingLowerTest(System.Runtime.Intrinsics.Vector128`1[UInt16],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
shrn v16.8b, v0.8h, #1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8