Introduction
In my vectorization using .NET APIs blog, I describe SIMD datatypes Vector64<T>
and Vector128<T>
that operates on ‘Arm64 hardware intrinsic’ APIs present under System.Runtime.Intrinsics.Arm.AdvSimd and System.Runtime.Intrinsics.Arm.AdvSimd.Arm64 class. In this post I will describe those hardware intrinsic APIs by showing sample code usage along with examples and generated Arm64 code. This will help people in understanding these APIs so they can use them to optimize their .NET code written to target Arm64. Since there are 360 APIs, describing all of them in a single post will be overwhelming. So I have divided these APIs among 8 blogs and will demonstrate 45 APIs in each blog. This is part 6 of that blog series. You can checkout my previous blogs at:
Most of the description of these APIs is adapted and referenced from Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile document. You can also refer to the description of SIMD and Floating-point instructions description at Arm developer docs page.
The blog page is programmatically generated and might contain mistakes. If you find any mistake, please leave a comment and I will address it.
APIs covered
1. MultiplyWideningLowerAndSubtract
Vector128<ushort> MultiplyWideningLowerAndSubtract(Vector128<ushort> minuend, Vector64<byte> left, Vector64<byte> right)
This method multiplies the vector elements in the left
by the corresponding vector elements of the right
vector, and subtracts the results with the vector elements from the minuend
vector and return the result vector. As seen in below example, the result vector element’s size ushort
is twice as long as the input element’s size byte
.
private Vector128<ushort> MultiplyWideningLowerAndSubtractTest(Vector128<ushort> minuend, Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.MultiplyWideningLowerAndSubtract(minuend, left, right);
}
// minuend = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <65316, 65284, 65250, 65214, 65176, 65136, 65094, 65050>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyWideningLowerAndSubtract(Vector128<int> minuend, Vector64<short> left, Vector64<short> right)
Vector128<long> MultiplyWideningLowerAndSubtract(Vector128<long> minuend, Vector64<int> left, Vector64<int> right)
Vector128<short> MultiplyWideningLowerAndSubtract(Vector128<short> minuend, Vector64<sbyte> left, Vector64<sbyte> right)
Vector128<uint> MultiplyWideningLowerAndSubtract(Vector128<uint> minuend, Vector64<ushort> left, Vector64<ushort> right)
Vector128<ulong> MultiplyWideningLowerAndSubtract(Vector128<ulong> minuend, Vector64<uint> left, Vector64<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyWideningLowerAndSubtractTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
umlsl v0.8h, v1.8b, v2.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
2. MultiplyWideningUpper
Vector128<ushort> MultiplyWideningUpper(Vector128<byte> left, Vector128<byte> right)
This method multiplies corresponding vector elements in the upper-half of left
and right
vector, stores the result in a vector, and returns the result vector. As seen in below example, the result vector element’s size ushort
is twice as long as the input element’s size byte
.
private Vector128<ushort> MultiplyWideningUpperTest(Vector128<byte> left, Vector128<byte> right)
{
return AdvSimd.MultiplyWideningUpper(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36>
// Result = <551, 600, 651, 704, 759, 816, 875, 936>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyWideningUpper(Vector128<short> left, Vector128<short> right)
Vector128<long> MultiplyWideningUpper(Vector128<int> left, Vector128<int> right)
Vector128<short> MultiplyWideningUpper(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<uint> MultiplyWideningUpper(Vector128<ushort> left, Vector128<ushort> right)
Vector128<ulong> MultiplyWideningUpper(Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyWideningUpperTest(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
umull2 v16.8h, v0.16b, v1.16b
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
3. MultiplyWideningUpperAndAdd
Vector128<ushort> MultiplyWideningUpperAndAdd(Vector128<ushort> addend, Vector128<byte> left, Vector128<byte> right)
This method multiplies corresponding vector elements in the upper-half of left
and right
vector, and accumulates the results with the vector elements of the addend
vector and returns the result vector. As seen in below example, the result vector element’s size ushort
is twice as long as the input element’s size byte
.
private Vector128<ushort> MultiplyWideningUpperAndAddTest(Vector128<ushort> addend, Vector128<byte> left, Vector128<byte> right)
{
return AdvSimd.MultiplyWideningUpperAndAdd(addend, left, right);
}
// addend = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36>
// Result = <562, 612, 664, 718, 774, 832, 892, 954>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyWideningUpperAndAdd(Vector128<int> addend, Vector128<short> left, Vector128<short> right)
Vector128<long> MultiplyWideningUpperAndAdd(Vector128<long> addend, Vector128<int> left, Vector128<int> right)
Vector128<short> MultiplyWideningUpperAndAdd(Vector128<short> addend, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<uint> MultiplyWideningUpperAndAdd(Vector128<uint> addend, Vector128<ushort> left, Vector128<ushort> right)
Vector128<ulong> MultiplyWideningUpperAndAdd(Vector128<ulong> addend, Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyWideningUpperAndAddTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd16 -> d2 HFA(simd16)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
umlal2 v0.8h, v1.16b, v2.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
4. MultiplyWideningUpperAndSubtract
Vector128<ushort> MultiplyWideningUpperAndSubtract(Vector128<ushort> minuend, Vector128<byte> left, Vector128<byte> right)
This method multiplies the vector elements in the upper-half of left
by the corresponding vector elements of the right
vector, and subtracts the results with the vector elements from the minuend
vector and return the result vector. As seen in below example, the result vector element’s size ushort
is twice as long as the input element’s size byte
.
private Vector128<ushort> MultiplyWideningUpperAndSubtractTest(Vector128<ushort> minuend, Vector128<byte> left, Vector128<byte> right)
{
return AdvSimd.MultiplyWideningUpperAndSubtract(minuend, left, right);
}
// minuend = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36>
// Result = <64996, 64948, 64898, 64846, 64792, 64736, 64678, 64618>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyWideningUpperAndSubtract(Vector128<int> minuend, Vector128<short> left, Vector128<short> right)
Vector128<long> MultiplyWideningUpperAndSubtract(Vector128<long> minuend, Vector128<int> left, Vector128<int> right)
Vector128<short> MultiplyWideningUpperAndSubtract(Vector128<short> minuend, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<uint> MultiplyWideningUpperAndSubtract(Vector128<uint> minuend, Vector128<ushort> left, Vector128<ushort> right)
Vector128<ulong> MultiplyWideningUpperAndSubtract(Vector128<ulong> minuend, Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyWideningUpperAndSubtractTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd16 -> d2 HFA(simd16)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
umlsl2 v0.8h, v1.16b, v2.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
5. Negate
Vector64<short> Negate(Vector64<short> value)
This method negates each element of value
vector and returns the result vector.
private Vector64<short> NegateTest(Vector64<short> value)
{
return AdvSimd.Negate(value);
}
// value = <11, 12, 13, 14>
// Result = <-11, -12, -13, -14>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> Negate(Vector64<int> value)
Vector64<sbyte> Negate(Vector64<sbyte> value)
Vector64<float> Negate(Vector64<float> value)
Vector128<short> Negate(Vector128<short> value)
Vector128<int> Negate(Vector128<int> value)
Vector128<sbyte> Negate(Vector128<sbyte> value)
Vector128<float> Negate(Vector128<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Negate(Vector128<double> value)
Vector128<long> Negate(Vector128<long> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:NegateTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
neg v16.4h, v0.4h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
6. NegateSaturate
Vector64<short> NegateSaturate(Vector64<short> value)
This method negates each vector element in the value
vector, stores the results in a vector and returns the result vector. All the values in this method are signed integer values. If there is an overflow with the negation, that element result is saturated.
private Vector64<short> NegateSaturateTest(Vector64<short> value)
{
return AdvSimd.NegateSaturate(value);
}
// value = <11, 12, 13, 14>
// Result = <-11, -12, -13, -14>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> NegateSaturate(Vector64<int> value)
Vector64<sbyte> NegateSaturate(Vector64<sbyte> value)
Vector128<short> NegateSaturate(Vector128<short> value)
Vector128<int> NegateSaturate(Vector128<int> value)
Vector128<sbyte> NegateSaturate(Vector128<sbyte> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<long> NegateSaturate(Vector128<long> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:NegateSaturateTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqneg v16.4h, v0.4h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
7. NegateSaturateScalar
Vector64<short> NegateSaturateScalar(Vector64<short> value)
This method negates 0th vector element in the value
vector, stores the result in 0th element of a vector and returns the result vector. All non-zero elements are initialized to 0. All the values in this method are signed integer values. If there is an overflow with the negation, that element result is saturated.
private Vector64<short> NegateSaturateScalarTest(Vector64<short> value)
{
return AdvSimd.Arm64.NegateSaturateScalar(value);
}
// value = <11, 12, 13, 14>
// Result = <-11, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> NegateSaturateScalar(Vector64<int> value)
Vector64<long> NegateSaturateScalar(Vector64<long> value)
Vector64<sbyte> NegateSaturateScalar(Vector64<sbyte> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:NegateSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqneg h16, h0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
8. NegateScalar
Vector64<double> NegateScalar(Vector64<double> value)
This method negates the elements in the value
vector and returns the result.
private Vector64<double> NegateScalarTest(Vector64<double> value)
{
return AdvSimd.NegateScalar(value);
}
// value = <11.5>
// Result = <-11.5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> NegateScalar(Vector64<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<long> NegateScalar(Vector64<long> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:NegateScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fneg d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
9. Not
Vector64<byte> Not(Vector64<byte> value)
This method performs bitwise inverse of each element of the value
vector, stores the result in a vector and returns the result vector.
private Vector64<byte> NotTest(Vector64<byte> value)
{
return AdvSimd.Not(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <244, 243, 242, 241, 240, 239, 238, 237>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> Not(Vector64<double> value)
Vector64<short> Not(Vector64<short> value)
Vector64<int> Not(Vector64<int> value)
Vector64<long> Not(Vector64<long> value)
Vector64<sbyte> Not(Vector64<sbyte> value)
Vector64<float> Not(Vector64<float> value)
Vector64<ushort> Not(Vector64<ushort> value)
Vector64<uint> Not(Vector64<uint> value)
Vector64<ulong> Not(Vector64<ulong> value)
Vector128<byte> Not(Vector128<byte> value)
Vector128<double> Not(Vector128<double> value)
Vector128<short> Not(Vector128<short> value)
Vector128<int> Not(Vector128<int> value)
Vector128<long> Not(Vector128<long> value)
Vector128<sbyte> Not(Vector128<sbyte> value)
Vector128<float> Not(Vector128<float> value)
Vector128<ushort> Not(Vector128<ushort> value)
Vector128<uint> Not(Vector128<uint> value)
Vector128<ulong> Not(Vector128<ulong> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:NotTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mvn v16.8b, v0.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
10. Or
Vector64<byte> Or(Vector64<byte> left, Vector64<byte> right)
This method performs a bitwise OR between the left
and right
vectors, and returns the result.
private Vector64<byte> OrTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.Or(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <31, 30, 31, 30, 31, 26, 27, 30>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> Or(Vector64<double> left, Vector64<double> right)
Vector64<short> Or(Vector64<short> left, Vector64<short> right)
Vector64<int> Or(Vector64<int> left, Vector64<int> right)
Vector64<long> Or(Vector64<long> left, Vector64<long> right)
Vector64<sbyte> Or(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> Or(Vector64<float> left, Vector64<float> right)
Vector64<ushort> Or(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> Or(Vector64<uint> left, Vector64<uint> right)
Vector64<ulong> Or(Vector64<ulong> left, Vector64<ulong> right)
Vector128<byte> Or(Vector128<byte> left, Vector128<byte> right)
Vector128<double> Or(Vector128<double> left, Vector128<double> right)
Vector128<short> Or(Vector128<short> left, Vector128<short> right)
Vector128<int> Or(Vector128<int> left, Vector128<int> right)
Vector128<long> Or(Vector128<long> left, Vector128<long> right)
Vector128<sbyte> Or(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> Or(Vector128<float> left, Vector128<float> right)
Vector128<ushort> Or(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> Or(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> Or(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:OrTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
orr v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
11. OrNot
Vector64<byte> OrNot(Vector64<byte> left, Vector64<byte> right)
This method performs a bitwise OR NOT between the left
and right
vectors, and returns the result.
private Vector64<byte> OrNotTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.OrNot(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <235, 237, 237, 239, 239, 245, 245, 243>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> OrNot(Vector64<double> left, Vector64<double> right)
Vector64<short> OrNot(Vector64<short> left, Vector64<short> right)
Vector64<int> OrNot(Vector64<int> left, Vector64<int> right)
Vector64<long> OrNot(Vector64<long> left, Vector64<long> right)
Vector64<sbyte> OrNot(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> OrNot(Vector64<float> left, Vector64<float> right)
Vector64<ushort> OrNot(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> OrNot(Vector64<uint> left, Vector64<uint> right)
Vector64<ulong> OrNot(Vector64<ulong> left, Vector64<ulong> right)
Vector128<byte> OrNot(Vector128<byte> left, Vector128<byte> right)
Vector128<double> OrNot(Vector128<double> left, Vector128<double> right)
Vector128<short> OrNot(Vector128<short> left, Vector128<short> right)
Vector128<int> OrNot(Vector128<int> left, Vector128<int> right)
Vector128<long> OrNot(Vector128<long> left, Vector128<long> right)
Vector128<sbyte> OrNot(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> OrNot(Vector128<float> left, Vector128<float> right)
Vector128<ushort> OrNot(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> OrNot(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> OrNot(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:OrNotTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
orn v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
12. PolynomialMultiply
Vector64<byte> PolynomialMultiply(Vector64<byte> left, Vector64<byte> right)
This method multiplies corresponding elements in the vectors of the left
and right
vectors, stores the results in a vector and returns the result vector.
private Vector64<byte> PolynomialMultiplyTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.PolynomialMultiply(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <151, 232, 243, 144, 135, 160, 171, 248>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<sbyte> PolynomialMultiply(Vector64<sbyte> left, Vector64<sbyte> right)
Vector128<byte> PolynomialMultiply(Vector128<byte> left, Vector128<byte> right)
Vector128<sbyte> PolynomialMultiply(Vector128<sbyte> left, Vector128<sbyte> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:PolynomialMultiplyTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
pmul v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
13. PolynomialMultiplyWideningLower
Vector128<ushort> PolynomialMultiplyWideningLower(Vector64<byte> left, Vector64<byte> right)
This method multiplies corresponding elements in the left
and right
vectors, stores the results in a vector and returns the result vector. As seen in below example, the result vector element’s size ushort
is twice as long as the input element’s size byte
.
private Vector128<ushort> PolynomialMultiplyWideningLowerTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.PolynomialMultiplyWideningLower(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <151, 232, 243, 144, 135, 416, 427, 504>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> PolynomialMultiplyWideningLower(Vector64<sbyte> left, Vector64<sbyte> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:PolynomialMultiplyWideningLowerTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
pmull v16.8h, v0.8b, v1.8b
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
14. PolynomialMultiplyWideningUpper
Vector128<ushort> PolynomialMultiplyWideningUpper(Vector128<byte> left, Vector128<byte> right)
This method multiplies corresponding elements in the upper-half of left
with corresponding elements of right
vectors, stores the results in a vector and returns the result vector. As seen in below example, the result vector element’s size ushort
is twice as long as the input element’s size byte
.
private Vector128<ushort> PolynomialMultiplyWideningUpperTest(Vector128<byte> left, Vector128<byte> right)
{
return AdvSimd.PolynomialMultiplyWideningUpper(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36>
// Result = <503, 408, 403, 704, 759, 816, 779, 808>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> PolynomialMultiplyWideningUpper(Vector128<sbyte> left, Vector128<sbyte> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:PolynomialMultiplyWideningUpperTest(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
pmull2 v16.8h, v0.16b, v1.16b
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
15. PopCount
Vector64<byte> PopCount(Vector64<byte> value)
This method counts the number of bits that have a value of one in each element in the value
vector, stores the results in a vector and returns the result vector.
private Vector64<byte> PopCountTest(Vector64<byte> value)
{
return AdvSimd.PopCount(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <3, 2, 3, 3, 4, 1, 2, 2>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<sbyte> PopCount(Vector64<sbyte> value)
Vector128<byte> PopCount(Vector128<byte> value)
Vector128<sbyte> PopCount(Vector128<sbyte> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:PopCountTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
cnt v16.8b, v0.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
16. ReciprocalEstimate
Vector64<float> ReciprocalEstimate(Vector64<float> value)
This method finds an approximate reciprocal estimate for each element in the value
vector, stores the results in a vector and returns the result vector.
private Vector64<float> ReciprocalEstimateTest(Vector64<float> value)
{
return AdvSimd.ReciprocalEstimate(value);
}
// value = <11.5, 12.5>
// Result = <0.08691406, 0.079833984>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<uint> ReciprocalEstimate(Vector64<uint> value)
Vector128<float> ReciprocalEstimate(Vector128<float> value)
Vector128<uint> ReciprocalEstimate(Vector128<uint> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> ReciprocalEstimate(Vector128<double> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReciprocalEstimateTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frecpe v16.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
17. ReciprocalEstimateScalar
Vector64<double> ReciprocalEstimateScalar(Vector64<double> value)
This method finds an approximate reciprocal estimate for each element in the value
vector, stores the results in a vector and returns the result vector.
private Vector64<double> ReciprocalEstimateScalarTest(Vector64<double> value)
{
return AdvSimd.Arm64.ReciprocalEstimateScalar(value);
}
// value = <11.5>
// Result = <0.0869140625>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> ReciprocalEstimateScalar(Vector64<float> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReciprocalEstimateScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frecpe d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
18. ReciprocalExponentScalar
Vector64<double> ReciprocalExponentScalar(Vector64<double> value)
This method finds an approximate reciprocal exponent for each element in the value
vector, stores the results in a vector and returns the result vector.
private Vector64<double> ReciprocalExponentScalarTest(Vector64<double> value)
{
return AdvSimd.Arm64.ReciprocalExponentScalar(value);
}
// value = <11.5>
// Result = <0.25>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> ReciprocalExponentScalar(Vector64<float> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReciprocalExponentScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frecpx d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
19. ReciprocalSquareRootEstimate
Vector64<float> ReciprocalSquareRootEstimate(Vector64<float> value)
This method calculates an approximate square root for each element in the value
vector, stores the results in a vector and returns the result vector.
private Vector64<float> ReciprocalSquareRootEstimateTest(Vector64<float> value)
{
return AdvSimd.ReciprocalSquareRootEstimate(value);
}
// value = <11.5, 12.5>
// Result = <0.29492188, 0.28222656>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<uint> ReciprocalSquareRootEstimate(Vector64<uint> value)
Vector128<float> ReciprocalSquareRootEstimate(Vector128<float> value)
Vector128<uint> ReciprocalSquareRootEstimate(Vector128<uint> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> ReciprocalSquareRootEstimate(Vector128<double> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReciprocalSquareRootEstimateTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frsqrte v16.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
20. ReciprocalSquareRootEstimateScalar
Vector64<double> ReciprocalSquareRootEstimateScalar(Vector64<double> value)
This method calculates an approximate square root for each element in the value
vector, stores the results in a vector and returns the result vector.
private Vector64<double> ReciprocalSquareRootEstimateScalarTest(Vector64<double> value)
{
return AdvSimd.Arm64.ReciprocalSquareRootEstimateScalar(value);
}
// value = <11.5>
// Result = <0.294921875>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> ReciprocalSquareRootEstimateScalar(Vector64<float> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReciprocalSquareRootEstimateScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frsqrte d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
21. ReciprocalSquareRootStep
Vector64<float> ReciprocalSquareRootStep(Vector64<float> left, Vector64<float> right)
This method multiplies corresponding floating-point values in the left
and right
vector, subtracts each of the products from 3.0, divides these results by 2.0, stores the results in a vector and returns the result vector.
private Vector64<float> ReciprocalSquareRootStepTest(Vector64<float> left, Vector64<float> right)
{
return AdvSimd.ReciprocalSquareRootStep(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <-122.125, -139.125>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> ReciprocalSquareRootStep(Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> ReciprocalSquareRootStep(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReciprocalSquareRootStepTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frsqrts v16.2s, v0.2s, v1.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
22. ReciprocalSquareRootStepScalar
Vector64<double> ReciprocalSquareRootStepScalar(Vector64<double> left, Vector64<double> right)
This method multiplies corresponding floating-point values in the vectors of the left
and right
vectors, subtracts each of the products from 3.0, divides these results by 2.0, stores the results in a vector and returns the result vector.
private Vector64<double> ReciprocalSquareRootStepScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.Arm64.ReciprocalSquareRootStepScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <-64.625>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> ReciprocalSquareRootStepScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReciprocalSquareRootStepScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frsqrts d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
23. ReciprocalStep
Vector64<float> ReciprocalStep(Vector64<float> left, Vector64<float> right)
This method multiplies the corresponding floating-point values in the left
and right
vectors, subtracts each of the products from 2.0, stores the results in a vector and returns the result vector.
private Vector64<float> ReciprocalStepTest(Vector64<float> left, Vector64<float> right)
{
return AdvSimd.ReciprocalStep(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <-245.25, -279.25>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> ReciprocalStep(Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> ReciprocalStep(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReciprocalStepTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frecps v16.2s, v0.2s, v1.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
24. ReciprocalStepScalar
Vector64<double> ReciprocalStepScalar(Vector64<double> left, Vector64<double> right)
This method multiplies the corresponding floating-point values in the left
and right
vectors, subtracts each of the products from 2.0, stores the results in a vector and returns the result vector.
private Vector64<double> ReciprocalStepScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.Arm64.ReciprocalStepScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <-130.25>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> ReciprocalStepScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReciprocalStepScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frecps d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
25. ReverseElement16
Vector64<int> ReverseElement16(Vector64<int> value)
Reverse bytes in each 32-bits value and returns the result.
private Vector64<int> ReverseElement16Test(Vector64<int> value)
{
return AdvSimd.ReverseElement16(value);
}
// value = <11, 12>
// Result = <720896, 786432>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<long> ReverseElement16(Vector64<long> value)
Vector64<uint> ReverseElement16(Vector64<uint> value)
Vector64<ulong> ReverseElement16(Vector64<ulong> value)
Vector128<int> ReverseElement16(Vector128<int> value)
Vector128<long> ReverseElement16(Vector128<long> value)
Vector128<uint> ReverseElement16(Vector128<uint> value)
Vector128<ulong> ReverseElement16(Vector128<ulong> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReverseElement16Test(System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector64`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
rev32 v16.4h, v0.4h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
26. ReverseElement32
Vector64<long> ReverseElement32(Vector64<long> value)
Reverse bytes in each 64-bits value and returns the result.
private Vector64<long> ReverseElement32Test(Vector64<long> value)
{
return AdvSimd.ReverseElement32(value);
}
// value = <11>
// Result = <47244640256>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> ReverseElement32(Vector64<ulong> value)
Vector128<long> ReverseElement32(Vector128<long> value)
Vector128<ulong> ReverseElement32(Vector128<ulong> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReverseElement32Test(System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
rev64 v16.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
27. ReverseElement8
Vector64<short> ReverseElement8(Vector64<short> value)
Reverse bytes in each 16-bit half word values and returns the result.
private Vector64<short> ReverseElement8Test(Vector64<short> value)
{
return AdvSimd.ReverseElement8(value);
}
// value = <11, 12, 13, 14>
// Result = <2816, 3072, 3328, 3584>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ReverseElement8(Vector64<int> value)
Vector64<long> ReverseElement8(Vector64<long> value)
Vector64<ushort> ReverseElement8(Vector64<ushort> value)
Vector64<uint> ReverseElement8(Vector64<uint> value)
Vector64<ulong> ReverseElement8(Vector64<ulong> value)
Vector128<short> ReverseElement8(Vector128<short> value)
Vector128<int> ReverseElement8(Vector128<int> value)
Vector128<long> ReverseElement8(Vector128<long> value)
Vector128<ushort> ReverseElement8(Vector128<ushort> value)
Vector128<uint> ReverseElement8(Vector128<uint> value)
Vector128<ulong> ReverseElement8(Vector128<ulong> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReverseElement8Test(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
rev16 v16.8b, v0.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
28. ReverseElementBits
Vector64<byte> ReverseElementBits(Vector64<byte> value)
Reverses the bit order of all elements in value
vector.
private Vector64<byte> ReverseElementBitsTest(Vector64<byte> value)
{
return AdvSimd.Arm64.ReverseElementBits(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <208, 48, 176, 112, 240, 8, 136, 72>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<sbyte> ReverseElementBits(Vector64<sbyte> value)
Vector128<byte> ReverseElementBits(Vector128<byte> value)
Vector128<sbyte> ReverseElementBits(Vector128<sbyte> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ReverseElementBitsTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
rbit v16.8b, v0.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
29. RoundAwayFromZero
Vector64<float> RoundAwayFromZero(Vector64<float> value)
This method rounds a vector of floating-point values in the value
vector to integral floating-point values of the same size using the Round to Nearest with Ties to Away rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<float> RoundAwayFromZeroTest(Vector64<float> value)
{
return AdvSimd.RoundAwayFromZero(value);
}
// value = <11.5, 12.5>
// Result = <12, 13>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> RoundAwayFromZero(Vector128<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> RoundAwayFromZero(Vector128<double> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:RoundAwayFromZeroTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frinta v16.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
30. RoundAwayFromZeroScalar
Vector64<double> RoundAwayFromZeroScalar(Vector64<double> value)
This method rounds a floating-point value in the value
vector to an integral floating-point value of the same size using the Round to Nearest with Ties to Away rounding mode, and returns the result. As per ARM docs, zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<double> RoundAwayFromZeroScalarTest(Vector64<double> value)
{
return AdvSimd.RoundAwayFromZeroScalar(value);
}
// value = <11.5>
// Result = <12>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> RoundAwayFromZeroScalar(Vector64<float> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:RoundAwayFromZeroScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frinta d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
31. RoundToNearest
Vector64<float> RoundToNearest(Vector64<float> value)
This method rounds a vector of floating-point values in the value
vector to integral floating-point values of the same size using the Round to Nearest rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<float> RoundToNearestTest(Vector64<float> value)
{
return AdvSimd.RoundToNearest(value);
}
// value = <11.4, 12.8>
// Result = <11, 13>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> RoundToNearest(Vector128<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> RoundToNearest(Vector128<double> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:RoundToNearestTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frintn v16.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
32. RoundToNearestScalar
Vector64<double> RoundToNearestScalar(Vector64<double> value)
This method rounds a vector of floating-point values in the value
vector to integral floating-point values of the same size using the Round to Nearest rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<double> RoundToNearestScalarTest(Vector64<double> value)
{
return AdvSimd.RoundToNearestScalar(value);
}
// value = <11.4>
// Result = <11>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> RoundToNearestScalar(Vector64<float> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:RoundToNearestScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frintn d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
33. RoundToNegativeInfinity
Vector64<float> RoundToNegativeInfinity(Vector64<float> value)
This method rounds a floating-point value in the value
vector to an integral floating-point value of the same size using the Round towards Minus Infinity rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<float> RoundToNegativeInfinityTest(Vector64<float> value)
{
return AdvSimd.RoundToNegativeInfinity(value);
}
// value = <11.5, 12.5>
// Result = <11, 12>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> RoundToNegativeInfinity(Vector128<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> RoundToNegativeInfinity(Vector128<double> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:RoundToNegativeInfinityTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frintm v16.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
34. RoundToNegativeInfinityScalar
Vector64<double> RoundToNegativeInfinityScalar(Vector64<double> value)
This method rounds a floating-point value in the value
vector to an integral floating-point value of the same size using the Round towards Minus Infinity rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<double> RoundToNegativeInfinityScalarTest(Vector64<double> value)
{
return AdvSimd.RoundToNegativeInfinityScalar(value);
}
// value = <11.5>
// Result = <11>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> RoundToNegativeInfinityScalar(Vector64<float> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:RoundToNegativeInfinityScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frintm d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
35. RoundToPositiveInfinity
Vector64<float> RoundToPositiveInfinity(Vector64<float> value)
This method rounds a floating-point value in the value
vector to an integral floating-point value of the same size using the Round towards Plus Infinity rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<float> RoundToPositiveInfinityTest(Vector64<float> value)
{
return AdvSimd.RoundToPositiveInfinity(value);
}
// value = <11.5, 12.5>
// Result = <12, 13>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> RoundToPositiveInfinity(Vector128<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> RoundToPositiveInfinity(Vector128<double> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:RoundToPositiveInfinityTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frintp v16.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
36. RoundToPositiveInfinityScalar
Vector64<double> RoundToPositiveInfinityScalar(Vector64<double> value)
This method rounds a floating-point value in the value
vector to an integral floating-point value of the same size using the Round towards Plus Infinity rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<double> RoundToPositiveInfinityScalarTest(Vector64<double> value)
{
return AdvSimd.RoundToPositiveInfinityScalar(value);
}
// value = <11.5>
// Result = <12>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> RoundToPositiveInfinityScalar(Vector64<float> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:RoundToPositiveInfinityScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frintp d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
37. RoundToZero
Vector64<float> RoundToZero(Vector64<float> value)
This method rounds a vector of floating-point values in the value
vector to integral floating-point values of the same size using the Round towards Zero rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<float> RoundToZeroTest(Vector64<float> value)
{
return AdvSimd.RoundToZero(value);
}
// value = <11.4, 12.8>
// Result = <11, 12>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> RoundToZero(Vector128<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> RoundToZero(Vector128<double> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:RoundToZeroTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frintz v16.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
38. RoundToZeroScalar
Vector64<double> RoundToZeroScalar(Vector64<double> value)
This method rounds a vector of floating-point values in the value
vector to integral floating-point values of the same size using the Round towards Zero rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<double> RoundToZeroScalarTest(Vector64<double> value)
{
return AdvSimd.RoundToZeroScalar(value);
}
// value = <11.4>
// Result = <11>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> RoundToZeroScalar(Vector64<float> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:RoundToZeroScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frintz d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
39. ShiftArithmetic
Vector64<short> ShiftArithmetic(Vector64<short> value, Vector64<short> count)
This method performs arithmetic shifts of each signed integer value in the value
vector, by the value in corresponding element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a truncating right shift.
private Vector64<short> ShiftArithmeticTest(Vector64<short> value, Vector64<short> count)
{
return AdvSimd.ShiftArithmetic(value, count);
}
// value = <11, 12, 13, 14>
// count = <18, 2, 3, -2>
// Result = <0, 48, 104, 3>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftArithmetic(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftArithmetic(Vector64<sbyte> value, Vector64<sbyte> count)
Vector128<short> ShiftArithmetic(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftArithmetic(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftArithmetic(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftArithmetic(Vector128<sbyte> value, Vector128<sbyte> count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftArithmeticTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sshl v16.4h, v0.4h, v1.4h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
40. ShiftArithmeticRounded
Vector64<short> ShiftArithmeticRounded(Vector64<short> value, Vector64<short> count)
This method performs arithmetic shift of each signed integer value in the value
vector, by a value in corresponding element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift.
private Vector64<short> ShiftArithmeticRoundedTest(Vector64<short> value, Vector64<short> count)
{
return AdvSimd.ShiftArithmeticRounded(value, count);
}
// value = <11, 12, 13, 14>
// count = <18, 2, 3, -2>
// Result = <0, 48, 104, 4>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftArithmeticRounded(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftArithmeticRounded(Vector64<sbyte> value, Vector64<sbyte> count)
Vector128<short> ShiftArithmeticRounded(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftArithmeticRounded(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftArithmeticRounded(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftArithmeticRounded(Vector128<sbyte> value, Vector128<sbyte> count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftArithmeticRoundedTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
srshl v16.4h, v0.4h, v1.4h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
41. ShiftArithmeticRoundedSaturate
Vector64<short> ShiftArithmeticRoundedSaturate(Vector64<short> value, Vector64<short> count)
This method performs arithmetic shift of each vector element in the value
vector, by a value of the corresponding vector element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded.
private Vector64<short> ShiftArithmeticRoundedSaturateTest(Vector64<short> value, Vector64<short> count)
{
return AdvSimd.ShiftArithmeticRoundedSaturate(value, count);
}
// value = <11, 12, 13, 14>
// count = <18, 2, 3, -2>
// Result = <32767, 48, 104, 4>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftArithmeticRoundedSaturate(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftArithmeticRoundedSaturate(Vector64<sbyte> value, Vector64<sbyte> count)
Vector128<short> ShiftArithmeticRoundedSaturate(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftArithmeticRoundedSaturate(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftArithmeticRoundedSaturate(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftArithmeticRoundedSaturate(Vector128<sbyte> value, Vector128<sbyte> count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftArithmeticRoundedSaturateTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrshl v16.4h, v0.4h, v1.4h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
42. ShiftArithmeticRoundedSaturateScalar
Vector64<long> ShiftArithmeticRoundedSaturateScalar(Vector64<long> value, Vector64<long> count)
This method performs arithmetic shift of 0th element in the value
vector, by a value of the corresponding 0th element in the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are rounded.
private Vector64<long> ShiftArithmeticRoundedSaturateScalarTest(Vector64<long> value, Vector64<long> count)
{
return AdvSimd.ShiftArithmeticRoundedSaturateScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> ShiftArithmeticRoundedSaturateScalar(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftArithmeticRoundedSaturateScalar(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftArithmeticRoundedSaturateScalar(Vector64<sbyte> value, Vector64<sbyte> count)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftArithmeticRoundedSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqrshl d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
43. ShiftArithmeticRoundedScalar
Vector64<long> ShiftArithmeticRoundedScalar(Vector64<long> value, Vector64<long> count)
This method performs arithmetic shift of each signed integer value in the value
vector, by a value of the corresponding element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. If the shift value is negative, it is a rounding right shift.
private Vector64<long> ShiftArithmeticRoundedScalarTest(Vector64<long> value, Vector64<long> count)
{
return AdvSimd.ShiftArithmeticRoundedScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftArithmeticRoundedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
srshl d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
44. ShiftArithmeticSaturate
Vector64<short> ShiftArithmeticSaturate(Vector64<short> value, Vector64<short> count)
This method performs arithmetic shift of each element in the value
vector, by a value of the corresponding element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated.
private Vector64<short> ShiftArithmeticSaturateTest(Vector64<short> value, Vector64<short> count)
{
return AdvSimd.ShiftArithmeticSaturate(value, count);
}
// value = <11, 12, 13, 14>
// count = <21, 22, 23, 24>
// Result = <32767, 32767, 32767, 32767>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> ShiftArithmeticSaturate(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftArithmeticSaturate(Vector64<sbyte> value, Vector64<sbyte> count)
Vector128<short> ShiftArithmeticSaturate(Vector128<short> value, Vector128<short> count)
Vector128<int> ShiftArithmeticSaturate(Vector128<int> value, Vector128<int> count)
Vector128<long> ShiftArithmeticSaturate(Vector128<long> value, Vector128<long> count)
Vector128<sbyte> ShiftArithmeticSaturate(Vector128<sbyte> value, Vector128<sbyte> count)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftArithmeticSaturateTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqshl v16.4h, v0.4h, v1.4h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
45. ShiftArithmeticSaturateScalar
Vector64<long> ShiftArithmeticSaturateScalar(Vector64<long> value, Vector64<long> count)
This method performs arithmetic shift of each element in the value
vector, by a value of the corresponding element of the count
vector, stores the results in a vector and returns the result vector. If the shift value is positive, the operation is a left shift. Otherwise, it is a right shift. The results are truncated.
private Vector64<long> ShiftArithmeticSaturateScalarTest(Vector64<long> value, Vector64<long> count)
{
return AdvSimd.ShiftArithmeticSaturateScalar(value, count);
}
// value = <11>
// count = <11>
// Result = <22528>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> ShiftArithmeticSaturateScalar(Vector64<short> value, Vector64<short> count)
Vector64<int> ShiftArithmeticSaturateScalar(Vector64<int> value, Vector64<int> count)
Vector64<sbyte> ShiftArithmeticSaturateScalar(Vector64<sbyte> value, Vector64<sbyte> count)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:ShiftArithmeticSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqshl d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8