Introduction
In my vectorization using .NET APIs blog, I describe SIMD datatypes Vector64<T>
and Vector128<T>
that operates on ‘Arm64 hardware intrinsic’ APIs present under System.Runtime.Intrinsics.Arm.AdvSimd and System.Runtime.Intrinsics.Arm.AdvSimd.Arm64 class. In this post I will describe those hardware intrinsic APIs by showing sample code usage along with examples and generated Arm64 code. This will help people in understanding these APIs so they can use them to optimize their .NET code written to target Arm64. Since there are 360 APIs, describing all of them in a single post will be overwhelming. So I have divided these APIs among 8 blogs and will demonstrate 45 APIs in each blog. This is part 4 of that blog series. You can checkout my previous blogs at:
Most of the description of these APIs is adapted and referenced from Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile document. You can also refer to the description of SIMD and Floating-point instructions description at Arm developer docs page.
The blog page is programmatically generated and might contain mistakes. If you find any mistake, please leave a comment and I will address it.
APIs covered
1. FusedMultiplySubtractScalarBySelectedScalar
Vector64<double> FusedMultiplySubtractScalarBySelectedScalar(Vector64<double> minuend, Vector64<double> left, Vector128<double> right, byte rightIndex)
This method multiplies the vector elements in the left
vector by the rightIndex
element in the right
vector, and subtracts the results from the vector elements of the minuend
vector and returns the result.
private Vector64<double> FusedMultiplySubtractScalarBySelectedScalarTest(Vector64<double> minuend, Vector64<double> left, Vector128<double> right, byte rightIndex)
{
return AdvSimd.Arm64.FusedMultiplySubtractScalarBySelectedScalar(minuend, left, right, 0);
}
// minuend = <11.5>
// left = <11.5>
// right = <11.5, 12.5>
// rightIndex = 0
// Result = <-120.75>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> FusedMultiplySubtractScalarBySelectedScalar(Vector64<float> minuend, Vector64<float> left, Vector64<float> right, byte rightIndex)
Vector64<float> FusedMultiplySubtractScalarBySelectedScalar(Vector64<float> minuend, Vector64<float> left, Vector128<float> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedMultiplySubtractScalarBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector128`1[Double],ubyte):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd16 -> d2 HFA(simd16)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmls d0, d1, v2.d[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
2. FusedSubtractHalving
Vector64<byte> FusedSubtractHalving(Vector64<byte> left, Vector64<byte> right)
This method subtracts the corresponding vector elements in the right
vector from those of left
vector, shifts each result right one bit, stores the result in a vector, and returns the result vector.
private Vector64<byte> FusedSubtractHalvingTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.FusedSubtractHalving(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <251, 251, 251, 251, 251, 251, 251, 251>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> FusedSubtractHalving(Vector64<short> left, Vector64<short> right)
Vector64<int> FusedSubtractHalving(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> FusedSubtractHalving(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<ushort> FusedSubtractHalving(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> FusedSubtractHalving(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> FusedSubtractHalving(Vector128<byte> left, Vector128<byte> right)
Vector128<short> FusedSubtractHalving(Vector128<short> left, Vector128<short> right)
Vector128<int> FusedSubtractHalving(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> FusedSubtractHalving(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> FusedSubtractHalving(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> FusedSubtractHalving(Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:FusedSubtractHalvingTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uhsub v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
3. Insert
Vector64<byte> Insert(Vector64<byte> vector, byte index, byte data)
This method copies the vector
vector in result vector with the element at index
set to data
value.
private Vector64<byte> InsertTest(Vector64<byte> vector, byte index, byte data)
{
return AdvSimd.Insert(vector, 4, 200);
}
// vector = <11, 12, 13, 14, 15, 16, 17, 18>
// index = 4
// data = 200
// Result = <11, 12, 13, 14, 200, 16, 17, 18>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> Insert(Vector64<short> vector, byte index, short data)
Vector64<int> Insert(Vector64<int> vector, byte index, int data)
Vector64<sbyte> Insert(Vector64<sbyte> vector, byte index, sbyte data)
Vector64<float> Insert(Vector64<float> vector, byte index, float data)
Vector64<ushort> Insert(Vector64<ushort> vector, byte index, ushort data)
Vector64<uint> Insert(Vector64<uint> vector, byte index, uint data)
Vector128<byte> Insert(Vector128<byte> vector, byte index, byte data)
Vector128<double> Insert(Vector128<double> vector, byte index, double data)
Vector128<short> Insert(Vector128<short> vector, byte index, short data)
Vector128<int> Insert(Vector128<int> vector, byte index, int data)
Vector128<long> Insert(Vector128<long> vector, byte index, long data)
Vector128<sbyte> Insert(Vector128<sbyte> vector, byte index, sbyte data)
Vector128<float> Insert(Vector128<float> vector, byte index, float data)
Vector128<ushort> Insert(Vector128<ushort> vector, byte index, ushort data)
Vector128<uint> Insert(Vector128<uint> vector, byte index, uint data)
Vector128<ulong> Insert(Vector128<ulong> vector, byte index, ulong data)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:InsertTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte,ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mov w0, #200
ins v0.b[4], w0
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
4. InsertScalar
Vector128<double> InsertScalar(Vector128<double> result, byte resultIndex, Vector64<double> value)
This method copies the result
vector in a result vector, except the element at resultIndex
of result
is set to that from value
vector.
private Vector128<double> InsertScalarTest(Vector128<double> result, byte resultIndex, Vector64<double> value)
{
return AdvSimd.InsertScalar(result, 1, value);
}
// result = <5.5, 5.5>
// resultIndex = 1
// value = <15.5>
// Result = <5.5, 15.5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<long> InsertScalar(Vector128<long> result, byte resultIndex, Vector64<long> value)
Vector128<ulong> InsertScalar(Vector128<ulong> result, byte resultIndex, Vector64<ulong> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:InsertScalarTest(System.Runtime.Intrinsics.Vector128`1[Double],ubyte,System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector128`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
; V02 arg2 [V02,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ins v0.d[1], v1.d[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
5. InsertSelectedScalar
Vector64<byte> InsertSelectedScalar(Vector64<byte> result, byte resultIndex, Vector64<byte> value, byte valueIndex)
This method copies the result
vector in a result vector, except the element at resultIndex
of result
is set to that of valueIndex
element of value
vector.
private Vector64<byte> InsertSelectedScalarTest(Vector64<byte> result, byte resultIndex, Vector64<byte> value, byte valueIndex)
{
return AdvSimd.Arm64.InsertSelectedScalar(result, 0, value, 1);
}
// result = <11, 12, 13, 14, 15, 16, 17, 18>
// resultIndex = 0
// value = <21, 22, 23, 24, 25, 26, 27, 28>
// valueIndex = 1
// Result = <22, 12, 13, 14, 15, 16, 17, 18>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<byte> InsertSelectedScalar(Vector64<byte> result, byte resultIndex, Vector128<byte> value, byte valueIndex)
Vector64<short> InsertSelectedScalar(Vector64<short> result, byte resultIndex, Vector64<short> value, byte valueIndex)
Vector64<short> InsertSelectedScalar(Vector64<short> result, byte resultIndex, Vector128<short> value, byte valueIndex)
Vector64<int> InsertSelectedScalar(Vector64<int> result, byte resultIndex, Vector64<int> value, byte valueIndex)
Vector64<int> InsertSelectedScalar(Vector64<int> result, byte resultIndex, Vector128<int> value, byte valueIndex)
Vector64<sbyte> InsertSelectedScalar(Vector64<sbyte> result, byte resultIndex, Vector64<sbyte> value, byte valueIndex)
Vector64<sbyte> InsertSelectedScalar(Vector64<sbyte> result, byte resultIndex, Vector128<sbyte> value, byte valueIndex)
Vector64<float> InsertSelectedScalar(Vector64<float> result, byte resultIndex, Vector64<float> value, byte valueIndex)
Vector64<float> InsertSelectedScalar(Vector64<float> result, byte resultIndex, Vector128<float> value, byte valueIndex)
Vector64<ushort> InsertSelectedScalar(Vector64<ushort> result, byte resultIndex, Vector64<ushort> value, byte valueIndex)
Vector64<ushort> InsertSelectedScalar(Vector64<ushort> result, byte resultIndex, Vector128<ushort> value, byte valueIndex)
Vector64<uint> InsertSelectedScalar(Vector64<uint> result, byte resultIndex, Vector64<uint> value, byte valueIndex)
Vector64<uint> InsertSelectedScalar(Vector64<uint> result, byte resultIndex, Vector128<uint> value, byte valueIndex)
Vector128<byte> InsertSelectedScalar(Vector128<byte> result, byte resultIndex, Vector64<byte> value, byte valueIndex)
Vector128<byte> InsertSelectedScalar(Vector128<byte> result, byte resultIndex, Vector128<byte> value, byte valueIndex)
Vector128<double> InsertSelectedScalar(Vector128<double> result, byte resultIndex, Vector128<double> value, byte valueIndex)
Vector128<short> InsertSelectedScalar(Vector128<short> result, byte resultIndex, Vector64<short> value, byte valueIndex)
Vector128<short> InsertSelectedScalar(Vector128<short> result, byte resultIndex, Vector128<short> value, byte valueIndex)
Vector128<int> InsertSelectedScalar(Vector128<int> result, byte resultIndex, Vector64<int> value, byte valueIndex)
Vector128<int> InsertSelectedScalar(Vector128<int> result, byte resultIndex, Vector128<int> value, byte valueIndex)
Vector128<long> InsertSelectedScalar(Vector128<long> result, byte resultIndex, Vector128<long> value, byte valueIndex)
Vector128<sbyte> InsertSelectedScalar(Vector128<sbyte> result, byte resultIndex, Vector64<sbyte> value, byte valueIndex)
Vector128<sbyte> InsertSelectedScalar(Vector128<sbyte> result, byte resultIndex, Vector128<sbyte> value, byte valueIndex)
Vector128<float> InsertSelectedScalar(Vector128<float> result, byte resultIndex, Vector64<float> value, byte valueIndex)
Vector128<float> InsertSelectedScalar(Vector128<float> result, byte resultIndex, Vector128<float> value, byte valueIndex)
Vector128<ushort> InsertSelectedScalar(Vector128<ushort> result, byte resultIndex, Vector64<ushort> value, byte valueIndex)
Vector128<ushort> InsertSelectedScalar(Vector128<ushort> result, byte resultIndex, Vector128<ushort> value, byte valueIndex)
Vector128<uint> InsertSelectedScalar(Vector128<uint> result, byte resultIndex, Vector64<uint> value, byte valueIndex)
Vector128<uint> InsertSelectedScalar(Vector128<uint> result, byte resultIndex, Vector128<uint> value, byte valueIndex)
Vector128<ulong> InsertSelectedScalar(Vector128<ulong> result, byte resultIndex, Vector128<ulong> value, byte valueIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:InsertSelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte,System.Runtime.Intrinsics.Vector64`1[Byte],ubyte):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
; V02 arg2 [V02,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ins v0.b[0], v1.b[1]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
6. LeadingSignCount
Vector64<short> LeadingSignCount(Vector64<short> value)
This method counts the number of leading bits of individual elements of value
vector that have the same value as the most significant bit and stores the result in result vector. This count does not include the most significant bit of the input.
private Vector64<short> LeadingSignCountTest(Vector64<short> value)
{
return AdvSimd.LeadingSignCount(value);
}
// value = <32757, 165, 0, 15>
// Result = <0, 7, 15, 11>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> LeadingSignCount(Vector64<int> value)
Vector64<sbyte> LeadingSignCount(Vector64<sbyte> value)
Vector128<short> LeadingSignCount(Vector128<short> value)
Vector128<int> LeadingSignCount(Vector128<int> value)
Vector128<sbyte> LeadingSignCount(Vector128<sbyte> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:LeadingSignCountTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
cls v16.4h, v0.4h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
7. LeadingZeroCount
Vector64<byte> LeadingZeroCount(Vector64<byte> value)
This method counts the number of binary zero bits before the first binary one bit in individual elements of the value
vector , and writes the result to the result vector.
private Vector64<byte> LeadingZeroCountTest(Vector64<byte> value)
{
return AdvSimd.LeadingZeroCount(value);
}
// value = <32757, 165, 0, 15>
// Result = <1, 8, 16, 12>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> LeadingZeroCount(Vector64<short> value)
Vector64<int> LeadingZeroCount(Vector64<int> value)
Vector64<sbyte> LeadingZeroCount(Vector64<sbyte> value)
Vector64<ushort> LeadingZeroCount(Vector64<ushort> value)
Vector64<uint> LeadingZeroCount(Vector64<uint> value)
Vector128<byte> LeadingZeroCount(Vector128<byte> value)
Vector128<short> LeadingZeroCount(Vector128<short> value)
Vector128<int> LeadingZeroCount(Vector128<int> value)
Vector128<sbyte> LeadingZeroCount(Vector128<sbyte> value)
Vector128<ushort> LeadingZeroCount(Vector128<ushort> value)
Vector128<uint> LeadingZeroCount(Vector128<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:LeadingZeroCountTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
clz v16.8b, v0.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
8. LoadAndInsertScalar
Vector64<byte> LoadAndInsertScalar(Vector64<byte> value, byte index, byte* address)
This method loads a single-element structure from memory at address
and writes the result to the specified index
of thevalue
vector without affecting the other elements of the result vector.
private Vector64<byte> LoadAndInsertScalarTest(Vector64<byte> value, byte index, byte* address)
{
return AdvSimd.LoadAndInsertScalar(value, 2, address);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// index = 2
// address = Address of byte[]{ 21, 22, 23, 24, 25, 26, 27, 28 }
// Result = <11, 12, 21, 14, 15, 16, 17, 18>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> LoadAndInsertScalar(Vector64<short> value, byte index, short* address)
Vector64<int> LoadAndInsertScalar(Vector64<int> value, byte index, int* address)
Vector64<sbyte> LoadAndInsertScalar(Vector64<sbyte> value, byte index, sbyte* address)
Vector64<float> LoadAndInsertScalar(Vector64<float> value, byte index, float* address)
Vector64<ushort> LoadAndInsertScalar(Vector64<ushort> value, byte index, ushort* address)
Vector64<uint> LoadAndInsertScalar(Vector64<uint> value, byte index, uint* address)
Vector128<byte> LoadAndInsertScalar(Vector128<byte> value, byte index, byte* address)
Vector128<double> LoadAndInsertScalar(Vector128<double> value, byte index, double* address)
Vector128<short> LoadAndInsertScalar(Vector128<short> value, byte index, short* address)
Vector128<int> LoadAndInsertScalar(Vector128<int> value, byte index, int* address)
Vector128<long> LoadAndInsertScalar(Vector128<long> value, byte index, long* address)
Vector128<sbyte> LoadAndInsertScalar(Vector128<sbyte> value, byte index, sbyte* address)
Vector128<float> LoadAndInsertScalar(Vector128<float> value, byte index, float* address)
Vector128<ushort> LoadAndInsertScalar(Vector128<ushort> value, byte index, ushort* address)
Vector128<uint> LoadAndInsertScalar(Vector128<uint> value, byte index, uint* address)
Vector128<ulong> LoadAndInsertScalar(Vector128<ulong> value, byte index, ulong* address)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:LoadAndInsertScalarTest(System.Runtime.Intrinsics.Vector64`1[Byte],ubyte,long):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T01] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;* V01 arg1 [V01 ] ( 0, 0 ) ubyte -> zero-ref
; V02 arg2 [V02,T00] ( 3, 3 ) long -> x1
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mov v16.8b, v0.8b
ld1 {v16.b}[2], [x1]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 28, prolog size 8
9. LoadAndReplicateToVector128
Vector128<byte> LoadAndReplicateToVector128(byte* address)
This method loads a single-element structure from memory at address
and replicates the value to all the elements of the result vector.
private Vector128<byte> LoadAndReplicateToVector128Test(byte* address)
{
return AdvSimd.LoadAndReplicateToVector128(address);
}
// address = Address of byte[]{ 11}
// Result = <11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> LoadAndReplicateToVector128(short* address)
Vector128<int> LoadAndReplicateToVector128(int* address)
Vector128<sbyte> LoadAndReplicateToVector128(sbyte* address)
Vector128<float> LoadAndReplicateToVector128(float* address)
Vector128<ushort> LoadAndReplicateToVector128(ushort* address)
Vector128<uint> LoadAndReplicateToVector128(uint* address)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> LoadAndReplicateToVector128(double* address)
Vector128<long> LoadAndReplicateToVector128(long* address)
Vector128<ulong> LoadAndReplicateToVector128(ulong* address)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:LoadAndReplicateToVector128Test(long):System.Runtime.Intrinsics.Vector128`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) long -> x0
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ld1r {v16.16b}, [x0]
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
10. LoadAndReplicateToVector64
Vector64<byte> LoadAndReplicateToVector64(byte* address)
This method loads a single-element structure from memory at address
and replicates the value to all the elements of the result vector.
private Vector64<byte> LoadAndReplicateToVector64Test(byte* address)
{
return AdvSimd.LoadAndReplicateToVector64(address);
}
// address = Address of byte[]{ 11}
// Result = <11, 11, 11, 11, 11, 11, 11, 11>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> LoadAndReplicateToVector64(short* address)
Vector64<int> LoadAndReplicateToVector64(int* address)
Vector64<sbyte> LoadAndReplicateToVector64(sbyte* address)
Vector64<float> LoadAndReplicateToVector64(float* address)
Vector64<ushort> LoadAndReplicateToVector64(ushort* address)
Vector64<uint> LoadAndReplicateToVector64(uint* address)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:LoadAndReplicateToVector64Test(long):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) long -> x0
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ld1r {v16.8b}, [x0]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
11. LoadVector128
Vector128<byte> LoadVector128(byte* address)
This method loads a multiple-element structure like array from memory at address
and writes it to the result vector. If the elements in memory don?t fill up all the elements of result vector, then the remaining are set to 0.
private Vector128<byte> LoadVector128Test(byte* address)
{
return AdvSimd.LoadVector128(address);
}
// address = Address of new byte[14] { 21, 22, 23, 24, 25, 26, 27, 28, 1, 2, 23, 24, 25, 26}
// Result = <21, 22, 23, 24, 25, 26, 27, 28, 1, 2, 23, 24, 25, 26, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<double> LoadVector128(double* address)
Vector128<short> LoadVector128(short* address)
Vector128<int> LoadVector128(int* address)
Vector128<long> LoadVector128(long* address)
Vector128<sbyte> LoadVector128(sbyte* address)
Vector128<float> LoadVector128(float* address)
Vector128<ushort> LoadVector128(ushort* address)
Vector128<uint> LoadVector128(uint* address)
Vector128<ulong> LoadVector128(ulong* address)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:LoadVector128Test(long):System.Runtime.Intrinsics.Vector128`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) long -> x0
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ld1 {v16.16b}, [x0]
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
12. LoadVector64
Vector64<byte> LoadVector64(byte* address)
This method loads a multiple-element structure like array from memory at address
and writes it to the result vector. If the elements in memory don?t fill up all the elements of result vector, then the remaining are set to 0.
private Vector64<byte> LoadVector64Test(byte* address)
{
return AdvSimd.LoadVector64(address);
}
// address = Address of new byte[14] { 21, 22, 23, 24, 25, 26, 27, 28, 1, 2, 23, 24, 25, 26}
// Result = <21, 22, 23, 24, 25, 26, 27, 28>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> LoadVector64(double* address)
Vector64<short> LoadVector64(short* address)
Vector64<int> LoadVector64(int* address)
Vector64<long> LoadVector64(long* address)
Vector64<sbyte> LoadVector64(sbyte* address)
Vector64<float> LoadVector64(float* address)
Vector64<ushort> LoadVector64(ushort* address)
Vector64<uint> LoadVector64(uint* address)
Vector64<ulong> LoadVector64(ulong* address)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:LoadVector64Test(long):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) long -> x0
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
ld1 {v16.8b}, [x0]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
13. Max
Vector64<byte> Max(Vector64<byte> left, Vector64<byte> right)
This method compares corresponding elements in the left
and right
vectors, places the larger of each pair in the result vector, and returns the result vector.
private Vector64<byte> MaxTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.Max(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <21, 22, 23, 24, 25, 26, 27, 28>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> Max(Vector64<short> left, Vector64<short> right)
Vector64<int> Max(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> Max(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> Max(Vector64<float> left, Vector64<float> right)
Vector64<ushort> Max(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> Max(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> Max(Vector128<byte> left, Vector128<byte> right)
Vector128<short> Max(Vector128<short> left, Vector128<short> right)
Vector128<int> Max(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> Max(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> Max(Vector128<float> left, Vector128<float> right)
Vector128<ushort> Max(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> Max(Vector128<uint> left, Vector128<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Max(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MaxTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
umax v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
14. MaxAcross
Vector64<byte> MaxAcross(Vector64<byte> value)
This method compares all the vector elements in the value
vector, and writes the largest value element in result vector at 0th index while other elements are set to 0.
private Vector64<byte> MaxAcrossTest(Vector64<byte> value)
{
return AdvSimd.Arm64.MaxAcross(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <18, 0, 0, 0, 0, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> MaxAcross(Vector64<short> value)
Vector64<sbyte> MaxAcross(Vector64<sbyte> value)
Vector64<ushort> MaxAcross(Vector64<ushort> value)
Vector64<byte> MaxAcross(Vector128<byte> value)
Vector64<short> MaxAcross(Vector128<short> value)
Vector64<int> MaxAcross(Vector128<int> value)
Vector64<sbyte> MaxAcross(Vector128<sbyte> value)
Vector64<float> MaxAcross(Vector128<float> value)
Vector64<ushort> MaxAcross(Vector128<ushort> value)
Vector64<uint> MaxAcross(Vector128<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MaxAcrossTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
umaxv b16, v0.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
15. MaxNumber
Vector64<float> MaxNumber(Vector64<float> left, Vector64<float> right)
This method compares corresponding elements in the left
and right
vectors, places the larger of each pair in the result vector, and returns the result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value.
private Vector64<float> MaxNumberTest(Vector64<float> left, Vector64<float> right)
{
return AdvSimd.MaxNumber(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <21.5, 22.5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> MaxNumber(Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MaxNumber(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MaxNumberTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmaxnm v16.2s, v0.2s, v1.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
16. MaxNumberAcross
Vector64<float> MaxNumberAcross(Vector128<float> value)
This method compares all the vector elements in the value
vector, and writes the largest value element in result vector at 0th index while other elements are set to 0. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result of the comparison is the numerical value, otherwise the result is identical to MaxScalar()
.
private Vector64<float> MaxNumberAcrossTest(Vector128<float> value)
{
return AdvSimd.Arm64.MaxNumberAcross(value);
}
// value = <11.5, 12.5, 13.5, 14.5>
// Result = <14.5, 0>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MaxNumberAcrossTest(System.Runtime.Intrinsics.Vector128`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmaxnmv s16, v0.4s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
17. MaxNumberPairwise
Vector64<float> MaxNumberPairwise(Vector64<float> left, Vector64<float> right)
This method creates a vector by concatenating the vector elements of left
vector followed by those of the right
vector, compares adjacent vector elements and writes the largest of each pair in a result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result is the numerical value.
private Vector64<float> MaxNumberPairwiseTest(Vector64<float> left, Vector64<float> right)
{
return AdvSimd.Arm64.MaxNumberPairwise(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <12.5, 22.5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MaxNumberPairwise(Vector128<double> left, Vector128<double> right)
Vector128<float> MaxNumberPairwise(Vector128<float> left, Vector128<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MaxNumberPairwiseTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmaxnmp v16.2s, v0.2s, v1.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
18. MaxNumberPairwiseScalar
Vector64<float> MaxNumberPairwiseScalar(Vector64<float> value)
This method creates a vector by concatenating the vector elements of left
vector followed by those of the right
vector, compares adjacent vector elements and writes the largest of each pair in 0th element of result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result is the numerical value.
private Vector64<float> MaxNumberPairwiseScalarTest(Vector64<float> value)
{
return AdvSimd.Arm64.MaxNumberPairwiseScalar(value);
}
// value = <11.5, 12.5>
// Result = <12.5, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<double> MaxNumberPairwiseScalar(Vector128<double> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MaxNumberPairwiseScalarTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmaxnmp s16, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
19. MaxNumberScalar
Vector64<double> MaxNumberScalar(Vector64<double> left, Vector64<double> right)
This method compares corresponding vector elements in left
and right
vector and stores the larger value in a result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value.
private Vector64<double> MaxNumberScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.MaxNumberScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <11.5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> MaxNumberScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MaxNumberScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmaxnm d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
20. MaxPairwise
Vector64<byte> MaxPairwise(Vector64<byte> left, Vector64<byte> right)
This method creates a vector by concatenating the vector elements of the left
after the vector elements of the right
vector, reads each pair of adjacent vector elements in the vectors, writes the largest of each pair into a result vector, and writes the vector to the result vector.
private Vector64<byte> MaxPairwiseTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.MaxPairwise(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <12, 14, 16, 18, 22, 24, 26, 28>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MaxPairwise(Vector64<short> left, Vector64<short> right)
Vector64<int> MaxPairwise(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> MaxPairwise(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> MaxPairwise(Vector64<float> left, Vector64<float> right)
Vector64<ushort> MaxPairwise(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> MaxPairwise(Vector64<uint> left, Vector64<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<byte> MaxPairwise(Vector128<byte> left, Vector128<byte> right)
Vector128<double> MaxPairwise(Vector128<double> left, Vector128<double> right)
Vector128<short> MaxPairwise(Vector128<short> left, Vector128<short> right)
Vector128<int> MaxPairwise(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> MaxPairwise(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> MaxPairwise(Vector128<float> left, Vector128<float> right)
Vector128<ushort> MaxPairwise(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> MaxPairwise(Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MaxPairwiseTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
umaxp v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
21. MaxPairwiseScalar
Vector64<float> MaxPairwiseScalar(Vector64<float> value)
This method compares two vector elements in the value
vector and writes the largest of the floating-point values as a scalar to the result vector.
private Vector64<float> MaxPairwiseScalarTest(Vector64<float> value)
{
return AdvSimd.Arm64.MaxPairwiseScalar(value);
}
// value = <11.5, 12.5>
// Result = <12.5, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<double> MaxPairwiseScalar(Vector128<double> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MaxPairwiseScalarTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmaxp s16, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
22. MaxScalar
Vector64<double> MaxScalar(Vector64<double> left, Vector64<double> right)
This method compares theleft
and right
vector, and writes the larger of the two floating-point values to the result vector.
private Vector64<double> MaxScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.Arm64.MaxScalar(left, right);
}
// left = <11.5>
// right = <10.5>
// Result = <11.5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> MaxScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MaxScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmax d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
23. Min
Vector64<byte> Min(Vector64<byte> left, Vector64<byte> right)
This method compares corresponding elements in the left
and right
vectors, places the smaller of each pair in the result vector, and returns the result vector.
private Vector64<byte> MinTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.Min(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <11, 12, 13, 14, 15, 16, 17, 18>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> Min(Vector64<short> left, Vector64<short> right)
Vector64<int> Min(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> Min(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> Min(Vector64<float> left, Vector64<float> right)
Vector64<ushort> Min(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> Min(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> Min(Vector128<byte> left, Vector128<byte> right)
Vector128<short> Min(Vector128<short> left, Vector128<short> right)
Vector128<int> Min(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> Min(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> Min(Vector128<float> left, Vector128<float> right)
Vector128<ushort> Min(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> Min(Vector128<uint> left, Vector128<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Min(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MinTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
umin v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
24. MinAcross
Vector64<byte> MinAcross(Vector64<byte> value)
This method compares all the vector elements in the value
vector, and writes the smaller value element in result vector at 0th index while other elements are set to 0.
private Vector64<byte> MinAcrossTest(Vector64<byte> value)
{
return AdvSimd.Arm64.MinAcross(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <11, 0, 0, 0, 0, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> MinAcross(Vector64<short> value)
Vector64<sbyte> MinAcross(Vector64<sbyte> value)
Vector64<ushort> MinAcross(Vector64<ushort> value)
Vector64<byte> MinAcross(Vector128<byte> value)
Vector64<short> MinAcross(Vector128<short> value)
Vector64<int> MinAcross(Vector128<int> value)
Vector64<sbyte> MinAcross(Vector128<sbyte> value)
Vector64<float> MinAcross(Vector128<float> value)
Vector64<ushort> MinAcross(Vector128<ushort> value)
Vector64<uint> MinAcross(Vector128<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MinAcrossTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uminv b16, v0.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
25. MinNumber
Vector64<float> MinNumber(Vector64<float> left, Vector64<float> right)
This method compares corresponding elements in the left
and right
vectors, places the smaller of each pair in the result vector, and returns the result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value.
private Vector64<float> MinNumberTest(Vector64<float> left, Vector64<float> right)
{
return AdvSimd.MinNumber(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <11.5, 12.5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> MinNumber(Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MinNumber(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MinNumberTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fminnm v16.2s, v0.2s, v1.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
26. MinNumberAcross
Vector64<float> MinNumberAcross(Vector128<float> value)
This method compares all the vector elements in the value
vector, and writes the smaller value element in result vector at 0th index while other elements are set to 0. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result of the comparison is the numerical value, otherwise the result is identical to MaxScalar()
.
private Vector64<float> MinNumberAcrossTest(Vector128<float> value)
{
return AdvSimd.Arm64.MinNumberAcross(value);
}
// value = <11.5, 12.5, 13.5, 14.5>
// Result = <11.5, 0>
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MinNumberAcrossTest(System.Runtime.Intrinsics.Vector128`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fminnmv s16, v0.4s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
27. MinNumberPairwise
Vector64<float> MinNumberPairwise(Vector64<float> left, Vector64<float> right)
This method creates a vector by concatenating the vector elements of left
vector followed by those of the right
vector, compares adjacent vector elements and writes the smallest of each pair in a result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result is the numerical value.
private Vector64<float> MinNumberPairwiseTest(Vector64<float> left, Vector64<float> right)
{
return AdvSimd.Arm64.MinNumberPairwise(left, right);
}
// left = <11.5, 12.5>
// right = <21.5, 22.5>
// Result = <11.5, 21.5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MinNumberPairwise(Vector128<double> left, Vector128<double> right)
Vector128<float> MinNumberPairwise(Vector128<float> left, Vector128<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MinNumberPairwiseTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fminnmp v16.2s, v0.2s, v1.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
28. MinNumberPairwiseScalar
Vector64<float> MinNumberPairwiseScalar(Vector64<float> value)
This method creates a vector by concatenating the vector elements of left
vector followed by those of the right
vector, compares adjacent vector elements and writes the smallest of each pair in 0th element of result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result is the numerical value.
private Vector64<float> MinNumberPairwiseScalarTest(Vector64<float> value)
{
return AdvSimd.Arm64.MinNumberPairwiseScalar(value);
}
// value = <11.5, 12.5>
// Result = <11.5, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<double> MinNumberPairwiseScalar(Vector128<double> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MinNumberPairwiseScalarTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fminnmp s16, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
29. MinNumberScalar
Vector64<double> MinNumberScalar(Vector64<double> left, Vector64<double> right)
This method compares corresponding vector elements in left
and right
vector and stores the smaller value in a result vector. As per ARM docs, NaNs are handled according to the IEEE 754-2008 standard. If one vector element is numeric and the other is a quiet NaN, the result placed in the vector is the numerical value.
private Vector64<double> MinNumberScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.MinNumberScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <11.5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> MinNumberScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MinNumberScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fminnm d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
30. MinPairwise
Vector64<byte> MinPairwise(Vector64<byte> left, Vector64<byte> right)
This method creates a vector by concatenating the vector elements of the left
after the vector elements of the right
vector, reads each pair of adjacent vector elements in the vectors, writes the smallest of each pair into a result vector, and writes the vector to the result vector.
private Vector64<byte> MinPairwiseTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.MinPairwise(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <11, 13, 15, 17, 21, 23, 25, 27>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MinPairwise(Vector64<short> left, Vector64<short> right)
Vector64<int> MinPairwise(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> MinPairwise(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> MinPairwise(Vector64<float> left, Vector64<float> right)
Vector64<ushort> MinPairwise(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> MinPairwise(Vector64<uint> left, Vector64<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<byte> MinPairwise(Vector128<byte> left, Vector128<byte> right)
Vector128<double> MinPairwise(Vector128<double> left, Vector128<double> right)
Vector128<short> MinPairwise(Vector128<short> left, Vector128<short> right)
Vector128<int> MinPairwise(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> MinPairwise(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> MinPairwise(Vector128<float> left, Vector128<float> right)
Vector128<ushort> MinPairwise(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> MinPairwise(Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MinPairwiseTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uminp v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
31. MinPairwiseScalar
Vector64<float> MinPairwiseScalar(Vector64<float> value)
This method compares two vector elements in the value
vector and writes the smallest of the floating-point values as a scalar to the result vector.
private Vector64<float> MinPairwiseScalarTest(Vector64<float> value)
{
return AdvSimd.Arm64.MinPairwiseScalar(value);
}
// value = <11.5, 12.5>
// Result = <11.5, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<double> MinPairwiseScalar(Vector128<double> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MinPairwiseScalarTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fminp s16, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
32. MinScalar
Vector64<double> MinScalar(Vector64<double> left, Vector64<double> right)
This method compares theleft
and right
vector, and writes the smaller of the two floating-point values to the result vector.
private Vector64<double> MinScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.Arm64.MinScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <11.5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> MinScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MinScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fmin d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
33. Multiply
Vector64<byte> Multiply(Vector64<byte> left, Vector64<byte> right)
This method performs multiplication of corresponding vector elements in left
and right
vectors, writes the product to the result vector.
private Vector64<byte> MultiplyTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.Multiply(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <231, 8, 43, 80, 119, 160, 203, 248>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> Multiply(Vector64<short> left, Vector64<short> right)
Vector64<int> Multiply(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> Multiply(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> Multiply(Vector64<float> left, Vector64<float> right)
Vector64<ushort> Multiply(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> Multiply(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> Multiply(Vector128<byte> left, Vector128<byte> right)
Vector128<short> Multiply(Vector128<short> left, Vector128<short> right)
Vector128<int> Multiply(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> Multiply(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> Multiply(Vector128<float> left, Vector128<float> right)
Vector128<ushort> Multiply(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> Multiply(Vector128<uint> left, Vector128<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Multiply(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mul v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
34. MultiplyAdd
Vector64<byte> MultiplyAdd(Vector64<byte> addend, Vector64<byte> left, Vector64<byte> right)
This method multiplies corresponding elements in the vectors of the left
and right
vectors, and accumulates the product with the vector elements of the addend
and returns the accumulated result.
private Vector64<byte> MultiplyAddTest(Vector64<byte> addend, Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.MultiplyAdd(addend, left, right);
}
// addend = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <2, 22, 23, 24, 25, 26, 27, 28>
// right = <3, 32, 33, 34, 35, 36, 37, 38>
// Result = <17, 204, 4, 62, 122, 184, 248, 58>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MultiplyAdd(Vector64<short> addend, Vector64<short> left, Vector64<short> right)
Vector64<int> MultiplyAdd(Vector64<int> addend, Vector64<int> left, Vector64<int> right)
Vector64<sbyte> MultiplyAdd(Vector64<sbyte> addend, Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<ushort> MultiplyAdd(Vector64<ushort> addend, Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> MultiplyAdd(Vector64<uint> addend, Vector64<uint> left, Vector64<uint> right)
Vector128<byte> MultiplyAdd(Vector128<byte> addend, Vector128<byte> left, Vector128<byte> right)
Vector128<short> MultiplyAdd(Vector128<short> addend, Vector128<short> left, Vector128<short> right)
Vector128<int> MultiplyAdd(Vector128<int> addend, Vector128<int> left, Vector128<int> right)
Vector128<sbyte> MultiplyAdd(Vector128<sbyte> addend, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> MultiplyAdd(Vector128<ushort> addend, Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> MultiplyAdd(Vector128<uint> addend, Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyAddTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mla v0.8b, v1.8b, v2.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
35. MultiplyAddByScalar
Vector64<short> MultiplyAddByScalar(Vector64<short> addend, Vector64<short> left, Vector64<short> right)
This method multiplies the vector elements in the left
by the 0th element value in the right
, and accumulates the product with the vector elements of the addend
vector and return the result vector.
private Vector64<short> MultiplyAddByScalarTest(Vector64<short> addend, Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyAddByScalar(addend, left, right);
}
// addend = <11, 12, 13, 14>
// left = <21, 22, 23, 24>
// right = <31, 32, 33, 34>
// Result = <662, 694, 726, 758>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> MultiplyAddByScalar(Vector64<int> addend, Vector64<int> left, Vector64<int> right)
Vector64<ushort> MultiplyAddByScalar(Vector64<ushort> addend, Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> MultiplyAddByScalar(Vector64<uint> addend, Vector64<uint> left, Vector64<uint> right)
Vector128<short> MultiplyAddByScalar(Vector128<short> addend, Vector128<short> left, Vector64<short> right)
Vector128<int> MultiplyAddByScalar(Vector128<int> addend, Vector128<int> left, Vector64<int> right)
Vector128<ushort> MultiplyAddByScalar(Vector128<ushort> addend, Vector128<ushort> left, Vector64<ushort> right)
Vector128<uint> MultiplyAddByScalar(Vector128<uint> addend, Vector128<uint> left, Vector64<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyAddByScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mla v0.4h, v1.4h, v2.h[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
36. MultiplyAddBySelectedScalar
Vector64<short> MultiplyAddBySelectedScalar(Vector64<short> addend, Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies the vector elements in the left
by the rightIndex
element value in the right
, and accumulates the product with the vector elements of the addend
vector and return the result vector.
private Vector64<short> MultiplyAddBySelectedScalarTest(Vector64<short> addend, Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyAddBySelectedScalar(addend, left, right, 3);
}
// addend = <100, 100, 100, 100>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 3
// Result = <364, 388, 412, 436>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MultiplyAddBySelectedScalar(Vector64<short> addend, Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector64<int> MultiplyAddBySelectedScalar(Vector64<int> addend, Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector64<int> MultiplyAddBySelectedScalar(Vector64<int> addend, Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector64<ushort> MultiplyAddBySelectedScalar(Vector64<ushort> addend, Vector64<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector64<ushort> MultiplyAddBySelectedScalar(Vector64<ushort> addend, Vector64<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector64<uint> MultiplyAddBySelectedScalar(Vector64<uint> addend, Vector64<uint> left, Vector64<uint> right, byte rightIndex)
Vector64<uint> MultiplyAddBySelectedScalar(Vector64<uint> addend, Vector64<uint> left, Vector128<uint> right, byte rightIndex)
Vector128<short> MultiplyAddBySelectedScalar(Vector128<short> addend, Vector128<short> left, Vector64<short> right, byte rightIndex)
Vector128<short> MultiplyAddBySelectedScalar(Vector128<short> addend, Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<int> MultiplyAddBySelectedScalar(Vector128<int> addend, Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<int> MultiplyAddBySelectedScalar(Vector128<int> addend, Vector128<int> left, Vector128<int> right, byte rightIndex)
Vector128<ushort> MultiplyAddBySelectedScalar(Vector128<ushort> addend, Vector128<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<ushort> MultiplyAddBySelectedScalar(Vector128<ushort> addend, Vector128<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<uint> MultiplyAddBySelectedScalar(Vector128<uint> addend, Vector128<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<uint> MultiplyAddBySelectedScalar(Vector128<uint> addend, Vector128<uint> left, Vector128<uint> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyAddBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mla v0.4h, v1.4h, v2.h[3]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
37. MultiplyByScalar
Vector64<short> MultiplyByScalar(Vector64<short> left, Vector64<short> right)
This method multiplies corresponding vector elements in the left
by the 0th element of right
vector and returns the result vector.
private Vector64<short> MultiplyByScalarTest(Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyByScalar(left, right);
}
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// Result = <231, 252, 273, 294>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> MultiplyByScalar(Vector64<int> left, Vector64<int> right)
Vector64<float> MultiplyByScalar(Vector64<float> left, Vector64<float> right)
Vector64<ushort> MultiplyByScalar(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> MultiplyByScalar(Vector64<uint> left, Vector64<uint> right)
Vector128<short> MultiplyByScalar(Vector128<short> left, Vector64<short> right)
Vector128<int> MultiplyByScalar(Vector128<int> left, Vector64<int> right)
Vector128<float> MultiplyByScalar(Vector128<float> left, Vector64<float> right)
Vector128<ushort> MultiplyByScalar(Vector128<ushort> left, Vector64<ushort> right)
Vector128<uint> MultiplyByScalar(Vector128<uint> left, Vector64<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MultiplyByScalar(Vector128<double> left, Vector64<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyByScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mul v16.4h, v0.4h, v1.h[0]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
38. MultiplyBySelectedScalar
Vector64<short> MultiplyBySelectedScalar(Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies corresponding vector elements in the left
by the rightIndex
element of right
vector and returns the result vector.
private Vector64<short> MultiplyBySelectedScalarTest(Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyBySelectedScalar(left, right, 3);
}
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 3
// Result = <264, 288, 312, 336>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> MultiplyBySelectedScalar(Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector64<int> MultiplyBySelectedScalar(Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector64<int> MultiplyBySelectedScalar(Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector64<float> MultiplyBySelectedScalar(Vector64<float> left, Vector64<float> right, byte rightIndex)
Vector64<float> MultiplyBySelectedScalar(Vector64<float> left, Vector128<float> right, byte rightIndex)
Vector64<ushort> MultiplyBySelectedScalar(Vector64<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector64<ushort> MultiplyBySelectedScalar(Vector64<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector64<uint> MultiplyBySelectedScalar(Vector64<uint> left, Vector64<uint> right, byte rightIndex)
Vector64<uint> MultiplyBySelectedScalar(Vector64<uint> left, Vector128<uint> right, byte rightIndex)
Vector128<short> MultiplyBySelectedScalar(Vector128<short> left, Vector64<short> right, byte rightIndex)
Vector128<short> MultiplyBySelectedScalar(Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<int> MultiplyBySelectedScalar(Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<int> MultiplyBySelectedScalar(Vector128<int> left, Vector128<int> right, byte rightIndex)
Vector128<float> MultiplyBySelectedScalar(Vector128<float> left, Vector64<float> right, byte rightIndex)
Vector128<float> MultiplyBySelectedScalar(Vector128<float> left, Vector128<float> right, byte rightIndex)
Vector128<ushort> MultiplyBySelectedScalar(Vector128<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<ushort> MultiplyBySelectedScalar(Vector128<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalar(Vector128<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalar(Vector128<uint> left, Vector128<uint> right, byte rightIndex)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> MultiplyBySelectedScalar(Vector128<double> left, Vector128<double> right, byte rightIndex)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mul v16.4h, v0.4h, v1.h[3]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
39. MultiplyBySelectedScalarWideningLower
Vector128<int> MultiplyBySelectedScalarWideningLower(Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies each vector element in the left
vector by the rightIndex
vector element of the right
vector, places the product in a result vector, and returns the result vector. As seen in example below, the result vector element int
size is twice as long as the elements that are multiplied short
.
private Vector128<int> MultiplyBySelectedScalarWideningLowerTest(Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyBySelectedScalarWideningLower(left, right, 3);
}
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 3
// Result = <264, 288, 312, 336>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyBySelectedScalarWideningLower(Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningLower(Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningLower(Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningLower(Vector64<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningLower(Vector64<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningLower(Vector64<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningLower(Vector64<uint> left, Vector128<uint> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarWideningLowerTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
smull v16.4s, v0.4h, v1.h[3]
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
40. MultiplyBySelectedScalarWideningLowerAndAdd
Vector128<int> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<int> addend, Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies corresponding values in the left
and right
vectors, and accumulates the results with the vector elements of the addend
vector and returns the result vector. As seen in example below, the result vector element’s size int
is twice as long as that of input’s size short
.
private Vector128<int> MultiplyBySelectedScalarWideningLowerAndAddTest(Vector128<int> addend, Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyBySelectedScalarWideningLowerAndAdd(addend, left, right, 2);
}
// addend = <1000, 1000, 1000, 1000>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 2
// Result = <1253, 1276, 1299, 1322>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<int> addend, Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<long> addend, Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<long> addend, Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<uint> addend, Vector64<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<uint> addend, Vector64<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<ulong> addend, Vector64<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningLowerAndAdd(Vector128<ulong> addend, Vector64<uint> left, Vector128<uint> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarWideningLowerAndAddTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
smlal v0.4s, v1.4h, v2.h[2]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
41. MultiplyBySelectedScalarWideningLowerAndSubtract
Vector128<int> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<int> minuend, Vector64<short> left, Vector64<short> right, byte rightIndex)
This method multiplies corresponding values in the left
and right
vectors, and substracts the results from the vector elements of the minuend
vector and returns the result. As seen in example below, the result vector element’s size int
is twice as long as that of input’s size short
.
private Vector128<int> MultiplyBySelectedScalarWideningLowerAndSubtractTest(Vector128<int> minuend, Vector64<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyBySelectedScalarWideningLowerAndSubtract(minuend, left, right, 2);
}
// minuend = <1000, 1000, 1000, 1000>
// left = <11, 12, 13, 14>
// right = <21, 22, 23, 24>
// rightIndex = 2
// Result = <747, 724, 701, 678>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<int> minuend, Vector64<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<long> minuend, Vector64<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<long> minuend, Vector64<int> left, Vector128<int> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<uint> minuend, Vector64<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<uint> minuend, Vector64<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<ulong> minuend, Vector64<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningLowerAndSubtract(Vector128<ulong> minuend, Vector64<uint> left, Vector128<uint> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarWideningLowerAndSubtractTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
smlsl v0.4s, v1.4h, v2.h[2]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
42. MultiplyBySelectedScalarWideningUpper
Vector128<int> MultiplyBySelectedScalarWideningUpper(Vector128<short> left, Vector64<short> right, byte rightIndex)
This method multiplies each vector element in the left
vector by the rightIndex
vector element of the right
vector, places the product in a result vector, and returns the result vector. As seen in example below, the result vector element int
size is twice as long as the elements that are multiplied short
.
private Vector128<int> MultiplyBySelectedScalarWideningUpperTest(Vector128<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyBySelectedScalarWideningUpper(left, right, 2);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 12, 13, 14>
// rightIndex = 2
// Result = <345, 368, 391, 414>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyBySelectedScalarWideningUpper(Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningUpper(Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningUpper(Vector128<int> left, Vector128<int> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningUpper(Vector128<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningUpper(Vector128<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningUpper(Vector128<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningUpper(Vector128<uint> left, Vector128<uint> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarWideningUpperTest(System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;* V02 arg2 [V02 ] ( 0, 0 ) ubyte -> zero-ref
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
smull2 v16.4s, v0.8h, v1.h[2]
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
43. MultiplyBySelectedScalarWideningUpperAndAdd
Vector128<int> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<int> addend, Vector128<short> left, Vector64<short> right, byte rightIndex)
This method multiplies corresponding values in left
and right
vectors, and accumulates the results with the vector elements of the addend
vector and return the result vector. As seen in example below, the result vector element int
size is twice as long as the elements that are multiplied short
.
private Vector128<int> MultiplyBySelectedScalarWideningUpperAndAddTest(Vector128<int> addend, Vector128<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyBySelectedScalarWideningUpperAndAdd(addend, left, right, 0);
}
// addend = <1000, 1000, 1000, 1000>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 12, 13, 14>
// rightIndex = 0
// Result = <1165, 1176, 1187, 1198>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<int> addend, Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<long> addend, Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<long> addend, Vector128<int> left, Vector128<int> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<uint> addend, Vector128<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<uint> addend, Vector128<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<ulong> addend, Vector128<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningUpperAndAdd(Vector128<ulong> addend, Vector128<uint> left, Vector128<uint> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarWideningUpperAndAddTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
smlal2 v0.4s, v1.8h, v2.h[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
44. MultiplyBySelectedScalarWideningUpperAndSubtract
Vector128<int> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<int> minuend, Vector128<short> left, Vector64<short> right, byte rightIndex)
This method multiplies corresponding values in left
and right
vectors, and substracts the results with the vector elements of the minuend
vector and return the result vector. As seen in example below, the result vector element int
size is twice as long as the elements that are multiplied short
.
private Vector128<int> MultiplyBySelectedScalarWideningUpperAndSubtractTest(Vector128<int> minuend, Vector128<short> left, Vector64<short> right, byte rightIndex)
{
return AdvSimd.MultiplyBySelectedScalarWideningUpperAndSubtract(minuend, left, right, 0);
}
// minuend = <11, 12, 13, 14>
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 12, 13, 14>
// rightIndex = 0
// Result = <-154, -164, -174, -184>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<int> minuend, Vector128<short> left, Vector128<short> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<long> minuend, Vector128<int> left, Vector64<int> right, byte rightIndex)
Vector128<long> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<long> minuend, Vector128<int> left, Vector128<int> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<uint> minuend, Vector128<ushort> left, Vector64<ushort> right, byte rightIndex)
Vector128<uint> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<uint> minuend, Vector128<ushort> left, Vector128<ushort> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<ulong> minuend, Vector128<uint> left, Vector64<uint> right, byte rightIndex)
Vector128<ulong> MultiplyBySelectedScalarWideningUpperAndSubtract(Vector128<ulong> minuend, Vector128<uint> left, Vector128<uint> right, byte rightIndex)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyBySelectedScalarWideningUpperAndSubtractTest(System.Runtime.Intrinsics.Vector128`1[Int32],System.Runtime.Intrinsics.Vector128`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16],ubyte):System.Runtime.Intrinsics.Vector128`1[Int32]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;* V03 arg3 [V03 ] ( 0, 0 ) ubyte -> zero-ref
;# V04 OutArgs [V04 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
smlsl2 v0.4s, v1.8h, v2.h[0]
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
45. MultiplyDoublingByScalarSaturateHigh
Vector64<short> MultiplyDoublingByScalarSaturateHigh(Vector64<short> left, Vector64<short> right)
This method multiplies each vector element in the left
by the rightIndex
vector element of the right
vector, doubles the results, places the most significant half of the final results in a result vector.
private Vector64<short> MultiplyDoublingByScalarSaturateHighTest(Vector64<short> left, Vector64<short> right)
{
return AdvSimd.MultiplyDoublingByScalarSaturateHigh(left, right);
}
// left = <1000, 12, 13, 14>
// right = <100, 22, 23, 24>
// Result = <3, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> MultiplyDoublingByScalarSaturateHigh(Vector64<int> left, Vector64<int> right)
Vector128<short> MultiplyDoublingByScalarSaturateHigh(Vector128<short> left, Vector64<short> right)
Vector128<int> MultiplyDoublingByScalarSaturateHigh(Vector128<int> left, Vector64<int> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:MultiplyDoublingByScalarSaturateHighTest(System.Runtime.Intrinsics.Vector64`1[Int16],System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqdmulh v16.4h, v0.4h, v1.h[0]
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8