What is it?
Antigen is a fuzzer that generates random C# programs on the fly to test .NET’s RyuJIT.
How does it work?
Antigen generates random C# programs and execute them in baseline and test mode. In baseline mode there is minimal optimizations enabled, while in test mode, it executes with full optimization, and some other stress switches to turn ON/OFF certain optimizations. It then compares the output of baseline and test execution and reports back the programs if their output didn’t match or if there were any asserts hit during execution.
How does it help uncover issues?
Most of the unit tests are handwritten and might test a specific scenario so it might be hard to catch hidden issues. Antigen, like any other fuzzers, usually generate random C# code, sometimes having long expressions and statements that might not even be written in real world code patterns but does a great stress testing of the compilers and code generation component. By fuzzing, if there is any hidden issue in code generation, either we would hit the asserts in the compiler code or worst would give different output than baseline meaning the compiler didn’t generate the machine code correctly. In either way, it uncovers several compiler issues.
What are the examples of expressions generated?
Antigen currently generates range of expressions:
- Literals: All primitive types like
int
,long
,float
,char
,string
, etc. - Variable references
- Unary and Binary operations
- Assignments
- Struct and nested struct declaration and usage
- Method declaration
- Method calls
Below is a sample expression generated by Antigen.
s_double_10 -= ((double)(((double)(((double)(((double)(((double)(((double)(LeafMethod4() - -2)) + ((double)(s_double_10 %= ((double)((s_double_10) + 58)))))) - ((double)(double_55 += ((double)(s_double_10 + LeafMethod4())))))) % ((double)((((double)(((double)(((double)(double_55 /= ((double)((s_double_10) + 22)))) + double_28)) / ((double)((((double)(s_double_10 *= ((double)(LeafMethod4() + double_28))))) + 96))))) + 2)))) + ((double)(((double)(((double)(((double)(p_double_44 += double_55)) % ((double)((p_double_44) + 44)))) - ((double)(((double)(double_55 - p_double_44)) + ((double)(p_double_44 += double_28)))))) + ((double)(((double)(double_28 += ((double)(s_double_10 / ((double)((double_28) + 7)))))) + ((double)(((double)(double_55 % ((double)((s_double_10) + 30)))) + LeafMethod4())))))))) * ((double)(((double)(((double)(((double)(double_28 + ((double)(double_55 * LeafMethod4())))) * ((double)(double_28 /= ((double)((((double)(LeafMethod4() + double_28))) + 72)))))) * ((double)(((double)(((double)(double_28 * LeafMethod4())) % ((double)((((double)(double_55 % ((double)((LeafMethod4()) + 57))))) + 8)))) + ((double)(((double)(LeafMethod4() % ((double)((s_double_10) + 67)))) * ((double)(double_28 += s_double_10)))))))) + p_double_44))));
Note: Antigen currently cast every right side expression to the left side variable type and there is a TODO item to get rid of them. Likewise, there is a work item to eliminate unwanted parenthesis.
What are the examples of statements generated?
Antigen currently generates range of statements:
- Variable Declaration
- Assignment statements
- if-else statements
- Loops:
for
,while-do
,do-while
- try-catch-finally
- switch-case
Here is a section of program that Antigen generated:
byte_1 &= ((byte)(s_byte_1 &= ((byte)(((byte)(s_byte_1 %= ((byte)((((byte)(((byte)(((byte)(byte_1 * s_byte_1)) % ((byte)((((byte)(s_byte_1 ^= s_byte_1))) + 96)))) % ((byte)((byte_1) + 43))))) + 77)))) + ((byte)(LeafMethod1() + ((byte)(((byte)(((byte)(s_byte_1 - s_byte_1)) / ((byte)((((byte)(LeafMethod1() * LeafMethod1()))) + 65)))) - ((byte)(((byte)(s_byte_1 / ((byte)((LeafMethod1()) + 61)))) | ((byte)(byte_1 * byte_1))))))))))));
int __loopvar9 = s_loopInvariant - 12, __loopSecondaryVar9_0 = s_loopInvariant - 10;
do
{
__loopvar9 += 4;
if (__loopvar9 > s_loopInvariant + 4)
break;
__loopSecondaryVar9_0 += 3;
s_double_4 %= ((double)(double_4 = LeafMethod4()));
if (bool_0)
{
int __loopvar7 = s_loopInvariant + 10;
while ((((bool)(bool_0 = ((bool)(bool_0 = bool_0))))))
{
__loopvar7 -= 4;
if (__loopvar7 <= s_loopInvariant - 4)
break;
LeafMethod10();
long_7 ^= ((long)(((long)(long_7 %= ((long)((((long)(((long)(((long)(long_7 & LeafMethod7())) - ((long)(long_7 = LeafMethod7())))) + ((long)(s_long_7 <<= ((int)(((int)(LeafMethod6() & LeafMethod6())) ^ ((int)(p_int_5 ^= int_6))))))))) + 90)))) * long_7));
LeafMethod15();
}
LeafMethod11();
s_uint_12 <<= s_int_6;
}
else
{
int __loopvar8 = s_loopInvariant, __loopSecondaryVar8_0 = s_loopInvariant - 10;
for (;; __loopSecondaryVar8_0 += 3)
{
__loopvar8 += 4;
if (__loopvar8 > s_loopInvariant + 16)
break;
s_sbyte_8 &= ((sbyte)(((sbyte)(sbyte_8 -= ((sbyte)(s_sbyte_8 = ((sbyte)(sbyte_8 >> ((int)(((int)(s_int_6 - LeafMethod6())) + ((int)(int_6 - LeafMethod6())))))))))) * s_sbyte_8));
sbyte_8 /= ((sbyte)(((sbyte)(((sbyte)(LeafMethod8() << ((int)(((int)(int_6 >> 1)) % ((int)((((int)(int_6 += ((int)(LeafMethod6() & LeafMethod6()))))) + 28)))))) * ((sbyte)(((sbyte)(sbyte_8 <<= s_int_6)) % ((sbyte)((((sbyte)(sbyte_8 &= sbyte_8))) + 31)))))) ^ ((sbyte)(((sbyte)(s_sbyte_8 <<= LeafMethod6())) >> LeafMethod6()))));
s2_15.ulong_1 %= ((ulong)(ulong_13 ^ ((ulong)(((ulong)(((ulong)(ulong_13 = ((ulong)(((ulong)(s_ulong_13 -= s_ulong_13)) * ((ulong)(s2_15.ulong_1 %= ((ulong)((s2_15.ulong_1) + 51)))))))) * ((ulong)(((ulong)(s_ulong_13 % ((ulong)((((ulong)(ulong_13 &= s2_15.ulong_1))) + 10)))) / ((ulong)((LeafMethod13()) + 71)))))) * ulong_13))));
s_int_6 &= ((int)(LeafMethod6() % ((int)((((int)(s_int_6 -= ((int)(((int)(((int)(4 | ((int)(s_int_6 |= s_int_6)))) - s_int_6)) | ((int)(((int)(((int)(s_int_6 % ((int)((1) + 26)))) ^ ((int)(int_6 ^= s_int_6)))) / ((int)((((int)(s_int_6 = ((int)(int_6 - int_6))))) + 11))))))))) + 12))));
long_7 ^= ((long)(((long)(((long)(((long)(s_long_7 <<= ((int)(p_int_5 >>= s_int_6)))) << ((int)(int_6 -= ((int)(((int)(p_int_5 -= int_6)) % ((int)((((int)(LeafMethod6() - s_int_6))) + 22)))))))) ^ ((long)(long_7 * LeafMethod7())))) | ((long)(((long)(((long)(long_7 | long_7)) >> ((int)(s_int_6 -= LeafMethod6())))) ^ s_long_7))));
LeafMethod10();
uint_12 = ((uint)(uint_12 - ((uint)(((uint)(p_uint_0 |= ((uint)(p_uint_0 >>= ((int)(((int)(s_int_6 |= int_6)) >> ((int)(s_int_6 << 94)))))))) * ((uint)(((uint)(((uint)(((uint)(s_uint_12 + LeafMethod12())) / ((uint)((uint_12) + 22)))) % ((uint)((((uint)(((uint)(s_uint_12 * LeafMethod12())) % ((uint)((((uint)(s_uint_12 + LeafMethod12()))) + 51))))) + 54)))) % ((uint)((((uint)(((uint)(((uint)(uint_12 % ((uint)((LeafMethod12()) + 10)))) + ((uint)(LeafMethod12() / ((uint)((LeafMethod12()) + 66)))))) & ((uint)(s_uint_12 = ((uint)(uint_12 | uint_12))))))) + 86))))))));
s2_15.ulong_1 &= ulong_13;
s_ushort_11 *= ushort_11;
sbyte_8 &= ((sbyte)(sbyte_8 = LeafMethod8()));
}
}
LeafMethod3();
}
while ((((bool)(int_6 == ((int)(((int)(((int)(int_6 * ((int)(((int)(s_int_6 /= ((int)((LeafMethod6()) + 43)))) ^ ((int)(int_6 ^= int_6)))))) & ((int)(int_6 -= ((int)(s_int_6 &= ((int)(int_6 % ((int)((p_int_5) + 21)))))))))) % ((int)((((int)(s_int_6 *= ((int)(((int)(int_6 &= ((int)(p_int_5 >> s_int_6)))) & ((int)(((int)(s_int_6 -= LeafMethod6())) ^ ((int)(LeafMethod6() | s_int_6))))))))) + 48))))))));
There is a long list of other statements, functionality that needs to be added like having more than 1 class, single dimension and multi-dimension arrays, SIMD APIs, etc.
How many test cases are validated / hour?
To recap, Antigen generates C# program, runs it using Corerun in baseline mode and another Corerun
that runs in test mode. There is fair amount of improvement that can be done to this process, but for now, with this model, on a dual-core machine, with 2 threads, Antigen can generate and validate 1000 test cases / hour.
Where is it used?
Antigen is incorporated in dotnet/runtime repository to run weekly and on-demand on PRs that are making changes to RyuJIT.
What are the real issues found?
Below are some of the examples of .NET issues found by Antigen:
Linux/Arm: BBJ_ALWAYS block remains unvisited during dominance computation
Assertion failed '!nodeInfo.IsLclVarWrite() || !unusedLclVarReads.Contains(nodeInfo.LclNum())
Assertion failed '((tree->gtDebugFlags & GTF_DEBUG_NODE_MORPHED) == 0) && "ERROR: Already morphed this node!"
Assertion failed 'ssaNum != SsaConfig::RESERVED_SSA_NUM
block->bbNatLoopNum == BasicBlock::NOT_IN_LOOP
operand->gtUseNum == -1 during 'Generate code'
Assertion failed 'lvaStackPointerVar != 0xCCCCCCCC && compiler->lvaTable[lvaStackPointerVar].lvDoNotEnregister && compiler->lvaTable[lvaStackPointerVar].lvOnFrame'
Can we get a reduce repro code?
Antigen also comes with a component called Trimmer
which would trim the C# code as much as possible while still making sure that the original issue reproduces. Currently, it has very limited capability and is slow, but there are plans to improve it going forward.
Where is the source code?
The source code is in https://github.com/kunalspathak/antigen and contributions are welcome.
What’s up with the name “Antigen”?
“Antigen” name was chosen as a reminder that this tool was developed during Covid era. The name comes from one of the Covid-19 testing methodology “Rapid Antigen test (RAT)”. Just as RAT was used to detect covid symptoms, Antigen tool is used to detect any issues in .NET.
Other tools
Jakob Botsch Nielsen has developed Fuzzlyn which fuzzes C# code and find issues in .NET code.