Antigen

What is it?

Antigen is a fuzzer that generates random C# programs on the fly to test .NET’s RyuJIT.

How does it work?

Antigen generates random C# programs and execute them in baseline and test mode. In baseline mode there is minimal optimizations enabled, while in test mode, it executes with full optimization, and some other stress switches to turn ON/OFF certain optimizations. It then compares the output of baseline and test execution and reports back the programs if their output didn’t match or if there were any asserts hit during execution.

How does it help uncover issues?

Most of the unit tests are handwritten and might test a specific scenario so it might be hard to catch hidden issues. Antigen, like any other fuzzers, usually generate random C# code, sometimes having long expressions and statements that might not even be written in real world code patterns but does a great stress testing of the compilers and code generation component. By fuzzing, if there is any hidden issue in code generation, either we would hit the asserts in the compiler code or worst would give different output than baseline meaning the compiler didn’t generate the machine code correctly. In either way, it uncovers several compiler issues.

What are the examples of expressions generated?

Antigen currently generates range of expressions:

Literals: All primitive types like int, long, float, char, string, etc.
Variable references
Unary and Binary operations
Assignments
Struct and nested struct declaration and usage
Method declaration
Method calls

Below is a sample expression generated by Antigen.

s_double_10 -= ((double)(((double)(((double)(((double)(((double)(((double)(LeafMethod4() - -2)) + ((double)(s_double_10 %= ((double)((s_double_10) + 58)))))) - ((double)(double_55 += ((double)(s_double_10 + LeafMethod4())))))) % ((double)((((double)(((double)(((double)(double_55 /= ((double)((s_double_10) + 22)))) + double_28)) / ((double)((((double)(s_double_10 *= ((double)(LeafMethod4() + double_28))))) + 96))))) + 2)))) + ((double)(((double)(((double)(((double)(p_double_44 += double_55)) % ((double)((p_double_44) + 44)))) - ((double)(((double)(double_55 - p_double_44)) + ((double)(p_double_44 += double_28)))))) + ((double)(((double)(double_28 += ((double)(s_double_10 / ((double)((double_28) + 7)))))) + ((double)(((double)(double_55 % ((double)((s_double_10) + 30)))) + LeafMethod4())))))))) * ((double)(((double)(((double)(((double)(double_28 + ((double)(double_55 * LeafMethod4())))) * ((double)(double_28 /= ((double)((((double)(LeafMethod4() + double_28))) + 72)))))) * ((double)(((double)(((double)(double_28 * LeafMethod4())) % ((double)((((double)(double_55 % ((double)((LeafMethod4()) + 57))))) + 8)))) + ((double)(((double)(LeafMethod4() % ((double)((s_double_10) + 67)))) * ((double)(double_28 += s_double_10)))))))) + p_double_44))));

Note: Antigen currently cast every right side expression to the left side variable type and there is a TODO item to get rid of them. Likewise, there is a work item to eliminate unwanted parenthesis.

What are the examples of statements generated?

Antigen currently generates range of statements:

Variable Declaration
Assignment statements
if-else statements
Loops: for, while-do, do-while
try-catch-finally
switch-case

Here is a section of program that Antigen generated:

byte_1 &= ((byte)(s_byte_1 &= ((byte)(((byte)(s_byte_1 %= ((byte)((((byte)(((byte)(((byte)(byte_1 * s_byte_1)) % ((byte)((((byte)(s_byte_1 ^= s_byte_1))) + 96)))) % ((byte)((byte_1) + 43))))) + 77)))) + ((byte)(LeafMethod1() + ((byte)(((byte)(((byte)(s_byte_1 - s_byte_1)) / ((byte)((((byte)(LeafMethod1() * LeafMethod1()))) + 65)))) - ((byte)(((byte)(s_byte_1 / ((byte)((LeafMethod1()) + 61)))) | ((byte)(byte_1 * byte_1))))))))))));
int __loopvar9 = s_loopInvariant - 12, __loopSecondaryVar9_0 = s_loopInvariant - 10;
do
{
    __loopvar9 += 4;
    if (__loopvar9 > s_loopInvariant + 4)
        break;
    __loopSecondaryVar9_0 += 3;
    s_double_4 %= ((double)(double_4 = LeafMethod4()));
    if (bool_0)
    {
        int __loopvar7 = s_loopInvariant + 10;
        while ((((bool)(bool_0 = ((bool)(bool_0 = bool_0))))))
        {
            __loopvar7 -= 4;
            if (__loopvar7 <= s_loopInvariant - 4)
                break;
            LeafMethod10();
            long_7 ^= ((long)(((long)(long_7 %= ((long)((((long)(((long)(((long)(long_7 & LeafMethod7())) - ((long)(long_7 = LeafMethod7())))) + ((long)(s_long_7 <<= ((int)(((int)(LeafMethod6() & LeafMethod6())) ^ ((int)(p_int_5 ^= int_6))))))))) + 90)))) * long_7));
            LeafMethod15();
        }

        LeafMethod11();
        s_uint_12 <<= s_int_6;
    }
    else
    {
        int __loopvar8 = s_loopInvariant, __loopSecondaryVar8_0 = s_loopInvariant - 10;
        for (;; __loopSecondaryVar8_0 += 3)
        {
            __loopvar8 += 4;
            if (__loopvar8 > s_loopInvariant + 16)
                break;
            s_sbyte_8 &= ((sbyte)(((sbyte)(sbyte_8 -= ((sbyte)(s_sbyte_8 = ((sbyte)(sbyte_8 >> ((int)(((int)(s_int_6 - LeafMethod6())) + ((int)(int_6 - LeafMethod6())))))))))) * s_sbyte_8));
            sbyte_8 /= ((sbyte)(((sbyte)(((sbyte)(LeafMethod8() << ((int)(((int)(int_6 >> 1)) % ((int)((((int)(int_6 += ((int)(LeafMethod6() & LeafMethod6()))))) + 28)))))) * ((sbyte)(((sbyte)(sbyte_8 <<= s_int_6)) % ((sbyte)((((sbyte)(sbyte_8 &= sbyte_8))) + 31)))))) ^ ((sbyte)(((sbyte)(s_sbyte_8 <<= LeafMethod6())) >> LeafMethod6()))));
            s2_15.ulong_1 %= ((ulong)(ulong_13 ^ ((ulong)(((ulong)(((ulong)(ulong_13 = ((ulong)(((ulong)(s_ulong_13 -= s_ulong_13)) * ((ulong)(s2_15.ulong_1 %= ((ulong)((s2_15.ulong_1) + 51)))))))) * ((ulong)(((ulong)(s_ulong_13 % ((ulong)((((ulong)(ulong_13 &= s2_15.ulong_1))) + 10)))) / ((ulong)((LeafMethod13()) + 71)))))) * ulong_13))));
            s_int_6 &= ((int)(LeafMethod6() % ((int)((((int)(s_int_6 -= ((int)(((int)(((int)(4 | ((int)(s_int_6 |= s_int_6)))) - s_int_6)) | ((int)(((int)(((int)(s_int_6 % ((int)((1) + 26)))) ^ ((int)(int_6 ^= s_int_6)))) / ((int)((((int)(s_int_6 = ((int)(int_6 - int_6))))) + 11))))))))) + 12))));
            long_7 ^= ((long)(((long)(((long)(((long)(s_long_7 <<= ((int)(p_int_5 >>= s_int_6)))) << ((int)(int_6 -= ((int)(((int)(p_int_5 -= int_6)) % ((int)((((int)(LeafMethod6() - s_int_6))) + 22)))))))) ^ ((long)(long_7 * LeafMethod7())))) | ((long)(((long)(((long)(long_7 | long_7)) >> ((int)(s_int_6 -= LeafMethod6())))) ^ s_long_7))));
            LeafMethod10();
            uint_12 = ((uint)(uint_12 - ((uint)(((uint)(p_uint_0 |= ((uint)(p_uint_0 >>= ((int)(((int)(s_int_6 |= int_6)) >> ((int)(s_int_6 << 94)))))))) * ((uint)(((uint)(((uint)(((uint)(s_uint_12 + LeafMethod12())) / ((uint)((uint_12) + 22)))) % ((uint)((((uint)(((uint)(s_uint_12 * LeafMethod12())) % ((uint)((((uint)(s_uint_12 + LeafMethod12()))) + 51))))) + 54)))) % ((uint)((((uint)(((uint)(((uint)(uint_12 % ((uint)((LeafMethod12()) + 10)))) + ((uint)(LeafMethod12() / ((uint)((LeafMethod12()) + 66)))))) & ((uint)(s_uint_12 = ((uint)(uint_12 | uint_12))))))) + 86))))))));
            s2_15.ulong_1 &= ulong_13;
            s_ushort_11 *= ushort_11;
            sbyte_8 &= ((sbyte)(sbyte_8 = LeafMethod8()));
        }
    }

    LeafMethod3();
}
while ((((bool)(int_6 == ((int)(((int)(((int)(int_6 * ((int)(((int)(s_int_6 /= ((int)((LeafMethod6()) + 43)))) ^ ((int)(int_6 ^= int_6)))))) & ((int)(int_6 -= ((int)(s_int_6 &= ((int)(int_6 % ((int)((p_int_5) + 21)))))))))) % ((int)((((int)(s_int_6 *= ((int)(((int)(int_6 &= ((int)(p_int_5 >> s_int_6)))) & ((int)(((int)(s_int_6 -= LeafMethod6())) ^ ((int)(LeafMethod6() | s_int_6))))))))) + 48))))))));

There is a long list of other statements, functionality that needs to be added like having more than 1 class, single dimension and multi-dimension arrays, SIMD APIs, etc.

How many test cases are validated / hour?

To recap, Antigen generates C# program, runs it using Corerun in baseline mode and another Corerun that runs in test mode. There is fair amount of improvement that can be done to this process, but for now, with this model, on a dual-core machine, with 2 threads, Antigen can generate and validate 1000 test cases / hour.

Where is it used?

Antigen is incorporated in dotnet/runtime repository to run weekly and on-demand on PRs that are making changes to RyuJIT.

What are the real issues found?

Below are some of the examples of .NET issues found by Antigen:

Can we get a reduce repro code?

Antigen also comes with a component called Trimmer which would trim the C# code as much as possible while still making sure that the original issue reproduces. Currently, it has very limited capability and is slow, but there are plans to improve it going forward.

Where is the source code?

The source code is in https://github.com/kunalspathak/antigen and contributions are welcome.

What’s up with the name “Antigen”?

“Antigen” name was chosen as a reminder that this tool was developed during Covid era. The name comes from one of the Covid-19 testing methodology “Rapid Antigen test (RAT)”. Just as RAT was used to detect covid symptoms, Antigen tool is used to detect any issues in .NET.

Other tools

Jakob Botsch Nielsen has developed Fuzzlyn which fuzzes C# code and find issues in .NET code.