Automating Deobfuscation of XorStringsNet

Recently I wanted to learn a bit more about the .NET Common Intermediate Language (CIL). The CIL is basically the equivalent to assembly for .NET managed code - if you compile a .NET assembly, the assembly is made out of CIL instructions. If you know me, I like to learn by doing so my plan was to write a de-obfsucator for a .NET obfuscator. Since I already worked with dr4k0nia's XorStringsNET and knew how it worked internally, I decided to go for it. There are deobfuscators already for it, which either just output the strings or, with her own approach, deobfuscate the assembly, leveraging de4dot. However, none of these approaches is fully automatic and returns a clean binary. For her approach you would need to identify the token of the string decryption method first by manually looking at the decompilation and the other only prints the identified strings. That's why I wanted to create one that can be used in an automated malware deobfuscation pipeline.

XorStringsNet

Dr4k0nia describes how the tool works in her blog post Encrypting strings in .NET, so I am not going to explain it in-depth here. Simply put, it adds a new module (with a random GUID as the name) to an assembly:

This module implements a strings decryption function:

Finally, all strings are replaced with a call to this function and an ID for the string (as here for "Hello World")

Internally, the added module has a big encrypted blob of data, where each string is represented as one block. The ID is used to index this blob and return the correct block to decrypt.

The blocks are structured as such:

[ INT32 DataLength ] [ INT32 XOR-Key ] [ BYTE[] Data ]

The decryption routine gets the block, gets the XOR-Key and the length of the data to decrypt, decrypts it and returns the plain text string.

One crucial detail is that the string ID (the index to the data blob pointing to the right block for the string) is encrypted with a global XOR key stored in the first bytes prepended to the encrypted blob. So the whole structure could e.g. look like this:

 Global Key | String Blob 1                | String Blob 2 
 [ INT32 ]  | [INT32] [INT32] [ BYTE[11] ] | [INT32] [INT32] [ BYTE[200] ]

The ID 434752308 XOR'ed with the global key would then e.g. return the index 1, pointing to the first block in the whole blob, where the "Hello World" string would be stored.

Automating the Deobfuscation

All we need to do to automate the decryption is thus:

Get the global key
Get the encrypted data blob
Get all calls to the decryption routine
Replace said calls with the decrypted content of the respective blob

Just like XORStringsNet I used the amazing AsmResolver library, a library for the manipulation of .NET PEs.

With the following code we can locate the decryption module, by looking for a type with a GUID name and validating it by looking for the expected method signature (the decryption method, also named by GUID) in each matching type:

foreach (var type in _module.GetAllTypes())
{
    if (IsValidUuid(type.FullName))
    {
        Console.WriteLine($"    Found potential encryption type: {type.FullName}");
        encryptionType = type;
        foreach (var method in type.Methods)
        {
            // method name should be a uuid
            if (IsValidUuid(method.Name))
            {
                // Validate parameter and return type
                if (method.ParameterDefinitions.Count()  1 && method.Parameters[0].ParameterType.FullName  "System.Int32")
                {
                    if (method.Signature.ReturnType.FullName  "System.String") ;
                    {
                        Console.WriteLine($"    Found encryption method:\n    {method.FullName}");
                        encryptionMethod = method;
                        break;
                    }
                }
            }
        }
    }
}

After that, we can locate the encrypted blob from the class fields and extract the global key from the beginning of it and convert the relative virtual address (RVA) to a file offset to read it:

// the decryption module only has one field
var field = encryptionType.Fields[0];

var rva = field.FieldRva.Rva;
_fileOffset = _pefile.RvaToFileOffset(rva);

Console.WriteLine("    Found encrypted data at RVA 0x" + rva.ToString("X8"));
Console.WriteLine("    File offset:                0x" + _fileOffset.ToString("X8"));

// Global key is stored as an int at the beginning of the file
int globalKey = BitConverter.ToInt32(ReadEncryptedData(0, sizeof(int)));
Console.WriteLine($"[*] Found global key: {globalKey}");

Armed with this key, we can just copy the decryption algorithm from the injected method and call it for each call to the method to decrypt the strings. For this, we just iterate over all calls, check against our identified method and replace their calls with a call to ldstr <decrypted string>, the CIL instruction to load a string onto the stack.

// Loop through all types in the module
foreach (var type in _module.GetAllTypes())
{
    // Loop through all methods in the type
    foreach (var method in type.Methods)
    {
        // Skip empty methods and the encryption method itself
        if (method.CilMethodBody  null) continue;
        if (method.FullName  encryptionMethod.FullName) continue;

        // Loop through each instruction in the methods body
        var instructions = method.CilMethodBody.Instructions;
        for (int i = 0; i < instructions.Count; i++)
        {
            var instruction = instructions[i];

            // Check if the instruction is a call to the decryption method
            if (instruction.OpCode  CilOpCodes.Call || instruction.OpCode  CilOpCodes.Callvirt)
            {
                if (instruction.Operand is MethodDefinition calledMethod && calledMethod.Name  encryptionMethod.Name)
                {
                    /*
                        * we could load the malwares method and call it, 
                        * but i dont like loading malware modules into my code >:(
                        * So RXOR() is reimplementation
                        */
                    string decrypted;

                    // the operation before the call pushes the string id to the stack
                    int string_id = (int)instructions[i - 1].Operand;

                    // we decrypt the id with the global key to get the offset
                    var offset = globalKey ^ string_id;
                    Console.WriteLine($"  String ID [{string_id}] @ data+{offset}");

                    // decrypt
                    // [ length ] [ key ] [ encrypted_data ]
                    var dataLength = BitConverter.ToInt32(ReadEncryptedData((ulong)offset, sizeof(int)));
                    var xorKey = ReadEncryptedData((ulong)offset + sizeof(int), sizeof(int));
                    var data = ReadEncryptedData((ulong)offset + sizeof(int) * 2, dataLength);

                    // Empty strings have a negative ID
                    if (string_id >> 31 != 0)
                        decrypted = String.Empty;
                    else
                        decrypted = Encoding.UTF8.GetString(RXOR(data, xorKey, dataLength));
                    Console.WriteLine($"  - {decrypted}");

                    instructions.RemoveAt(i - 1); // remove ldc.id4 <ID>
                    instruction.ReplaceWithNop(); // nop call to decryption method
                    instructions.Insert(i, new CilInstruction(CilOpCodes.Ldstr, decrypted));
                }
            }
        }
    }
}

Interestingly, there's a bug in the implementation of XorStringsNet, which causes the encryption to only use the first byte of the encryption key. That cost me quite some sanity debuging the decryption.

You might have noticed the calls to RemoveAt and ReplaceWithNop. To understand this, we need to look at the CIL opcodes of an obfuscated vs. and unobfuscated assembly:

Patching the Assembly

The CIL is heavily stack based, and if we simply try to replace the calls to the decryption function with ldstr instructions, we mess up the stack.

Comparing the CIL instructions of Console.WriteLine("Hello World") versus its obfuscated equivalent in dnSpy, it becomes evident why: The obfuscated version loads the ID of the encrypted string onto the stack, before calling the decryption routine, which in turn puts the string onto the stack - we thus have one value too much on the stack, if we simply replace the call with a ldstr, since the id is never popped.

vs. the plain version:

This can be worked around by simply removing the ldc.i4 before our function call instruction. This does not mess with any offsets, if we nop out the call to the encryption routine and insert the call to ldstr <DecryptedString> - we remove one instruction and we add one.

The resulting CIL looks like this. Notice how the amount of instructions (5) is unchanged, compared to the screenshot from the obfuscated code above and it also has the same amount of stack pushes than the plain text version:

With this, we successfully restored the strings in the binary and can run it without crashing, while also having clear text strings when decompiling:

Now we can use this code anytime we encounter vanilla XorStringsNet binaries.

For the code see https://github.com/eversinc33/UnXorStringsNet

Happy Hacking!