(Anti-)Anti-Rootkit Techniques - Part II: Stomped Drivers and Hidden Threads

At the end of Part I of this Series, we ended up with a small anti-rootkit driver, that was able to detect malicious drivers mapped to unbacked memory if they either run as a standard Windows Driver (that registers a device object for IRP communication) or run any thread in unbacked memory at all - unless they employ some other anti-anti-rootkit techniques.

This post will cover some evasions against this specific anti-rootkit and as such build upon Part I - if you have not read it, you might want to do it now. It is a rather short read anyway. Also check out my rootkit Banshee and the anti-rootkit unKover. This post is mainly an aggregation of known anti-rootkit/anti-cheat evasion techniques and me coming up with ways to detect them.

Detection 1: Detecting driver "stomping"

The last part was mainly about detecting rootkits that are mapped to memory, using a mapper such as kdmapper. Generally, these tools map a driver manually to kernel memory, using an arbitrary write primitive in a vulnerable, signed driver - so the premise of the last blog post was that detecting threads originating from unbacked memory is one way to detect these types of rootkits.

I ended the previous post with a short word on driver "stomping", i.e. loading the rootkit over an existing driver in memory. As I mentioned, this can easily be detected by simply comparing a driver's .text section on disk to its .text section in memory (analogous to detecting module stomping).

The implementation is really straightforward (as usual, error handling ommited for brevity):

First, we iterate over the \Driver directory, as known from Part I:

// Get Handle to \Driver directory
InitializeObjectAttributes(&attributes, &directoryName, OBJ_CASE_INSENSITIVE, NULL, NULL);
status = ZwOpenDirectoryObject(&handle, DIRECTORY_ALL_ACCESS, &attributes);
status = ObReferenceObjectByHandle(handle, DIRECTORY_ALL_ACCESS, nullptr, KernelMode, &directory, nullptr);

POBJECT_DIRECTORY directoryObject = (POBJECT_DIRECTORY)directory;
ULONG_PTR hashBucketLock = directoryObject->Lock;

DbgPrint("Scanning DriverObjects...\n");

// Lock the hashbucket
KeEnterCriticalRegion();
ExAcquirePushLockExclusiveEx(&hashBucketLock, 0);

for (POBJECT_DIRECTORY_ENTRY entry : directoryObject->HashBuckets)
{
    while (entry != nullptr && entry->Object)
    {
        PDRIVER_OBJECT driver = (PDRIVER_OBJECT)entry->Object;

Then, we get the driver service name and look up its path in the registry. (This is flawed, as a rootkit can spoof this as well, by setting this value to point to the actual rootkit driver - but then, the attacker has to drop it to disk (or hook the filesystem driver and spoof it, but that requires some additional effort)):

        // Get driver service name to lookup path to binary. Strip \Driver prefix
        UkStripDriverPrefix(&driver->DriverName, &driverServiceName);

        // This queries the registry for the image path
        NTSTATUS status = UkGetDriverImagePath(&driverServiceName, &imagePath);

        // Create an absolute path
        status = UkPrependWindowsPathIfStartsWithSystem32(&imagePath, &imagePathAbsolute);

With this absolute path, we can now compare it to the image in memory:

        // read the image and compare it to the in memory image
        ULONG fileSize = 0;
        status = UkReadFileToMemory(&imagePathAbsolute, &fileBuffer, &fileSize);

        // compare .text sections
        if (!NT_SUCCESS(UkGetPeSection(".text", fileBuffer, textSectionOnDiskBuffer, &sectionSizeOnDisk))
            || !NT_SUCCESS(UkGetPeSection(".text", driver->DriverStart, textSectionInMemBuffer, &sectionSizeInMem))
            || !textSectionOnDiskBuffer || !textSectionInMemBuffer)
        {
            goto Next;
        }

        if (RtlCompareMemory(textSectionOnDiskBuffer, textSectionInMemBuffer, sectionSizeOnDisk) != sectionSizeOnDisk)
        {
            DbgPrint("-- [!] .TEXT SECTION DIFFERS\n");
        }

    Next:
        /* [...] Cleanup */
        entry = entry->ChainLink;
    }
}

ExReleasePushLockExclusiveEx(&hashBucketLock, 0);
KeLeaveCriticalRegion();
ObDereferenceObject(directory);
ZwClose(handle);

I implemented this in unKover and it seems to work quite well - I did not find any self-modifying driver's, i.e. false positives, on my machine at all (However, for the Ghost Drivers, some path resolving adjustments would need to be done).

But what if the mapper does not use the .text section, but rather some other section such as .data or .rdata? This is implemented in tools such as SinMapper or lpmapper - for the latter, VollRagm describes detection vectors in his blog post Abusing LargePageDrivers to copy shellcode into valid kernel modules himself. For SinMapper, we should be able to detect threads with the usual methods from Part I, except that this time we should check not only for unbacked memory, but also for threads originating from non-.text sections. One more for the unKover backlog...

Anyway, driver "stomping" seems to not be the silver bullet to get rid of thread detections, so what do we do to evade detection? We attack unKover's flawed implementation of emumerating threads and their start addresses.

Windows Thread Internals: Handle Tables and the PspCidTable

In each implemented technique that checks thread's start addresses, unKover uses PsLookupThreadByThreadId, a routine exported by ntoskrnl.exe. If we look at its implementation by decompiling the function in IDA, we see that it internally calls the
private PspReferenceCidTableEntry routine:

If we take a look at that function, we can see a reference to a global symbol named PspCidTable:

This is a pointer to a handle table, which (among others) contains handles to threads (as can be seen from calling ExpLookupHandleTableEntry with the target thread ID). A handle table is simply a page sized block that stores up to 256 handle entries or references to other handle tables (see (3) What are Windows Handles - Windows Internals Explained (guidedhacking.com)). Below are the relevant C structs for handle tables:

typedef struct _HANDLE_TABLE
{
     ULONG TableCode;
     PEPROCESS QuotaProcess;
     PVOID UniqueProcessId;
     EX_PUSH_LOCK HandleLock;
     LIST_ENTRY HandleTableList;
     EX_PUSH_LOCK HandleContentionEvent;
     PHANDLE_TRACE_DEBUG_INFODebugInfo;
     LONG ExtraInfoPages;
     ULONG Flags;
     ULONG StrictFIFO: 1;
     LONG FirstFreeHandle;
     PHANDLE_TABLE_ENTRY LastFreeHandleEntry;
     LONG HandleCount;
     ULONG NextHandleNeedingPool;
} HANDLE_TABLE, *PHANDLE_TABLE;

typedef struct _HANDLE_TABLE_ENTRY
{
     union
     {
          PVOID Object;
          ULONG ObAttributes;
          PHANDLE_TABLE_ENTRY_INFO InfoTable;
          ULONG Value;
     };
     union
     {
          ULONG GrantedAccess;
          struct
          {
               WORD GrantedAccessIndex;
               WORD CreatorBackTraceIndex;
          };
          LONG NextFreeTableEntry;
     };
} HANDLE_TABLE_ENTRY, *PHANDLE_TABLE_ENTRY;

If we dig a bit deeper, we actually find out that this specific handle table is also used by PsLookupProcessByProcessId - the PspCidTable is thus the pool which is used for generating unique Process and Thread (Client) IDs (CIDs). This also explains why two process and thread IDs can never be the same, because this ID pool is shared for both process as well as thread handles.

The flaw

As we saw in IDA, two lines below the call to ExpLookupHandleTableEntry, if no handle table entry is found for the ID that was queried, the functions returns NULL - which is our way to attack the unKover anti-rootkit. If we remove our rootkits thread IDs from this table, any security solution which relies on calls that leverage the PspCidTable, e.g. PsLookupThreadByThreadId, will not find it, as the function will return NULL instead of the actual thread.

With this, we are directly attacking our specific anti-rootkit implementation. The takeaway here is to find a flaw or an oversight in whatever security product you are facing, either through reverse engineering or code auditing, and abuse it.

Locating the PspCidTable

As I described in my blog post Keylogging in the Windows Kernel with undocumented data structures, when locating gafAsyncKeyState, we can usually just use signature scanning to find these kind of pointers. Our PspReferenceCidTableEntry function seems to be a good fit, as it contains a static reference to our object of interest - which means we can simply scan for the first mov rbp, cs: instruction and extract the displacement to the PspCidTable from the assembly instruction bytes. This signature might of course be different for different versions of ntoskrnl and you might want to either hardcode different target signatures or use something like YASS.

(If you are wondering about the cs segment selector - cs base is usually set to 0 in 64 bit operating systems that do not rely on segmentation anymore, so you can directly extract the displacement/offset from this instruction and treat it like a "regular" mov instruction).

But there is also another way: KeCapturePersistentThreadState is an undocumented Windows kernel function, which a driver can use to retrieve a small memory dump. Inside this dump will be a decrypted KdDebuggerDataBlock, a structure used by crash dump analysis tools and debuggers, which, among others, contains the location of the PspCidTable. For more info, see e.g. UnknownCheats or here - note that it was not always encrypted, hence why older blog posts locate it differently.

New dog, old tricks

How do we remove our handle from this table? According to vx-underground, there are three sources we can consult:

我不会说中文, so here are the first two:

In this UnknownCheats post, ExDestroyHandle is used to simply destroy/remove the handle from the table (see [Tutorial] Remove your systemthread from PspCidTable)
In this blogpost from 2006 in Uninformed the Object property of the HANDLE_TABLE_ENTRY struct is set to NULL instead (funnily enough to evade the Blacklight anti-rootkit back then)

Let's implement the first approach as described in the tutorial.

Removing the Handle Table Entry

First, we scan for ExMapHandleToPointer, from which we scan for ExpLookupHandleTableEntry, which is our function that performs a lookup on the PspCidTable and gets our thread's HANDLE_TABLE_ENTRY for us. Simply using ExMapHandleToPointer is not possible, because this will cause a deadlock. We also need to scan for ExDestroyHandle. I am not going to explain this step by step again here, look for xrefs in IDA, extract a signature and scan.

We can remove our handle from the table, to hide it from the OS, with one very simple function:

NTSTATUS
RemoveEntryFromPspCidTable(
    ULONG id
)
{
    auto cidEntry = g_ExpLookupHandleTableEntry(*g_pPspCidTable, ULongToHandle(id));
    if (cidEntry != NULL)
    {
        g_ExDestroyHandle(*g_pspCidTable, ULongToHandle(id), cidEntry);
        return STATUS_SUCCESS;
    }
}

With the handle gone from the PspCidTable, any function that relies on it, such as most of the Ps* routines from ntoskrnl will not find our thread anymore.

Of course though, this is not a perfect cloak hiding us from vigilant anti-rootkit eyes...

Detection 2: Detecting tampering through finding inconsistencies

A general strategy to detecting tampering is to check for inconsistencies - usually, attackers that tamper with data do only as much as they need to and might overlook someplace else, where that data is still referenced. What I am saying is, there are other methods to list threads from a windows driver than simply using the provided Ps* API. If the thread shows up in one place, but not in the other, we found our offender.

One method, which I implemented in unKover, is walking a processes thread linked list.

Each process is represented in the Windows Kernel as an KPROCESS/EPROCESS object. This object contains a linked list of threads, containing all threads for that process, starting with the ThreadListHead.

(The E* structs contain the corresponding K* struct as the first member, which means you can typecast between them as you like. K* is essentially a subset of E*)

0: kd> dt nt!_EPROCESS
   +0x000 Pcb                : _KPROCESS /* KPROCESS as first member of EPROCESS */
   +0x438 ProcessLock        : _EX_PUSH_LOCK
   +0x440 UniqueProcessId    : Ptr64 Void
   +0x448 ActiveProcessLinks : _LIST_ENTRY
   [ ... ]
   +0x5e0 ThreadListHead     : _LIST_ENTRY /* The thread linked list head */

Also, this can linked list can be referenced from a thread that is a member of that linked list, either from KTHREAD or ETHREAD:

1: kd> dt nt!_KTHREAD
   +0x000 Header            : _DISPATCHER_HEADER
   +0x018 SListFaultAddress : Ptr64 Void
   +0x020 QuantumTarget     : Uint8B
   [ ... ]
   +0x2f8 ThreadListEntry   : _LIST_ENTRY /* The list entry reference */

Since all kernel drivers, and as such rootkits, run under the windows system process with the process ID 4 by default, if we get our current thread from our anti-rootkit driver, we are in the right linked list. We can then walk that list from the ThreadListEntry of our thread onwards, to enumerate all driver threads running under the system process. If we find a thread ID here that can not be found in the PspCidTable, e.g. via PsLookupThreadByThreadId, or points to a corrupted entry, we found our offender:

The code is just as straightforward as removing the handle is. Unfortunately though, the offset for the ThreadListEntry from KTHREAD is hardcoded for now. At least in the two Windows versions I have running as VMs, this offset is stable ¯\_(ツ)_/¯

#define THREAD_LIST_ENTRY_OFFSET 0x2f8
typedef struct _myKTHREAD
{
    char padding[0x2F8];                // 0x0000
    struct _LIST_ENTRY ThreadListEntry; // 0x02F8 
    // [ ... ]
} myKTHREAD, * myPKTHREAD;

NTSTATUS 
UkWalkSystemProcessThreads()
{
    // Get current thread (an arbitrary thread in system process / PID 4 is ok)
    auto currentThread = KeGetCurrentThread();
    auto threadListEntry = (PLIST_ENTRY)((ULONG_PTR)currentThread + THREAD_LIST_ENTRY_OFFSET);
    auto listEntry = threadListEntry;

    // walk all linked list entries
    while ((listEntry = listEntry->Flink) != threadListEntry)
    {
        auto entry = CONTAINING_RECORD(listEntry, myKTHREAD, ThreadListEntry);
        auto threadId = (ULONG)PsGetThreadId((PETHREAD)entry);

        if (threadId != 0)
        {
            PETHREAD pThread = NULL;
            NTSTATUS status = PsLookupThreadByThreadId(ULongToHandle(threadId), &pThread);

            // If PsLookupThreadByThreadID fails, we found our offender
            if (!NT_SUCCESS(status))
            {
                LOG_MSG("Found hidden thread: PID: 0x%llx\n", threadId);
            }
        }
    }

    return STATUS_SUCCESS

I believe that this is what WinDbg's !thread command also does - if it does not find the thread in the PspCidTable, it walks the list to find it (notice the "free handle"). That is just a guess however.

While we could do this cat and mouse dance for some more time and start removing our thread elsewhere, as well as removing our process or even spoofing threads, this is an endless back and forth that I am not going to exercise (read this post on rohitab if you want an idea). If you are aware of the process hiding trick from rootkits, which unlinks the process from the process linked list - you can essentially do the same here with the thread linked list. For a good explanation, see the readme of ZeroMemoryEx's Chaos-Rootkit.

Outlook

In the realm of userland malware, when thread callbacks were starting to detect many different code injection techniques, people figured that going threadless was the way to go. Since unKover is so heavily thread based, we will also go that route and implement a threadless rootkit in the conclusion of this series, part III, which will hopefully not take me half a year to write. That being said, my life right now is very busy, which is why part II took me so long ... anyway

Happy Hacking!

EDIT: Part III is live here

(Anti-)Anti-Rootkit Techniques II: Stomped Drivers & Hidden Threads