Avoiding direct syscall instructions by using trampolines
Recently, in order to prepare for an internal penetration testing engagement, I wanted to automate my payload generation. In order to do so, I created a packer for executables and shellcodes called MATROJKA. Since I’ve been a fan of Nim for malware development for some time, the choice to write my packer in Nim was an easy one. Nim has a beautiful syntax, transpiles to C, has great C and C++ (yes, real C++) integrations and is overall very fun to write in. (EDIT in 2024: just use C/C++. Nim is great, but it does have its caveats)
There are a few publicly available packers already that are based on Nim - most notably @chvancooten’s NimPackt-v1 packer and @icyguider’s Nimcrypt2. NimPackt-v1 is a shellcode- and dotnet-assembly packer that is actively used by threat actors in the wild. It’s second version, NimPackt-NG improves upon v1, but is, as of now, still private. Nimcrypt2 is a packer for shellcode, dotnet-assemblies and additionally supports regular portable executables.
Both NimPackt-v1 and Nimcrypt2 use SysWhispers (as implemented by NimlineWhispers2) to invoke direct syscalls in order to avoid EDR-hooks. If you don’t know what syscalls in Windows are, and what role they play in malware development and -detection, I can highly recommend this article by @Cn33liz.
In short, syscalls are undocumented functions in NTDLL.DLL that are the closest to the kernel that we can get. In the end, any windows-API call calls one or more of these functions. If you for example call WriteProcessMemory, the syscall that is executed under the hood is NtWriteVirtualMemory. As such, AVs and EDRs like to hook some of these syscalls that are interesting to us malware developers, in order to inspect what is executed by a program. There are several methods to “unhook” these syscalls and make sure that our calls go to the kernel without being monitored - one example would be to reload a copy of NTDLL.DLL from disk, that is not yet hooked by the AV/EDR. Another is to find out the syscall numbers (which are different, depending on the Windows version) in some way and insert them into assembly stubs which we can be sure are not hooked, e.g. like this:
mov r10, rcx
mov eax, ["SYSCALL_NUMBER_TO_INSERT"]
// here would be where a hook would be inserted, that jumps to code that belongs to the AV/EDR
syscall   
ret
This is what SysWhispers does - it generates assembly code that is then filled with the syscall numbers depending on the version of Windows that the code is running on.
Detecting direct syscalls made through SysWhispers
The problem with using SysWhispers1 and SysWhispers2 (not 3 though) is that by now it is heavily signatured by most AVs and EDRs and most binaries generated with tools that leverage SysWhispers are easily flagged as malicious (I assume this is why Nimcrypt2 additionally supports using GetSyscallStub instead of NimlineWhispers2).
One simple way to detect the use of syscalls generated by SysWhispers is to check for direct syscall instructions. Usually, each syscall goes through NTDLL.DLL, which acts as Windows’ interface to kernel mode, so direct syscall instructions should (in theory) never occur and are highly suspicious.

As such, Defender instantly removes a binary that includes NimlineWhispers2 (if no further evasion is applied) upon downloading it onto a Windows host:

That meant that I had to look for a different way to invoke direct syscalls for my packer. I did not want to use GetSyscallStub, since it is used by Nimcrypt2 and I figured that using a different technique would make my packer’s signatures more unique and thus less detectable.
Retrieve syscalls with HellsGate
Another technique that is widely used to retrieve syscall numbers, in order to invoke unhooked syscalls is HellsGate by @smelly__vx and @am0nsec. You basically traverse the PEB structure, until you reach the module list, get NTDLL.DLL’s base address and then traverse its Export Address Table until you find the desired function. API Hashing is used to find the function name. Then, all that is left is to extract the syscall number from that function and you have everything you need to call that syscall directly, by using a syscall stub like above. You can read the paper at the Vx-Underground Github, which explains it more in-depth. Luckily, zimawhit3 already implemented HellsGate in Nim.
However, with HellsGate the same problem arises, since the assembly stubs that are populated with the retrieved syscall numbers also use the direct syscall instruction to invoke the syscall.
Make it bounce!
To make my syscalls seem more legit, I adjusted HellsGate, by simply replacing all syscall instructions with a trampoline jump - in this case a JMP instruction that jumps to the location of a syscall instruction located in NTDLL.DLL. This makes the syscalls seem more legit, as they originate from NTDLL.DLL and also avoids leaving any syscall instructions in the resulting binary. This technique is nothing new though, and was described e.g. in a blog post by @passthehashbrowns. In fact, I later found out that this is what SysWhispers3 and NimlineWhispers3 ended up using as a remediation (well…). However, I still see it as a way to improve HellsGate. EDIT: I found out that this modification to HellsGate has already been done and is known as RecycledGate
Thanks to Nim’s ability to write inline assembly, implementing this was a breeze:
First, I parsed NTDLL.DLL byte by byte until a syscall instruction is found. In binary representation, the syscall instruction and its prologue are 0x75 0x03 0x0F 0x05, as can be seen when inspecting the DLL in x64dbg:

Starting from the NTDLL.DLL module base address it doesn’t take long for one to find such an address. We just take the first one we find and save it to the global variable syscallJumpAddress:
proc getSyscallInstructionAddress(ntdllModuleBaseAddr: PVOID): ByteAddress =
    ## Get The address of a syscall instruction from ntdll to make sure all syscalls go through ntdll
    echo "[*] Resolving syscall..."
    echo "[*] NTDDL Base: " & $cast[int](ntdllModuleBaseAddr).toHex
    var offset: UINT = 0
    while true:
        var currByte = cast[PDWORD](ntdllModuleBaseAddr + offset)[]
        if "050F0375" in $currByte.toHex:
            echo "[*] Found syscall in ntdll addr " & $cast[ByteAddress](ntdllModuleBaseAddr + offset).toHex & ": " & $currByte.toHex
            return cast[ByteAddress](ntdllModuleBaseAddr + offset) + sizeof(WORD)
        offset = offset + 1
Now all that is left is to adjust the assembly code for each syscall and add a JMP to our address from above:
proc NtProtectVirtualMemory(ProcessHandle: Handle, BaseAddress: PVOID, NumberOfBytesToProtect: PULONG, NewAccessProtection: ULONG, OldAccessProtection: PULONG): NTSTATUS {.asmNoStackFrame.} =
    asm """
        mov r10, rcx
        mov eax, `ntProtectSyscall`
        # syscall                     # this is what we want to avoid
        mov r11, `syscallJumpAddress` # move syscall address into r11
        jmp r11			      # jump to syscall address
        ret
    """
When compiling the binary, we do not have any direct syscalls left anymore, which resulted in a much smaller detection rate for my packed payloads. Neat!

Grabbing the first one available worked fine for me. One idea for improvement that is left is to control what syscall instruction we jump to and use this to e.g. fool an analyst that only observes the location of the syscall. EDIT 02/08/23: I just pushed changes that fix this. Now the syscall instruction that is corresponding to the actual syscall is taken instead.
The code for this technique is hosted at https://github.com/eversinc33/BouncyGate. Unfortunately, as opposed to SysWhispers/NimlineWhispers, you will have to add the function definitions for each Syscall that you need yourself (but you can still use those that NimlineWhispers generates).
Happy Hacking!