4 Ways to Locate a Remote Process’s PEB

The Process Environment Block (PEB) is an opaque data structure in Windows that provides a goldmine of information from a malware development standpoint. While obtaining a pointer to this in your current process is trivial, doing so for a remote process can be trickier.

While in reality you only need 1 or 2 reliable methods, it can be somewhat of a fun exercise to dig deeper and get creative. Here are some methods I tried while working on some custom tooling.

GetThreadContext

This function is used to retrieve the context of a specified thread. The thread in question must be in a suspended state. This allows you to obtain some register values while the thread is “frozen”. It so happens that when you create a new process, the main thread start function is ntdll!RtlUserThreadStart, and it’s 2nd argument in RDX is a pointer to the PEB:

rdx.png

So, create a process in a suspended state, grab RDX from the main thread, do what you need, and resume the thread.

CreateProcessW(L"c:/windows/system32/notepad.exe", NULL, NULL, NULL, FALSE, CREATE_SUSPENDED, NULL, NULL, &si, &pi);
CONTEXT ctx = {0};
ctx.ContextFlags = CONTEXT_INTEGER; 
GetThreadContext(pi.hThread, &ctx);
printf("PEB is @ %llX", ctx.Rdx);

GetThreadContext outputs the register values in a CONTEXT structure that the user provides. Within that structure, ContextFlags is a bitmask field that lets us specify which registers we want to retrieve. These definitions can be found within winnt.h:

// CONTEXT_CONTROL specifies SegSs, Rsp, SegCs, Rip, and EFlags.
// CONTEXT_INTEGER specifies Rax, Rcx, Rdx, Rbx, Rbp, Rsi, Rdi, and R8-R15.
// CONTEXT_SEGMENTS specifies SegDs, SegEs, SegFs, and SegGs.
// CONTEXT_FLOATING_POINT specifies Xmm0-Xmm15.
// CONTEXT_DEBUG_REGISTERS specifies Dr0-Dr3 and Dr6-Dr7.

You might have noticed it’s possible to use CONTEXT_SEGMENTS to receive the value of “SegGs”. While the GS register can indirectly lead to the PEB through the TEB, GetThreadContext returns the segment selectors, not the segment bases required for this.

NtQuerySystemInformation

Note that you can also do this with NtQueryInformationProcess. The difference is as follows:

NtQueryInformationProcess -> Retrieves information about the specified process. Use this if you already know which process you’re targeting.

NtQuerySystemInformation -> Allows you to get extended information about every running process and thread, which could be great for other reasons.

The first argument to NtQuerySystemInformation is an enumeration value from the SYSTEM_INFORMATION_CLASS. This determines what data is returned by the function. SystemExtendedProcessInformation returns for each process, a SYSTEM_PROCESS_INFORMATION structure, and for each of its threads, a SYSTEM_EXTENDED_THREAD_INFORMATION structure.

__kernel_entry NTSTATUS NtQuerySystemInformation(
  [in]            SYSTEM_INFORMATION_CLASS SystemInformationClass,
  [in, out]       PVOID                    SystemInformation,
  [in]            ULONG                    SystemInformationLength,
  [out, optional] PULONG                   ReturnLength
);

One of the values in SYSTEM_EXTENDED_THREAD_INFORMATION is TebBase (a pointer to the Thread Environment Block (TEB))

typedef struct _SYSTEM_EXTENDED_THREAD_INFORMATION
{
    SYSTEM_THREAD_INFORMATION ThreadInfo;
    PVOID StackBase;
    PVOID StackLimit;
    PVOID Win32StartAddress;
    PVOID TebBaseAddress;
    ULONG_PTR Reserved2;
    ULONG_PTR Reserved3;
    ULONG_PTR Reserved4;
} SYSTEM_EXTENDED_THREAD_INFORMATION, *PSYSTEM_EXTENDED_THREAD_INFORMATION;

Since the output contains information on all running processes and threads, some parsing is necessary:

// resolve process address
pNtQuerySystemInformation NtQuerySystemInformation = (pNtQuerySystemInformation)GetProcAddress(GetModuleHandleA("ntdll"), "NtQuerySystemInformation");

// find out how much space to allocate for output buffer
ULONG required_length = 0;
NtQuerySystemInformation(SystemExtendedProcessInformation, NULL, 0, &required_length);

// allocate memory and call function again
void* outbuffer = malloc(required_length);
NtQuerySystemInformation(SystemExtendedProcessInformation, outbuffer, required_length, &required_length);

// parse each process / thread
SYSTEM_EXTENDED_PROCESS_INFORMATION* spi = outbuffer;
do {

	// print each TEB
	for (int i = 0; i<spi->NumberOfThreads; i++){
		printf("pid (%p) teb (%p)\n", spi->UniqueProcessId, spi->Threads[i].TebBaseAddress);
	}

	// move to next process
	spi = (SYSTEM_EXTENDED_PROCESS_INFORMATION*)((char*)spi + spi->NextEntryOffset);

} while (spi->NextEntryOffset > 0);

enumeratethreads.png

Once you pick a target process, simply open a handle to it, and use ReadProcessMemory (or equivalent) at TEB + 0x60, which holds the PEB address.

peb_pointer.png

NtQueryInformationThread

This function also allows you to obtain the TEB address of threads within a target process. A handle to the thread is required for this to work, which means we must first find at least 1 running thread’s ID in the target process, and open a handle to it. Why only 1 thread? A process can only have 1 PEB, so every threads TEB is going to point to that.

We can enumerate threads with Thread32First & Thread32Next, which work on snapshots taken by CreateToolhelp32Snapshot. Once we find a matching OwnerProcessID, we can grab the first corresponding ThreadID and return.

DWORD EnumThread(DWORD pid) {
	THREADENTRY32 te = { 0 };
	te.dwSize = sizeof(THREADENTRY32);
	HANDLE hSnap = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);
	if (!Thread32First(hSnap, &te)) {
		return 0;
	}
	
	do {
		if (te.th32OwnerProcessID == pid) {
			return te.th32ThreadID;
		}

	} while (Thread32Next(hSnap, &te));
	CloseHandle(hSnap);
	return 0;
}

Similar to previously, the first argument to NtQueryInformationThread is an enumeration value from THREADINFOCLASS. This determines what data is populated in the output buffer.

__kernel_entry NTSTATUS NtQueryInformationThread(
  [in]            HANDLE          ThreadHandle,
  [in]            THREADINFOCLASS ThreadInformationClass,
  [in, out]       PVOID           ThreadInformation,
  [in]            ULONG           ThreadInformationLength,
  [out, optional] PULONG          ReturnLength
);

Using ThreadBasicInformation will fill the output buffer with a THREAD_BASIC_INFORMATION structure, which contains the TebBaseAddress at offset 0x8.

typedef struct _THREAD_BASIC_INFORMATION {
	NTSTATUS ExitStatus;
	PVOID TebBaseAddress;
	CLIENT_ID ClientId;
	KAFFINITY AffinityMask;
	KPRIORITY Priority;
	KPRIORITY BasePriority;
} THREAD_BASIC_INFORMATION, *PTHREAD_BASIC_INFORMATION;

Fortunately this is a much simpler function to work with:

// Find thread ID and open handle to it
DWORD tid = EnumThread(10528);
HANDLE hThread = OpenThread(THREAD_ALL_ACCESS , FALSE, tid);

// resolve NtQueryInformationThread address
pNtQueryInformationThread NtQueryInformationThread = (pNtQueryInformationThread)GetProcAddress(GetModuleHandleA("ntdll"), "NtQueryInformationThread");

// call function
void* output_buf = malloc(sizeof(THREAD_BASIC_INFORMATION));
NtQueryInformationThread(hThread, ThreadBasicInformation, output_buf, sizeof(THREAD_BASIC_INFORMATION), NULL);

// get TEB
THREAD_BASIC_INFORMATION* tbi = output_buf;
printf("teb : %p", tbi->TebBaseAddress);

To reach the PEB from here, again, simply open a handle to the process and use ReadProcessMemory (or equivalent) at TEB + 0x60.

Memory scanning

This is not the “cleanest” method, but I had the most fun working on it. While the combination of ASLR & 64bit addresses make brute forcing unrealistic, the VirtualQueryEx API gives us much needed help. Essentially, for a given process, we’re able to query all of its committed memory pages and their region size, State (commit / free / reserve), Protect (R / W / X), Type (mapped / private / image), etc.

An example below shows this API in use, and we can see one of the outputs is the memory region allocated for the PEB & TEB. This significantly reduces the number of “guesses” from a scanning perspective.

peb.png

So we now have the ability to shortlist a few candidate memory regions that could be the PEB (any commited region (State:0x1000) that is RW (Protect:0x4)). If we read each of these memory regions and find some reliable and unique identifier present in all PEBs, we would be able to tell with relative certainty when we’re reading a PEB. While the PEB has some elements that seemed promising, most of them required multiple reads which I was trying to avoid.

Thankfully, the TEB has such an identifier. As shown below, the NT_TIB structure has a Self member at offset 0x30, which is basically a pointer to itself. So, we can read 8 bytes from this offset, and if it contains the address that we’re reading in the first place, this is the TEB.

teb1.png

teb2.png

From there, we can read TEB + 0x60 to get the PEB, as shown below:

peb_pointer.png

HANDLE hProc = OpenProcess(PROCESS_QUERY_LIMITED_INFORMATION | PROCESS_VM_READ , FALSE, 10528);
MEMORY_BASIC_INFORMATION mbi = {0};
void* starting_address = 0x0;
void* result = 0x0;
SIZE_T out = 0;

while (TRUE){
	VirtualQueryEx(hProc, starting_address, &mbi, sizeof(MEMORY_BASIC_INFORMATION));
	printf("BaseAddress: %p, RegionSize: %llx, State: %lx, Protect: %lx\n", mbi.BaseAddress, mbi.RegionSize, mbi.State, mbi.Protect);
	starting_address = (char*)starting_address + mbi.RegionSize;

	// check for PAGE_READWRITE
	if (mbi.Protect != 0x4){
		continue;
	} 

	// check for MEM_COMMIT
	if (mbi.State != 0x1000){
		continue;
	}

	// read process memory to check if its the TEB
	ReadProcessMemory(hProc, (char*)mbi.BaseAddress + 0x30, &result, 0x8, &out);

	printf("read: %p - output: %p\n", (char*)mbi.BaseAddress + 0x30, result);

	if (result == mbi.BaseAddress){
		ReadProcessMemory(hProc, (char*)mbi.BaseAddress + 0x60, &result, 0x8, &out);
		printf("peb : %p\n", result);
		break;
	} 
}

And we can see this works perfectly:

scanresult.png

From testing against multiple processes, this technique typically returns the PEB after anywhere from 1-5 ReadProcessMemory calls, as the PEB & TEB seem to be allocated at relatively low addresses.

References