Preface

As part of the course we were instructed to analyze a custom malware sample developed for us, below is a full analysis of that sample plus a an automated script to extract the final payload of that sample.

Hi there,

During an ongoing investigation, one of our IR team members managed to locate an unknown sample on an infected machine belonging to one of our clients. We cannot pass that sample onto you currently as we are still analyzing it to determine what data was exfilatrated. However, one of our backend analysts developed a YARA rule based on the malware packer, and we were able to locate a similar binary that seemed to be an earlier version of the sample we’re dealing with. Would you be able to take a look at it? We’re all hands on deck here, dealing with this situation, and so we are unable to take a look at it ourselves. We’re not too sure how much the binary has changed, though developing some automation tools might be a good idea, in case the threat actors behind it start utilizing something like Cutwail to push their samples. I have uploaded the sample alongside this email.

Thanks, and Good Luck!


Basic Static & Dynamic Anaylsis

Sample information:

SHA-256: [Redacted]

Intezer: [Redacted]

Any.run: [Redacted]

VirusTotal: [Redacted]

We begin by running the sample in a sandboxed environment like any.run. We can immediately see that this process launches itself and svchost as a sub process. This can also be confirmed within Intezer, and in Intezer one can examine the strings contained within each sub process. We can assume there is process injection happening.

image

image1

Regarding any connection made to outer servers, we can see that pastebin.com is being contacted.

image2

There is a suspicious looking resource within the resource section:

624x47

Before the sandbox quit on Any.run – one can see a strange looking MessageBox string.

500x187

Finally, the strings found in all first 3 loaded process appear to encrypt, and this can be easily confirmed within Intezer:

239x158 261x160

Summary

We’re going to be looking for process injection, I suspect that the resource located within the resource section would be mapped into memory, unpacked or decrypted and then injected into the second sub process. There are 4 sub process in total so our final payload would be located on the fourth sub process. It is assumed that the payload would connect to pastebin and display a message box. There is definitely string encryption going on, so we’ll have to deal with that as well.


Static and Dynamic Anaylsis

This binary is compiled with Visual Studio C++, the implications of that mean that the actual main function is located somewhere within the start function and I’ve located it by recursively traversing xrefs from one of the imported functions. The main function is located at 0x00401400

233x350

When viewing the code of the main function one can see gibberish strings being pushed before function calls:

269x272

The function that is called after each push is the same, I instantly assume there is some string encryption going on. The extensive use of LoadLibraryA and GetProcAddress also makes me assume these strings are API strings that are resolved using LoadLibraryA and GetProcAddress.

At loc_401550 and loc_401570 one can located something that resembles a RC4 KSA routine, it is easily recognized by the two loop procedures iterating 256 times.

285x456

These are just quick assumptions I’ve made by looking at the binary, the rest of it is filled with obfuscated code and otherwise an extensive use of registers and dynamic resolving so we’ll have to resort to dynamic analysis. I’ve disabled ASLR for this binary with CFF explorer so it would be easier to debug it. I’ve set relevant breakpoints within the debugger. So all the resource related functions to locate any resource loading, CreateProcessInternalW for process initialization and VirtualProtect, VirtualAllocEx and WriteProcessMemory for injection. In addition I’ve set breakpoints on OutputDebugString and IsDebuggerPresent to catch easy implemented anti analysis.

624x99

First, as assumed the sub_401300 seems to be resolving strings:

624x280

The function func_StringDecrypt seems to be decrypting strings using some kind of custom base64 decoding, The reason I suspect is – is because the use of the full alphanumeric string that is being passed within this function. This function would be studied extensively later on in this paper and we’ll attempt to generate an automation script for it to decode all strings within this sample.

624x123

Then the sample does something interesting:

490x552

It allocates space for the resource but it skips the first 0x1C bytes within the resource.

Hmm, perhaps this is the decryption key for the RC4 algorithm we saw before?

590x29

Then a call is performed to sub_402DB0 which I renamed to func_CopyResourceWithoutKeyToAllocMem because that is exactly what it does, it just copies the resource without its key, so starting at offset 0x1C.

505x152

because that is exactly what it does, it just copies the resource without its key, so starting at offset 0x1C. This function would seem very confusing at first, but if we set a hardware breakpoint on the allocated memory we’ll break within this function we can confirm this. Because at 00403002 it performs this copying procedure and then it just exists the function:

589x170

565x222

Then sub_4025B0 is executed, it receives a stack address and we simply skip this function while looking at that address on dump we can see that it simply zeroes it out.

593x167

So thankfully I’m saving a lot of time by skipping these rabbit holes.

Then the assumed RC4 algorithm executes, what I want to do is locate the address to where the sample mapped the resource to memory as I suspect that it’s going to be decrypted.

590x166

So I’m going to skip the RC4 decryption routine and jump straight to address 0x40161D

594x165

Yay! I decide to dump the new PE file out to disk but we’re not done yet. We have to see how it’s going to be injected to memory.

We jump into sub_401000:

445x131

First the sample resolves the ImageBase of the current executing sample and then it attempts to confirm and locate the address of the NT_Headers of the decrypted payload.

Then the current process main.bin is created in suspended mode

624x49

Seems like there is going to be process injection involved.

566x160

First memory is allocated within the new process. The sample attempts to execute VirtualAllocEx on the new process. Attempting to allocate memory at the payloads PE default ImageBase and with the virtual Size of the image. This won’t work though under our modified execution since we disabled ASLR. Why? Because both processes execute at the same ImageBase, the new process already has memory allocated within that region, so we must enable ASLR and start again. So, lets do that just that and we’ll see that it would work.

Then the sample will copy the payloads section to their correct virtual addresses:

436x453

Then something very interesting happens: The ImageBase of the payload is written to the PEB of the new process, specifically at offset 0x8.

179x72

458x235

514x101

We can assume that the payload would use this to resolve its own APIs.

Finally, SetThreadContext and ResumeThread are called, and the injected payload executes.

395x91


Second Payload Analysis

Sample information:

SHA-256: [Redacted]

Intezer: [Redacted]

Any.run: [Redacted]

VirusTotal: [Redacted]

Advanced Static and Dynamic Analysis

I’m going to assume process injection again and go straight into analysis, I’ll disable ASLR for this execution as I would be executing the second stage payload independently and so the previous problem, we encountered due disabling ASLR shouldn’t bother us.

It is observed that this second stage contains API hashing as the extensive use of CRC32 constants indicates that:

320x171

In addition, the function that was seen in the first stage loader that simply copies the payload into memory from the resource section can be seen:

383x406

I’ve set relevant breakpoints within the debugger. So, all the resource related functions to locate any resource loading, CreateProcessInternalW for process initialization and VirtualProtect, VirtualAllocEx and WriteProcessMemory for injection. In addition, I’ve set breakpoints on OutputDebugString and IsDebuggerPresent to catch easy implemented anti analysis.

The malware takes the name of the file that its currently executing from and hashes it using a CRC32 hashing algorithm, it can be identified by the CRC32 hashing constants found within this function. The sample then compares the result against a constant value. I’m assuming its using this method as an anti-analysis method to check if the samples name is “sample” or “malware” its really hard to tell. If this check matches the sample quits execution.

624x73

Then an API resolving routine called which utilizes the CRC32 hashing algorithm we seen earlier:

298x79

The HashID(EDX) and the ID(ECX) which identifies the library to which to resolve the API from

459x58 624x120

I will not explain how the API Hashing and resolving in depth. Basically the export table of each loaded DLL is hashed to check which function name matches the hash passed into the function. If anyone wants to read about how that might be implemented they can read about it on my github right here.

The sample loads IsDebuggerPresent to check if the malware is executing under a debugger, this can be easily circumvented. The next anti analysis method located within sub_401000 checks if any blacklisted process is running – the malware hashes each running process and then checks if the hash matches to a pre-computed hash array. If they match the malware quits execution.

378x109

Then the sample executes sub_401D50 which resolves a lot of APIs that might indicate process injection:

130x291

  1. Start svchost with suspended flags
  2. Copy current PE into allocated memory
  3. Allocate Memory in svchost
  4. Rebuild current PE payload relocation table
  5. Write payload into svchost
  6. Use CreateRemoteThread to execute function sub_401DC0

To continue execution, we simply must attach to a second debugger instance and set a breakpoint on the functions location and after running CreateRemoteThread we should hit it.

611x395

After setting up the breakpoint lets resume execution of the svchost instance and then skip CreateRemoteThread and see if anything happens. 624x103

Success! We can continue analyzing this function within our documented IDA instance as this function is located in the second stage payload.

It’s important to note that interestingly enough – this function is called previously at loc_402085 when the func_CRC32Hash returns a hashed value for the current executing process name which matches a hardcoded hash.

624x229 I quickly assumed that this is a method to detect if the binary is running from svchost, because as we seen in the malware’s execution process tree, it executes svchost twice. I later confirmed this check by renaming the sample to svchost.exe and it worked:

624x80

Let’s continue with the analysis.

This function first resolves a few Internet WINAPIs:

624x203

Then a strange string located at offset 0x0413C7C is passed into a code block and decrypted by a very simple algorithm:

242x203

So, our encrypted byte sequence:

608x18

Returns: 439x34

Which resolves to [Redacted] 499x191

Hmm.. This might indicate stenography is involved. Let’s continue with the analysis

This pastebin link is passed into sub_401290 this function returns the image link contained within the pastebin and saves it within the memory.

This resolved link is passed into sub_4013A0, first the function reads the contents of the image file linked passed into it. Using sub_401290 which I renamed to func_ReadWebContents

340x335

Then the function decodes a string located with qword_413CA4 339x16

Which resolves to output.jpg, then it computes a path to the Temp directory and converts the it to a WCHAR type string, then specially picked bytes are extracted from the data section to compute this path:

422x21

Then CreateDirectory is invoked to create the cruloader folder, after which the output.jpg file string is append to this path to create the following string:

502x24

CreateFileW is then invoked to create this file:

367x188

Then the PNG file extracted from the previous website is copied into this file using WriteFile 624x325

357x220 Afterwards the sample attempts to locate a string “redaolurc” (which is cruloader reversed) within the image data download

624x193

After locating the string within the image data, this offset is used to access encrypted data. The data is xored with xmmword(which is 128-bit) 40 times. The xor key is 0x61(‘a’). One could also notice that there are a lot of ‘a’ characters within the encoded payload. These ‘a’ characters are actually zeroes within the binary, because 0 ^ 0x61 = 0x61 so this payload isn’t obfuscated with high class obfuscation as on could infer this pretty quickly.

540x14

400x449

And the result is a valid PE file:

598x170

After dumping this PE I’ve observed it within IDA and it would appear as if this is the final payload!

624x162

This PE Is then mapped, relocated and then fixed with VirtualProtect and finally injected into svchost.exe again within the function sub_401750 this executing the final payload.

474x201

And that’s pretty much it!


Automation

Alright, Let’s begin attempting to automate the process of extracting all payloads and dumping them on disk.

We begin with the resource section; this one is pretty easy.

624x37

First we begin looking for a string at offset 0x60 + 0xc from the begging of the resource section, and load the string which is 16 bytes in length.

483x104

Then, we load the rest of the payload into another variable

624x272

Then we use the ARC4 python module to decrypt this data and dump it on disk

457x197

Now as we posses the second payload, we must locate the pastebin URL inside the PE file.

624x147

We know our URL is located two XMM_WORDS (32 bytes) in size after the offset of the string “cruloader” so let’s set this up:

276x66

624x247

We calculate the offset by locating the “cruloader” string, adding 3 bytes to skip the null bytes and then jumping after both irrelevant XMM_WORDS. We extract our data and we set the XOR key to 0xC5

Then for each byte extracted we perform a four ROL and then a xor to match the decryption algorithm

301x231

526x89

We then use URLLIB to extract the pastebin URL and then we use that same lib to extract the contents of the payload image

549x143

Finally, to locate the payload within the PNG payload file we locate the reverse “cruloader” string, extract the payload and then xor it with 0x61.

624x153

And that’s it! Easy as that!

I really enjoyed this challenge and I’m looking forward to continuing the course 😊 Hope you enjoyed reading this!