Practical Malware Analysis
Lab 5 — IDA Pro
Solutions for Lab 5 within Practical Malware Analysis.
IDA Pro
IDA Pro, an Interactive Disassembler, is a disassembler for computer programs that generates assembly language source code from an executable or a program. IDA Pro enables the disassembly of an entire program and performs tasks such as function discovery, stack analysis, local variable identification, in order to understand (or change) its functionality.
This lab utilises IDA to explore a malicious .dll and demonstrates various techniques for navigation and analysis. Any useful shortcuts will be identified.
Practical Malware Analysis
Download Labs
Labs skip from 3 to 5, as there is no Lab 4-x in the book, this chapter covers x86 disassembly, covered here (coming soon)
________________________________________________________________
Lab 5–1
This lab analyses the malware found in the file Lab05–01.dll, and is a longer lab designed to demonstrate features of IDA Pro and give hands-on experience.
1. What is the address of DllMain?
2. Use the Imports window to browse to gethostbyname. Where is the import located?
3. How many functions call gethostbyname?
4. Focusing on the call to gethostbyname located at 0x10001757, can you figure out which DNS request will be made?
5. How many local variables has IDA Pro recognized for the subroutine at 0x10001656?
6. How many parameters has IDA Pro recognized for the subroutine at 0x10001656?
7. Use the Strings window to locate the string \cmd.exe /c in the disassembly. Where is it located?
8. What is happening in the area of code that references \cmd.exe /c?
9. In the same area, at 0x100101C8, it looks like dword_1008E5C4 is a global variable that helps decide which path to take. How does the malware set dword_1008E5C4? (Hint: Use dword_1008E5C4’s cross-references.)
10. A few hundred lines into the subroutine at 0x1000FF58, a series of comparisons use memcmp to compare strings. What happens if the string comparison to robotwork is successful (when memcmp returns 0)?
11. What does the export PSLIST do?
12. Use the graph mode to graph the cross-references from sub_10004E79. Which API functions could be called by entering this function? Based on the API functions alone, what could you rename this function?
13. How many Windows API functions does DllMain call directly? How many at a depth of 2?
14. At 0x10001358, there is a call to Sleep (an API function that takes one parameter containing the number of milliseconds to sleep). Looking backward through the code, how long will the program sleep if this code executes?
15. At 0x10001701 is a call to socket. What are the three parameters?
16. Using the MSDN page for socket and the named symbolic constants functionality in IDA Pro, can you make the parameters more meaningful? What are the parameters after you apply changes?
17. Search for usage of the in instruction (opcode 0xED). This instruction is used with a magic string VMXh to perform VMware detection. Is that in use in this malware? Using the cross-references to the function that executes the in instruction, is there further evidence of VMware detection?
18. Jump your cursor to 0x1001D988. What do you find?
19. If you have the IDA Python plug-in installed (included with the commercial version of IDA Pro), run Lab05–01.py, an IDA Pro Python script provided with the malware for this book. (Make sure the cursor is at 0x1001D988.) What happens after you run the script?
20. With the cursor in the same location, how do you turn this data into a single ASCII string?
21. Open the script with a text editor. How does it work?
0. Before we get started.
To help with navigation of IDA, some useful settings and windows should be configured. First enable Line Prefixes, set Opcode bytes to 6, and enable Auto Comments. This will provide some clarity to the assembly. The windows will likely be present by default, but can be switched to with the shortcuts.
1. What is the address of DllMain?
The address off DllMain is 0x1000D02E
. This can be found within the graph mode, or within the Functions window (figure 2).
2. Where is the import gethostbyname located?
gethostbyname
is located at 0x100163CC
within .idata
(figure 3).This is found through the Imports window and double-clicking the function. Here we can also see gethostbyname
also takes a single parameter — something like a string.
3. How many functions call gethostbyname?
Searching the xrefs (ctrl+x) on gethostbyname
shows it is referenced 18 times, 9 of which are type (p) for the near calll, and the other 9 are read (r) (figure 4). Of these, there are 5 unique calling functions.
4. For gethostbyname at 0x10001757, which DNS request is made?
Pressing G and navigating to 0x10001757
, we see a call to thegethostbyname
function, which we know takes one parameter; in this case, whatever is in eax
— the contents of off_10019040
(figure 5)
The contents of off_10019040
points to a variable aThisIsRdoPicsP
which contains the string [This is RDO]pics.practicalmalwareanalysis.com
. This is moved into eax
(figure 6).
Importantly, 0Dh
is added to eax
, which moves the pointer along the current contents. 0Dh
can be converted in IDA by pressing H, to 13. This means the eax
now points to 13 characters inside of its current contents, skipping past the prefix [This is RDO]
and resulting in the DNS request being made for pics.practicalmalwareanalysis.com
.
5 & 6. How many parameters and local variables are recognized for the subroutine at 0x10001656?
There are a total of 24 variables and parameters for sub_10001656
(figure 7).
Local variables correspond to negative offsets, where there are 23. Many are generated by IDA and prepended with var_
however there are some which have been resolved, such as name
or commandline
. As we work through, we generally rename any of the important ones.
Parameters have positive offsets. Here there is one, currently lpThreadParameter
. This may also be seen as arg_0
if not automagically resolved.
7. Where is the string \cmd.exe /c located in the disassembly?
Press Alt+T
to perform a string search for \cmd.exe /c
, which is stored as aCmdExeC
, found within sub_1000FF58
at offset 0x100101D0
(figure 8).
8. What happens around the referencing of \cmd.exe /c?
The command cmd.exe /c
opens a new instance of cmd.exe and the /c
parameter instructs it to execute the command then terminate. This suggests that there is likely a construct of something to execute somewhere nearby.
Taking a cursory look around sub_1000FF58
, we see several indications of what might be happening. Look for push offset X
for quick wins.
Towards the top of the function, we see an address that is quite telling of what is happening. The offset aHiMasterDDDDDD
called at 0x1001009D
contains a long message which includes several strings relating to system time information (actually initialised just before), but more notably reference to a Remote Shell (figure 9).
Further on throughout the function, there are more interesting offset addresses with strings that may provide an indication of activity.
Some of which are likely part of any commandline activity, whereas others may be additional modules. Some of the notable ones might beaInject
, aIexploreExe
, and aCreateProcessG
, which could be indicative of process injection into iexplore.exe
.
9. At 0x100101C8, dword_1008E5C4 indicates which path to take. How does the malware set dword_1008E5C4?
The comparison of dword_1008E5C4
and ebx
will determine whether \cmd.exe /c
or \command.exe /c
is pushed; likey based upon the Operating System version to utilise the correct command prompt (figure 11).
Following the xrefs of dword_1008E5C4
, we see it written (type w) in sub_10001656
, with the value of eax
. There is a preceding call to sub_10003695
, where the function takes a look at the system’s Version Information (using API call GetVersionExA
) (figure 12).
There is a comparison between the VersionInformation.dwPlatformId
and 2
, so looking at the Windows Platform IDs we see that it is looking to see if ‘The operating system is Windows NT or later.’ If it is, then \cmd.exe /c
is pushed. If not, then it is \command.exe /c
.
10. What happens if the string comparison to robotwork is successful?
The robotwork
string comparison is completed using the function memcmp
, which returns 0 if the two strings are identical. The JNZ
branch jumps if the result Is Not Zero. This means, if the robotwork
comparison is successful, returning 0, then the jump does not execute (the red path). If the memcmp
was unsuccessful, then some other non-zero value would be returned and the jump (green path) would be followed (figure 13).
Not jumping, (and following the red path), leads to a new function sub_100052A2
which includes registry keys SOFTWARE\Microsoft\Windows\CurrentVersion
WorkTime
and WorkTimes
. The function is looking for values within the WorkTime
and WorkTimes
( RegQueryValueExA
) and if so, are displayed as part of the relevant aRobotWorktime
offset addresses (via %d
) (figure 14).
The start of the function takes in a parameter for SOCKET
as s
, which is then passed through to a new function (sub_100038EE
) along with the registry values (ebp
) (figure 15).
Therefore, if the string comparison for robotwork
is successful, the registry keys SOFTWARE\Microsoft\Windows\CurrentVersion
WorkTime
and WorkTimes
are queried and the values passed through (likely) the remote shell connection.
11. What does the export PSLIST do?
Open the exports list and find the exported function PSLIST. (figure 16).
Navigate here and see there are three subroutines. One of which queries OS version information (similar as seen in Q9, but this time also sees if dwMajorVersion
is 5
for more specific OS footprinting (dwMajorVersions)), and depending on the outcome, will call either sub_10006518
or sub_1000664C
(figure 17).
Both sub_10006518
and sub_1000664C
utilise CreateToolhelp32Snapshot
to take a snapshot of the specified processes and associated information, and then execute appropriate commands to query the running processes IDs, names, and the number of threads. sub_1000664C
also includes the SOCKET
(s
) to send the output out to (figure 18).
12. Which API functions could be called by entering sub_10004E79?
A useful way to quickly see what API functions are called by a certain subroutine is through the Proximity Brower view, this transforms the standard Graph or Text views into a much more condensed graph highlighting which API functions or subroutines are called (figure 19)
The functions called from sub_10004E79
(figure 20) indicate that the functionality is to identify the language used on the system, and then pass that information through the SOCKET (as we’ve seen sub_100038EE
before). It might make sense to rename sub_10004E79
to something like getSystemLanguage. While we’re at it, we might aswell rename sub_100038EE
to something like sendSocket.
13. How many Windows API functions does DllMain call directly, and how many at a depth of 2?
Another way to view the API functions called from somewhere, is through View -> Graphs -> User XRef Chart. Set start and end addresses to DllMain
and the Recursion depth to 1 to see four API functions called (figure 21). At a depth of 2, there are around 32, with some duplicates.
Some of the more notable API calls which may provide indication of functionality are: sleep
winexec
gethostbyname
inet_nota
CreateThread
WSAStartup
inet_addr
recv
send
socket
connect
LoadLibraryA
14. How long will the Sleep API function at 0x10001358 execute for?
At first glance, one might think that the value passed to the sleep
is 3E8h
(1000
), equating to 1 second, however it is a imul
call which means the value at eax
is getting multiplied by 1000. Looking up, we see that aThisIsCti30
at the offset address is moved into eax
and then the pointer is moved 13 along (similar to what's seen in Q2) (figure 22).
This means that the value of eax
when it is pushed is 30. atoi
converts the string to an integer, and it is multiplied by 1000. Therefore, the Sleep API function sleeps for 30 seconds.
15 & 16. What are the three parameters for the call to socket at 0x10001701?
The three values pushed to the stack, labeled as protocol
, type
, and af
, and are 6, 1, 2 respectively, are the three parameters used for the call to socket (figure 23).
These depict what type of socket is created. Using Socket Documentation we can determine that in this case, it is TCP IPV4. At this point, we might aswell rename those operands (figure 24).
17. Is there VM detection?
The in
instruction (opcode 0xED
) is used with the string VMXh to determine whether the malware is running inside VMware. 0xED
can be searched (alt+B) and look for the in
instruction (figure 25).
From here, we can navigate into the function and see what is going on within sub_10006196
.
Directly around the in
instruction, we see evidence of the string VMXh (converted from original hex value) (figure 26), which is potentially indicative of VM detection. If we look at the other xrefs of sub_10006196
we see three occurrences, each of which contains aFoundVirtualMa
, indicating the install is canceling if a Virtual Machine is found (figure 27).
18, 19, & 20. What is at 0x1001D988?
The data starting at 0x1001D988
appears illegible, however, we can convert this to ASCII (by pressing A), albeit still unreadable (Figure 28).
We have been provided a python script with the lab lab05–01.py
which is to be used as an IDA plugin for a simple script. For 0x50
bytes from the current cursor position, the script performs an XOR
of 0x55
, and prints out the resulting bytes, likely to decode the text (figure 29).
We are unable to do this within the free version of IDA, however we can loosely do it manually ourselves by taking the bytes from 0x1001D988
and doing XOR 0x55
.
Evidently, the conversion to ASCII and manual decoding has messed up something with the capitalisation, but we can see some plaintext and determine the completed message (figure 30)
rxdoor is this backdoor, string decoded for Practical Malware Analysis Lab :)1234