Introduction to Malware Reverse Engineering


Introduction:

Malware has been a vast topic within the world of security and it often takes different forms across different environments. It is often the task of defensive entities to research and analyze each individual portion of malware that is released into the wild in order to gain understanding of the contents of such malware agents. This article goes over the relative tools, techniques, motivations and mitigations that are possible for various malware research.

Tools:

Wireshark

– A network monitoring tool that allows for researchers to detect and monitor various network activity occurring on an infected machine in order to look at Command and Control (C&C) servers or other endpoints that malware may communicate with.

Volatility

– Provides analysis of memory from memory dumps or other sources. Note that VMs are considered contingent sources of memory and therefore are easily analyzed. This is necessary for a true analysis of a binaries program image with reference to malware, which may hide from internal system tools.

IdaPro

– The gold standard for static analysis of any possible binary. IDA also provides de-compilation possibilities to revert binaries back to an intermediary C structure. IDA also has an IDA-Python interface, which allows for scripting of various portions of the disassembly as well as IDA dynamic execution.

Binary Ninja

– A static disassembly tool, which was developed by Vector35, which provides an easy to manage view and analysis of binaries. Through various interfaces Binary Ninja also allows for scriptable plugins to be integrated into a CLI like interface for other means of analysis.

Intel Pin

– A tool that allows for instrumentation of binaries. It can be utilized with automated scripts or other dynamic analysis tools to show the total number of instructions executed by a program to determine code flow on positive test cases. That is to say, that through automated analysis on various systems with differing services; one could reasonably assume which systems passed the malware’s test conditions by looking to see the max number of instructions executed.

Footprint

– A fingerprinting like software that allows for analysis of the file system through a series of hashes and entropy analysis techniques. A “snapshot” of the file system is created and compared to a new state during a separate execution. This allows for a researcher to ascertain what key files may have been changed. For instance, certain registry information could have been changed to force a program to start at startup and so on. These types of things could easily be located by using this application.

Reversing Techniques:

There are a plethora of techniques and tools that can be utilized in order to reverse engineer and analyze malware. This portion focuses primarily on static or dynamic analysis with a select amount of tools.

Static Analysis:

An analysis of binaries in which assembly is decompiled or disassembled in a way that is readable for the researcher. A thorough analysis is then required to examine all pass cases, flows of execution, and ways that the malware operates. This methodology is generally very rigorous and time consuming with little availability of automation due to high failure rates and false positives.

Dynamic Analysis:

A execution of malware on test systems that allows for various tools to analyze results of the execution. These tools could be utilized to examine changes to the file system, network activity or execution paths taken by the executable. Dynamic analysis is a form of analysis that allows for automation and expansion into the field of research. Dynamic analysis has also proved useful to improve time efficiency and to help assist with which portions of code should be analyzed. As analyzed with Darpa’s Cyber Grand Challenge (CGC), the feasibility of automated such automated systems is impressive. However, the top winner of CGC could not hold a candle to the manpower of various teams during the Defcon CTF. As such it can be said that the overall nature of dynamic execution of binaries is heading in the right direction, however it is currently infeasible for an automated analysis to replace all human manpower in the reverse engineering process.

Advanced Attacker Methodology:

Anti-Debugging and Anti-Analysis Tactics:

Due to advances in debugging and reversing, a lot of malware agents have been thwarted or prevented due to research done by various great people. However, due to these implementations being public, malware developers have begun adapting to various research techniques. Another aspect of malware is that most developers do not want their code to be reverse engineered or analyzed. As such, various anti-debugging steps have been developed in order to delay or prevent analysis.

One case of malware that seeks to disrupt analysis would be a program that breaks free from the doubly linked list of processes that is stored in memory. Such tactis remove the ability for tools like htop, top, procmon and task viewer to alert to malicious programs that may still be running. These types of tactics were defeated through the volatility foundation’s tool which allows for an in depth analysis of memory. This allows for researchers to peer through memory and identify portions of code that are staged or should be executing but aren’t included in the process list. However, malware developers were quick on the upkeep and were able to develop strategies that mitigated signature techniques in memory to deny analysis of these binaries. This relationship between malware developers and security researchers parallels that of cat and mouse in which each entity attempts to thwart the other.

Malware that attempts to inhibit debugging tends to take advantage of the operating system and the system calls commonly utilized by a debugger’s use of Traps, signals and ptrace. An example technique of anti-debugging would be malware that attempts to periodically call ptrace upon itself, if ptrace errors out due to a debugging application utilizing the same function call on the malware the malware will terminate and possibly have some exit condition that seeks to cover it’s tracks. This can be mitigated, if physical access is granted, by overwriting calls to ptrace with No Operation op-codes relative to the instruction set of the machine the binary is being executed on.

Decision based execution:

Decision based execution implies malware that attempts to determine if all cases for their package pass prior to running exploits. These malwares generally have a failsafe that attempts to get rid of the original binary and source in order to cover up any tracks in the process. One such example would be stuxnet, which would try to determine the type of device it was trying to break into by correlating it to known devices utilized in Iran. Decision based execution can hide true initiatives from researchers and are often harder to analyze through automated tools as not all execution paths may be present within one operating system. As such, certain malware variants can be detained and classified with a signature in anti-viruses without having a true understanding of what is occurring within the binary. This has often lead to several vulnerabilities and 0 days being reutilized due to the negligence, lack of time, or inexperience of researchers. One methodology for testing decision-based execution would be to utilize Intel pin or other such instrumentation applications to analyze binaries as they execute on various systems. It is generally safe to assume that positive test cases have more execution paths and instructions that are executed. As such, such an implementation would be interesting to develop and test as to the feasibility of such an analysis system.

Motives of Malware Development:

The Motives of Malware Development can be broken up into Financial Gain, Botnet Creation, Personal Vendettas, Targeted Military Attacks and Accidental Malware deployments.

Financial Gain:

Financial gain is the most likely reason behind most modern malware development. The rise of malware agents that seek to encrypt important data with asymmetric key technologies like RSA have become very popular and successful in holding user’s data as a ransom and have been classified as Ransomware. However, ransomware is not the only method of financial gain and various other malware programs have sought to steal data such as Credit Card Numbers, Social Security Numbers and other sources of Personally Identifiable Information (PII).

Botnet Creation:

Malicious actors seeking to develop or create a botnet will generally develop Malware with C&C like actions. This would allow such entities the ability to DDoS various entities or perform other such activities. Recently, most malware developed in this sector has been targeted at various Internet of Things (IoT) devices, like the Mirai botnet. This has proven to be quite fruitful and rather easy due to the amount of IoT devices that are publicly accessible, the relative lack of updates to IoT devices, and the current state of insecurity in IoT devices. In some cases malicious actors have theorized or attempted to utilize botnet like malware in order to perform crypto currency mining operations. However, due to a series of calculations done by Errata Security, this seems infeasible on IoT devices and far too noticeable on desktop appliances.

Personal Vendetta:

An attacker with a personal vendetta may seek to write malware for the sole purpose of causing harm to as many users as possible. This may be due to a sense of wronging by certain entities or a group who seeks to cause havoc against mainstream sources. In any of the cases, malware created in these criteria are often quite harmful yet they do not often spread relatively fast and are generally mitigated by AV systems due to how noisy such malware would be.

Targeted Military Attack:

Military entities, which seek to cause downtime, financial pain, or to fulfill other motives may seek to develop and create malware that will be able to accomplish their task. A prime example of this is Stuxnet and other variants related to it that sought to take down or impair the development of uranium in Iran. Military attacks are often targeted towards entities that have differing values, views or appear dangerous to another nation.

Accidental:

The accidental category is often the hardest to determine however, the type of malware developed is often quite benign in nature and relatively easy to disrupt. There have been several incidences, like the Morris Worm, in which malicious intent was not intended but a lot of damage was caused. In the modern state of malware protection and analysis, such an occurrence would be quite rare and relatively easy to mitigate.

Mitigations:

Preventative:

Companies that develop code for which is used widely by the general public should take advantage of various preventative measures to ensure that their applications are secured. This can be through the use of fuzzing, external security teams and various security training. Overall, preventative measures cannot be the only source of protection, as most code that is shipped with a deadline will inevitably have bugs in it.

A company’s use of fuzzing tools on applications can allow for an automated analysis tool to determine if any vulnerabilities or faults exist. American Fuzzy Lop(AFL), Manticore, and ANGR are all tools that in combination with build integration tests and scripts could provide automated analysis for applications security as well as giving developers a better understanding of type manipulation and handling their inputs.

Network:

Networking provides the ultimate defense against the overall spread, impact and damage that any malware agent can provide. First and foremost, the network layer is the first layer of defense for most businesses. As such, various firewall rules, IDS or IPS, and segmentation of networks can allow for a great defense again malware agents that operate in plaintext. There has been a rise over the years of utilizing self signed SSL certs and traversing traffic over HTTPS to disguise commands and any malicious software that may be coming over the wire to an infected computer. Public Key Infrastructure (PKI) has been set up for websites that parallels the structure of E-mail web of Certificate Authorities (CA). These CAs only issue certs to validated websites and generally will revoke them in instances of misuse. Note there have been some issues with CAs being polluted and giving out certs to malicious actors without safeguarding their clients.

Due to the nature of PKI, all HTTPS certs have an entity with which they lean on to verify their existence. This is the basis of most web traffic and how we utilize HTTPS. However, when malicious attackers seek to utilize these communication channels, the primarily utilize self-signed certificates. This can be an easy indicator that traffic may not be legitimate. Furthermore, tracking these metrics of self signed certs and how their use propagates across the network gives us insights into how a malicious actor is pivoting across the network. Furthermore, we can compare url usage of potential sites that may be utilized over SSL in order to verify the existence and authenticity. This type of traffic can be authenticated automatically by comparing URLs against the Alexa top 500 websites and treating other websites as outliers that may need to be inspected.

Self-Signed certificate utilization is not recommended for anything but test websites and possibly as an implementation for internal websites over https. However these websites can be tracked and monitored so that they can be white listed as a company should have knowledge of which internal websites they are utilizing that have self signed certs. Otherwise one may be able to block all self-signed certs in order to limit the availability of this type of stealthy exfiltration. SSL Strip is a method in which a company may utilize to ensure that all SSL traffic is being analyzed. This runs into issues however when dealing with PII and the aspect that most modern browsers now alert to when their session is being high jacked or not utilizing the same session. Internet Explorer was the only browser for a long time that thought it was a good idea to not alert people to such an attempt.

Bias and Assumptions:

The overall article makes several assumptions as to the infrastructure as well as workflow that entities have. Several biases occur in tool selection, reverse engineering of malware, network analysis, and an emphasis on *Nix type operating systems as these various subjects are the researcher’s primary points of interest.

Conclusion:

Overall the world of malware will continue to expand and more and more malware agents will be released into the wild. It is up to various malware researchers to research and develop various tools in order to handle the volume of malware that is released. With various improvements and continued analysis, several key components of the reverse engineering pipeline could be refined in order to expand and improve the state of malware reverse engineering. There are also several tools that could benefit the community after further research is done.

Future Work:

To perform an analysis of ATP29 malware, provided by Deep End Research, to determine the OS breakdown and then further attempt to examine and classify the various components. Development of an OSX based malware agent that pivots through user land and takes control of the host for C&C. Another useful expansion would be the development of several guides for implementation of malware labs for single hosts or Hyper visor environments in which a suite of tools is installed and prepared with various versions of services and operating systems running with which Intel pin would run and automated results would be compiled to determine if test cases were passed within the binary. Results may be varied, but with an aggregate the overall playing field for dynamic analysis can be ascertained which could save researchers time.

Sources:

[1] http://blog.erratasec.com/2017/04/mirai-bitcoin-and-numeracy.html

[2]I. You and K. Yim, "Malware Obfuscation Techniques: A Brief Survey," 2010 International Conference on Broadband, Wireless Computing, Communication and Applications, Fukuoka, 2010, pp. 297-300. doi: 10.1109/BWCCA.2010.85 URL: http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=5633410&isnumber=5630027

[3] https://www.sslshopper.com/article-when-are-self-signed-certificates-acceptable.html