Linux Malware Analysis

Malware is a malicious piece of code sent with the intention to cause harm to one’s computer system. Malware can be of any type, such as rootkits, spyware, adware, viruses, worms, etc., which hides itself and runs in the background while communicating with its command and control system on the outside network. Nowadays, most malwares are target-specified and specially programmed to bypass the security measures of the target system. That is why advanced malware can be very hard to detect through normal security solutions. Malwares are usually target-specific, and an important step in triggering a malware is its infection vector, i.e., how the malware will reach the target’s surface. For example, a nondescript USB stick or malicious downloadable links (via social engineering/phishing) may be used. Malware must be able to exploit a vulnerability to infect the target system. In most cases, malware is equipped with the ability to perform more than one function; for example, the malware could contain a code to exploit a certain vulnerability and could also be carrying a payload or program to communicate with the attacking machine.

REMnux

The disassembling of a computer malware to study its behavior and understand what it actually does is called Malware Reverse Engineering. To determine whether an executable file contains malware or if it is just an ordinary executable, or to know what an executable file really does and the impact it has on the system, there is a special Linux distribution called REMnux. REMnux is a lightweight, Ubuntu-based distro equipped with all the tools and scripts needed to perform a detailed malware analysis on a given file or software executable. REMnux is equipped with free and open-source tools that can be used to examine all types of files, including executables. Some tools in REMnux can even be used to examine unclear or obfuscated JavaScript code and Flash programs.

Installation

REMnux can be run on any Linux-based distribution, or in a virtual box with Linux as the host operating system. The first step is to download the REMnux distribution from its official website, which can be done by entering the following command:

ubuntu@ubuntu:~$ wget https://REMnux.org/remnux-cli

Be sure to check that it is the same file you wanted by comparing the SHA1 signature. The SHA1 signature can be produced using the following command:

ubuntu@ubuntu:~$ sha256sum remnux-cli

Then, move it to another directory named “remnux” and give it executable permissions using “chmod +x.” Now, run the following command to start the installation process:

ubuntu@ubuntu:~$ mkdir remnux
ubuntu@ubuntu:~$ cd remnux
ubuntu@ubuntu:~$ mv ../remux-cli ./
ubuntu@ubuntu:~$ chmod +x remnux-cli
//Install Remnux
ubuntu@ubuntu:~$ sudo install remnux

Restart your system, and you will be able to use the newly installed REMnux distro containing all the tools available for the reverse engineering procedure.

Another useful thing about REMnux is that you can use docker images of popular REMnux tools to perform a specific task instead of installing the whole distribution. For example, the RetDec tool is used to disassemble the machine code and it takes input in various file formats, such as 32-bit/62-bit exe files, elf files, etc. Rekall is another great tool containing a docker image that can be used to perform some useful tasks, like extracting memory data and retrieving important data. To examine an unclear JavaScript, a tool called JSdetox can also be used. Docker images of these tools are present in the REMnux repository in the Docker Hub.

Malware Analysis

Entropy

Checking the unpredictability of a data stream is called Entropy. A consistent stream of bytes of data, for example, all zeroes or all ones, have 0 Entropy. On the other hand, if the data is encrypted or consists of alternative bits, it will have a higher entropy value. A well-encrypted data packet has a higher entropy value than a normal packet of data because bit values in encrypted packets are unpredictable and change more rapidly. Entropy has a minimum value of 0 and a maximum value of 8. The primary use of Entropy in Malware analysis is to find malware in executable files. If an executable contains a malicious malware, most of the time, it is encrypted fully so that AntiVirus cannot investigate its contents. Entropy level of that kind of file is very high, as compared to a normal file, which will send a signal to the investigator about something suspicious in the contents of a file. A high entropy value means high scrambling of the data stream, which is a clear indication of something fishy.

Density Scout

This useful tool is created for a single purpose: to find malware in a system. Usually what the attackers do is to wrap up the malware in scrambled data (or encode/encrypt it) so that it cannot be detected by anti-virus software. Density Scout scans the specified file system path and prints the entropy values of every file in each path (starting from highest to lowest). A high value will make the investigator suspicious and he or she will further investigate the file. This tool is available for Linux, Windows, and Mac operating systems. Density Scout also has a help menu showing a variety of options it provides, with the following syntax:

ubuntu@ubuntu:~ densityscout --h

ByteHist

ByteHist is a very useful tool for generating a graph or histogram according to the data scrambling (entropy) level of different files. It makes the work of an investigator even easier, as this tool even makes histograms of the sub-sections of an executable file. This means that now, the investigator can easily focus on the part where suspicion occurs by just looking at the histogram. A normal-looking file’s histogram would be completely different from a malicious one.

Anomaly Detection

Malwares can be packed normally using different utilities, such as UPX. These utilities modify the headers of executable files. When someone tries to open these files using a debugger, the modified headers crash the debugger so that investigators cannot look into it. For these cases, Anomaly Detecting tools are used.

PE (Portable Executables) Scanner

PE Scanner is a useful script written in Python that is used to detect suspicious TLS entries, invalid timestamps, sections with suspicious entropy levels, sections with zero-length raw sizes, and the malwares packed in exe files, among other functions.

Exe Scan

Another great tool for scanning exe or dll files for a strange behavior is EXE scan. This utility checks the header field of executables for suspicious entropy levels, sections with zero-length raw sizes, checksum differences, and all other types of non-regular behavior of files. EXE Scan has great features, generating a detailed report and automating the tasks, which saves a lot of time.

Obfuscated Strings

Attackers can use a shifting method to obfuscate the strings in malicious executable files. There are certain types of encoding that can be used for obfuscation. For example, ROT encoding is used to rotate all the characters (smaller and capital alphabets) by a certain number of positions. XOR encoding uses a secret key or passphrase (constant) to encode or to XOR a file. ROL encodes the bytes of a file by rotating them after a certain number of bits. There are various tools to extract these puzzled strings from a given file.

XORsearch

XORsearch is used to look for contents in a file that are encoded using ROT, XOR, and ROL algorithms. It will brute force all one-byte key values. For longer values, this utility will take a lot of time, which is why you must specify the string that you are looking for. Some useful strings that are usually found in malware are “http” (most of the time, URLs are concealed in malware code), “This program” (header of file is modified by writing “This program cannot be run in DOS” in many cases). After finding a key, all the bytes can be decoded using it. The XORsearch syntax is as follows:

ubuntu@ubuntu:~ xorsearch -s <file name> <string you are looking for>

brutexor

After finding keys using programs like xor search, xor strings, etc., one can use a great tool called brutexor to bruteforce any file for strings without specifying a given string. When using the -f option, the whole file can be selected. A file can be brute-forced first and the strings extracted are copied in another file. Then, after looking at the extracted strings, one can find the key, and now, using this key, all the strings encoded using that particular key can be extracted.

ubuntu@ubuntu:~ brutexor.py <file> >> <file where you
want to copy the strings extracted>
ubuntu@ubuntu:~ brutexor.py -f -k <string> <file>

Extraction of Artifacts and Valuable Data (Deleted)

To analyze disk images and hard drives and extract artifacts and valuable data from them using various tools like Scalpel, Foremost, etc., one must first create a bit-by-bit image of them so that no data is lost. To create these image copies, there are various tools available.

dd is used to make a forensically sound image of a drive. This tool also provides an integrity check by allowing comparison of the hashes of an image with the original disk drive. The dd tool can be used as follows:

ubuntu@ubuntu:~ dd if=<src> of=<dest> bs=512
if=Source drive (for example, /dev/sda)
of=Destination location
bs=Block size (the number of bytes to copy at a time)

dcfldd

dcfldd is another tool used for disk imaging. This tool is like an upgraded version of the dd utility. It provides more options than dd, such as hashing at the time of imaging. You can explore dcfldd’s options using the following command:

ubuntu@ubuntu:~ dcfldd -h
Usage: dcfldd [OPTION]...
bs=BYTES force ibs=BYTES and obs=BYTES
conv=KEYWORDS convert the file as per the comma separated keyword list
count=BLOCKS copy only BLOCKS input blocks
ibs=BYTES read BYTES bytes at a time
if=FILE read from FILE instead of stdin
obs=BYTES write BYTES bytes at a time
of=FILE write to FILE instead of stdout
NOTE: of=FILE may be used several times to write
output to multiple files simultaneously
of:=COMMAND exec and write output to process COMMAND
skip=BLOCKS skip BLOCKS ibs-sized blocks at start of input
pattern=HEX use the specified binary pattern as input
textpattern=TEXT use repeating TEXT as input
errlog=FILE send error messages to FILE as well as stderr
hash=NAME either md5, sha1, sha256, sha384 or sha512
default algorithm is md5. To select multiple
algorithms to run simultaneously enter the names
in a comma separated list
hashlog=FILE send MD5 hash output to FILE instead of stderr
if you are using multiple hash algorithms you
can send each to a separate file using the
convention ALGORITHMlog=FILE, for example
md5log=FILE1, sha1log=FILE2, etc.
hashlog:=COMMAND exec and write hashlog to process COMMAND
ALGORITHMlog:=COMMAND also works in the same fashion
hashconv=[before|after] perform the hashing before or after the conversions
hash format=FORMAT display each hashwindow according to FORMAT
the hash format mini-language is described below
totalhash format=FORMAT display the total hash value according to FORMAT
status=[on|off] display a continual status message on stderr
default state is "on"
statusinterval=N update the status message every N blocks
default value is 256
vf=FILE verify that FILE matches the specified input
verifylog=FILE send verify results to FILE instead of stderr
verifylog:=COMMAND exec and write verify results to process COMMAND
--help display this help and exit
--version output version information and exit

Foremost

Foremost is used to carve data out of an image file using a technique known as file carving. The main focus of file carving is carving data using headers and footers. Its configuration file contains several headers, which can be edited by the user. Foremost extracts the headers and compares them to those in the configuration file. If it matches, it will be displayed.

Scalpel

Scalpel is another tool used for data retrieval and data extraction and is comparatively faster than Foremost. Scalpel looks at the blocked data storage area and starts recovering the deleted files. Before using this tool, the line of file types must be uncommented by removing # from the desired line. Scalpel is available for both Windows and Linux operating systems and is considered very useful in forensic investigations.

Bulk Extractor

Bulk Extractor is used to extract features, such as email addresses, credit card numbers, URLs, etc. This tool contains many functions that give enormous speed to the tasks. For decompressing partially corrupted files, Bulk Extractor is used. It can retrieve files like jpgs, pdfs, word documents, etc. Another feature of this tool is that it creates histograms and graphs of filetypes recovered, making it a lot easier for investigators to look at desired places or documents.

Analyzing PDFs

Having a fully patched computer system and the latest antivirus does not necessarily mean that the system is secure. Malicious code can get into the system from anywhere, including PDFs, malicious documents, etc. A pdf file usually consists of a header, objects, a cross-reference table (to find articles), and a trailer. “/OpenAction” and “/AA” (Additional Action) ensures that the content or activity runs naturally. “/Names,” “/AcroForm,” and “/Action” can likewise indicate and dispatch contents or activities. “/JavaScript” indicates JavaScript to run. “/GoTo*” changes the view to a predefined goal inside the PDF or in another PDF record. “/Launch” dispatches a program or opens an archive. “/URI” obtains an asset by its URL. “/SubmitForm” and “/GoToR” can send information to the URL. “/RichMedia” can be used to install Flash in PDF. “/ObjStm” can shroud objects inside an Object Stream. Be aware of confusion with hex codes, for example, “/JavaScript” versus “/J#61vaScript.” Pdf files can be investigated using various tools to determine whether they contain malicious JavaScript or shellcode.

pdfid.py

pdfid.py is a Python script used to obtain information about a PDF and its headers. Let us take a look at casually analyzing a PDF using pdfid:

ubuntu@ubuntu:~ python pdfid.py malicious.pdf
PDFiD 0.2.1 /home/ubuntu/Desktop/malicious.pdf
PDF Header: %PDF-1.7
obj 215
endobj 215
stream 12
endstream 12
xref 2
trailer 2
startxref 2
/Page 1
/Encrypt 0
/ObjStm 2
/JS 0
/JavaScript 2
/AA 0
/OpenAction 0
/AcroForm 0
/JBIG2Decode 0
/RichMedia 0
/Launch 0
/EmbeddedFile 0
/XFA 0
/Colors > 2^24 0

Here, you can see that a JavaScript code is present inside of the PDF file, which is most often used to exploit Adobe Reader.

peepdf

peepdf contains everything needed for PDF file analysis. This tool gives the investigator a look at encode and decode streams, metadata edit, shellcode, execution of shellcodes, and malicious JavaScript. Peepdf has signatures for many vulnerabilities. On running it with a malicious pdf file, peepdf will expose any known vulnerability. Peepdf is a Python script and it provides a variety of options for analyzing a PDF. Peepdf is also used by malicious coders to pack a PDF with malicious JavaScript, executed upon opening the PDF file. Shellcode analysis, extraction of malicious content, extraction of old document versions, object modification, and filter modification are just some of this tool’s wide range of capabilities.

ubuntu@ubuntu:~ python peepdf.py malicious.pdf
File: malicious.pdf
MD5: 5b92c62181d238f4e94d98bd9cf0da8d
SHA1: 3c81d17f8c6fc0d5d18a3a1c110700a9c8076e90
SHA256: 2f2f159d1dc119dcf548a4cb94160f8c51372a9385ee60dc29e77ac9b5f34059
Size: 263069 bytes
Version: 1.7
Binary: True
Linearized: False
Encrypted: False
Updates: 1
Objects: 1038
Streams: 12
URIs: 156
Comments: 0
Errors: 2
Streams (12): [4, 204, 705, 1022, 1023, 1027, 1029, 1031, 1032, 1033, 1036, 1038]
Xref streams (1): [1038]
Object streams (2): [204, 705]
Encoded (11): [4, 204, 705, 1022, 1023, 1027, 1029, 1031, 1032, 1033, 1038]
Objects with URIs (156): [11, 12, 13, 14, 15, 16, 24, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53,
54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73,
74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93,
94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110,
111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126,
127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142,
143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158,
159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175]

Suspicious elements:/Names (1): [200]

Cuckoo Sandbox

Sandboxing is used to check the behavior of untested or untrusted programs in a safe, realistic environment. After putting a file in Cuckoo Sandbox, in a few minutes, this tool will reveal all the relevant information and behavior. Malwares are the main weapon of attackers and Cuckoo is the best defense one can have. Nowadays, just knowing that a malware enters into a system and removing it is not enough, and a good security analyst must analyze and look at the behavior of the program to determine the effect on the operating system, its whole context, and its main targets.

Installation

Cuckoo can be installed on Windows, Mac, or Linux operating systems by downloading this tool through the official website: https://cuckoosandbox.org/

For Cuckoo to work smoothly, one must install a few Python modules and libraries. This can be done using the following commands:

ubuntu@ubuntu:~ sudo apt-get install python python-pip
python-dev mongodb postgresql libpq-dev

For Cuckoo to show the output revealing the program’s behavior on the network requires a packet sniffer like tcpdump, which can be installed using the following command:

ubuntu@ubuntu:~ sudo apt-get install tcpdump

To give the Python programmer SSL functionality to implement clients and servers, m2crypto can be used:

ubuntu@ubuntu:~ sudo apt-get install m2crypto

Usage

Cuckoo analyzes a variety of file types, including PDFs, word documents, executables, etc. With the latest version, even websites can be analyzed using this tool. Cuckoo can also drop network traffic or route it through a VPN. This tool even dumps network traffic or SSL-enabled network traffic, and that can be analyzed again. PHP scripts, URLs, html files, visual basic scripts, zip, dll files, and almost any other type of file can be analyzed using Cuckoo Sandbox.

To use Cuckoo, you must submit a sample and then analyze its effect and behavior.

To submit binary files, use the following command:

# cuckoo submit <binary file path>

To submit a URL, use the following command:

# cuckoo submit <http://url.com>

To set up a timeout for analysis, use the following command:

# cuckoo submit timeout=60s <binary file path>

To set a higher property for a given binary, use the following command:

# cuckoo submit --priority 5 <binary file path>

The basic syntax of Cuckoo is as follows:

# cuckoo submit --package exe --options arguments=dosometask
<binary file path>

Once the analysis is complete, a number of files can be seen in the directory “CWD/storage/analysis,” containing the results of the analysis on the samples provided. The files present in this directory include the following:

Analysis.log: Contains the process results during the time of analysis, such as runtime errors, creation of files, etc.
Memory.dump: Contains the full memory dump analysis.
Dump.pcap: Contains the network dump created by tcpdump.
Files: Contains every file that the malware worked on or affected.
Dump_sorted.pcap: Contains an easily understandable form of dump.pcap file to look up the TCP stream.
Logs: Contains all created logs.
Shots: Contains snapshots of the desktop during malware processing or during the time that the malware was running on the Cuckoo system.
Tlsmaster.txt: Contains TLS master secrets caught during execution of the malware.

Conclusion

There is a general perception that Linux is virus-free, or that the chance of getting malware on this OS is very rare. More than half of web servers are Linux- or Unix-based. With so many Linux systems serving websites and other internet traffic, attackers see a large attack vector in malware for Linux systems. So, even daily use of AntiVirus engines would not be enough. To defend against malware threats, there are many Antivirus and endpoint security solutions available. But to analyze a malware manually, REMnux and Cuckoo Sandbox are the best available options. REMnux provides a wide range of tools in a lightweight, easy-to-install distribution system that would be great for any forensic investigator in analyzing malicious files of all types for malwares. Some very useful tools are already described in detail, but that is not all REMnux has, it is just the tip of the iceberg. Some of the most useful tools in REMnux distribution system include the following:

To understand the behavior of a suspicious, untrusted, or third-party program, this tool must be run in a secure, realistic environment, such as Cuckoo Sandbox, so that damage cannot be done to the host operating system.

Using network controls and system hardening techniques provides an extra layer of security to the system. The incident response or digital forensics investigation techniques must also be upgraded regularly to overcome malware threats to your system.

Linux Malware Analysis

REMnux

Installation

Malware Analysis

Entropy

Density Scout

ByteHist

Anomaly Detection

PE (Portable Executables) Scanner

Exe Scan

Obfuscated Strings

XORsearch

brutexor

Extraction of Artifacts and Valuable Data (Deleted)

dd

dcfldd

Foremost

Scalpel

Bulk Extractor

Analyzing PDFs

pdfid.py

peepdf

Cuckoo Sandbox

Installation

Usage

Conclusion

About the author

Usama Azad