How to write YARA rules for improving your security and malware detection
YARA won’t replace antivirus software, but it can help you detect problems more efficiently and allows more customization. Learn how to write YARA rules to improve security and incident response .
In our first article about YARA, we defined what kind of tool it was and in which context it could be used: detecting malware on the network or on endpoints, helping incident response and monitoring, classifying files or even detecting sensitive data leaks. We also showed how to install it. Now it’s time to write rules to get the best out of it.
SEE: Google Chrome: Security and UI tips you need to know (TechRepublic Premium)
Use an empty template to start
YARA rules are text files, which follow a very basic, yet powerful, syntax.
YARA rules always contain three parts:
- The meta part: This part contains general or specific information that is not processed but serves the user to understand what it is about.
- The strings part: This part contains all the strings that need to be searched for in files.
- The condition part: This part defines the condition for matching. It can be just matching one or several strings, but it can also be more complex as we will see later in this article.
From my experience, it is strongly advised to create an empty template that you will always use to start writing a new rule. This way, you just need to fill a few variable contents and add the desired conditions.
rule samplerule
{
meta:
author="Cedric Pernet"
version="0.1"
date="2021/05/12"
reference="any useful reference"
strings:
condition:
}
Using this template, you can quickly edit the metadata and the rule name (in our example it is named samplerule). The metadata can be just anything the user wants to put there. As for me, I always use a version number, a date, a reference which could be a malware hash, or a blog report that mentions what I want to detect, and an author field.
Now that the metadata is written, let’s start writing out the first rule.
A first rule
YARA rules are a combination of strings elements and conditions. The strings can be text strings, hexadecimal strings or regular expressions.
The conditions are boolean expressions, just like in other programming languages. The most known are AND, OR, NOT. Relational, arithmetic and bitwise operators can also be used.
Here is a first rule:
rule netcat_detection
{
meta:
author="Cedric Pernet"
version="0.1"
date="2021/05/12"
reference="netcat is a free tool available freely online"
strings:
$str1="gethostpoop fuxored" // this is very specific to the netcat tool
$str2="nc -l -p port [options]"
condition:
$str1 or $str2
}
So let us explain this rule titled netcat_detection.
After our usual metadata, the strings division contains two variables, str1 and str2, which of course might be named any way we like. Also, to illustrate how to add comments, the first variable contains one comment at the end of it.
The condition part contains the following condition: It must match either str1 or str2.
This could have been written in a more comfortable way:
condition:
any of ($str*)
This can be useful if we have a lot of different variables and we want to just match on any of it.
Running the first rule
Let’s now run our rule, which we saved as a file named rule1.yar. We want to run it against a folder containing several different files, two of them being the 32- and 64-bits versions of the netcat software (Figure A). Our system is for testing is a Ubuntu Linux distribution, but it does not matter as Yara can be installed easily on Linux, Mac or Windows operating systems.
Figure A
As expected, YARA runs and returns the names of all files matching the rule.
Of course, one can put as many YARA rules as wanted in a single file, which makes it more comfortable than having a lot of different rule files.
Running YARA with -s option shows the exact strings which have matched those files (Figure B):
Figure B
On a side note, finding tools like netcat somewhere in your corporate network might indeed be worth investigating: That basic tool should not be found on the average user computer, since it allows computers to connect and exchange data on specific ports and might be used by attackers. It might also, of course, be used by IT people or red team staff, hence the investigation to determine why it was found on a machine from the corporate network.
More complex strings
Matching a basic string can be enough for finding files within systems. Yet strings might be encoded differently on different systems or might have been slightly triggered by attackers. One slight change, for example, can be to change the case of strings using random upper and lower case. Luckily enough, YARA can handle this easily.
In the following YARA strings part, a string will match no matter what case it uses:
strings:
$str1="thisisit" nocase
The condition $str1 will now match with any case used: “ThisIsIt”, “THISISIT”, “thisisit”,”ThIsIsiT”, etc.
If strings are encoded using two bytes per character, the “wide” modifier can be used, and can of course be combined with another one:
strings:
$str1="thisisit" nocase wide
To search for strings on both the ASCII and wide form, the modifier “ascii” can be used in conjunction with wide.
strings:
$str1="thisisit" ascii wide
Hexadecimal strings
Hexadecimal strings can be used easily:
strings:
$str1={ 75 72 65 6C 6E 20 }
$str2={ 75 72 65 6C ?? 20 }
$str3={ 75 72 [2-4] 65 6C }
Here are three different hexadecimal variables. The first one searches for an exact sequence on hexadecimal strings. The second one uses a wildcard expressed with two ? characters and will search strings with just any hexadecimal value where the ?? stands.
SEE: Password breach: Why pop culture and passwords don’t mix (free PDF) (TechRepublic)
The third string searches for the two first bytes, then a jump of two to four characters, then the two last bytes. This is very handy when some sequences vary in different files but show a predictable number of random bytes between two known ones.
Regular expressions
Regular expressions, just like in any programming language, are very useful to detect particular content that can be written in different ways. In YARA, they are defined by using a string that starts and ends with the slash (/) character.
Let’s take an example that makes sense.
In a malware binary, the developer left debug information, in particular the famous PDB string.
It reads:
D:workspaceMalware_v42Releasemalw.pdb
Now the idea would be not to only create a rule that would match this malware, but all the different versions of it in case the version number changes. Also, we decided to exclude the “D” drive from the rule, since the developer could also have it on another drive.
We come up with regular expression (Figure C):
Figure C
For demonstration purposes, we built a file named newmalwareversion.exe which contains three different PDB strings, each with a different version number. Our rule matches them all.
Please note that the characters from our strings have been doubled, because is a special character which needs to be escaped, like in C language.
More complex conditions
Conditions can be smarter than just matching a single or several strings. You can use conditions to count strings, to specify an offset at which you want to find a string, to match a file size or even use loops.
Here are a few examples which I commented for explanation:
condition:
2 of ($str*) // will match on 2 of several strings named str followed by a number
($str1 or $str2) and ($text1 or $text2) // example of Boolean operators
#a == 4 and #b > 6 // string a needs to be found exactly four times and string b needs to be found strictly more than six times
$str at 100 // string str needs to be located within the file at offset 100
$str in (500..filesize) // string str needs to be located between offset 500 and end of file.
filesize > 500KB // Only files which are more than 500KB big will be considered
Conclusion
This article shows the most basic capabilities of YARA. We could not document everything, of course, since it is really a kind of programming language. The possibilities offered by YARA for matching files are quite endless. The more the analyst gets comfortable with YARA, the more he or she will get the feel for it and improve their skills to write more efficient rules.
Since the language is so easy to write and use, it is more a matter of knowing what one really wants to detect. It has become increasingly common through the last years to see security researchers publish YARA rules in appendices of their research papers and blog posts, in order to help everyone match malicious content on their computers or servers. YARA rules also allow to match content that is not malicious but needs to be carefully monitored, like internal documents for example, rendering YARA into a data loss detection tool as well as a malicious content detector. One should not hesitate to consult the YARA documentation to see all possibilities offered by the tool.