\documentclass[a4paper,titlepage,12pt]{article} \usepackage{amssymb} \usepackage{pslatex} \usepackage[dvips]{graphicx} \usepackage{wrapfig} \usepackage{url} \date{} \begin{document} \begin{center} \huge Creating signatures for ClamAV\\ \vspace{2cm} \end{center} \noindent \section{Introduction} CVD (ClamAV Virus Database) is a digitally signed container that includes signature databases in various text formats. The header of the container is a 512 bytes long string with colon separated fields: \begin{verbatim} ClamAV-VDB:build time:version:number of signatures:functionality level required:MD5 checksum:digital signature:builder name:build time (sec) \end{verbatim} \verb+sigtool --info+ displays detailed information about a given CVD file: \begin{verbatim} zolw@localhost:/usr/local/share/clamav$ sigtool -i main.cvd File: main.cvd Build time: 09 Dec 2007 15:50 +0000 Version: 45 Signatures: 169676 Functionality level: 21 Builder: sven MD5: b35429d8d5d60368eea9630062f7c75a Digital signature: dxsusO/HWP3/GAA7VuZpxYwVsE9b+tCk+tPN6OyjVF/U8 JVh4vYmW8mZ62ZHYMlM903TMZFg5hZIxcjQB3SX0TapdF1SFNzoWjsyH53eXvMDY eaPVNe2ccXLfEegoda4xU2TezbGfbSEGoU1qolyQYLX674sNA2Ni6l6/CEKYYh Verification OK. \end{verbatim} The ClamAV project distributes two CVD files: \emph{main.cvd} and \emph{daily.cvd}. \section{Signature formats} \subsection{MD5} The easiest way to create signatures for ClamAV is to use MD5 checksums, however this method can be only used against static malware. To create a signature for \verb+test.exe+ use the \verb+--md5+ option of sigtool: \begin{verbatim} zolw@localhost:/tmp/test$ sigtool --md5 test.exe > test.hdb zolw@localhost:/tmp/test$ cat test.hdb 48c4533230e1ae1c118c741c0db19dfb:17387:test.exe \end{verbatim} That's it! The signature is ready to use: \begin{verbatim} zolw@localhost:/tmp/test$ clamscan -d test.hdb test.exe test.exe: test.exe FOUND ----------- SCAN SUMMARY ----------- Known viruses: 1 Scanned directories: 0 Engine version: 0.92.1 Scanned files: 1 Infected files: 1 Data scanned: 0.02 MB Time: 0.024 sec (0 m 0 s) \end{verbatim} You can change the name (by default sigtool uses the name of the file) and place it inside a \verb+*.hdb+ file. A single database file can include any number of signatures. To get them automatically loaded each time clamscan/clamd starts just copy the database file(s) into the local virus database directory (eg. /usr/local/share/clamav). \subsection{MD5, PE section based} You can create a MD5 signature for a specific section in a PE file. Such signatures shall be stored inside \verb+.mdb+ files in the following format: \begin{verbatim} PESectionSize:MD5:MalwareName \end{verbatim} The easiest way to generate MD5 based section signatures is to extract target PE sections into separate files and then run sigtool with the option \verb+--mdb+ \subsection{Hexadecimal signatures} ClamAV stores all signatures in a hexadecimal format. By a hex-signature here we mean a fragment of a malware's body converted into a hexadecimal string which can be additionally extended with various wildcards. \subsubsection{Hexadecimal format} You can use \verb+sigtool --hex-dump+ to convert any data into a hex-string: \begin{verbatim} zolw@localhost:/tmp/test$ sigtool --hex-dump How do I look in hex? 486f7720646f2049206c6f6f6b20696e206865783f0a \end{verbatim} \subsubsection{Wildcards} ClamAV supports the following extensions inside hex signatures: \begin{itemize} \item \verb+??+\\ Match any byte. \item \verb+a?+\\ Match a high nibble (the four high bits).\\ \textbf{IMPORTANT NOTE:} The nibble matching is only available in libclamav with the functionality level 17 and higher therefore please only use it with .ndb signatures followed by ":17" (MinEngineFunctionalityLevel, see \ref{ndb}). \item \verb+?a+\\ Match a low nibble (the four low bits). \item \verb+*+\\ Match any number of bytes. \item \verb+{n}+\\ Match $n$ bytes. \item \verb+{-n}+\\ Match $n$ or less bytes. \item \verb+{n-}+\\ Match $n$ or more bytes. \item \verb+{n-m}+\\ Match between $n$ and $m$ bytes ($m > n$). \item \verb+(aa|bb|cc|..)+\\ Match aa or bb or cc.. \item \verb+HEXSIG[x-y]aa+ or \verb+aa[x-y]HEXSIG+\\ Match aa anchored to a hex-signature, see \url{https://wwws.clamav.net/bugzilla/show_bug.cgi?id=776} for a discussion and examples. \end{itemize} The range signatures \verb+*+ and \verb+{}+ virtually separate a hex-signature into two parts, eg. \verb+aabbcc*bbaacc+ is treated as two sub-signatures \verb+aabbcc+ and \verb+bbaacc+ with any number of bytes between them. It's a requirement that each sub-signature includes a block of two static characters somewhere in its body. \subsubsection{Basic signature format} The simplest (and now deprecated) signature format is: \begin{verbatim} MalwareName=HexSignature \end{verbatim} ClamAV will scan the entire file looking for HexSignature. All signatures of this type must be placed inside \verb+*.db+ files. \subsubsection{Extended signature format}\label{ndb} The extended signature format allows for specification of additional information such as a target file type, virus offset or engine version, making the detection more reliable. The format is: \begin{verbatim} MalwareName:TargetType:Offset:HexSignature[:MinEngineFunctionalityLevel:[Max]] \end{verbatim} where \verb+TargetType+ is one of the following numbers specifying the type of the target file: \begin{itemize} \item 0 = any file \item 1 = Portable Executable, both 32- and 64-bit. \item 2 = file inside OLE2 container (e.g. image, embedded executable, VBA script). The OLE2 format is primarily used by MS Office and MSI installation files. \item 3 = HTML (normalized: whitespace transformed to spaces, tags/tag attributes normalized, all lowercase), Javascript is normalized too: all strings are normalized (hex encoding is decoded), numbers are parsed and normalized, local variables/function names are normalized to 'n001' format, argument to eval() is parsed as JS again, unescape() is handled, some simple JS packers are handled, output is whitespace normalized. \item 4 = Mail file \item 5 = Graphics \item 6 = ELF \item 7 = ASCII text file (normalized) \item 8 = Disassembler data \item 9 = Mach-O files \end{itemize} And \verb+Offset+ is an asterisk or a decimal number \verb+n+ possibly combined with a special modifier: \begin{itemize} \item \verb+*+ = any \item \verb+n+ = absolute offset \item \verb+EOF-n+ = end of file minus \verb+n+ bytes \end{itemize} Signatures for PE, ELF and Mach-O files additionally support: \begin{itemize} \item \verb#EP+n# = entry point plus n bytes (\verb#EP+0# for \verb+EP+) \item \verb#EP-n# = entry point minus n bytes \item \verb#Sx+n# = start of section \verb+x+'s (counted from 0) data plus \verb+n+ bytes \item \verb#Sx-n# = start of section \verb+x+'s data minus \verb+n+ bytes \item \verb#SL+n# = start of last section plus \verb+n+ bytes \item \verb#SL-n# = start of last section minus \verb+n+ bytes \end{itemize} All the above offsets except \verb+*+ can be turned into \textbf{floating offsets} and represented as \verb+Offset,MaxShift+ where \verb+MaxShift+ is an unsigned integer. A floating offset will match every offset between \verb+Offset+ and \verb#Offset+MaxShift#, eg. \verb+10,5+ will match all offsets from 10 to 15 and \verb#EP+n,y# will match all offsets from \verb#EP+n# to \verb#EP+n+y#. Versions of ClamAV older than 0.91 will silently ignore the \verb+MaxShift+ extension and only use \verb+Offset+.\\ \noindent All signatures in the extended format must be placed inside \verb+*.ndb+ files. \subsubsection{Logical signatures}\label{ndb} Logical signatures allow combining of multiple signatures in extended format using logical operators. They can provide both more detailed and flexible pattern matching. The logical sigs are stored inside \verb+*.ldb+ files in the following format: \begin{verbatim} SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0; Subsig1;Subsig2;... \end{verbatim} where: \begin{itemize} \item \verb+TargetDescriptionBlock+ provides information about the engine and target file with comma separated \verb+Arg:Val+ pairs, currently (as of 0.95.1) only \verb+Target:X+ and \verb+Engine:X-Y+ are supported. \item \verb+LogicalExpression+ specifies the logical expression describing the relationship between \verb+Subsig0...SubsigN+.\\ \textbf{Basis clause:} 0,1,...,N decimal indexes are SUB-EXPRESSIONS representing \verb+Subsig0, Subsig1,...,SubsigN+ respectively.\\ \textbf{Inductive clause:} if \verb+A+ and \verb+B+ are SUB-EXPRESSIONS and \verb+X, Y+ are decimal numbers then \verb+(A&B)+, \verb+(A|B)+, \verb+A=X+, \verb+A=X,Y+, \verb+A>X+, \verb+A>X,Y+, \verb+AX+: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches (with any of its sigs). \item \verb+A>X,Y+: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches and at least Y different signatures must be matched. \item \verb+A5,2)&(3|1);6b6f74656b;616c61;7a6f6c77;737 46566616e Sig3;Target:0;((0|1|2|3)=2)&(4|1);6b6f74656b;616c61;7a6f6c77;737 46566616e;deadbeef Sig4;Target:1,Engine:18-20;((0|1)&(2|3))&4;EP+123:33c06834f04100 f2aef7d14951684cf04100e8110a00;S2+78:22??232c2d252229{-15}6e6573 (63|64)61706528;S+50:68efa311c3b9963cb1ee8e586d32aeb9043e;f9c58d cf43987e4f519d629b103375;SL+550:6300680065005c0046006900 \end{verbatim} \subsection{Signatures based on archive metadata} Signatures based on metadata inside archive files can provide an effective protection against malware that spreads via encrypted zip or rar archives. The format of a metadata signature is: \begin{verbatim} virname:encrypted:filename:normal size:csize:crc32:cmethod:fileno:max depth \end{verbatim} where the corresponding fields are: \begin{itemize} \item Virus name \item Encryption flag (1 -- encrypted, 0 -- not encrypted) \item File name (this is a regular expression - * to ignore) \item Normal (uncompressed) size (* to ignore) \item Compressed size (* to ignore) \item CRC32 (* to ignore) \item Compression method (* to ignore) \item File position in archive (* to ignore) \item Maximum number of nested archives (* to ignore) \end{itemize} The database file should have the extension of \verb+.zmd+ or \verb+.rmd+ for zip or rar metadata respectively. \subsection{Whitelist databases} To whitelist a specific file use the MD5 signature format and place it inside a database file with the extension of \verb+.fp+.\\ \noindent To whitelist a specific signature inside main.cvd add the following entry into daily.ign or a local file local.ign: \begin{verbatim} db_name:line_number:signature_name \end{verbatim} \subsection{Signature names} ClamAV uses the following prefixes for signature names: \begin{itemize} \item \emph{Worm} for Internet worms \item \emph{Trojan} for backdoor programs \item \emph{Adware} for adware \item \emph{Flooder} for flooders \item \emph{HTML} for HTML files \item \emph{Email} for email messages \item \emph{IRC} for IRC trojans \item \emph{JS} for Java Script malware \item \emph{PHP} for PHP malware \item \emph{ASP} for ASP malware \item \emph{VBS} for VBS malware \item \emph{BAT} for BAT malware \item \emph{W97M}, \emph{W2000M} for Word macro viruses \item \emph{X97M}, \emph{X2000M} for Excel macro viruses \item \emph{O97M}, \emph{O2000M} for generic Office macro viruses \item \emph{DoS} for Denial of Service attack software \item \emph{DOS} for old DOS malware \item \emph{Exploit} for popular exploits \item \emph{VirTool} for virus construction kits \item \emph{Dialer} for dialers \item \emph{Joke} for hoaxes \end{itemize} Important rules of the naming convention: \begin{itemize} \item always use a -zippwd suffix in the malware name for signatures of type zmd, \item always use a -rarpwd suffix in the malware name for signatures of type rmd, \item only use alphanumeric characters, dash (-), dot (.), underscores (\_) in malware names, never use space, apostrophe or quote mark. \end{itemize} \section{Special files} \subsection{HTML} ClamAV contains a special HTML normalisation code which helps to detect HTML exploits. Running \verb+sigtool --html-normalise+ on a HTML file should generate the following files: \begin{itemize} \item nocomment.html - the file is normalized, lower-case, with all comments and superflous white space removed \item notags.html - as above but with all HTML tags removed \end{itemize} The code automatically decodes JScript.encode parts and char ref's (e.g. \verb+f+). You need to create a signature against one of the created files. To eliminate potential false positive alerts the target type should be set to 3. \subsection{Text files} Similarly to HTML all ASCII text files get normalized (converted to lower-case, all superflous white space and control characters removed, etc.) before scanning. Use \verb+clamscan --leave-temps+ to obtain a normalized file then create a signature with the target type 7. \subsection{Compressed Portable Executable files} If the file is compressed with UPX, FSG, Petite or other PE packer supported by libclamav, run \verb+clamscan+ with \verb+--debug --leave-temps+. Example output for a FSG compressed file: \begin{verbatim} LibClamAV debug: UPX/FSG/MEW: empty section found - assuming compression LibClamAV debug: FSG: found old EP @119e0 LibClamAV debug: FSG: Unpacked and rebuilt executable saved in /tmp/clamav-f592b20f9329ac1c91f0e12137bcce6c \end{verbatim} Next create a type 1 signature for \verb+/tmp/clamav-f592b20f9329ac1c91f0e12137bcce6c+ \end{document}