docs/signatures.tex
65af7ee1
 \documentclass[a4paper,titlepage,12pt]{article}
 \usepackage{amssymb}
 \usepackage{pslatex}
 \usepackage[dvips]{graphicx}
 \usepackage{wrapfig}
 \usepackage{url}
 \date{}
 
 \begin{document}
 
     \begin{center}
 	\huge Creating signatures for ClamAV\\
 	\vspace{2cm}
     \end{center}
 
     \noindent
     \section{Introduction}
e5611f92
     CVD (ClamAV Virus Database) is a digitally signed container that
     includes signature databases in various text formats. The header
     of the container is a 512 bytes long string with colon separated fields:
65af7ee1
     \begin{verbatim}
 ClamAV-VDB:build time:version:number of signatures:functionality
e5611f92
 level required:MD5 checksum:digital signature:builder name:build
 time (sec)
65af7ee1
     \end{verbatim}
e5611f92
     \verb+sigtool --info+ displays detailed information about a given CVD file:
65af7ee1
     \begin{verbatim}
 zolw@localhost:/usr/local/share/clamav$ sigtool -i main.cvd
e5611f92
 File: main.cvd
 Build time: 09 Dec 2007 15:50 +0000
 Version: 45
 Signatures: 169676
 Functionality level: 21
 Builder: sven
 MD5: b35429d8d5d60368eea9630062f7c75a
 Digital signature: dxsusO/HWP3/GAA7VuZpxYwVsE9b+tCk+tPN6OyjVF/U8
 JVh4vYmW8mZ62ZHYMlM903TMZFg5hZIxcjQB3SX0TapdF1SFNzoWjsyH53eXvMDY
 eaPVNe2ccXLfEegoda4xU2TezbGfbSEGoU1qolyQYLX674sNA2Ni6l6/CEKYYh
65af7ee1
 Verification OK.
     \end{verbatim}
11239bbe
     The ClamAV project distributes a number of CVD files, including
     \emph{main.cvd} and \emph{daily.cvd}.
65af7ee1
 
208397a2
     \section{Debug information from libclamav}
     In order to create efficient signatures for ClamAV it's important
     to understand how the engine handles input files. The best way
     to see how it works is having a look at the debug information from
     libclamav. You can do it by calling \verb+clamscan+ with the
     \verb+--debug+ and \verb+--leave-temps+ flags. The first switch
     makes clamscan display all the interesting information from
     libclamav and the second one avoids deleting temporary files so
     they can be analyzed further. The now important part of the info
     is:
     \begin{verbatim}
 $ clamscan --debug attachment.exe
 [...]
 LibClamAV debug: Recognized MS-EXE/DLL file
 LibClamAV debug: Matched signature for file type PE
 LibClamAV debug: File type: Executable
     \end{verbatim}
     The engine recognized a windows executable.
     \begin{verbatim}
 LibClamAV debug: Machine type: 80386
 LibClamAV debug: NumberOfSections: 3
 LibClamAV debug: TimeDateStamp: Fri Jan 10 04:57:55 2003
 LibClamAV debug: SizeOfOptionalHeader: e0
 LibClamAV debug: File format: PE
 LibClamAV debug: MajorLinkerVersion: 6
 LibClamAV debug: MinorLinkerVersion: 0
 LibClamAV debug: SizeOfCode: 0x9000
 LibClamAV debug: SizeOfInitializedData: 0x1000
 LibClamAV debug: SizeOfUninitializedData: 0x1e000
 LibClamAV debug: AddressOfEntryPoint: 0x27070
 LibClamAV debug: BaseOfCode: 0x1f000
 LibClamAV debug: SectionAlignment: 0x1000
 LibClamAV debug: FileAlignment: 0x200
 LibClamAV debug: MajorSubsystemVersion: 4
 LibClamAV debug: MinorSubsystemVersion: 0
 LibClamAV debug: SizeOfImage: 0x29000
 LibClamAV debug: SizeOfHeaders: 0x400
 LibClamAV debug: NumberOfRvaAndSizes: 16
 LibClamAV debug: Subsystem: Win32 GUI
 LibClamAV debug: ------------------------------------
 LibClamAV debug: Section 0
 LibClamAV debug: Section name: UPX0
 LibClamAV debug: Section data (from headers - in memory)
 LibClamAV debug: VirtualSize: 0x1e000 0x1e000
 LibClamAV debug: VirtualAddress: 0x1000 0x1000
 LibClamAV debug: SizeOfRawData: 0x0 0x0
 LibClamAV debug: PointerToRawData: 0x400 0x400
 LibClamAV debug: Section's memory is executable
 LibClamAV debug: Section's memory is writeable
 LibClamAV debug: ------------------------------------
 LibClamAV debug: Section 1
 LibClamAV debug: Section name: UPX1
 LibClamAV debug: Section data (from headers - in memory)
 LibClamAV debug: VirtualSize: 0x9000 0x9000
 LibClamAV debug: VirtualAddress: 0x1f000 0x1f000
 LibClamAV debug: SizeOfRawData: 0x8200 0x8200
 LibClamAV debug: PointerToRawData: 0x400 0x400
 LibClamAV debug: Section's memory is executable
 LibClamAV debug: Section's memory is writeable
 LibClamAV debug: ------------------------------------
 LibClamAV debug: Section 2
 LibClamAV debug: Section name: UPX2
 LibClamAV debug: Section data (from headers - in memory)
 LibClamAV debug: VirtualSize: 0x1000 0x1000
 LibClamAV debug: VirtualAddress: 0x28000 0x28000
 LibClamAV debug: SizeOfRawData: 0x200 0x1ff
 LibClamAV debug: PointerToRawData: 0x8600 0x8600
 LibClamAV debug: Section's memory is writeable
 LibClamAV debug: ------------------------------------
 LibClamAV debug: EntryPoint offset: 0x8470 (33904)
     \end{verbatim}
     The section structure displayed above suggests the executable is
     packed with UPX.
     \begin{verbatim}
 LibClamAV debug: ------------------------------------
 LibClamAV debug: EntryPoint offset: 0x8470 (33904)
 LibClamAV debug: UPX/FSG/MEW: empty section found - assuming
                  compression
 LibClamAV debug: UPX: bad magic - scanning for imports
 LibClamAV debug: UPX: PE structure rebuilt from compressed file
 LibClamAV debug: UPX: Successfully decompressed with NRV2B
 LibClamAV debug: UPX/FSG: Decompressed data saved in
                  /tmp/clamav-90d2d25c9dca42bae6fa9a764a4bcede
 LibClamAV debug: ***** Scanning decompressed file *****
 LibClamAV debug: Recognized MS-EXE/DLL file
 LibClamAV debug: Matched signature for file type PE
     \end{verbatim}
     Indeed, libclamav recognizes the UPX data and saves the decompressed
     (and rebuilt) executable into \verb+/tmp/clamav-90d2d25c9dca42bae6fa9a764a4bcede+.
     Then it continues by scanning this new file:
     \begin{verbatim}
 LibClamAV debug: File type: Executable
 LibClamAV debug: Machine type: 80386
 LibClamAV debug: NumberOfSections: 3
 LibClamAV debug: TimeDateStamp: Thu Jan 27 11:43:15 2011
 LibClamAV debug: SizeOfOptionalHeader: e0
 LibClamAV debug: File format: PE
 LibClamAV debug: MajorLinkerVersion: 6
 LibClamAV debug: MinorLinkerVersion: 0
 LibClamAV debug: SizeOfCode: 0xc000
 LibClamAV debug: SizeOfInitializedData: 0x19000
 LibClamAV debug: SizeOfUninitializedData: 0x0
 LibClamAV debug: AddressOfEntryPoint: 0x7b9f
 LibClamAV debug: BaseOfCode: 0x1000
 LibClamAV debug: SectionAlignment: 0x1000
 LibClamAV debug: FileAlignment: 0x1000
 LibClamAV debug: MajorSubsystemVersion: 4
 LibClamAV debug: MinorSubsystemVersion: 0
 LibClamAV debug: SizeOfImage: 0x26000
 LibClamAV debug: SizeOfHeaders: 0x1000
 LibClamAV debug: NumberOfRvaAndSizes: 16
 LibClamAV debug: Subsystem: Win32 GUI
 LibClamAV debug: ------------------------------------
 LibClamAV debug: Section 0
 LibClamAV debug: Section name: .text
 LibClamAV debug: Section data (from headers - in memory)
 LibClamAV debug: VirtualSize: 0xc000 0xc000
 LibClamAV debug: VirtualAddress: 0x1000 0x1000
 LibClamAV debug: SizeOfRawData: 0xc000 0xc000
 LibClamAV debug: PointerToRawData: 0x1000 0x1000
 LibClamAV debug: Section contains executable code
 LibClamAV debug: Section's memory is executable
 LibClamAV debug: ------------------------------------
 LibClamAV debug: Section 1
 LibClamAV debug: Section name: .rdata
 LibClamAV debug: Section data (from headers - in memory)
 LibClamAV debug: VirtualSize: 0x2000 0x2000
 LibClamAV debug: VirtualAddress: 0xd000 0xd000
 LibClamAV debug: SizeOfRawData: 0x2000 0x2000
 LibClamAV debug: PointerToRawData: 0xd000 0xd000
 LibClamAV debug: ------------------------------------
 LibClamAV debug: Section 2
 LibClamAV debug: Section name: .data
 LibClamAV debug: Section data (from headers - in memory)
 LibClamAV debug: VirtualSize: 0x17000 0x17000
 LibClamAV debug: VirtualAddress: 0xf000 0xf000
 LibClamAV debug: SizeOfRawData: 0x17000 0x17000
 LibClamAV debug: PointerToRawData: 0xf000 0xf000
 LibClamAV debug: Section's memory is writeable
 LibClamAV debug: ------------------------------------
 LibClamAV debug: EntryPoint offset: 0x7b9f (31647)
 LibClamAV debug: Bytecode executing hook id 257 (0 hooks)
 attachment.exe: OK
 [...]
     \end{verbatim}
     No additional files get created by libclamav. By writing
     a signature for the decompressed file you have more chances
     that the engine will detect the target data when it gets
     compressed with another packer.
 
     This method should be applied to all files for which you want
     to create signatures. By analyzing the debug information you
     can quickly see how the engine recognizes and preprocesses
     the data and what additional files get created. Signatures
     created for bottom-level temporary files are usually more
     generic and should help detecting the same malware in
     different forms.
 
e5611f92
     \section{Signature formats}
65af7ee1
 
e37613ad
     \subsection{Hash-based signatures}
     The easiest way to create signatures for ClamAV is to use filehash checksums,
5938dfd8
     however this method can be only used against static malware.
     \subsubsection{MD5 hash-based signatures}
     To create a
e37613ad
     MD5 signature for \verb+test.exe+ use the \verb+--md5+ option of sigtool:
65af7ee1
     \begin{verbatim}
 zolw@localhost:/tmp/test$ sigtool --md5 test.exe > test.hdb
050f1036
 zolw@localhost:/tmp/test$ cat test.hdb
65af7ee1
 48c4533230e1ae1c118c741c0db19dfb:17387:test.exe
     \end{verbatim}
11239bbe
     That's it! The signature is ready for use:
65af7ee1
     \begin{verbatim}
050f1036
 zolw@localhost:/tmp/test$ clamscan -d test.hdb test.exe
65af7ee1
 test.exe: test.exe FOUND
 
 ----------- SCAN SUMMARY -----------
 Known viruses: 1
 Scanned directories: 0
e5611f92
 Engine version: 0.92.1
65af7ee1
 Scanned files: 1
 Infected files: 1
 Data scanned: 0.02 MB
 Time: 0.024 sec (0 m 0 s)
     \end{verbatim}
e5611f92
     You can change the name (by default sigtool uses the name of the file)
     and place it inside a \verb+*.hdb+ file. A single database file can
     include any number of signatures. To get them automatically loaded
     each time clamscan/clamd starts just copy the database file(s) into
     the local virus database directory (eg. /usr/local/share/clamav).
65af7ee1
 
208397a2
     \emph{The hash-based signatures shall not be used for text files,
     HTML and any other data that gets internally preprocessed before
     pattern matching. If you really want to use a hash signature in
     such a case, run clamscan with --debug and --leave-temps flags
     as described above and create a signature for a preprocessed file
     left in /tmp. Please keep in mind that a hash signature will stop
     matching as soon as a single byte changes in the target file.}
 
5938dfd8
     \subsubsection{SHA1 and SHA256 hash-based signatures}
e37613ad
     ClamAV 0.98 has also added support for SHA1 and SHA256 file checksums.
050f1036
     The format is the same as for MD5 file checksum.
e37613ad
     It can differentiate between them based on the length of the hash string
     in the signature. For best backwards compatibility, these should be
     placed inside a \verb+*.hsb+ file. The format is:
     \begin{verbatim}
 HashString:FileSize:MalwareName
     \end{verbatim}
 
5938dfd8
     \subsubsection{PE section based hash signatures}
e37613ad
     You can create a hash signature for a specific section in a PE file.
e5611f92
     Such signatures shall be stored inside \verb+.mdb+ files in the
     following format:
2702014b
     \begin{verbatim}
e37613ad
 PESectionSize:PESectionHash:MalwareName
2702014b
     \end{verbatim}
e5611f92
     The easiest way to generate MD5 based section signatures is to extract
     target PE sections into separate files and then run sigtool with the
     option \verb+--mdb+
2702014b
 
e37613ad
     ClamAV 0.98 has also added support for SHA1 and SHA256 section based
     signatures. The format is the same as for MD5 PE section based signatures.
     It can differentiate between them based on the length of the hash string
     in the signature. For best backwards compatibility, these should be
     placed inside a \verb+*.msb+ file.
 
5938dfd8
     \subsubsection{Hash signatures with unknown size}
e37613ad
     ClamAV 0.98 has also added support for hash signatures where the size
     is not known but the hash is. It is much more performance-efficient to
     use signatures with specific sizes, so be cautious when using this
     feature. For these cases, the '*' character can be used in the size
     field. To ensure proper backwards compatibility with older versions of
     ClamAV, these signatures must have a minimum functional level of 73 or
     higher. Signatures that use the wildcard size without this level set
     will be rejected as malformed.
     \begin{verbatim}
 Sample .hsb signature matching any size
 HashString:*:MalwareName:73
 
 Sample .msb signature matching any size
 *:PESectionHash:MalwareName:73
     \end{verbatim}
 
11239bbe
     \subsection{Body-based signatures}
     ClamAV stores all body-based signatures in a hexadecimal format. In this
     section by a hex-signature we mean a fragment of malware's body converted
     into a hexadecimal string which can be additionally extended using various
     wildcards.
65af7ee1
 
     \subsubsection{Hexadecimal format}
e5611f92
     You can use \verb+sigtool --hex-dump+ to convert any data into a hex-string:
65af7ee1
     \begin{verbatim}
 zolw@localhost:/tmp/test$ sigtool --hex-dump
 How do I look in hex?
 486f7720646f2049206c6f6f6b20696e206865783f0a
     \end{verbatim}
 
     \subsubsection{Wildcards}
aa16bd11
     ClamAV supports the following wildcards for hex-signatures:
65af7ee1
     \begin{itemize}
 	\item \verb+??+\\
 	Match any byte.
b0416c1b
 	\item \verb+a?+\\
ef9c6b65
 	Match a high nibble (the four high bits).\\ \textbf{IMPORTANT NOTE:}
e5611f92
 	The nibble matching is only available in libclamav with the
 	functionality level 17 and higher therefore please only use it with
 	.ndb signatures followed by ":17" (MinEngineFunctionalityLevel,
 	see \ref{ndb}).
2b56f64e
 	\item \verb+?a+\\
e5611f92
 	Match a low nibble (the four low bits).
65af7ee1
 	\item \verb+*+\\
 	Match any number of bytes.
 	\item \verb+{n}+\\
ef9c6b65
 	Match $n$ bytes.
65af7ee1
 	\item \verb+{-n}+\\
ef9c6b65
 	Match $n$ or less bytes.
65af7ee1
 	\item \verb+{n-}+\\
ef9c6b65
 	Match $n$ or more bytes.
 	\item \verb+{n-m}+\\
 	Match between $n$ and $m$ bytes ($m > n$).
e5611f92
 	\item \verb+HEXSIG[x-y]aa+ or \verb+aa[x-y]HEXSIG+\\
 	Match aa anchored to a hex-signature, see
8f0398f4
 	\url{https://bugzilla.clamav.net/show_bug.cgi?id=776} for
11239bbe
 	discussion and examples.
65af7ee1
     \end{itemize}
e5611f92
     The range signatures \verb+*+ and \verb+{}+ virtually separate
     a hex-signature into two parts, eg. \verb+aabbcc*bbaacc+ is treated
     as two sub-signatures \verb+aabbcc+ and \verb+bbaacc+ with any number
     of bytes between them. It's a requirement that each sub-signature
     includes a block of two static characters somewhere in its body.
0d2d1d17
     Note that there is one exception to this restriction; that is when
     the range wildcard is of the form \verb+{n}+ with \verb+n<128+. In this
     case, ClamAV uses an optimization and translates \verb+{n}+ to the string
     consisting of \verb+n ??+ character wildcards. Character wildcards do not
     divide hex signatures into two parts and so the two static character
     requirement does not apply.
aa16bd11
 
     \subsubsection{Character classes}
     ClamAV supports the following character classes for hex-signatures:
     \begin{itemize}
 	\item \verb+(B)+\\
 	Match word boundary (including file boundaries).
 	\item \verb+(L)+\\
 	Match CR, CRLF or file boundaries.
 	\item \verb+(W)+\\
 	Match a non-alphanumeric character.
     \end{itemize}
 
     \subsubsection{Alternate strings}
     \begin{itemize}
     \item Single-byte alternates (clamav-0.96)\\
       \verb+(aa|bb|cc|...)+ or \verb+!(aa|bb|cc|...)+\\
       Match a member from a set of bytes [aa, bb, cc, ...].
       \begin{itemize}
         \item Negation operation can be applied to match any non-member, assumed to be one-byte in length.
         \item Signature modifiers and wildcards cannot be applied.
       \end{itemize}
     \item Multi-byte fixed length alternates\\
       \verb+(aaaa|bbbb|cccc|...)+ or \verb+!(aaaa|bbbb|cccc|...)+\\
       Match a member from a set of multi-byte alternates [aaaa, bbbb, cccc, ...] of n-length.
       \begin{itemize}
         \item All set members must be the same length.
         \item Negation operation can be applied to match any non-member, assumed to be n-bytes in length (clamav-0.98.2).
         \item Signature modifiers and wildcards cannot be applied.
       \end{itemize}
     \item Generic alternates (clamav-0.99)\\
       \verb+(alt1|alt2|alt3|...)+\\
       Match a member from a set of alternates [alt1, alt2, alt3, ...] that can be of variable lengths.
       \begin{itemize}
         \item Negation operation cannot be applied.
         \item Signature modifiers and nibble wildcards [\verb+??, a?, ?a+] can be applied.
         \item Ranged wildcards [\verb+{n-m}+] are limited to a fixed range of less than 128 bytes [\verb+{1} -> {127}+].
       \end{itemize}
     \end{itemize}
     Note that using signature modifiers and wildcards classifies the alternate type to be a generic
     alternate. Thus single-byte alternates and multi-byte fixed length alternates can use signature
     modifiers and wildcards but will be classified as generic alternate. This means that negation
     cannot be applied in this situation and there is a slight performance impact.
65af7ee1
 
     \subsubsection{Basic signature format}
e5611f92
     The simplest (and now deprecated) signature format is:
65af7ee1
     \begin{verbatim}
 MalwareName=HexSignature
     \end{verbatim}
e5611f92
     ClamAV will scan the entire file looking for HexSignature. All
     signatures of this type must be placed inside \verb+*.db+ files.
65af7ee1
 
b0416c1b
     \subsubsection{Extended signature format}\label{ndb}
e5611f92
     The extended signature format allows for specification of additional
     information such as a target file type, virus offset or engine version,
     making the detection more reliable. The format is:
65af7ee1
     \begin{verbatim}
010d625d
 MalwareName:TargetType:Offset:HexSignature[:MinFL:[MaxFL]]
65af7ee1
     \end{verbatim}
e5611f92
     where \verb+TargetType+ is one of the following numbers specifying
     the type of the target file:
65af7ee1
     \begin{itemize}
 	\item 0 = any file
ef9c6b65
 	\item 1 = Portable Executable, both 32- and 64-bit.
eafdfbe2
 	\item 2 = OLE2 containers, including their specific macros. The OLE2
     format is primarily used by MS Office and MSI installation files.
ef9c6b65
 	\item 3 = HTML (normalized: whitespace transformed to spaces, tags/tag
 	attributes normalized, all lowercase), Javascript is normalized too:
 	all strings are normalized (hex encoding is decoded), numbers are
 	parsed and normalized, local variables/function names are normalized
 	to 'n001' format, argument to eval() is parsed as JS again,
 	unescape() is handled, some simple JS packers are handled,
 	output is whitespace normalized.
65af7ee1
 	\item 4 = Mail file
e5611f92
 	\item 5 = Graphics
65af7ee1
 	\item 6 = ELF
ef9c6b65
 	\item 7 = ASCII text file (normalized)
11239bbe
 	\item 8 = Unused
4c82fd9f
 	\item 9 = Mach-O files
077d2d59
 	\item 10 = PDF files
9100c3a1
 	\item 11 = Flash files
ae8024bd
 	\item 12 = Java class files
65af7ee1
     \end{itemize}
     And	\verb+Offset+ is an asterisk or a decimal number \verb+n+ possibly
e5611f92
     combined with a special modifier:
65af7ee1
     \begin{itemize}
 	\item \verb+*+ = any
 	\item \verb+n+ = absolute offset
 	\item \verb+EOF-n+ = end of file minus \verb+n+ bytes
     \end{itemize}
4c82fd9f
     Signatures for PE, ELF and Mach-O files additionally support:
65af7ee1
     \begin{itemize}
e5611f92
 	\item \verb#EP+n# = entry point plus n bytes (\verb#EP+0# for \verb+EP+)
65af7ee1
 	\item \verb#EP-n# = entry point minus n bytes
31970e48
 	\item \verb#Sx+n# = start of section \verb+x+'s (counted from 0)
65af7ee1
 	data plus \verb+n+ bytes
e067b3b4
 	\item \verb#SEx# = entire section \verb+x+ (offset must lie within section
 	boundaries)
65af7ee1
 	\item \verb#SL+n# = start of last section plus \verb+n+ bytes
     \end{itemize}
9b82f82b
     All the above offsets except \verb+*+ can be turned into
     \textbf{floating offsets} and represented as \verb+Offset,MaxShift+ where
     \verb+MaxShift+ is an unsigned integer. A floating offset will match every
     offset between \verb+Offset+ and \verb#Offset+MaxShift#, eg. \verb+10,5+
     will match all offsets from 10 to 15 and \verb#EP+n,y# will match all
     offsets from \verb#EP+n# to \verb#EP+n+y#. Versions of ClamAV older than
     0.91 will silently ignore the \verb+MaxShift+ extension and only use
     \verb+Offset+.\\
 
e5611f92
     \noindent
010d625d
     Optional \verb+MinFL+ and \verb+MaxFL+ parameters can restrict the signature
     to specific engine releases. All signatures in the extended format must be
     placed inside \verb+*.ndb+ files.
65af7ee1
 
ef9c6b65
     \subsubsection{Logical signatures}\label{ndb}
     Logical signatures allow combining of multiple signatures in extended
     format using logical operators. They can provide both more detailed and
     flexible pattern matching. The logical sigs are stored inside \verb+*.ldb+
     files in the following format:
     \begin{verbatim}
 SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;
 Subsig1;Subsig2;...
     \end{verbatim}
     where:
     \begin{itemize}
 	\item \verb+TargetDescriptionBlock+ provides information about the
9dd0bbc8
 	engine and target file with comma separated \verb+Arg:Val+ pairs.
 	For args where \verb+Val+ is a range, the minimum and maximum values
 	should be expressed as \verb+min-max+.
ef9c6b65
 	\item \verb+LogicalExpression+ specifies the logical expression
 	describing the relationship between \verb+Subsig0...SubsigN+.\\
 	\textbf{Basis clause:} 0,1,...,N decimal indexes are SUB-EXPRESSIONS
 	representing \verb+Subsig0, Subsig1,...,SubsigN+ respectively.\\
 	\textbf{Inductive clause:} if \verb+A+ and \verb+B+ are
 	SUB-EXPRESSIONS and \verb+X, Y+ are decimal numbers then
 	\verb+(A&B)+, \verb+(A|B)+, \verb+A=X+, \verb+A=X,Y+, \verb+A>X+,
 	\verb+A>X,Y+, \verb+A<X+ and \verb+A<X,Y+ are SUB-EXPRESSIONS
 	\item \verb+SubsigN+ is n-th subsignature in extended format possibly
 	preceded with an offset. There can be specified up to 64 subsigs.
     \end{itemize}
11239bbe
     Keywords used in \verb+TargetDescriptionBlock+:
     \begin{itemize}
 	\item \verb+Target:X+: Target file type
9dd0bbc8
 	\item \verb+Engine:X-Y+: Required engine functionality (range; 0.96).
 	Note that if the \verb+Engine+ keyword is used, it must be the first
 	one in the \verb+TargetDescriptionBlock+ for backwards compatibility
11239bbe
 	\item \verb+FileSize:X-Y+: Required file size (range in bytes; 0.96)
 	\item \verb+EntryPoint+: Entry point offset (range in bytes; 0.96)
 	\item \verb+NumberOfSections+: Required number of sections in executable (range; 0.96)
6014c2db
 	\item \verb+Container:CL_TYPE_*+: File type of the container which stores the scanned file.
 	  Specifying \verb+CL_TYPE_ANY+ matches on root objects only.
 	\item \verb+Intermediates:CL_TYPE_*>CL_TYPE_*+: File types of intermediate containers which stores the scanned file.
 	  Specify 1-16 file types separated by '\verb+>+' in top-down order ('\verb+>+' separator not needed for single file type),
 	  last type should be the immediate container for the malicious content. \verb+CL_TYPE_ANY+ can be used as a wildcard
2c51f248
 	  file type. (expr; 0.100.0)
694da931
 	\item \verb+IconGroup1+: Icon group name 1 from .idb signature Required engine functionality (range; 0.96)
 	\item \verb+IconGroup2+: Icon group name 2 from .idb signature Required engine functionality (range; 0.96)
11239bbe
     \end{itemize}
ef9c6b65
     Modifiers for subexpressions:
     \begin{itemize}
 	\item \verb+A=X+: If the SUB-EXPRESSION A refers to a single signature
 	then this signature must get matched exactly X times; if it refers to
 	a (logical) block of signatures then this block must generate exactly
 	X matches (with any of its sigs).
 	\item \verb+A=0+ specifies negation (signature or block of signatures
 	cannot be matched)
 	\item \verb+A=X,Y+: If the SUB-EXPRESSION A refers to a single signature
 	then this signature must be matched exactly X times; if it refers to
 	a (logical) block of signatures then this block must generate X matches
 	and at least Y different signatures must get matched.
 	\item \verb+A>X+: If the SUB-EXPRESSION A refers to a single signature
 	then this signature must get matched more than X times; if it refers to
 	a (logical) block of signatures then this block must generate more
 	than X matches (with any of its sigs).
 	\item \verb+A>X,Y+: If the SUB-EXPRESSION A refers to a single signature
 	then this signature must get matched more than X times; if it refers to
 	a (logical) block of signatures then this block must generate more than
 	X matches and at least Y different signatures must be matched.
 	\item \verb+A<X+ and \verb+A<X,Y+ as above with the change of "more"
 	to "less".
     \end{itemize}
     Examples:
     \begin{verbatim}
 Sig1;Target:0;(0&1&2&3)&(4|1);6b6f74656b;616c61;7a6f6c77;7374656
 6616e;deadbeef
 
 Sig2;Target:0;((0|1|2)>5,2)&(3|1);6b6f74656b;616c61;7a6f6c77;737
050f1036
 46566616e
ef9c6b65
 
 Sig3;Target:0;((0|1|2|3)=2)&(4|1);6b6f74656b;616c61;7a6f6c77;737
 46566616e;deadbeef
 
9dd0bbc8
 Sig4;Engine:51-255,Target:1;((0|1)&(2|3))&4;EP+123:33c06834f04100
ef9c6b65
 f2aef7d14951684cf04100e8110a00;S2+78:22??232c2d252229{-15}6e6573
cc934375
 (63|64)61706528;S3+50:68efa311c3b9963cb1ee8e586d32aeb9043e;f9c58
 dcf43987e4f519d629b103375;SL+550:6300680065005c0046006900
ef9c6b65
     \end{verbatim}
6f7cd348
 
     \subsubsection{Subsignature Modifiers}
     ClamAV (clamav-0.99) supports a number of additional subsignature modifiers
071958b8
     for logical signatures. This is done by specifying '::' followed by a number
7bdc0d90
     of characters representing the desired options. Signatures using subsignature
     modifiers require \verb+Engine:81-255+ for backwards-compatibility.
6f7cd348
     \begin{itemize}
     \item Case-Insensitive [\verb+i+]\\
       Specifying the \verb+i+ modifier causes ClamAV to match all alphabetic
       hex bytes as case-insensitive. All patterns in ClamAV are case-sensitive
       by default.
     \item Wide [\verb+w+]\\
       Specifying the \verb+w+ causes ClamAV to match all hex bytes encoded with
       two bytes per character. Note this simply interweaves each character with
       NULL characters and does not truly support UTF-16 characters. Wildcards for
       'wide' subsignatures are not treated as wide (i.e. there can be an odd number
       of intermittent characters). This can be combined with \verb+a+ to search for
       patterns in both wide and ascii.
     \item Fullword [\verb+f+]\\
       Match subsignature as a fullword (delimited by non-alphanumeric characters).
     \item Ascii [\verb+a+]\\
       Match subsignature as ascii characters. This can be combined with \verb+w+
       to search for patterns in both ascii and wide.
     \end{itemize}
     Examples:
     \begin{verbatim}
8c042547
 clamav-nocase-A;Engine:81-255,Target:0;0&1;41414141::i;424242424242::i
6f7cd348
     -matches 'AAAA'(nocase) and 'BBBBBB'(nocase)
 
8c042547
 clamav-fullword-A;Engine:81-255,Target:0;0&1;414141;68656c6c6f::f
6f7cd348
     -matches 'AAA' and 'hello'(fullword)
8c042547
 clamav-fullword-B;Engine:81-255,Target:0;0&1;414141;68656c6c6f::fi
6f7cd348
     -matches 'AAA' and 'hello'(fullword nocase)
 
8c042547
 clamav-wide-B2;Engine:81-255,Target:0;0&1;414141;68656c6c6f::wa
6f7cd348
     -matches 'AAA' and 'hello'(wide ascii)
8c042547
 clamav-wide-C0;Engine:81-255,Target:0;0&1;414141;68656c6c6f::iwfa
6f7cd348
     -matches 'AAA' and 'hello'(nocase wide fullword ascii)
     \end{verbatim}
 
050f1036
     \subsection{Special Subsignature Types}
aa16bd11
     \subsubsection{Macro subsignatures (clamav-0.96) : \textnormal{\texttt{\$\{min-max\}MACROID\$}}}
7bdc0d90
     Macro subsignatures are used to combine a number of existing extended
     signatures (\verb+.ndb+) into a on-the-fly generated alternate string logical
     signature (\verb+.ldb+). Signatures using macro subsignatures require \verb+Engine:51-255+
     for backwards-compatibility.\\\\
7f6410ba
     Example:
     \begin{verbatim}
       test.ldb:
7bdc0d90
         TestMacro;Engine:51-255,Target:0;0&1;616161;${6-7}12$
7f6410ba
 
       test.ndb:
         D1:0:$12:626262
         D2:0:$12:636363
         D3:0:$30:626264
     \end{verbatim}
     The example logical signature \verb+TestMacro+ is functionally equivalent to:\\
7bdc0d90
     \verb+TestMacro;Engine:51-255,Target:0;0;616161{3-4}(626262|636363)+
7f6410ba
     \begin{itemize}
 	\item \verb+MACROID+ points to a group of signatures; there can be at most 32 macro groups.
       \begin{itemize}
       \item In the example, \verb+MACROID+ is \verb+12+ and both \verb+D1+ and \verb+D2+ are members 
         of macro group \verb+12+. \verb+D3+ is a member of separate macro group \verb+30+.
       \end{itemize}
     \item \verb+{min-max}+ specifies the offset range at which one of the group signatures should match;
       the offset range is relative to the starting offset of the preceding subsignature. This means a
       macro subsignature cannot be the first subsignature.
       \begin{itemize}
       \item In the example, \verb+{min-max}+ is \verb+{6-7}+ and it is relative to the start of a \verb+616161+ match.
       \end{itemize}
050f1036
 	\item For more information and examples please see \url{https://wwws.clamav.net/bugzilla/show_bug.cgi?id=164}.
     \end{itemize}
aa16bd11
     \subsubsection{PCRE subsignatures (clamav-0.99) : \textnormal{\texttt{Trigger/PCRE/[Flags]}}}
7bdc0d90
     PCRE subsignatures are used within a logical signature (\verb+.ldb+) to specify regex matches
     that execute once triggered by a conditional based on preceding subsignatures. Signatures using
     PCRE subsignatures require \verb+Engine:81-255+ for backwards-compatibility.
050f1036
     \begin{itemize}
     \item \verb+Trigger+ is a required field that is a valid \verb+LogicalExpression+ and
c3ba529d
     may refer to any subsignatures that precede this subsignature. Triggers cannot be
     self-referential and cannot refer to subsequent subsignatures.
050f1036
     \item \verb+PCRE+ is the expression representing the regex to execute. \verb+PCRE+
a8474d2a
     must be delimited by '/' and usage of '/' within the expression need to be escaped.
ca3d8d59
     For backward compatibility, ';' within the expression must be expressed as '\verb+\x3B+'.
c3ba529d
     \verb+PCRE+ cannot be empty and (?UTF*) control sequence is not allowed. If debug is specified,
     named capture groups are displayed in a post-execution report.
050f1036
     \item \verb+Flags+ are a series of characters which affect the compilation and execution
     of \verb+PCRE+ within the PCRE compiler and the ClamAV engine. This field is optional.
 	\begin{itemize}
 	\item \verb+g [CLAMAV_GLOBAL]+ specifies to search for ALL matches of PCRE (default is to
c3ba529d
         search for first match). NOTE: INCREASES the time needed to run the PCRE.
         \item \verb+r [CLAMAV_ROLLING]+ specifies to use the given offset as the starting location
         to search for a match as opposed to the only location; applies to subsigs without maxshifts.
         By default, in order to facilatate normal ClamAV offset behavior, PCREs are auto-anchored
         (only attempt match on first offset); using the rolling option disables the auto-anchoring.
050f1036
 	\item \verb+e [CLAMAV_ENCOMPASS]+ specifies to CONFINE matching between the specified offset
c3ba529d
 	and maxshift; applies only when maxshift is specified. Note: DECREASES time needed to run the PCRE.
050f1036
 	\item \verb+i [PCRE_CASELESS]+
 	\item \verb+s [PCRE_DOTALL]+
 	\item \verb+m [PCRE_MULTILINE]+
 	\item \verb+x [PCRE_EXTENDED]+
 	\item \verb+A [PCRE_ANCHORED]+
 	\item \verb+E [PCRE_DOLLAR_ENODNLY]+
4a7a77ba
 	\item \verb+U [PCRE_UNGREEDY]+
050f1036
 	\end{itemize}
     \end{itemize}
     Examples:
     \begin{verbatim}
7bdc0d90
 Find.All.ClamAV;Engine:81-255,Target:0;1;6265676c6164697427736e6
 f7462797465636f6465;0/clamav/g
c3ba529d
 
7bdc0d90
 Find.ClamAV.OnlyAt.299;Engine:81-255,Target:0;2;7374756c747a6765
 7473;7063726572656765786c6f6c;299:0&1/clamav/
c3ba529d
 
7bdc0d90
 Find.ClamAV.StartAt.300;Engine:81-255,Target:0;3;616c61696e;6275
 6731393238;636c6f736564;300:0&1&2/clamav/r
c3ba529d
 
7bdc0d90
 Find.All.Encompassed.ClamAV;Engine:81-255,Target:0;3;77687961726
 56e2774;796f757573696e67;79617261;200,300:0&1&2/clamav/ge
c3ba529d
 
7bdc0d90
 Named.CapGroup.Pcre;Engine:81-255,Target:0;3;636f75727479617264;
 616c62756d;74657272696572;50:0&1&2/variable=(?<nilshell>.{16})en
 d/gr
b14795cd
 
7bdc0d90
 Firefox.TreeRange.UseAfterFree;Engine:81-255,Target:0,Engine:81-
 255;0&1&2;2e766965772e73656c656374696f6e;2e696e76616c69646174655
 3656c656374696f6e;0&1/\x2Eview\x2Eselection.*?\x2Etree\s*\x3D\s*
 null.*?\x2Einvalidate/smi
050f1036
 
7bdc0d90
 Firefox.IDB.UseAfterFree;Engine:81-255,Target:0;0&1;4944424b6579
 52616e6765;0/^\x2e(only|lowerBound|upperBound|bound)\x28.*?\x29.
 *?\x2e(lower|upper|lowerOpen|upperOpen)/smi
050f1036
 
7bdc0d90
 Firefox.boundElements;Engine:81-255,Target:0;0&1&2;6576656e742e6
 26f756e64456c656d656e7473;77696e646f772e636c6f7365;0&1/on(load|c
 lick)\s*=\s*\x22?window\.close\s*\x28/si
050f1036
     \end{verbatim}
11239bbe
 
010d625d
     \subsection{Icon signatures for PE files}
     ClamAV 0.96 includes an approximate/fuzzy icon matcher to help
     detecting malicious executables disguising themselves as innocent
     looking image files, office documents and the like.
 
     Icon matching is only triggered via .ldb signatures using the special
     attribute tokens \verb+IconGroup1+ or \verb+IconGroup2+. These identify
     two (optional) groups of icons defined in a .idb database file. The
     format of the .idb file is:
     \begin{verbatim}
 ICONNAME:GROUP1:GROUP2:ICON_HASH
     \end{verbatim}
     where:
     \begin{itemize}
 	\item \verb+ICON_NAME+ is a unique string identifier for a specific
 	icon,
 	\item \verb+GROUP1+ is a string identifier for the first group of
 	icons (\verb+IconGroup1+)
 	\item \verb+GROUP2+ is a string identifier for the second group of
 	icons (\verb+IconGroup2+),
 	\item \verb+ICON_HASH+ is a fuzzy hash of the icon image
     \end{itemize}
     The \verb+ICON_HASH+ field can be obtained from the debug output of
     libclamav. For example:
     \begin{verbatim}
 LibClamAV debug: ICO SIGNATURE:
 ICON_NAME:GROUP1:GROUP2:18e2e0304ce60a0cc3a09053a30000414100057e
 000afe0000e 80006e510078b0a08910d11ad04105e0811510f084e01040c080
 a1d0b0021000a39002a41
     \end{verbatim}
 
     \subsection{Signatures for Version Information metadata in PE files}
     Starting with ClamAV 0.96 it is possible to easily match certain
     information built into PE files (executables and dynamic link libraries).
     Whenever you lookup the properties of a PE executable file in windows,
     you are presented with a bunch of details about the file itself.
 
     These info are stored in a special area of the file resources which goes
     under the name of \verb+VS_VERSION_INFORMATION+ (or versioninfo for short).
     It is divided into 2 parts. The first part (which is rather uninteresting)
     is really a bunch of numbers and flags indicating the product and file
     version. It was originally intended for use with installers which, after
     parsing it, should be able to determine whether a certain executable or
     library are to be upgraded/overwritten or are already up to date. Suffice
     to say, this approach never really worked and is generally never used.
 
     The second block is much more interesting: it is a simple list of key/value
     strings, intended for user information and completely ignored by the OS.
     For example, if you look at ping.exe you can see the company being \emph{"Microsoft
     Corporation"}, the description \emph{"TCP/IP Ping command"}, the internal name
     \emph{"ping.exe"} and so on... Depending on the OS version, some keys may be given
     peculiar visibility in the file properties dialog, however they are internally
     all the same.
 
     To match a versioninfo key/value pair, the special file offset anchor \verb+VI+ was
     introduced.  This is similar to the other anchors (like \verb+EP+ and \verb+SL+)
     except that, instead of matching the hex pattern against a single offset, it checks
     it against each and every key/value pair in the file. The \verb+VI+ token doesn't
     need nor accept a \verb#+/-# offset like e.g. \verb#EP+1#. As for the hex signature
     itself, it's just the utf16 dump of the key and value. Only the \verb+??+ and
     \verb+(aa|bb)+ wildcards are allowed in the signature. Usually, you don't need to
     bother figuring it out: each key/value pair together with the corresponding VI-based
     signature is printed by \verb+clamscan+ when the \verb+--debug+ option is given.
 
     For example \verb+clamscan --debug freecell.exe+ produces:
     \begin{verbatim}
 [...]
 Recognized MS-EXE/DLL file
 in cli_peheader
 versioninfo_cb: type: 10, name: 1, lang: 410, rva: 9608
 cli_peheader: parsing version info @ rva 9608 (1/1)
 VersionInfo (d2de): 'CompanyName'='Microsoft Corporation' -
 VI:43006f006d00700061006e0079004e0061006d006500000000004d006900
 630072006f0073006f0066007400200043006f00720070006f0072006100740
 069006f006e000000
 VersionInfo (d32a): 'FileDescription'='Entertainment Pack
 FreeCell Game' - VI:460069006c006500440065007300630072006900700
 0740069006f006e000000000045006e007400650072007400610069006e006d
 0065006e00740020005000610063006b0020004600720065006500430065006
 c006c002000470061006d0065000000
 VersionInfo (d396): 'FileVersion'='5.1.2600.0 (xpclient.010817
 -1148)' - VI:460069006c006500560065007200730069006f006e00000000
 0035002e0031002e0032003600300030002e003000200028007800700063006
 c00690065006e0074002e003000310030003800310037002d00310031003400
 380029000000
 VersionInfo (d3fa): 'InternalName'='freecell' - VI:49006e007400
 650072006e0061006c004e0061006d006500000066007200650065006300650
 06c006c000000
 VersionInfo (d4ba): 'OriginalFilename'='freecell' - VI:4f007200
 6900670069006e0061006c00460069006c0065006e0061006d0065000000660
 0720065006500630065006c006c000000
 VersionInfo (d4f6): 'ProductName'='Sistema operativo Microsoft
 Windows' - VI:500072006f0064007500630074004e0061006d00650000000
 000530069007300740065006d00610020006f00700065007200610074006900
 76006f0020004d006900630072006f0073006f0066007400ae0020005700690
 06e0064006f0077007300ae000000
 VersionInfo (d562): 'ProductVersion'='5.1.2600.0' - VI:50007200
 6f006400750063007400560065007200730069006f006e00000035002e00310
 02e0032003600300030002e0030000000
 [...]
     \end{verbatim}
 Although VI-based signatures are intended for use in logical signatures you can test them
 using ordinary \verb+.ndb+ files. For example:
     \begin{verbatim}
 my_test_vi_sig:1:VI:paste_your_hex_sig_here
     \end{verbatim}
 Final note. If you want to decode a VI-based signature into a human readable form you can use:
     \begin{verbatim}
 echo hex_string | xxd -r -p | strings -el
     \end{verbatim}
 For example:
     \begin{verbatim}
 $ echo 460069006c0065004400650073006300720069007000740069006f006e
 000000000045006e007400650072007400610069006e006d0065006e007400200
 05000610063006b0020004600720065006500430065006c006c00200047006100
 6d0065000000 | xxd -r -p | strings -el
 FileDescription
 Entertainment Pack FreeCell Game
     \end{verbatim}
 
76bca4c8
     \subsection{Trusted and Revoked Certificates}
     Clamav 0.98 checks signed PE files for certificates and verifies each
     certificate in the chain against a database of trusted and revoked
ebddb6a6
     certificates. The signature format is
76bca4c8
 \begin{verbatim}
5cc4cb86
 Name;Trusted;Subject;Serial;Pubkey;Exponent;CodeSign;TimeSign;CertSign;
 NotBefore;Comment[;minFL[;maxFL]]
76bca4c8
 \end{verbatim}
     where the corresponding fields are:
     \begin{itemize}
         \item \verb+Name:+ name of the entry
         \item \verb+Trusted:+ bit field, specifying whether the cert is
             trusted. 1 for trusted. 0 for revoked
         \item \verb+Subject:+ sha1 of the Subject field in hex
5cc4cb86
         \item \verb+Serial:+ the serial number as clamscan --debug --verbose
             reports
76bca4c8
         \item \verb+Pubkey:+ the public key in hex
         \item \verb+Exponent:+ the exponent in hex. Currently ignored and
             hardcoded to 010001 (in hex)
         \item \verb+CodeSign:+ bit field, specifying whether this cert
             can sign code. 1 for true, 0 for false
         \item \verb+TimeSign:+ bit field. 1 for true, 0 for false
5cc4cb86
         \item \verb+CertSign:+ bit field, specifying whether this cert
             can sign other certs. 1 for true, 0 for false
76bca4c8
         \item \verb+NotBefore:+ integer, cert should not be added before
             this variable. Defaults to 0 if left empty
         \item \verb+Comment:+ comments for this entry
     \end{itemize}
c05ac24f
     The signatures for certs are stored inside \verb+.crb+ files.
76bca4c8
 
11239bbe
     \subsection{Signatures based on container metadata}
     ClamAV 0.96 allows creating generic signatures matching files stored
     inside different container types which meet specific conditions.
     The signature format is
 \begin{verbatim}
010d625d
 VirusName:ContainerType:ContainerSize:FileNameREGEX:
 FileSizeInContainer:FileSizeReal:IsEncrypted:FilePos:
 Res1:Res2[:MinFL[:MaxFL]]
11239bbe
 \end{verbatim}
     where the corresponding fields are:
     \begin{itemize}
 	\item \verb+VirusName:+ Virus name to be displayed when signature matches
 	\item \verb+ContainerType:+ one of \verb+CL_TYPE_ZIP+, \verb+CL_TYPE_RAR+,
010d625d
 	\verb+CL_TYPE_ARJ+,\\
80b82f1c
 	\verb+CL_TYPE_MSCAB+, \verb+CL_TYPE_7Z+, \verb+CL_TYPE_MAIL+, \verb+CL_TYPE_(POSIX|OLD)_TAR+,\\
11239bbe
 	\verb+CL_TYPE_CPIO_(OLD|ODC|NEWC|CRC)+ or \verb+*+ to match
 	any of the container types listed here
 	\item \verb+ContainerSize:+ size of the container file itself (eg. size of
 	the zip archive) specified in bytes as absolute value or range \verb+x-y+
 	\item \verb+FileNameREGEX:+ regular expression describing name of the target file
 	\item \verb+FileSizeInContainer:+ usually compressed size; for MAIL, TAR and CPIO ==
 	\verb+FileSizeReal+; specified in bytes as absolute value or range
 	\item \verb+FileSizeReal:+ usually uncompressed size; for MAIL, TAR and CPIO ==
 	\verb+FileSizeInContainer+; absolute value or range
 	\item \verb+IsEncrypted+: 1 if the target file is encrypted, 0 if it's not and
 	\verb+*+ to ignore
 	\item \verb+FilePos+: file position in container (counting from 1); absolute value
 	or range
 	\item \verb+Res1+: when \verb+ContainerType+ is \verb+CL_TYPE_ZIP+ or
 	\verb+CL_TYPE_RAR+ this field is treated as a CRC sum of the target file
 	specified in hexadecimal format; for other container types it's ignored
 	\item \verb+Res2+: not used as of ClamAV 0.96
     \end{itemize}
     The signatures for container files are stored inside \verb+.cdb+ files.
ef9c6b65
 
11239bbe
     \subsection{Signatures based on ZIP/RAR metadata (obsolete)}
     The (now obsolete) archive metadata signatures can be only applied
     to ZIP and RAR files and have the following format:
65af7ee1
 \begin{verbatim}
010d625d
 virname:encrypted:filename:normal size:csize:crc32:cmethod:
 fileno:max depth
65af7ee1
 \end{verbatim}
e5611f92
     where the corresponding fields are:
65af7ee1
     \begin{itemize}
 	\item Virus name
 	\item Encryption flag (1 -- encrypted, 0 -- not encrypted)
e4453e9f
 	\item File name (this is a regular expression - * to ignore)
65af7ee1
 	\item Normal (uncompressed) size (* to ignore)
 	\item Compressed size (* to ignore)
 	\item CRC32 (* to ignore)
 	\item Compression method (* to ignore)
 	\item File position in archive (* to ignore)
 	\item Maximum number of nested archives (* to ignore)
     \end{itemize}
e5611f92
     The database file should have the extension of \verb+.zmd+ or
     \verb+.rmd+ for zip or rar metadata respectively.
65af7ee1
 
e5611f92
     \subsection{Whitelist databases}
65af7ee1
     To whitelist a specific file use the MD5 signature format and place
e37613ad
     it inside a database file with the extension of \verb+.fp+.
     To whitelist a specific file with the SHA1 or SHA256 file hash signature
     format, place the signature inside a database file with the extension
     of \verb+.sfp+.\\
e5611f92
 
     \noindent
11239bbe
     To whitelist a specific signature from the database you just add
     its name into a local file called local.ign2 stored inside the
     database directory. You can additionally follow the signature name
     with the MD5 of the entire database entry for this signature, eg:
e5611f92
 \begin{verbatim}
11239bbe
 Eicar-Test-Signature:bc356bae4c42f19a3de16e333ba3569c
e5611f92
 \end{verbatim}
11239bbe
     In such a case, the signature will no longer be whitelisted when
     its entry in the database gets modified (eg. the signature gets
     updated to avoid false alerts).
65af7ee1
 
     \subsection{Signature names}
e5611f92
     ClamAV uses the following prefixes for signature names:
65af7ee1
     \begin{itemize}
 	\item \emph{Worm} for Internet worms
 	\item \emph{Trojan} for backdoor programs
2702014b
 	\item \emph{Adware} for adware
 	\item \emph{Flooder} for flooders
         \item \emph{HTML} for HTML files
         \item \emph{Email} for email messages
         \item \emph{IRC} for IRC trojans
65af7ee1
 	\item \emph{JS} for Java Script malware
2702014b
 	\item \emph{PHP} for PHP malware
 	\item \emph{ASP} for ASP malware
65af7ee1
 	\item \emph{VBS} for VBS malware
2702014b
 	\item \emph{BAT} for BAT malware
65af7ee1
 	\item \emph{W97M}, \emph{W2000M} for Word macro viruses
 	\item \emph{X97M}, \emph{X2000M} for Excel macro viruses
e5611f92
 	\item \emph{O97M}, \emph{O2000M} for generic Office macro viruses
65af7ee1
 	\item \emph{DoS} for Denial of Service attack software
2702014b
 	\item \emph{DOS} for old DOS malware
65af7ee1
 	\item \emph{Exploit} for popular exploits
 	\item \emph{VirTool} for virus construction kits
 	\item \emph{Dialer} for dialers
 	\item \emph{Joke} for hoaxes
     \end{itemize}
2702014b
     Important rules of the naming convention:
     \begin{itemize}
11239bbe
 	\item always use a -zippwd suffix in the malware name for signatures
 	      of type zmd,
31970e48
 	\item always use a -rarpwd suffix in the malware name for signatures
2702014b
 	      of type rmd,
 	\item only use alphanumeric characters, dash (-), dot (.), underscores
 	      (\_) in malware names, never use space, apostrophe or quote mark.
     \end{itemize}
65af7ee1
 
ebddb6a6
     \subsection{Using YARA rules in ClamAV}
     ClamAV version 0.99 and above can process YARA rules. ClamAV virus database file names ending
     with ``.yar'' or ``.yara'' are parsed as yara rule files. The link to the YARA rule grammar
     documentation may be found at http://plusvic.github.io/yara/. There are currently a few 
     limitations on using YARA rules within ClamAV:
     \begin{itemize}
         \item YARA modules are not yet supported by ClamAV. This includes the ``import''
         keyword and any YARA module-specific keywords.
         \item Global rules(``global'' keyword) are not supported by ClamAV.
         \item External variables(``contains'' and ``matches'' keywords) are not supported.
         \item YARA rules pre-compiled with the \emph{yarac} command are not supported.
         \item As in the ClamAV logical and extended signature formats, YARA strings and segments
         of strings separated by wild cards must represent at least two octets of data.
         \item There is a maximum of 64 strings per YARA rule.
         \item YARA rules in ClamAV must contain at least one literal, hexadecimal, or
         regular expression string.
     \end{itemize}
26f4fc8f
     In addition, there are a few more ClamAV processing modes that may affect the outcome of YARA rules.
     \begin{itemize}
         \item \emph{File decomposition and decompression} - Since ClamAV uses file decomposition and decompression
                to find viruses within de-archived and uncompressed inner files, YARA rules executed by ClamAV
                will match against these files as well.
         \item \emph{Normalization} - By default, ClamAV normalizes HTML, JavaScript, and ASCII text files.
               YARA rules in ClamAV will match against the normalized result. The effects of normalization
               of these file types may be captured using \verb+clamscan --leave-temps --tempdir=mytempdir+. 
               YARA rules may then be written using the normalized file(s) found in \verb+mytempdir+.
2c51f248
               Alternatively, starting with ClamAV 0.100.0, \verb+clamscan --normalize=no+ will prevent
26f4fc8f
               normalization and only scan the raw file. To obtain similar behavior prior to 0.99.2, use
               \verb+clamscan --scan-html=no+. The corresponding parameters for clamd.conf are \verb+Normalize+
               and \verb+ScanHTML+.
         \item \emph{YARA conditions driven by string matches} - All YARA conditions are driven by string matches in
               ClamAV. This saves from executing every YARA rule on every file. Any YARA condition may be augmented
               with a string match clause which is always true, such as:
         \begin{verbatim}
           rule CheckFileSize
           {
             strings:
               $abc = "abc"
             condition:
               ($abc or not $abc) and filesize < 200KB
           }
         \end{verbatim}
         This will ensure that the YARA condition always performs the desired action (checking the file size in this example),
     \end{itemize}
6e407f20
     
85a854ae
     \subsection{Passwords for archive files [experimental]}
44abdce1
     ClamAV 0.99 allows for users to specify password attempts for certain password-compatible archives.
     Passwords will be attempted in order of appearance in the password signature file which use the extension
     of \verb+.pwdb+. If no passwords apply or none are provided, ClamAV will default to the original
     behavior of parsing the file.
85a854ae
     Currently, as of ClamAV 0.99 [flevel 81], only \verb+.zip+ archives using the traditional PKWARE encryption 
     are supported.
44abdce1
     The signature format is
 \begin{verbatim}
 SignatureName;TargetDescriptionBlock;PWStorageType;Password
 \end{verbatim}
     where:
     \begin{itemize}
         \item \verb+SignatureName+: name to be displayed during debug when a password is successful
         \item \verb+TargetDescriptionBlock+: provides information about the engine and target file with comma separated Arg:Val pairs
         \begin{itemize}
              \item \verb+Engine:X-Y+: Required engine functionality
              \item \verb+Container:CL_TYPE_*+: File type of applicable containers
         \end{itemize}
         \item \verb+PWStorageType+: determines how the password field is parsed
         \begin{itemize}
             \item 0 = cleartext
             \item 1 = hex
         \end{itemize}
         \item \verb+Password+: value used in password attempt
     \end{itemize}
     The signatures for password attempts are stored inside \verb+.pwdb+ files.
 
65af7ee1
     \section{Special files}
 
     \subsection{HTML}
e5611f92
     ClamAV contains a special HTML normalisation code which helps to detect
65af7ee1
     HTML exploits. Running \verb+sigtool --html-normalise+ on a HTML file
e5611f92
     should generate the following files:
65af7ee1
     \begin{itemize}
ef9c6b65
 	\item nocomment.html - the file is normalized, lower-case, with all
166174bc
 	comments and superfluous white space removed
e5611f92
 	\item notags.html - as above but with all HTML tags removed
65af7ee1
     \end{itemize}
     The code automatically decodes JScript.encode parts and char ref's (e.g.
     \verb+&#102;+). You need to create a signature against one of the created
e5611f92
     files. To eliminate potential false positive alerts the target type should
     be set to 3.
 
     \subsection{Text files}
ef9c6b65
     Similarly to HTML all ASCII text files get normalized (converted
166174bc
     to lower-case, all superfluous white space and control characters removed,
e5611f92
     etc.) before scanning. Use \verb+clamscan --leave-temps+ to obtain
ef9c6b65
     a normalized file then create a signature with the target type 7.
65af7ee1
 
     \subsection{Compressed Portable Executable files}
e5611f92
     If the file is compressed with UPX, FSG, Petite or other PE packer
     supported by libclamav, run \verb+clamscan+ with
     \verb+--debug --leave-temps+. Example output for a FSG compressed file:
65af7ee1
     \begin{verbatim}
e5611f92
 LibClamAV debug: UPX/FSG/MEW: empty section found - assuming compression
 LibClamAV debug: FSG: found old EP @119e0
 LibClamAV debug: FSG: Unpacked and rebuilt executable saved in
 /tmp/clamav-f592b20f9329ac1c91f0e12137bcce6c
65af7ee1
     \end{verbatim}
e5611f92
     Next create a type 1 signature for \verb+/tmp/clamav-f592b20f9329ac1c91f0e12137bcce6c+
65af7ee1
 
 \end{document}