Browse code

Restructured the signature writing documentation, and supplemented it with dconf documentation, file type magic documentation, and references for ClamAV functionality levels.

Micah Snyder authored on 2018/11/08 06:24:28
Showing 22 changed files
... ...
@@ -15,7 +15,6 @@ Table Of Contents
15 15
 5. [ClamAV Developer Tips and Tricks](UserManual/development.md)
16 16
 6. [Build \[lib\]ClamAV Into Your Programs](UserManual/libclamav.md)
17 17
 7. [Writing ClamAV Signatures](UserManual/Signatures.md)
18
-8. [Writing ClamAV Phishing Signatures](UserManual/PhishSigs.md)
19 18
 
20 19
 -----
21 20
 
22 21
deleted file mode 100644
... ...
@@ -1,89 +0,0 @@
1
-# ClamAV File Types
2
-
3
-ClamAV maintains it's own file typing format and assigns these types using either:
4
-
5
-- Evaluation of a unique sequence of bytes at the start of a file (file type magic).
6
-- File type indicators when parsing container files.
7
-  - For example:
8
-    CL_TYPE_SCRIPT may be assigned to data contained in a PDF when the PDF indicates that a stream of bytes is "Javascript"
9
-- File type determination based on the names or characteristics contained within the file.
10
-  - For example:
11
-    CL_TYPE_OOXML_WORD may be assigned to a Zip file containing files with specific names.
12
-
13
-## CL_TYPE's
14
-
15
-ClamAV Types are prefixed with `CL_TYPE_`.  The following is an exhaustive list of all current CL_TYPE's.
16
-
17
-| CL_TYPE                | Description                                                                       |
18
-|------------------------|-----------------------------------------------------------------------------------|
19
-| `CL_TYPE_7Z`           | 7-Zip Archive                                                                     |
20
-| `CL_TYPE_7ZSFX`        | Self-Extracting 7-Zip Archive                                                     |
21
-| `CL_TYPE_APM`          | Disk Image - Apple Partition Map                                                  |
22
-| `CL_TYPE_ARJ`          | ARJ Archive                                                                       |
23
-| `CL_TYPE_ARJSFX`       | Self-Extracting ARJ Archive                                                       |
24
-| `CL_TYPE_AUTOIT`       | AutoIt Automation Executable                                                      |
25
-| `CL_TYPE_BINARY_DATA`  | binary data                                                                       |
26
-| `CL_TYPE_BINHEX`       | BinHex Macintosh 7-bit ASCII email attachment encoding                            |
27
-| `CL_TYPE_BZ`           | BZip Compressed File                                                              |
28
-| `CL_TYPE_CABSFX`       | Self-Extracting Microsoft CAB Archive                                             |
29
-| `CL_TYPE_CPIO_CRC`     | CPIO Archive (CRC)                                                                |
30
-| `CL_TYPE_CPIO_NEWC`    | CPIO Archive (NEWC)                                                               |
31
-| `CL_TYPE_CPIO_ODC`     | CPIO Archive (ODC)                                                                |
32
-| `CL_TYPE_CPIO_OLD`     | CPIO Archive (OLD, Little Endian or Big Endian)                                   |
33
-| `CL_TYPE_CRYPTFF`      | Files encrypted by CryptFF malware                                                |
34
-| `CL_TYPE_DMG`          | Apple DMG Archive                                                                 |
35
-| `CL_TYPE_ELF`          | ELF Executable (Linux/Unix program or library)                                    |
36
-| `CL_TYPE_GPT`          | Disk Image - GUID Partition Table                                                 |
37
-| `CL_TYPE_GRAPHICS`     | TIFF (Little Endian or Big Endian)                                                |
38
-| `CL_TYPE_GZ`           | GZip Compressed File                                                              |
39
-| `CL_TYPE_HTML_UTF16`   | Wide-Character / UTF16 encoded HTML                                               |
40
-| `CL_TYPE_HTML`         | HTML data                                                                         |
41
-| `CL_TYPE_HWP3`         | Hangul Word Processor (3.X)                                                       |
42
-| `CL_TYPE_HWPOLE2`      | Hangul Word Processor embedded OLE2                                               |
43
-| `CL_TYPE_INTERNAL`     | Internal properties                                                               |
44
-| `CL_TYPE_ISHIELD_MSI`  | Windows Install Shield MSI installer                                              |
45
-| `CL_TYPE_ISO9660`      | ISO 9660 file system for optical disc media                                       |
46
-| `CL_TYPE_JAVA`         | Java Class File                                                                   |
47
-| `CL_TYPE_LNK`          | Microsoft Windows Shortcut File                                                   |
48
-| `CL_TYPE_MACHO_UNIBIN` | Universal Binary/Java Bytecode                                                    |
49
-| `CL_TYPE_MACHO`        | Apple/NeXTSTEP Mach-O Executable file format                                      |
50
-| `CL_TYPE_MAIL`         | Email file                                                                        |
51
-| `CL_TYPE_MBR`          | Disk Image - Master Boot Record                                                   |
52
-| `CL_TYPE_MHTML`        | MHTML Saved Web Page                                                              |
53
-| `CL_TYPE_MSCAB`        | Microsoft CAB Archive                                                             |
54
-| `CL_TYPE_MSCHM`        | Microsoft CHM help archive                                                        |
55
-| `CL_TYPE_MSEXE`        | Microsoft EXE / DLL Executable file                                               |
56
-| `CL_TYPE_MSOLE2`       | Microsoft OLE2 Container file                                                     |
57
-| `CL_TYPE_MSSZDD`       | Microsoft Compressed EXE                                                          |
58
-| `CL_TYPE_NULSFT`       | NullSoft Scripted Installer program                                               |
59
-| `CL_TYPE_OLD_TAR`      | TAR archive (old)                                                                 |
60
-| `CL_TYPE_OOXML_HWP`    | Hangul Office Open Word Processor (5.X)                                           |
61
-| `CL_TYPE_OOXML_PPT`    | Microsoft Office Open XML PowerPoint                                              |
62
-| `CL_TYPE_OOXML_WORD`   | Microsoft Office Open Word 2007+                                                  |
63
-| `CL_TYPE_OOXML_XL`     | Microsoft Office Open Excel 2007+                                                 |
64
-| `CL_TYPE_PART_HFSPLUS` | Apple HFS+ partition                                                              |
65
-| `CL_TYPE_PDF`          | Adobe PDF document                                                                |
66
-| `CL_TYPE_POSIX_TAR`    | TAR archive                                                                       |
67
-| `CL_TYPE_PS`           | Postscript                                                                        |
68
-| `CL_TYPE_RAR`          | RAR Archive                                                                       |
69
-| `CL_TYPE_RARSFX`       | Self-Extracting RAR Archive                                                       |
70
-| `CL_TYPE_RIFF`         | Resource Interchange File Format container formatted file                         |
71
-| `CL_TYPE_RTF`          | Rich Text Format document                                                         |
72
-| `CL_TYPE_SCRENC`       | Files encrypted by ScrEnc malware                                                 |
73
-| `CL_TYPE_SCRIPT`       | Generic type for scripts that don't have their own type (Javascript, Python, etc) |
74
-| `CL_TYPE_SIS`          | Symbian OS Software Installation Script Archive                                   |
75
-| `CL_TYPE_SWF`          | Adobe Flash File (LZMA, Zlib, or uncompressed)                                    |
76
-| `CL_TYPE_TEXT_ASCII`   | ASCII text                                                                        |
77
-| `CL_TYPE_TEXT_UTF16BE` | UTF-16BE text                                                                     |
78
-| `CL_TYPE_TEXT_UTF16LE` | UTF-16LE text                                                                     |
79
-| `CL_TYPE_TEXT_UTF8`    | UTF-8 text                                                                        |
80
-| `CL_TYPE_TNEF`         | Microsoft Outlook & Exchange email attachment format                              |
81
-| `CL_TYPE_UUENCODED`    | UUEncoded (Unix-to-Unix) binary file (Unix email attachment format)               |
82
-| `CL_TYPE_XAR`          | XAR Archive                                                                       |
83
-| `CL_TYPE_XDP`          | Adobe XDP - Embedded PDF                                                          |
84
-| `CL_TYPE_XML_HWP`      | Hangul Word Processor XML (HWPML) Document                                        |
85
-| `CL_TYPE_XML_WORD`     | Microsoft Word 2003 XML Document                                                  |
86
-| `CL_TYPE_XML_XL`       | Microsoft Excel 2003 XML Document                                                 |
87
-| `CL_TYPE_XZ`           | XZ Archive                                                                        |
88
-| `CL_TYPE_ZIP`          | Zip Archive                                                                       |
89
-| `CL_TYPE_ZIPSFX`       | Self-Extracting Zip Archive                                                       |
... ...
@@ -2,7 +2,7 @@
2 2
 
3 3
 Below are the steps for installing ClamAV from source on Debian and Ubuntu Linux.
4 4
 
5
-## Install prerequisitesaa
5
+## Install prerequisites
6 6
 
7 7
 1. Install ClamAV dependencies
8 8
     1. Install the developer tools
... ...
@@ -2,7 +2,7 @@
2 2
 
3 3
 Below are the steps for installing ClamAV from source on Debian and Ubuntu Linux.
4 4
 
5
-## Install prerequisitesaa
5
+## Install prerequisites
6 6
 
7 7
 1. Install ClamAV dependencies
8 8
     1. Install the developer tools
9 9
deleted file mode 100644
... ...
@@ -1,682 +0,0 @@
1
-# PhishSigs
2
-
3
-Table of Contents
4
-- [PhishSigs](#phishsigs)
5
-- [Database file format](#database-file-format)
6
-    - [PDB format](#pdb-format)
7
-    - [GDB format](#gdb-format)
8
-    - [WDB format](#wdb-format)
9
-    - [Hints](#hints)
10
-    - [Examples of PDB signatures](#examples-of-pdb-signatures)
11
-    - [Examples of WDB signatures](#examples-of-wdb-signatures)
12
-    - [Example for how the URL extractor works](#example-for-how-the-url-extractor-works)
13
-    - [How matching works](#how-matching-works)
14
-        - [RealURL, displayedURL concatenation](#realurl-displayedurl-concatenation)
15
-        - [What happens when a match is found](#what-happens-when-a-match-is-found)
16
-        - [Extraction of realURL, displayedURL from HTML tags](#extraction-of-realurl-displayedurl-from-html-tags)
17
-        - [Example](#example)
18
-    - [Simple patterns](#simple-patterns)
19
-    - [Regular expressions](#regular-expressions)
20
-    - [Flags](#flags)
21
-- [Introduction to regular expressions](#introduction-to-regular-expressions)
22
-    - [Special characters](#special-characters)
23
-    - [Character classes](#character-classes)
24
-    - [Escaping](#escaping)
25
-    - [Alternation](#alternation)
26
-    - [Optional matching, and repetition](#optional-matching-and-repetition)
27
-    - [Groups](#groups)
28
-- [How to create database files](#how-to-create-database-files)
29
-    - [How to create and maintain the whitelist (daily.wdb)](#how-to-create-and-maintain-the-whitelist-dailywdb)
30
-    - [How to create and maintain the domainlist (daily.pdb)](#how-to-create-and-maintain-the-domainlist-dailypdb)
31
-    - [Dealing with false positives, and undetected phishing mails](#dealing-with-false-positives-and-undetected-phishing-mails)
32
-        - [False positives](#false-positives)
33
-        - [Undetected phish mails](#undetected-phish-mails)
34
-
35
-# Database file format
36
-
37
-## PDB format
38
-
39
-This file contains urls/hosts that are target of phishing attempts. It
40
-contains lines in the following format:
41
-
42
-```
43
-    R[Filter]:RealURL:DisplayedURL[:FuncLevelSpec]
44
-    H[Filter]:DisplayedHostname[:FuncLevelSpec]
45
-```
46
-
47
-- `R`
48
-
49
-  regular expression, for the concatenated URL
50
-
51
-- `H`
52
-
53
-  matches the `DisplayedHostname` as a simple pattern (literally, no regular expression)
54
-
55
-  - the pattern can match either the full hostname
56
-
57
-  - or a subdomain of the specified hostname
58
-
59
-  - to avoid false matches in case of subdomain matches, the engine checks that there is a dot(`.`) or a space(` `) before the matched portion
60
-
61
-- `Filter`
62
-
63
-  is ignored for R and H for compatibility reasons
64
-
65
-- `RealURL`
66
-
67
-  is the URL the user is sent to, example: *href* attribute of an html anchor (*\<a\> tag*)
68
-
69
-- `DisplayedURL`
70
-
71
-  is the URL description displayed to the user, where its *claimed* they are sent, example: contents of an html anchor (*\<a\> tag*)
72
-
73
-- `DisplayedHostname`
74
-
75
-  is the hostname portion of the DisplayedURL
76
-
77
-- `FuncLevelSpec`
78
-
79
-  an (optional) functionality level, 2 formats are possible:
80
-
81
-  - `minlevel` all engines having functionality level \>= `minlevel` will load this line
82
-
83
-  - `minlevel-maxlevel` engines with functionality level \(>=\) `minlevel`, and \(<\) `maxlevel` will load this line
84
-
85
-## GDB format
86
-
87
-This file contains URL hashes in the following format:
88
-
89
-    S:P:HostPrefix[:FuncLevelSpec]
90
-    S:F:Sha256hash[:FuncLevelSpec]
91
-    S1:P:HostPrefix[:FuncLevelSpec]
92
-    S1:F:Sha256hash[:FuncLevelSpec]
93
-    S2:P:HostPrefix[:FuncLevelSpec]
94
-    S2:F:Sha256hash[:FuncLevelSpec]
95
-    S:W:Sha256hash[:FuncLevelSpec]
96
-
97
-- `S:`
98
-
99
-  These are hashes for Google Safe Browsing - malware sites, and should not be used for other purposes.
100
-
101
-- `S2:`
102
-
103
-  These are hashes for Google Safe Browsing - phishing sites, and should not be used for other purposes.
104
-
105
-- `S1:`
106
-
107
-  Hashes for blacklisting phishing sites. Virus name: Phishing.URL.Blacklisted
108
-
109
-- `S:W:`
110
-
111
-  Locally whitelisted hashes.
112
-
113
-- `HostPrefix`
114
-
115
-  4-byte prefix of the sha256 hash of the last 2 or 3 components of the hostname. If prefix doesn’t match, no further lookups are performed.
116
-
117
-- `Sha256hash`
118
-
119
-  sha256 hash of the canonicalized URL, or a sha256 hash of its prefix/suffix according to the Google Safe Browsing “Performing Lookups” rules. There should be a corresponding `:P:HostkeyPrefix` entry for the hash to be taken into consideration.
120
-
121
-To see which hash/URL matched, look at the `clamscan --debug` output, and look for the following strings: `Looking up hash`, `prefix matched`, and `Hash matched`. Local whitelisting of .gdb entries can be done by creating a local.gdb file, and adding a line `S:W:<HASH>`.
122
-
123
-## WDB format
124
-
125
-This file contains whitelisted url pairs It contains lines in the following format:
126
-
127
-```
128
-    X:RealURL:DisplayedURL[:FuncLevelSpec]
129
-    M:RealHostname:DisplayedHostname[:FuncLevelSpec]
130
-```
131
-
132
-- `X`
133
-
134
-  regular expression, for the *entire URL*, not just the hostname
135
-
136
-  - The regular expression is by default anchored to start-of-line and end-of-line, as if you have used `^RegularExpression$`
137
-
138
-  - A trailing `/` is automatically added both to the regex, and the input string to avoid false matches
139
-
140
-  - The regular expression matches the *concatenation* of the RealURL, a colon(`:`), and the DisplayedURL as a single string. It doesn’t separately match RealURL and DisplayedURL\!
141
-
142
-- `M`
143
-
144
-  matches hostname, or subdomain of it, see notes for H above
145
-
146
-## Hints
147
-
148
-- empty lines are ignored
149
-
150
-- the colons are mandatory
151
-
152
-- Don’t leave extra spaces on the end of a line\!
153
-
154
-- if any of the lines don’t conform to this format, clamav will abort with a Malformed Database Error
155
-
156
-- see section [Extraction-of-realURL](#Extraction-of-realURL,-displayedURL-from-HTML-tags) for more details on realURL/displayedURL
157
-
158
-## Examples of PDB signatures
159
-
160
-To check for phishing mails that target amazon.com, or subdomains of
161
-amazon.com:
162
-
163
-```
164
-    H:amazon.com
165
-```
166
-
167
-To do the same, but for amazon.co.uk:
168
-
169
-```
170
-    H:amazon.co.uk
171
-```
172
-
173
-To limit the signatures to certain engine versions:
174
-
175
-```
176
-    H:amazon.co.uk:20-30
177
-    H:amazon.co.uk:20-
178
-    H:amazon.co.uk:0-20
179
-```
180
-
181
-First line: engine versions 20, 21, ..., 29 can load it
182
-
183
-Second line: engine versions \>= 20 can load it
184
-
185
-Third line: engine versions \< 20 can load it
186
-
187
-In a real situation, you’d probably use the second form. A situation like that would be if you are using a feature of the signatures not available in earlier versions, or if earlier versions have bugs with your signature. Its neither case here, the above examples are for illustrative purposes only.
188
-
189
-## Examples of WDB signatures
190
-
191
-To allow amazon’s country specific domains and amazon.com, to mix domain names in DisplayedURL, and RealURL:
192
-
193
-    X:.+\.amazon\.(at|ca|co\.uk|co\.jp|de|fr)([/?].*)?:.+\.amazon\.com([/?].*)?:17-
194
-
195
-Explanation of this signature:
196
-
197
-- `X:`
198
-
199
-  this is a regular expression
200
-
201
-- `:17-`
202
-
203
-  load signature only for engines with functionality level \>= 17 (recommended for type X)
204
-
205
-The regular expression is the following (X:, :17- stripped, and a / appended)
206
-
207
-```
208
-    .+\.amazon\.(at|ca|co\.uk|co\.jp|de|fr)([/?].*)?:.+\.amazon\.com([/?].*)?/
209
-```
210
-
211
-Explanation of this regular expression (note that it is a single regular expression, and not 2 regular expressions splitted at the :).
212
-
213
-- `.+`
214
-
215
-  any subdomain of
216
-
217
-- `\.amazon\.`
218
-
219
-  domain we are whitelisting (RealURL part)
220
-
221
-- `(at|ca|co\.uk|co\.jp|de|fr)`
222
-
223
-  country-domains: at, ca, co.uk, co.jp, de, fr
224
-
225
-- `([/?].*)?`
226
-
227
-  recomended way to end real url part of whitelist, this protects against embedded URLs (evilurl.example.com/amazon.co.uk/)
228
-
229
-- `:`
230
-
231
-  RealURL and DisplayedURL are concatenated via a :, so match a literal : here
232
-
233
-- `.+`
234
-
235
-  any subdomain of
236
-
237
-- `\.amazon\.com`
238
-
239
-  whitelisted DisplayedURL
240
-
241
-- `([/?].*)?`
242
-
243
-  recommended way to end displayed url part, to protect against embedded URLs
244
-
245
-- `/`
246
-
247
-  automatically added to further protect against embedded URLs
248
-
249
-When you whitelist an entry make sure you check that both domains are owned by the same entity. What this whitelist entry allows is: Links claiming to point to amazon.com (DisplayedURL), but really go to country-specific domain of amazon (RealURL).
250
-
251
-## Example for how the URL extractor works
252
-
253
-Consider the following HTML file:
254
-
255
-```html
256
-    <html>
257
-    <a href="http://1.realurl.example.com/">
258
-      1.displayedurl.example.com
259
-    </a>
260
-    <a href="http://2.realurl.example.com">
261
-      2 d<b>i<p>splayedurl.e</b>xa<i>mple.com
262
-    </a>
263
-    <a href="http://3.realurl.example.com"> 
264
-      3.nested.example.com
265
-      <a href="http://4.realurl.example.com">
266
-        4.displayedurl.example.com
267
-      </a>
268
-    </a>
269
-    <form action="http://5.realurl.example.com">
270
-      sometext
271
-      <img src="http://5.displayedurl.example.com/img0.gif"/>
272
-      <a href="http://5.form.nested.displayedurl.example.com">
273
-        5.form.nested.link-displayedurl.example.com
274
-      </a>
275
-    </form>
276
-    <a href="http://6.realurl.example.com">
277
-      6.displ
278
-      <img src="6.displayedurl.example.com/img1.gif"/>
279
-      ayedurl.example.com
280
-    </a>
281
-    <a href="http://7.realurl.example.com">
282
-      <iframe src="http://7.displayedurl.example.com">
283
-    </a>
284
-```
285
-
286
-The phishing engine extract the following
287
-RealURL/DisplayedURL pairs from it:
288
-
289
-```
290
-    http://1.realurl.example.com/
291
-    1.displayedurl.example.com
292
-
293
-    http://2.realurl.example.com
294
-    2displayedurl.example.com
295
-
296
-    http://3.realurl.example.com
297
-    3.nested.example.com
298
-
299
-    http://4.realurl.example.com
300
-    4.displayedurl.example.com
301
-
302
-    http://5.realurl.example.com
303
-    http://5.displayedurl.example.com/img0.gif
304
-
305
-    http://5.realurl.example.com
306
-    http://5.form.nested.displayedurl.example.com
307
-
308
-    http://5.form.nested.displayedurl.example.com
309
-    5.form.nested.link-displayedurl.example.com
310
-
311
-    http://6.realurl.example.com
312
-    6.displayedurl.example.com
313
-
314
-    http://6.realurl.example.com
315
-    6.displayedurl.example.com/img1.gif
316
-```
317
-
318
-## How matching works
319
-
320
-### RealURL, displayedURL concatenation
321
-
322
-The phishing detection module processes pairs of RealURL/DisplayedURL. Matching against daily.wdb is done as follows: the realURL is concatenated with a `:`, and with the DisplayedURL, then that *line* is matched against the lines in daily.wdb/daily.pdb
323
-
324
-So if you have this line in daily.wdb:
325
-
326
-    M:www.google.ro:www.google.com
327
-
328
-and this href: `<a href='http://www.google.ro'>www.google.com</a>` then it will be whitelisted, but: `<a href='http://images.google.com'>www.google.com</a>` will not.
329
-
330
-### What happens when a match is found
331
-
332
-In the case of the whitelist, a match means that the RealURL/DisplayedURL combination is considered clean, and no further checks are performed on it.
333
-
334
-In the case of the domainlist, a match means that the RealURL/displayedURL is going to be checked for phishing attempts.
335
-
336
-Furthermore you can restrict what checks are to be performed by specifying the 3-digit hexnumber.
337
-
338
-### Extraction of realURL, displayedURL from HTML tags
339
-
340
-The html parser extracts pairs of realURL/displayedURL based on the following rules.
341
-
342
-In version 0.93: After URLs have been extracted, they are normalized, and cut after the hostname. `http://test.example.com/path/somecgi?queryparameters` becomes `http://test.example.com/`
343
-
344
-- `a`
345
-
346
-  (anchor) the *href* is the realURL, its *contents* is the displayedURL
347
-
348
-  - contents
349
-    is the tag-stripped contents of the \<a\> tags, so for example \<b\> tags are stripped (but not their contents)
350
-
351
-  nesting another \<a\> tag withing an \<a\> tag (besides being invalid html) is treated as a \</a\>\<a..
352
-
353
-- `form`
354
-
355
-  the *action* attribute is the realURL, and a nested \<a\> tag is the displayedURL
356
-
357
-- `img/area`
358
-
359
-  if nested within an *\<a\>* tag, the realURL is the *href* of the a tag, and the *src/dynsrc/area* is the displayedURL of the img
360
-
361
-  if nested withing a *form* tag, then the action attribute of the *form* tag is the realURL
362
-
363
-- `iframe`
364
-
365
-  if nested withing an *\<a\>* tag the *src* attribute is the displayedURL, and the *href* of its parent *a* tag is the realURL
366
-
367
-  if nested withing a *form* tag, then the action attribute of the *form* tag is the realURL
368
-
369
-### Example
370
-
371
-Consider this html file:
372
-
373
-```html
374
-<a href=”evilurl”\>www.paypal.com\</a\>*
375
-
376
-<a href=”evilurl2” title=”www.ebay.com”\>click here to sign
377
-in\</a\>*
378
-
379
-<form action=”evilurl_form”\>*
380
-
381
-*Please sign in to \<a href=”cgi.ebay.com”\>Ebay\</a\using this
382
-form*
383
-
384
-<input type=’text’ name=’username’\>Username\</input\>*
385
-
386
-*....*
387
-
388
-</form\>*
389
-
390
-<a href=”evilurl”\>\<img src=”images.paypal.com/secure.jpg”\>\</a\>*
391
-```
392
-
393
-The resulting realURL/displayedURL pairs will be (note that one tag can generate multiple pairs):
394
-
395
-- evilurl / www.paypal.com
396
-
397
-- evilurl2 / click here to sign in
398
-
399
-- evilurl2 / www.ebay.com
400
-
401
-- evilurl_form / cgi.ebay.com
402
-
403
-- cgi.ebay.com / Ebay
404
-
405
-- evilurl / image.paypal.com/secure.jpg
406
-
407
-## Simple patterns
408
-
409
-Simple patterns are matched literally, i.e. if you say:
410
-
411
-```
412
-www.google.com
413
-```
414
-
415
-it is going to match *www.google.com*, and only that. The *. (dot)* character has no special meaning (see the section on regexes [\[sec:Regular-expressions\]](#sec:Regular-expressions) for how the *.(dot)* character behaves there)
416
-
417
-## Regular expressions
418
-
419
-POSIX regular expressions are supported, and you can consider that internally it is wrapped by *^*, and *$.* In other words, this means that the regular expression has to match the entire concatenated (see section [RealURL,-displayedURL-concatenation](#RealURL,-displayedURL-concatenation) for details on concatenation) url.
420
-
421
-It is recomended that you read section [Introduction-to-regular](#Introduction-to-regular) to learn how to write regular expressions, and then come back and read this for hints.
422
-
423
-Be advised that clamav contains an internal, very basic regex matcher to reduce the load on the regex matching core. Thus it is recomended that you avoid using regex syntax not supported by it at the very beginning of regexes (at least the first few characters).
424
-
425
-Currently the clamav regex matcher supports:
426
-
427
-- `.` (dot) character
428
-
429
-- `\(\backslash\)` (escaping special characters)
430
-
431
-- `|` (pipe) alternatives
432
-
433
-- `\[\]` (character classes)
434
-
435
-- `()` (parenthesis for grouping, but no group extraction is performed)
436
-
437
-- other non-special characters
438
-
439
-Thus the following are not supported:
440
-
441
-- `\+` repetition
442
-
443
-- `\*` repetition
444
-
445
-- `{}` repetition
446
-
447
-- backreferences
448
-
449
-- lookaround
450
-
451
-- other “advanced” features not listed in the supported list ;)
452
-
453
-This however shouldn’t discourage you from using the “not directly supported features “, because if the internal engine encounters unsupported syntax, it passes it on to the POSIX regex core (beginning from the first unsupported token, everything before that is still processed by the internal matcher). An example might make this more clear:
454
-
455
-*www\(\backslash\).google\(\backslash\).(com|ro|it) (\[a-zA-Z\])+\(\backslash\).google\(\backslash\).(com|ro|it)*
456
-
457
-Everything till *(\[a-zA-Z\])+* is processed internally, that parenthesis (and everything beyond) is processed by the posix core.
458
-
459
-Examples of url pairs that match:
460
-
461
-- *www.google.ro images.google.ro*
462
-
463
-- www.google.com images.google.ro
464
-
465
-Example of url pairs that don’t match:
466
-
467
-- www.google.ro images1.google.ro
468
-
469
-- images.google.com image.google.com
470
-
471
-## Flags
472
-
473
-Flags are a binary OR of the following numbers:
474
-
475
-- HOST_SUFFICIENT
476
-
477
-  1
478
-
479
-- DOMAIN_SUFFICIENT
480
-
481
-  2
482
-
483
-- DO_REVERSE_LOOKUP
484
-
485
-  4
486
-
487
-- CHECK_REDIR
488
-
489
-  8
490
-
491
-- CHECK_SSL
492
-
493
-  16
494
-
495
-- CHECK_CLOAKING
496
-
497
-  32
498
-
499
-- CLEANUP_URL
500
-
501
-  64
502
-
503
-- CHECK_DOMAIN_REVERSE
504
-
505
-  128
506
-
507
-- CHECK_IMG_URL
508
-
509
-  256
510
-
511
-- DOMAINLIST_REQUIRED
512
-
513
-  512
514
-
515
-The names of the constants are self-explanatory.
516
-
517
-These constants are defined in libclamav/phishcheck.h, you can check there for the latest flags.
518
-
519
-There is a default set of flags that are enabled, these are currently:
520
-
521
-    ( CLEANUP_URL | CHECK_SSL | CHECK_CLOAKING | CHECK_IMG_URL )
522
-
523
-ssl checking is performed only for a tags currently.
524
-
525
-You must decide for each line in the domainlist if you want to filter any flags (that is you don’t want certain checks to be done), and then calculate the binary OR of those constants, and then convert it into a 3-digit hexnumber. For example you devide that domain_sufficient shouldn’t be used for ebay.com, and you don’t want to check images either, so you come up with this flag number: \(2|256\Rightarrow\)258\((decimal)\Rightarrow102(hexadecimal)\)
526
-
527
-So you add this line to daily.wdb:
528
-
529
-- R102 www.ebay.com .+
530
-
531
-# Introduction to regular expressions
532
-
533
-Recomended reading:
534
-
535
-- http://www.regular-expressions.info/quickstart.html
536
-
537
-- http://www.regular-expressions.info/tutorial.html
538
-
539
-- regex(7) man-page: http://www.tin.org/bin/man.cgi?section=7\&topic=regex
540
-
541
-## Special characters
542
-
543
-- \[
544
-
545
-  the opening square bracket - it marks the beginning of a character class, see section[Character-classes](#Character-classes)
546
-
547
-- \(\backslash\)
548
-
549
-  the backslash - escapes special characters, see section [Escaping](#Escaping)
550
-
551
-- ^
552
-
553
-  the caret - matches the beginning of a line (not needed in clamav regexes, this is implied)
554
-
555
-- $
556
-
557
-  the dollar sign - matches the end of a line (not needed in clamav regexes, this is implied)
558
-
559
-- ̇
560
-
561
-  the period or dot - matches *any* character
562
-
563
-- |
564
-
565
-  the vertical bar or pipe symbol - matches either of the token on its left and right side, see section [Alternation](#sub:Alternation)
566
-
567
-- ?
568
-
569
-  the question mark - matches optionally the left-side token, see section[Optional-matching,-and](Optional-matching,-and)
570
-
571
-- \*
572
-
573
-  the asterisk or star - matches 0 or more occurences of the left-side token, see section [Optional-matching,-and](Optional-matching,-and)
574
-
575
-- +
576
-
577
-  the plus sign - matches 1 or more occurences of the left-side token, see section [Optional-matching,-and](Optional-matching,-and)
578
-
579
-- (
580
-
581
-  the opening round bracket - marks beginning of a group, see section [Groups](Groups)
582
-
583
-- )
584
-
585
-  the closing round bracket - marks end of a group, see section[Groups](Groups)
586
-
587
-## Character classes
588
-
589
-## Escaping
590
-
591
-Escaping has two purposes:
592
-
593
-- it allows you to actually match the special characters themselves, for example to match the literal *+*, you would write *\(\backslash\)+*
594
-
595
-- it also allows you to match non-printable characters, such as the tab (*\(\backslash\)t*), newline (*\(\backslash\)n*), ..
596
-
597
-However since non-printable characters are not valid inside an url, you won’t have a reason to use them.
598
-
599
-## Alternation
600
-
601
-## Optional matching, and repetition
602
-
603
-## Groups
604
-
605
-Groups are usually used together with repetition, or alternation. For example: *(com|it)+* means: match 1 or more repetitions of *com* or *it,* that is it matches: com, it, comcom, comcomcom, comit, itit, ititcom,... you get the idea.
606
-
607
-Groups can also be used to extract substring, but this is not supported by the clam engine, and not needed either in this case.
608
-
609
-# How to create database files
610
-
611
-## How to create and maintain the whitelist (daily.wdb)
612
-
613
-If the phishing code claims that a certain mail is phishing, but its not, you have 2 choices:
614
-
615
-- examine your rules daily.pdb, and fix them if necessary (see: section[How-to-create](How-to-create))
616
-
617
-- add it to the whitelist (discussed here)
618
-
619
-Lets assume you are having problems because of links like this in a mail:
620
-
621
-```html
622
-    <a href=''http://69.0.241.57/bCentral/L.asp?L=XXXXXXXX''>
623
-      http://www.bcentral.it/
624
-    </a>
625
-```
626
-
627
-After investigating those sites further, you decide they are no threat, and create a line like this in daily.wdb:
628
-
629
-```
630
-R http://www\(\backslash\).bcentral\(\backslash\).it/.+
631
-http://69\(\backslash\).0\(\backslash\).241\(\backslash\).57/bCentral/L\(\backslash\).asp?L=.+
632
-```
633
-
634
-Note: urls like the above can be used to track unique mail recipients, and thus know if somebody actually reads mails (so they can send more spam). However since this site required no authentication information, it is safe from a phishing point of view.
635
-
636
-## How to create and maintain the domainlist (daily.pdb)
637
-
638
-When not using –phish-scan-alldomains (production environments for example), you need to decide which urls you are going to check.
639
-
640
-Although at a first glance it might seem a good idea to check everything, it would produce false positives. Particularly newsletters, ads, etc. are likely to use URLs that look like phishing attempts.
641
-
642
-Lets assume that you’ve recently seen many phishing attempts claiming they come from Paypal. Thus you need to add paypal to daily.pdb:
643
-
644
-```
645
-R .+ .+\(\backslash\).paypal\(\backslash\).com
646
-```
647
-
648
-The above line will block (detect as phishing) mails that contain urls that claim to lead to paypal, but they don’t in fact.
649
-
650
-Be carefull not to create regexes that match a too broad range of urls though.
651
-
652
-## Dealing with false positives, and undetected phishing mails
653
-
654
-### False positives
655
-
656
-Whenever you see a false positive (mail that is detected as phishing, but its not), you need to examine *why* clamav decided that its phishing. You can do this easily by building clamav with debugging (./configure –enable-experimental –enable-debug), and then running a tool:
657
-
658
-```bash
659
-$contrib/phishing/why.py phishing.eml
660
-```
661
-
662
-This will show the url that triggers the phish verdict, and a reason why that url is considered phishing attempt.
663
-
664
-Once you know the reason, you might need to modify daily.pdb (if one of yours rules inthere are too broad), or you need to add the url to daily.wdb. If you think the algorithm is incorrect, please file a bug report on bugzilla.clamav.net, including the output of *why.py*.
665
-
666
-### Undetected phish mails
667
-
668
-Using why.py doesn’t help here unfortunately (it will say: clean), so all you can do is:
669
-
670
-```bash
671
-$clamscan/clamscan –phish-scan-alldomains undetected.eml
672
-```
673
-
674
-And see if the mail is detected, if yes, then you need to add an appropriate line to daily.pdb (see section [How-to-create](How-to-create)).
675
-
676
-If the mail is not detected, then try using:
677
-
678
-```bash
679
-$clamscan/clamscan –debug undetected.eml|less
680
-```
681
-
682
-Then see what urls are being checked, see if any of them is in a whitelist, see if all urls are detected, etc.
... ...
@@ -3,849 +3,131 @@
3 3
 Table of Contents
4 4
 
5 5
 - [Creating signatures for ClamAV](#creating-signatures-for-clamav)
6
-- [Introduction](#introduction)
7
-- [Signature formats](#signature-formats)
8
-    - [Hash-based signatures](#hash-based-signatures)
9
-        - [MD5 hash-based signatures](#md5-hash-based-signatures)
10
-        - [SHA1 and SHA256 hash-based signatures](#sha1-and-sha256-hash-based-signatures)
11
-        - [PE section based hash signatures](#pe-section-based-hash-signatures)
12
-        - [Hash signatures with unknown size](#hash-signatures-with-unknown-size)
13
-    - [Body-based signatures](#body-based-signatures)
14
-        - [Hexadecimal format](#hexadecimal-format)
15
-        - [Wildcards](#wildcards)
16
-        - [Character classes](#character-classes)
17
-        - [Alternate strings](#alternate-strings)
18
-        - [Basic signature format](#basic-signature-format)
19
-        - [Extended signature format](#extended-signature-format)
20
-        - [Logical signatures](#logical-signatures)
21
-        - [Subsignature Modifiers](#subsignature-modifiers)
22
-    - [Special Subsignature Types](#special-subsignature-types)
23
-        - [Macro subsignatures](#macro-subsignatures)
24
-        - [Byte Compare Subsignatures](#byte-compare-subsignatures)
25
-        - [PCRE subsignatures](#pcre-subsignatures)
26
-    - [Icon signatures for PE files](#icon-signatures-for-pe-files)
27
-    - [Signatures for Version Information metadata in PE files](#signatures-for-version-information-metadata-in-pe-files)
28
-    - [Trusted and Revoked Certificates](#trusted-and-revoked-certificates)
29
-    - [Signatures based on container metadata](#signatures-based-on-container-metadata)
30
-    - [Whitelist databases](#whitelist-databases)
31
-    - [Signature names](#signature-names)
32
-    - [Using YARA rules in ClamAV](#using-yara-rules-in-clamav)
33
-    - [Passwords for archive files \[experimental\]](#passwords-for-archive-files-experimental)
34
-- [Signature writing tips and tricks](#signature-writing-tips-and-tricks)
35
-    - [Testing rules with clamscan](#testing-rules-with-clamscan)
36
-    - [Debug information from libclamav](#debug-information-from-libclamav)
37
-    - [Writing signatures for special files](#writing-signatures-for-special-files)
38
-        - [HTML](#html)
39
-        - [Text files](#text-files)
40
-        - [Compressed Portable Executable files](#compressed-portable-executable-files)
41
-    - [Using sigtool](#using-sigtool)
42
-    - [Inspecting signatures inside a CVD file](#inspecting-signatures-inside-a-CVD-file)
43
-    - [External tools](#external-tools)
44
-
45
-# Introduction
6
+    - [Introduction](#introduction)
7
+    - [Database formats](#database-formats)
8
+        - [Settings databases](#settings-databases)
9
+        - [Signature databases](#signature-databases)
10
+            - [Body-based Signatures](#body-based-signatures)
11
+            - [Hash-based Signatures](#hash-based-signatures)
12
+            - [Alternative signature support](#alternative-signature-support)
13
+        - [Other database files](#other-database-files)
14
+        - [Signature names](#signature-names)
15
+    - [Signature Writing Tips and Tricks](#signature-writing-tips-and-tricks)
16
+        - [Testing rules with `clamscan`](#testing-rules-with-clamscan)
17
+        - [Debug information from libclamav](#debug-information-from-libclamav)
18
+        - [Writing signatures for special files](#writing-signatures-for-special-files)
19
+            - [HTML](#html)
20
+            - [Text files](#text-files)
21
+            - [Compressed Portable Executable files](#compressed-portable-executable-files)
22
+        - [Using `sigtool`](#using-sigtool)
23
+        - [Inspecting signatures inside a CVD file](#inspecting-signatures-inside-a-cvd-file)
24
+        - [External tools](#external-tools)
25
+
26
+## Introduction
46 27
 
47 28
 In order to detect malware and other file-based threats, ClamAV relies on signatures to differentiate clean and malicious/unwanted files.  ClamAV signatures are primarily text-based and conform to one of the ClamAV-specific signature formats associated with a given method of detection.  These formats are explained in the [Signature formats](#signature-formats) section below.  In addition, ClamAV 0.99 and above support signatures written in the YARA format.  More information on this can be found in the [Using YARA rules in ClamAV](#using-yara-rules-in-clamav) section.
48 29
 
49 30
 The ClamAV project distributes a collection of signatures in the form of CVD (ClamAV Virus Database) files.  The CVD file format provides a digitally-signed container that encapsulates the signatures and ensures that they can't be modified by a malicious third-party.  This signature set is actively maintained by [Cisco Talos](https://www.talosintelligence.com/) and can be downloaded using the `freshclam` application that ships with ClamAV.  For more details on this, see the [CVD file](#inspecting-signatures-inside-a-CVD-file) section.
50 31
 
51
-# Signature formats
32
+## Database formats
52 33
 
53
-## Hash-based signatures
34
+ClamAV CVD and CLD database archives may be unpacked to the current directory using `sigtool -u <database name>`. For more details on inspecting CVD and CLD files, see [Inspecting signatures inside a CVD file](#inspecting-signatures-inside-a-cvd-file). Once unpacked, you'll observe a large collection of database files with various extensions described below.
54 35
 
55
-The easiest way to create signatures for ClamAV is to use filehash checksums, however this method can be only used against static malware.
36
+The CVD and CLD database archives may be supplemented with custom database files in the formats described to gain additional detection functionality. This is done simply by adding files of the following formats to the database directory, typically `/usr/local/share/clamav` or `"C:\Program Files\ClamAV\database"`. Alternatively, `clamd` and `clamscan` can be instructed to load the database from an alternative database file or database directory manually using the `clamd` `DatabaseDirectory` config option or the `clamscan -d` command line option.
56 37
 
57
-### MD5 hash-based signatures
38
+### Settings databases
58 39
 
59
-To create a MD5 signature for `test.exe` use the `--md5` option of
60
-sigtool:
40
+ClamAV provides a handful of configuration related databases along side the signature definitions.
61 41
 
62
-```bash
63
-zolw@localhost:/tmp/test$ sigtool --md5 test.exe > test.hdb
64
-zolw@localhost:/tmp/test$ cat test.hdb
65
-48c4533230e1ae1c118c741c0db19dfb:17387:test.exe
66
-```
67
-
68
-That’s it\! The signature is ready for use:
69
-
70
-```bash
71
-zolw@localhost:/tmp/test$ clamscan -d test.hdb test.exe
72
-test.exe: test.exe FOUND
73
-
74
-Known viruses: 1
75
-Scanned directories: 0
76
-Engine version: 0.92.1
77
-Scanned files: 1
78
-Infected files: 1
79
-Data scanned: 0.02 MB
80
-Time: 0.024 sec (0 m 0 s)
81
-```
82
-
83
-You can change the name (by default sigtool uses the name of the file) and place it inside a `*.hdb` file. A single database file can include any number of signatures. To get them automatically loaded each time clamscan/clamd starts just copy the database file(s) into the local virus database directory (eg. /usr/local/share/clamav).
84
-
85
-*The hash-based signatures shall not be used for text files, HTML and any other data that gets internally preprocessed before pattern matching. If you really want to use a hash signature in such a case, run clamscan with –debug and –leave-temps flags as described above and create a signature for a preprocessed file left in /tmp. Please keep in mind that a hash signature will stop matching as soon as a single byte changes in the target file.*
86
-
87
-### SHA1 and SHA256 hash-based signatures
88
-
89
-ClamAV 0.98 has also added support for SHA1 and SHA256 file checksums. The format is the same as for MD5 file checksum. It can differentiate between them based on the length of the hash string in the signature. For best backwards compatibility, these should be placed inside a `*.hsb` file. The format is:
90
-
91
-```
92
-HashString:FileSize:MalwareName
93
-```
94
-
95
-### PE section based hash signatures
96
-
97
-You can create a hash signature for a specific section in a PE file. Such signatures shall be stored inside `.mdb` files in the following format:
98
-
99
-```
100
-PESectionSize:PESectionHash:MalwareName
101
-```
102
-
103
-The easiest way to generate MD5 based section signatures is to extract target PE sections into separate files and then run sigtool with the option `--mdb`
104
-
105
-ClamAV 0.98 has also added support for SHA1 and SHA256 section based signatures. The format is the same as for MD5 PE section based signatures. It can differentiate between them based on the length of the hash string in the signature. For best backwards compatibility, these should be placed inside a `*.msb` file.
106
-
107
-### Hash signatures with unknown size
108
-
109
-ClamAV 0.98 has also added support for hash signatures where the size is not known but the hash is. It is much more performance-efficient to use signatures with specific sizes, so be cautious when using this feature. For these cases, the ’\*’ character can be used in the size field. To ensure proper backwards compatibility with older versions of ClamAV, these signatures must have a minimum functional level of 73 or higher. Signatures that use the wildcard size without this level set will be rejected as malformed.
110
-
111
-```
112
-Sample .hsb signature matching any size
113
-HashString:*:MalwareName:73
114
-
115
-Sample .msb signature matching any size
116
-*:PESectionHash:MalwareName:73
117
-```
118
-
119
-## Body-based signatures
120
-
121
-ClamAV stores all body-based signatures in a hexadecimal format. In this section by a hex-signature we mean a fragment of malware’s body converted into a hexadecimal string which can be additionally extended using various wildcards.
122
-
123
-### Hexadecimal format
124
-
125
-You can use `sigtool --hex-dump` to convert any data into a hex-string:
126
-
127
-```bash
128
-zolw@localhost:/tmp/test$ sigtool --hex-dump
129
-How do I look in hex?
130
-486f7720646f2049206c6f6f6b20696e206865783f0a
131
-```
132
-
133
-### Wildcards
134
-
135
-ClamAV supports the following wildcards for hex-signatures:
136
-
137
-- `??`
138
-
139
-  Match any byte.
140
-
141
-- `a?`
142
-
143
-  Match a high nibble (the four high bits).
144
-  **IMPORTANT NOTE:** The nibble matching is only available in
145
-  libclamav with the functionality level 17 and higher therefore
146
-  please only use it with .ndb signatures followed by ":17"
147
-  (MinEngineFunctionalityLevel, see [3.2.7](#ndb)).
148
-
149
-- `?a`
150
-
151
-  Match a low nibble (the four low bits).
152
-
153
-- `*`
154
-
155
-  Match any number of bytes.
156
-
157
-- `{n}`
158
-
159
-  Match \(n\) bytes.
160
-
161
-- `{-n}`
162
-
163
-  Match \(n\) or less bytes.
164
-
165
-- `{n-}`
166
-
167
-  Match \(n\) or more bytes.
168
-
169
-- `{n-m}`
170
-
171
-  Match between \(n\) and \(m\) bytes (\(m > n\)).
172
-
173
-- `HEXSIG[x-y]aa` or `aa[x-y]HEXSIG`
174
-
175
-  Match aa anchored to a hex-signature, see
176
-  <https://bugzilla.clamav.net/show_bug.cgi?id=776> for discussion and
177
-  examples.
178
-
179
-The range signatures `*` and `{}` virtually separate a hex-signature into two parts, eg. `aabbcc*bbaacc` is treated as two sub-signatures `aabbcc` and `bbaacc` with any number of bytes between them. It’s a requirement that each sub-signature includes a block of two static characters somewhere in its body. Note that there is one exception to this restriction; that is when the range wildcard is of the form `{n}` with `n<128`. In this case, ClamAV uses an optimization and translates `{n}` to the string consisting of `n ??` character wildcards. Character wildcards do not divide hex signatures into two parts and so the two static character requirement does not apply.
180
-
181
-### Character classes
182
-
183
-ClamAV supports the following character classes for hex-signatures:
184
-
185
-- `(B)`
186
-
187
-  Match word boundary (including file boundaries).
188
-
189
-- `(L)`
190
-
191
-  Match CR, CRLF or file boundaries.
192
-
193
-- `(W)`
194
-
195
-  Match a non-alphanumeric character.
196
-
197
-### Alternate strings
198
-
199
-- Single-byte alternates (clamav-0.96) `(aa|bb|cc|...)` or `!(aa|bb|cc|...)` Match a member from a set of bytes \[aa, bb, cc, ...\].
200
-  - Negation operation can be applied to match any non-member, assumed to be one-byte in length.
201
-  - Signature modifiers and wildcards cannot be applied.
202
-
203
-- Multi-byte fixed length alternates `(aaaa|bbbb|cccc|...)` or `!(aaaa|bbbb|cccc|...)` Match a member from a set of multi-byte alternates \[aaaa, bbbb, cccc, ...\] of n-length.
204
-  - All set members must be the same length.
205
-  - Negation operation can be applied to match any non-member, assumed to be n-bytes in length (clamav-0.98.2).
206
-  - Signature modifiers and wildcards cannot be applied.
207
-
208
-- Generic alternates (clamav-0.99) `(alt1|alt2|alt3|...)` Match a member from a set of alternates \[alt1, alt2, alt3, ...\] that can be of variable lengths.
209
-  - Negation operation cannot be applied.
210
-  - Signature modifiers and nibble wildcards \[`??, a?, ?a`\] can be applied.
211
-  - Ranged wildcards \[`{n-m}`\] are limited to a fixed range of less than 128 bytes \[`{1} -> {127}`\].
212
-
213
-Note that using signature modifiers and wildcards classifies the alternate type to be a generic alternate. Thus single-byte alternates and multi-byte fixed length alternates can use signature modifiers and wildcards but will be classified as generic alternate. This means that negation cannot be applied in this situation and there is a slight performance impact.
214
-
215
-### Basic signature format
216
-
217
-The simplest (and now deprecated) signature format is:
218
-
219
-```
220
-MalwareName=HexSignature
221
-```
222
-
223
-ClamAV will scan the entire file looking for HexSignature. All signatures of this type must be placed inside `*.db` files.
224
-
225
-### Extended signature format
226
-
227
-The extended signature format allows for specification of additional information such as a target file type, virus offset or engine version, making the detection more reliable. The format is:
228
-
229
-```
230
-MalwareName:TargetType:Offset:HexSignature[:MinFL:[MaxFL]]
231
-```
232
-
233
-where `TargetType` is one of the following numbers specifying the type of the target file:
234
-
235
-- 0 = any file
236
-
237
-- 1 = Portable Executable, both 32- and 64-bit.
238
-
239
-- 2 = OLE2 containers, including their specific macros. The OLE2 format is primarily used by MS Office and MSI installation files.
240
-
241
-- 3 = HTML (normalized: whitespace transformed to spaces, tags/tag attributes normalized, all lowercase), Javascript is normalized too: all strings are normalized (hex encoding is decoded), numbers are parsed and normalized, local variables/function names are normalized to ’n001’ format, argument to eval() is parsed as JS again, unescape() is handled, some simple JS packers are handled, output is whitespace normalized.
242
-
243
-- 4 = Mail file
244
-
245
-- 5 = Graphics
246
-
247
-- 6 = ELF
248
-
249
-- 7 = ASCII text file (normalized)
250
-
251
-- 8 = Unused
252
-
253
-- 9 = Mach-O files
254
-
255
-- 10 = PDF files
256
-
257
-- 11 = Flash files
258
-
259
-- 12 = Java class files
260
-
261
-And `Offset` is an asterisk or a decimal number `n` possibly combined with a special modifier:
262
-
263
-- `*` = any
264
-
265
-- `n` = absolute offset
266
-
267
-- `EOF-n` = end of file minus `n` bytes
268
-
269
-Signatures for PE, ELF and Mach-O files additionally support:
270
-
271
-- `EP+n` = entry point plus n bytes (`EP+0` for `EP`)
272
-
273
-- `EP-n` = entry point minus n bytes
274
-
275
-- `Sx+n` = start of section `x`’s (counted from 0) data plus `n` bytes
276
-
277
-- `SEx` = entire section `x` (offset must lie within section boundaries)
278
-
279
-- `SL+n` = start of last section plus `n` bytes
280
-
281
-All the above offsets except `*` can be turned into **floating offsets** and represented as `Offset,MaxShift` where `MaxShift` is an unsigned integer. A floating offset will match every offset between `Offset` and `Offset+MaxShift`, eg. `10,5` will match all offsets from 10 to 15 and `EP+n,y` will match all offsets from `EP+n` to `EP+n+y`. Versions of ClamAV older than 0.91 will silently ignore the `MaxShift` extension and only use `Offset`. Optional `MinFL` and `MaxFL` parameters can restrict the signature to specific engine releases. All signatures in the extended format must be placed inside `*.ndb` files.
282
-
283
-### Logical signatures
284
-
285
-Logical signatures allow combining of multiple signatures in extended format using logical operators. They can provide both more detailed and flexible pattern matching. The logical sigs are stored inside `*.ldb` files in the following format:
286
-
287
-```
288
-SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;
289
-Subsig1;Subsig2;...
290
-```
291
-
292
-where:
293
-
294
-- `TargetDescriptionBlock` provides information about the engine and target file with comma separated `Arg:Val` pairs. For args where `Val` is a range, the minimum and maximum values should be expressed as `min-max`.
295
-
296
-- `LogicalExpression` specifies the logical expression describing the relationship between `Subsig0...SubsigN`. **Basis clause:** 0,1,...,N decimal indexes are SUB-EXPRESSIONS representing `Subsig0, Subsig1,...,SubsigN` respectively. **Inductive clause:** if `A` and `B` are SUB-EXPRESSIONS and `X, Y` are decimal numbers then `(A&B)`, `(A|B)`, `A=X`, `A=X,Y`, `A>X`, `A>X,Y`, `A<X` and `A<X,Y` are SUB-EXPRESSIONS
297
-
298
-- `SubsigN` is n-th subsignature in extended format possibly preceded with an offset. There can be specified up to 64 subsigs.
299
-
300
-Keywords used in `TargetDescriptionBlock`:
301
-
302
-- `Target:X`: Target file type
303
-
304
-- `Engine:X-Y`: Required engine functionality (range; 0.96). Note that if the `Engine` keyword is used, it must be the first one in the `TargetDescriptionBlock` for backwards compatibility
305
-
306
-- `FileSize:X-Y`: Required file size (range in bytes; 0.96)
307
-
308
-- `EntryPoint`: Entry point offset (range in bytes; 0.96)
309
-
310
-- `NumberOfSections`: Required number of sections in executable (range; 0.96)
311
-
312
-- `Container:CL_TYPE_*`: File type of the container which stores the scanned file.
313
-
314
-  Specifying `CL_TYPE_ANY` matches on root objects only (i.e. the target file is explicitely _not_ in a container). Chances slim that you would want to use `CL_TYPE_ANY` in a signature, because placing the malicious file in an archive will then prevent it from alerting.
315
-
316
-  Every ClamAV file type has the potential to be a container for additional files, although some are more likely than others. When a file is parsed and data in the file is identified to be scanned as a unique type, that parent file becomes a container the moment the embedded content is scanned. For a list of possible CL_TYPEs, refer to the [File Types Reference](ClamAV-File-Types.md).
317
-
318
-- `Intermediates:CL_TYPE_*>CL_TYPE_*`: Specify one or more layers of file types containing the scanned file. _This is an alternative to using `Container`._
319
-
320
-  You may specify up to 16 layers of file types separated by ’`>`’ in top-down order. Note that the ’`>`’ separator is not needed if you only specify a single container. The last type should be the immediate container containing the malicious file. Unlike with the `Container` option, `CL_TYPE_ANY` can be used as a wildcard file type. (expr; 0.100.0)
321
-
322
-  For a list of possible CL_TYPEs, refer to the [File Types Reference](ClamAV-File-Types.md).
323
-
324
-- `IconGroup1`: Icon group name 1 from .idb signature Required engine functionality (range; 0.96)
325
-
326
-- `IconGroup2`: Icon group name 2 from .idb signature Required engine functionality (range; 0.96)
327
-
328
-Modifiers for subexpressions:
329
-
330
-- `A=X`: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched exactly X times; if it refers to a (logical) block of signatures then this block must generate exactly X matches (with any of its sigs).
331
-
332
-- `A=0` specifies negation (signature or block of signatures cannot be matched)
333
-
334
-- `A=X,Y`: If the SUB-EXPRESSION A refers to a single signature then this signature must be matched exactly X times; if it refers to a (logical) block of signatures then this block must generate X matches and at least Y different signatures must get matched.
335
-
336
-- `A>X`: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches (with any of its sigs).
337
-
338
-- `A>X,Y`: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches _and_ at least Y different signatures must be matched.
339
-
340
-- `A<X`: Just like `A>Z` above with the change of "more" to "less".
341
-
342
-  If the SUB-EXPRESSION A refers to a single signature then this signature must get matched less than X times; if it refers to a (logical) block of signatures then this block must generate less than X matches (with any of its sigs).
343
-
344
-- `A<X,Y`: Similar to `A>X,Y`. If the SUB-EXPRESSION A refers to a single signature then this signature must get matched less than X times; if it refers to a (logical) block of signatures then this block must generate less than X matches _and_ at least Y different signatures must be matched.
345
-
346
-Examples:
347
-
348
-```
349
-Sig1;Target:0;(0&1&2&3)&(4|1);6b6f74656b;616c61;7a6f6c77;7374656
350
-6616e;deadbeef
351
-
352
-Sig2;Target:0;((0|1|2)>5,2)&(3|1);6b6f74656b;616c61;7a6f6c77;737
353
-46566616e
354
-
355
-Sig3;Target:0;((0|1|2|3)=2)&(4|1);6b6f74656b;616c61;7a6f6c77;737
356
-46566616e;deadbeef
357
-
358
-Sig4;Engine:51-255,Target:1;((0|1)&(2|3))&4;EP+123:33c06834f04100
359
-f2aef7d14951684cf04100e8110a00;S2+78:22??232c2d252229{-15}6e6573
360
-(63|64)61706528;S3+50:68efa311c3b9963cb1ee8e586d32aeb9043e;f9c58
361
-dcf43987e4f519d629b103375;SL+550:6300680065005c0046006900
362
-```
363
-
364
-### Subsignature Modifiers
365
-
366
-ClamAV (clamav-0.99) supports a number of additional subsignature
367
-modifiers for logical signatures. This is done by specifying `::`
368
-followed by a number of characters representing the desired options.
369
-Signatures using subsignature modifiers require `Engine:81-255` for
370
-backwards-compatibility.
371
-
372
-- Case-Insensitive \[`i`\]
373
-
374
-  Specifying the `i` modifier causes ClamAV to match all alphabetic hex bytes as case-insensitive. All patterns in ClamAV are case-sensitive by default.
375
-
376
-- Wide \[`w`\]
377
-
378
-  Specifying the `w` causes ClamAV to match all hex bytes encoded with two bytes per character. Note this simply interweaves each character with NULL characters and does not truly support UTF-16 characters. Wildcards for ’wide’ subsignatures are not treated as wide (i.e. there can be an odd number of intermittent characters). This can be combined with `a` to search for patterns in both wide and ascii.
379
-
380
-- Fullword \[`f`\]
381
-
382
-  Match subsignature as a fullword (delimited by non-alphanumeric characters).
383
-
384
-- Ascii \[`a`\]
385
-
386
-  Match subsignature as ascii characters. This can be combined with `w` to search for patterns in both ascii and wide.
387
-
388
-Examples:
389
-
390
-```
391
-clamav-nocase-A;Engine:81-255,Target:0;0&1;41414141::i;424242424242::i
392
-    -matches 'AAAA'(nocase) and 'BBBBBB'(nocase)
393
-
394
-clamav-fullword-A;Engine:81-255,Target:0;0&1;414141;68656c6c6f::f
395
-    -matches 'AAA' and 'hello'(fullword)
396
-clamav-fullword-B;Engine:81-255,Target:0;0&1;414141;68656c6c6f::fi
397
-    -matches 'AAA' and 'hello'(fullword nocase)
398
-
399
-clamav-wide-B2;Engine:81-255,Target:0;0&1;414141;68656c6c6f::wa
400
-    -matches 'AAA' and 'hello'(wide ascii)
401
-clamav-wide-C0;Engine:81-255,Target:0;0&1;414141;68656c6c6f::iwfa
402
-    -matches 'AAA' and 'hello'(nocase wide fullword ascii)
403
-```
404
-
405
-## Special Subsignature Types
406
-
407
-### Macro subsignatures
408
-
409
-Introduced in ClamAV 0.96
410
-
411
-Format: `${min-max}MACROID$`
412
-
413
-Macro subsignatures are used to combine a number of existing extended
414
-signatures (`.ndb`) into a on-the-fly generated alternate string logical
415
-signature (`.ldb`). Signatures using macro subsignatures require
416
-`Engine:51-255` for backwards-compatibility.
42
+`*.cfg`: [Dynamic config settings](Signatures/DynamicConfig.md)
417 43
 
418
-Example:
44
+`*.cat` `*.crb`: [Trusted and revoked PE certs](Signatures/AuthenticodeRules.md)
419 45
 
420
-```
421
-      test.ldb:
422
-        TestMacro;Engine:51-255,Target:0;0&1;616161;${6-7}12$
423
-
424
-      test.ndb:
425
-        D1:0:$12:626262
426
-        D2:0:$12:636363
427
-        D3:0:$30:626264
428
-```
429
-
430
-The example logical signature `TestMacro` is functionally equivalent
431
-to:
432
-
433
-```
434
-`TestMacro;Engine:51-255,Target:0;0;616161{3-4}(626262|636363)`
435
-```
436
-
437
-- `MACROID` points to a group of signatures; there can be at most 32 macro groups.
438
-
439
-  - In the example, `MACROID` is `12` and both `D1` and `D2` are members of macro group `12`. `D3` is a member of separate macro group `30`.
440
-
441
-- `{min-max}` specifies the offset range at which one of the group signatures should match; the offset range is relative to the starting offset of the preceding subsignature. This means a macro subsignature cannot be the first subsignature.
442
-
443
-  - In the example, `{min-max}` is `{6-7}` and it is relative to the start of a `616161` match.
444
-
445
-- For more information and examples please see <https://bugzilla.clamav.net/show_bug.cgi?id=164>.
446
-
447
-### Byte Compare Subsignatures
448
-
449
-Introduced in ClamAV 0.101
450
-
451
-Format: `subsigid_trigger(offset#byte_options#comparisons)`
452
-
453
-Byte compare subsignatures can be used to evaluate a numeric value at a given offset from the start of another (matched) subsignature within the same logical signature. These are executed after all other subsignatures within the logical subsignature are fired, with the exception of PCRE subsignatures. They can evaluate offsets only from a single referenced subsignature, and that subsignature must give a valid match for the evaluation to occur.
454
-
455
-- `subsigid_trigger` is a required field and may refer to any single non-PCRE, non-Byte Compare subsignature within the lsig. The byte compare subsig will evaluate if `subsigid_trigger` matches. Triggering on multiple subsigs or logic based triggering is not currently supported.
456
-
457
-- `offset` is a required field that consists of an `offset_modifier` and a numeric `offset` (hex or decimal offsets are okay).
458
-
459
-  - `offset_modifier` can be either `>>` or `<<` where the former denotes a positive offset and the latter denotes a negative offset. The offset is calculated from the start of `subsigid_trigger`, which allows for byte extraction before the specified match, after the match, and within the match itself.
460
-
461
-  - `offset` must be a positive hex or decimal value. This will be the number of bytes from the start of the referenced `subsigid_trigger` match within the file buffer to begin the comparison.
462
-
463
-- `byte_options` are used to specify the numeric type and endianess of the extracted byte sequence in that order as well as the number of bytes to be read. By default ClamAV will attempt to matchup up to the number of byte specified, unless the `e` (exact) option is specified or the numeric type is `b` (binary).  This field follows the form `[h|d|a|i][l|b][e]num_bytes`
464
-
465
-  - `h|d|a|i` where `h` specifies the byte sequence will be in hex, `d` decimal, `a` automatic detection of hex or decimal at runtime, and `i` signifies raw binary data.
466
-
467
-  - `l|b` where `l` specifies the byte sequence will be in little endian order and `b` big endian. If decimal `d` is specified, big-endian is implied and using `l` will result in a malformed database error.
468
-
469
-  - `e` specifies that ClamAV will only evaluate the comparison if it can extract the exact number of bytes specified. This option is implicitly declared when using the `i` flag.
470
-
471
-  - `num_bytes` specifies the number of bytes to extract. This can be a hex or decimal value. If `i` is specified only 1, 2, 4, and 8 are valid options.
472
-
473
-- `comparisons` are a required field which denotes how to evaluate the extracted byte sequence. Each Byte Compare signature can have one or two `comparison_sets` separated by a comma. Each `comparison_set` consists of a `Comparison_symbol` and a `Comparison_value` and takes the form `Comparison_symbolComparison_value`. Thus, `comparisons` takes the form `comparison_set[,comparison_set]`
474
-
475
-  - `Comparison_symbol` denotes the type of comparison to be done. The supported comparison symbols are `<`, `>`, `=`.
476
-
477
-  - `Comparison_value` is a required field which must be a numeric hex or decimal value. If all other conditions are met, the byte compare subsig will evalutate the extracted byte sequence against this number based on the provided `comparison_symbol`.
478
-
479
-### PCRE subsignatures
480
-
481
-Introduced in ClamAV 0.99
482
-
483
-Format: `Trigger/PCRE/[Flags]`
484
-
485
-PCRE subsignatures are used within a logical signature (`.ldb`) to specify regex matches that execute once triggered by a conditional based on preceding subsignatures. Signatures using PCRE subsignatures require `Engine:81-255` for backwards-compatibility.
486
-
487
-- `Trigger` is a required field that is a valid `LogicalExpression` and may refer to any subsignatures that precede this subsignature. Triggers cannot be self-referential and cannot refer to subsequent subsignatures.
488
-
489
-- `PCRE` is the expression representing the regex to execute. `PCRE` must be delimited by ’/’ and usage of ’/’ within the expression need to be escaped. For backward compatibility, ’;’ within the expression must be expressed as ’`\x3B`’. `PCRE` cannot be empty and (?UTF\*) control sequence is not allowed. If debug is specified, named capture groups are displayed in a post-execution report.
490
-
491
-- `Flags` are a series of characters which affect the compilation and execution of `PCRE` within the PCRE compiler and the ClamAV engine. This field is optional.
492
-
493
-  - `g [CLAMAV_GLOBAL]` specifies to search for ALL matches of PCRE (default is to search for first match). NOTE: INCREASES the time needed to run the PCRE.
494
-
495
-  - `r [CLAMAV_ROLLING]` specifies to use the given offset as the starting location to search for a match as opposed to the only location; applies to subsigs without maxshifts. By default, in order to facilatate normal ClamAV offset behavior, PCREs are auto-anchored (only attempt match on first offset); using the rolling option disables the auto-anchoring.
496
-
497
-  - `e [CLAMAV_ENCOMPASS]` specifies to CONFINE matching between the specified offset and maxshift; applies only when maxshift is specified. Note: DECREASES time needed to run the PCRE.
498
-
499
-  - `i [PCRE_CASELESS]`
500
-
501
-  - `s [PCRE_DOTALL]`
502
-
503
-  - `m [PCRE_MULTILINE]`
504
-
505
-  - `x [PCRE_EXTENDED]`
46
+`*.ftm`: [File Type Magic (FTM)](Signatures/FileTypeMagic.md)
506 47
 
507
-  - `A [PCRE_ANCHORED]`
48
+### Signature databases
508 49
 
509
-  - `E [PCRE_DOLLAR_ENODNLY]`
50
+_Note_: Signature databases with an extension ending in `u` are only loaded when Potentially Unwanted Application (PUA) signatures are enabled (default: off).
510 51
 
511
-  - `U [PCRE_UNGREEDY]`
52
+#### Body-based Signatures
512 53
 
513
-Examples:
54
+Body-based signature content is a definition that matches not based on a hash but based on the specific sequences of bytes exhibited by the target file.
514 55
 
515
-```
516
-Find.All.ClamAV;Engine:81-255,Target:0;1;6265676c6164697427736e6f7462797465636f6465;0/clamav/g
517
-
518
-Find.ClamAV.OnlyAt.299;Engine:81-255,Target:0;2;7374756c747a67657473;7063726572656765786c6f6c;299:0&1/clamav/
519
-
520
-Find.ClamAV.StartAt.300;Engine:81-255,Target:0;3;616c61696e;62756731393238;636c6f736564;300:0&1&2/clamav/r
521
-
522
-Find.All.Encompassed.ClamAV;Engine:81-255,Target:0;3;7768796172656e2774;796f757573696e67;79617261;200,300:0&1&2/clamav/ge
523
-
524
-Named.CapGroup.Pcre;Engine:81-255,Target:0;3;636f75727479617264;616c62756d;74657272696572;50:0&1&2/variable=(?<nilshell>.{16})end/gr
525
-
526
-Firefox.TreeRange.UseAfterFree;Engine:81-255,Target:0,Engine:81-255;0&1&2;2e766965772e73656c656374696f6e;2e696e76616c696461746553656c656374696f6e;0&1/\x2Eview\x2Eselection.*?\x2Etree\s*\x3D\s*null.*?\x2Einvalidate/smi
527
-
528
-Firefox.IDB.UseAfterFree;Engine:81-255,Target:0;0&1;4944424b657952616e6765;0/^\x2e(only|lowerBound|upperBound|bound)\x28.*?\x29.*?\x2e(lower|upper|lowerOpen|upperOpen)/smi
529
-
530
-Firefox.boundElements;Engine:81-255,Target:0;0&1&2;6576656e742e6
531
-26f756e64456c656d656e7473;77696e646f772e636c6f7365;0&1/on(load|click)\s*=\s*\x22?window\.close\s*\x28/si
532
-```
533
-
534
-## Icon signatures for PE files
535
-
536
-ClamAV 0.96 includes an approximate/fuzzy icon matcher to help detecting malicious executables disguising themselves as innocent looking image files, office documents and the like.
537
-
538
-Icon matching is only triggered via .ldb signatures using the special attribute tokens `IconGroup1` or `IconGroup2`. These identify two (optional) groups of icons defined in a .idb database file. The format of the .idb file is:
539
-
540
-```
541
-ICONNAME:GROUP1:GROUP2:ICON_HASH
542
-```
543
-
544
-where:
545
-
546