Browse code

Restructured the signature writing documentation, and supplemented it with dconf documentation, file type magic documentation, and references for ClamAV functionality levels.

Micah Snyder authored on 2018/11/08 06:24:28
Showing 22 changed files
... ...
@@ -15,7 +15,6 @@ Table Of Contents
15 15
 5. [ClamAV Developer Tips and Tricks](UserManual/development.md)
16 16
 6. [Build \[lib\]ClamAV Into Your Programs](UserManual/libclamav.md)
17 17
 7. [Writing ClamAV Signatures](UserManual/Signatures.md)
18
-8. [Writing ClamAV Phishing Signatures](UserManual/PhishSigs.md)
19 18
 
20 19
 -----
21 20
 
22 21
deleted file mode 100644
... ...
@@ -1,89 +0,0 @@
1
-# ClamAV File Types
2
-
3
-ClamAV maintains it's own file typing format and assigns these types using either:
4
-
5
-- Evaluation of a unique sequence of bytes at the start of a file (file type magic).
6
-- File type indicators when parsing container files.
7
-  - For example:
8
-    CL_TYPE_SCRIPT may be assigned to data contained in a PDF when the PDF indicates that a stream of bytes is "Javascript"
9
-- File type determination based on the names or characteristics contained within the file.
10
-  - For example:
11
-    CL_TYPE_OOXML_WORD may be assigned to a Zip file containing files with specific names.
12
-
13
-## CL_TYPE's
14
-
15
-ClamAV Types are prefixed with `CL_TYPE_`.  The following is an exhaustive list of all current CL_TYPE's.
16
-
17
-| CL_TYPE                | Description                                                                       |
18
-|------------------------|-----------------------------------------------------------------------------------|
19
-| `CL_TYPE_7Z`           | 7-Zip Archive                                                                     |
20
-| `CL_TYPE_7ZSFX`        | Self-Extracting 7-Zip Archive                                                     |
21
-| `CL_TYPE_APM`          | Disk Image - Apple Partition Map                                                  |
22
-| `CL_TYPE_ARJ`          | ARJ Archive                                                                       |
23
-| `CL_TYPE_ARJSFX`       | Self-Extracting ARJ Archive                                                       |
24
-| `CL_TYPE_AUTOIT`       | AutoIt Automation Executable                                                      |
25
-| `CL_TYPE_BINARY_DATA`  | binary data                                                                       |
26
-| `CL_TYPE_BINHEX`       | BinHex Macintosh 7-bit ASCII email attachment encoding                            |
27
-| `CL_TYPE_BZ`           | BZip Compressed File                                                              |
28
-| `CL_TYPE_CABSFX`       | Self-Extracting Microsoft CAB Archive                                             |
29
-| `CL_TYPE_CPIO_CRC`     | CPIO Archive (CRC)                                                                |
30
-| `CL_TYPE_CPIO_NEWC`    | CPIO Archive (NEWC)                                                               |
31
-| `CL_TYPE_CPIO_ODC`     | CPIO Archive (ODC)                                                                |
32
-| `CL_TYPE_CPIO_OLD`     | CPIO Archive (OLD, Little Endian or Big Endian)                                   |
33
-| `CL_TYPE_CRYPTFF`      | Files encrypted by CryptFF malware                                                |
34
-| `CL_TYPE_DMG`          | Apple DMG Archive                                                                 |
35
-| `CL_TYPE_ELF`          | ELF Executable (Linux/Unix program or library)                                    |
36
-| `CL_TYPE_GPT`          | Disk Image - GUID Partition Table                                                 |
37
-| `CL_TYPE_GRAPHICS`     | TIFF (Little Endian or Big Endian)                                                |
38
-| `CL_TYPE_GZ`           | GZip Compressed File                                                              |
39
-| `CL_TYPE_HTML_UTF16`   | Wide-Character / UTF16 encoded HTML                                               |
40
-| `CL_TYPE_HTML`         | HTML data                                                                         |
41
-| `CL_TYPE_HWP3`         | Hangul Word Processor (3.X)                                                       |
42
-| `CL_TYPE_HWPOLE2`      | Hangul Word Processor embedded OLE2                                               |
43
-| `CL_TYPE_INTERNAL`     | Internal properties                                                               |
44
-| `CL_TYPE_ISHIELD_MSI`  | Windows Install Shield MSI installer                                              |
45
-| `CL_TYPE_ISO9660`      | ISO 9660 file system for optical disc media                                       |
46
-| `CL_TYPE_JAVA`         | Java Class File                                                                   |
47
-| `CL_TYPE_LNK`          | Microsoft Windows Shortcut File                                                   |
48
-| `CL_TYPE_MACHO_UNIBIN` | Universal Binary/Java Bytecode                                                    |
49
-| `CL_TYPE_MACHO`        | Apple/NeXTSTEP Mach-O Executable file format                                      |
50
-| `CL_TYPE_MAIL`         | Email file                                                                        |
51
-| `CL_TYPE_MBR`          | Disk Image - Master Boot Record                                                   |
52
-| `CL_TYPE_MHTML`        | MHTML Saved Web Page                                                              |
53
-| `CL_TYPE_MSCAB`        | Microsoft CAB Archive                                                             |
54
-| `CL_TYPE_MSCHM`        | Microsoft CHM help archive                                                        |
55
-| `CL_TYPE_MSEXE`        | Microsoft EXE / DLL Executable file                                               |
56
-| `CL_TYPE_MSOLE2`       | Microsoft OLE2 Container file                                                     |
57
-| `CL_TYPE_MSSZDD`       | Microsoft Compressed EXE                                                          |
58
-| `CL_TYPE_NULSFT`       | NullSoft Scripted Installer program                                               |
59
-| `CL_TYPE_OLD_TAR`      | TAR archive (old)                                                                 |
60
-| `CL_TYPE_OOXML_HWP`    | Hangul Office Open Word Processor (5.X)                                           |
61
-| `CL_TYPE_OOXML_PPT`    | Microsoft Office Open XML PowerPoint                                              |
62
-| `CL_TYPE_OOXML_WORD`   | Microsoft Office Open Word 2007+                                                  |
63
-| `CL_TYPE_OOXML_XL`     | Microsoft Office Open Excel 2007+                                                 |
64
-| `CL_TYPE_PART_HFSPLUS` | Apple HFS+ partition                                                              |
65
-| `CL_TYPE_PDF`          | Adobe PDF document                                                                |
66
-| `CL_TYPE_POSIX_TAR`    | TAR archive                                                                       |
67
-| `CL_TYPE_PS`           | Postscript                                                                        |
68
-| `CL_TYPE_RAR`          | RAR Archive                                                                       |
69
-| `CL_TYPE_RARSFX`       | Self-Extracting RAR Archive                                                       |
70
-| `CL_TYPE_RIFF`         | Resource Interchange File Format container formatted file                         |
71
-| `CL_TYPE_RTF`          | Rich Text Format document                                                         |
72
-| `CL_TYPE_SCRENC`       | Files encrypted by ScrEnc malware                                                 |
73
-| `CL_TYPE_SCRIPT`       | Generic type for scripts that don't have their own type (Javascript, Python, etc) |
74
-| `CL_TYPE_SIS`          | Symbian OS Software Installation Script Archive                                   |
75
-| `CL_TYPE_SWF`          | Adobe Flash File (LZMA, Zlib, or uncompressed)                                    |
76
-| `CL_TYPE_TEXT_ASCII`   | ASCII text                                                                        |
77
-| `CL_TYPE_TEXT_UTF16BE` | UTF-16BE text                                                                     |
78
-| `CL_TYPE_TEXT_UTF16LE` | UTF-16LE text                                                                     |
79
-| `CL_TYPE_TEXT_UTF8`    | UTF-8 text                                                                        |
80
-| `CL_TYPE_TNEF`         | Microsoft Outlook & Exchange email attachment format                              |
81
-| `CL_TYPE_UUENCODED`    | UUEncoded (Unix-to-Unix) binary file (Unix email attachment format)               |
82
-| `CL_TYPE_XAR`          | XAR Archive                                                                       |
83
-| `CL_TYPE_XDP`          | Adobe XDP - Embedded PDF                                                          |
84
-| `CL_TYPE_XML_HWP`      | Hangul Word Processor XML (HWPML) Document                                        |
85
-| `CL_TYPE_XML_WORD`     | Microsoft Word 2003 XML Document                                                  |
86
-| `CL_TYPE_XML_XL`       | Microsoft Excel 2003 XML Document                                                 |
87
-| `CL_TYPE_XZ`           | XZ Archive                                                                        |
88
-| `CL_TYPE_ZIP`          | Zip Archive                                                                       |
89
-| `CL_TYPE_ZIPSFX`       | Self-Extracting Zip Archive                                                       |
... ...
@@ -2,7 +2,7 @@
2 2
 
3 3
 Below are the steps for installing ClamAV from source on Debian and Ubuntu Linux.
4 4
 
5
-## Install prerequisitesaa
5
+## Install prerequisites
6 6
 
7 7
 1. Install ClamAV dependencies
8 8
     1. Install the developer tools
... ...
@@ -2,7 +2,7 @@
2 2
 
3 3
 Below are the steps for installing ClamAV from source on Debian and Ubuntu Linux.
4 4
 
5
-## Install prerequisitesaa
5
+## Install prerequisites
6 6
 
7 7
 1. Install ClamAV dependencies
8 8
     1. Install the developer tools
9 9
deleted file mode 100644
... ...
@@ -1,682 +0,0 @@
1
-# PhishSigs
2
-
3
-Table of Contents
4
-- [PhishSigs](#phishsigs)
5
-- [Database file format](#database-file-format)
6
-    - [PDB format](#pdb-format)
7
-    - [GDB format](#gdb-format)
8
-    - [WDB format](#wdb-format)
9
-    - [Hints](#hints)
10
-    - [Examples of PDB signatures](#examples-of-pdb-signatures)
11
-    - [Examples of WDB signatures](#examples-of-wdb-signatures)
12
-    - [Example for how the URL extractor works](#example-for-how-the-url-extractor-works)
13
-    - [How matching works](#how-matching-works)
14
-        - [RealURL, displayedURL concatenation](#realurl-displayedurl-concatenation)
15
-        - [What happens when a match is found](#what-happens-when-a-match-is-found)
16
-        - [Extraction of realURL, displayedURL from HTML tags](#extraction-of-realurl-displayedurl-from-html-tags)
17
-        - [Example](#example)
18
-    - [Simple patterns](#simple-patterns)
19
-    - [Regular expressions](#regular-expressions)
20
-    - [Flags](#flags)
21
-- [Introduction to regular expressions](#introduction-to-regular-expressions)
22
-    - [Special characters](#special-characters)
23
-    - [Character classes](#character-classes)
24
-    - [Escaping](#escaping)
25
-    - [Alternation](#alternation)
26
-    - [Optional matching, and repetition](#optional-matching-and-repetition)
27
-    - [Groups](#groups)
28
-- [How to create database files](#how-to-create-database-files)
29
-    - [How to create and maintain the whitelist (daily.wdb)](#how-to-create-and-maintain-the-whitelist-dailywdb)
30
-    - [How to create and maintain the domainlist (daily.pdb)](#how-to-create-and-maintain-the-domainlist-dailypdb)
31
-    - [Dealing with false positives, and undetected phishing mails](#dealing-with-false-positives-and-undetected-phishing-mails)
32
-        - [False positives](#false-positives)
33
-        - [Undetected phish mails](#undetected-phish-mails)
34
-
35
-# Database file format
36
-
37
-## PDB format
38
-
39
-This file contains urls/hosts that are target of phishing attempts. It
40
-contains lines in the following format:
41
-
42
-```
43
-    R[Filter]:RealURL:DisplayedURL[:FuncLevelSpec]
44
-    H[Filter]:DisplayedHostname[:FuncLevelSpec]
45
-```
46
-
47
-- `R`
48
-
49
-  regular expression, for the concatenated URL
50
-
51
-- `H`
52
-
53
-  matches the `DisplayedHostname` as a simple pattern (literally, no regular expression)
54
-
55
-  - the pattern can match either the full hostname
56
-
57
-  - or a subdomain of the specified hostname
58
-
59
-  - to avoid false matches in case of subdomain matches, the engine checks that there is a dot(`.`) or a space(` `) before the matched portion
60
-
61
-- `Filter`
62
-
63
-  is ignored for R and H for compatibility reasons
64
-
65
-- `RealURL`
66
-
67
-  is the URL the user is sent to, example: *href* attribute of an html anchor (*\<a\> tag*)
68
-
69
-- `DisplayedURL`
70
-
71
-  is the URL description displayed to the user, where its *claimed* they are sent, example: contents of an html anchor (*\<a\> tag*)
72
-
73
-- `DisplayedHostname`
74
-
75
-  is the hostname portion of the DisplayedURL
76
-
77
-- `FuncLevelSpec`
78
-
79
-  an (optional) functionality level, 2 formats are possible:
80
-
81
-  - `minlevel` all engines having functionality level \>= `minlevel` will load this line
82
-
83
-  - `minlevel-maxlevel` engines with functionality level \(>=\) `minlevel`, and \(<\) `maxlevel` will load this line
84
-
85
-## GDB format
86
-
87
-This file contains URL hashes in the following format:
88
-
89
-    S:P:HostPrefix[:FuncLevelSpec]
90
-    S:F:Sha256hash[:FuncLevelSpec]
91
-    S1:P:HostPrefix[:FuncLevelSpec]
92
-    S1:F:Sha256hash[:FuncLevelSpec]
93
-    S2:P:HostPrefix[:FuncLevelSpec]
94
-    S2:F:Sha256hash[:FuncLevelSpec]
95
-    S:W:Sha256hash[:FuncLevelSpec]
96
-
97
-- `S:`
98
-
99
-  These are hashes for Google Safe Browsing - malware sites, and should not be used for other purposes.
100
-
101
-- `S2:`
102
-
103
-  These are hashes for Google Safe Browsing - phishing sites, and should not be used for other purposes.
104
-
105
-- `S1:`
106
-
107
-  Hashes for blacklisting phishing sites. Virus name: Phishing.URL.Blacklisted
108
-
109
-- `S:W:`
110
-
111
-  Locally whitelisted hashes.
112
-
113
-- `HostPrefix`
114
-
115
-  4-byte prefix of the sha256 hash of the last 2 or 3 components of the hostname. If prefix doesn’t match, no further lookups are performed.
116
-
117
-- `Sha256hash`
118
-
119
-  sha256 hash of the canonicalized URL, or a sha256 hash of its prefix/suffix according to the Google Safe Browsing “Performing Lookups” rules. There should be a corresponding `:P:HostkeyPrefix` entry for the hash to be taken into consideration.
120
-
121
-To see which hash/URL matched, look at the `clamscan --debug` output, and look for the following strings: `Looking up hash`, `prefix matched`, and `Hash matched`. Local whitelisting of .gdb entries can be done by creating a local.gdb file, and adding a line `S:W:<HASH>`.
122
-
123
-## WDB format
124
-
125
-This file contains whitelisted url pairs It contains lines in the following format:
126
-
127
-```
128
-    X:RealURL:DisplayedURL[:FuncLevelSpec]
129
-    M:RealHostname:DisplayedHostname[:FuncLevelSpec]
130
-```
131
-
132
-- `X`
133
-
134
-  regular expression, for the *entire URL*, not just the hostname
135
-
136
-  - The regular expression is by default anchored to start-of-line and end-of-line, as if you have used `^RegularExpression$`
137
-
138
-  - A trailing `/` is automatically added both to the regex, and the input string to avoid false matches
139
-
140
-  - The regular expression matches the *concatenation* of the RealURL, a colon(`:`), and the DisplayedURL as a single string. It doesn’t separately match RealURL and DisplayedURL\!
141
-
142
-- `M`
143
-
144
-  matches hostname, or subdomain of it, see notes for H above
145
-
146
-## Hints
147
-
148
-- empty lines are ignored
149
-
150
-- the colons are mandatory
151
-
152
-- Don’t leave extra spaces on the end of a line\!
153
-
154
-- if any of the lines don’t conform to this format, clamav will abort with a Malformed Database Error
155
-
156
-- see section [Extraction-of-realURL](#Extraction-of-realURL,-displayedURL-from-HTML-tags) for more details on realURL/displayedURL
157
-
158
-## Examples of PDB signatures
159
-
160
-To check for phishing mails that target amazon.com, or subdomains of
161
-amazon.com:
162
-
163
-```
164
-    H:amazon.com
165
-```
166
-
167
-To do the same, but for amazon.co.uk:
168
-
169
-```
170
-    H:amazon.co.uk
171
-```
172
-
173
-To limit the signatures to certain engine versions:
174
-
175
-```
176
-    H:amazon.co.uk:20-30
177
-    H:amazon.co.uk:20-
178
-    H:amazon.co.uk:0-20
179
-```
180
-
181
-First line: engine versions 20, 21, ..., 29 can load it
182
-
183
-Second line: engine versions \>= 20 can load it
184
-
185
-Third line: engine versions \< 20 can load it
186
-
187
-In a real situation, you’d probably use the second form. A situation like that would be if you are using a feature of the signatures not available in earlier versions, or if earlier versions have bugs with your signature. Its neither case here, the above examples are for illustrative purposes only.
188
-
189
-## Examples of WDB signatures
190
-
191
-To allow amazon’s country specific domains and amazon.com, to mix domain names in DisplayedURL, and RealURL:
192
-
193
-    X:.+\.amazon\.(at|ca|co\.uk|co\.jp|de|fr)([/?].*)?:.+\.amazon\.com([/?].*)?:17-
194
-
195
-Explanation of this signature:
196
-
197
-- `X:`
198
-
199
-  this is a regular expression
200
-
201
-- `:17-`
202
-
203
-  load signature only for engines with functionality level \>= 17 (recommended for type X)
204
-
205
-The regular expression is the following (X:, :17- stripped, and a / appended)
206
-
207
-```
208
-    .+\.amazon\.(at|ca|co\.uk|co\.jp|de|fr)([/?].*)?:.+\.amazon\.com([/?].*)?/
209
-```
210
-
211
-Explanation of this regular expression (note that it is a single regular expression, and not 2 regular expressions splitted at the :).
212
-
213
-- `.+`
214
-
215
-  any subdomain of
216
-
217
-- `\.amazon\.`
218
-
219
-  domain we are whitelisting (RealURL part)
220
-
221
-- `(at|ca|co\.uk|co\.jp|de|fr)`
222
-
223
-  country-domains: at, ca, co.uk, co.jp, de, fr
224
-
225
-- `([/?].*)?`
226
-
227
-  recomended way to end real url part of whitelist, this protects against embedded URLs (evilurl.example.com/amazon.co.uk/)
228
-
229
-- `:`
230
-
231
-  RealURL and DisplayedURL are concatenated via a :, so match a literal : here
232
-
233
-- `.+`
234
-
235
-  any subdomain of
236
-
237
-- `\.amazon\.com`
238
-
239
-  whitelisted DisplayedURL
240
-
241
-- `([/?].*)?`
242
-
243
-  recommended way to end displayed url part, to protect against embedded URLs
244
-
245
-- `/`
246
-
247
-  automatically added to further protect against embedded URLs
248
-
249
-When you whitelist an entry make sure you check that both domains are owned by the same entity. What this whitelist entry allows is: Links claiming to point to amazon.com (DisplayedURL), but really go to country-specific domain of amazon (RealURL).
250
-
251
-## Example for how the URL extractor works
252
-
253
-Consider the following HTML file:
254
-
255
-```html
256
-    <html>
257
-    <a href="http://1.realurl.example.com/">
258
-      1.displayedurl.example.com
259
-    </a>
260
-    <a href="http://2.realurl.example.com">
261
-      2 d<b>i<p>splayedurl.e</b>xa<i>mple.com
262
-    </a>
263
-    <a href="http://3.realurl.example.com"> 
264
-      3.nested.example.com
265
-      <a href="http://4.realurl.example.com">
266
-        4.displayedurl.example.com
267
-      </a>
268
-    </a>
269
-    <form action="http://5.realurl.example.com">
270
-      sometext
271
-      <img src="http://5.displayedurl.example.com/img0.gif"/>
272
-      <a href="http://5.form.nested.displayedurl.example.com">
273
-        5.form.nested.link-displayedurl.example.com
274
-      </a>
275
-    </form>
276
-    <a href="http://6.realurl.example.com">
277
-      6.displ
278
-      <img src="6.displayedurl.example.com/img1.gif"/>
279
-      ayedurl.example.com
280
-    </a>
281
-    <a href="http://7.realurl.example.com">
282
-      <iframe src="http://7.displayedurl.example.com">
283
-    </a>
284
-```
285
-
286
-The phishing engine extract the following
287
-RealURL/DisplayedURL pairs from it:
288
-
289
-```
290
-    http://1.realurl.example.com/
291
-    1.displayedurl.example.com
292
-
293
-    http://2.realurl.example.com
294
-    2displayedurl.example.com
295
-
296
-    http://3.realurl.example.com
297
-    3.nested.example.com
298
-
299
-    http://4.realurl.example.com
300
-    4.displayedurl.example.com
301
-
302
-    http://5.realurl.example.com
303
-    http://5.displayedurl.example.com/img0.gif
304
-
305
-    http://5.realurl.example.com
306
-    http://5.form.nested.displayedurl.example.com
307
-
308
-    http://5.form.nested.displayedurl.example.com
309
-    5.form.nested.link-displayedurl.example.com
310
-
311
-    http://6.realurl.example.com
312
-    6.displayedurl.example.com
313
-
314
-    http://6.realurl.example.com
315
-    6.displayedurl.example.com/img1.gif
316
-```
317
-
318
-## How matching works
319
-
320
-### RealURL, displayedURL concatenation
321
-
322
-The phishing detection module processes pairs of RealURL/DisplayedURL. Matching against daily.wdb is done as follows: the realURL is concatenated with a `:`, and with the DisplayedURL, then that *line* is matched against the lines in daily.wdb/daily.pdb
323
-
324
-So if you have this line in daily.wdb:
325
-
326
-    M:www.google.ro:www.google.com
327
-
328
-and this href: `<a href='http://www.google.ro'>www.google.com</a>` then it will be whitelisted, but: `<a href='http://images.google.com'>www.google.com</a>` will not.
329
-
330
-### What happens when a match is found
331
-
332
-In the case of the whitelist, a match means that the RealURL/DisplayedURL combination is considered clean, and no further checks are performed on it.
333
-
334
-In the case of the domainlist, a match means that the RealURL/displayedURL is going to be checked for phishing attempts.
335
-
336
-Furthermore you can restrict what checks are to be performed by specifying the 3-digit hexnumber.
337
-
338
-### Extraction of realURL, displayedURL from HTML tags
339
-
340
-The html parser extracts pairs of realURL/displayedURL based on the following rules.
341
-
342
-In version 0.93: After URLs have been extracted, they are normalized, and cut after the hostname. `http://test.example.com/path/somecgi?queryparameters` becomes `http://test.example.com/`
343
-
344
-- `a`
345
-
346
-  (anchor) the *href* is the realURL, its *contents* is the displayedURL
347
-
348
-  - contents
349
-    is the tag-stripped contents of the \<a\> tags, so for example \<b\> tags are stripped (but not their contents)
350
-
351
-  nesting another \<a\> tag withing an \<a\> tag (besides being invalid html) is treated as a \</a\>\<a..
352
-
353
-- `form`
354
-
355
-  the *action* attribute is the realURL, and a nested \<a\> tag is the displayedURL
356
-
357
-- `img/area`
358
-
359
-  if nested within an *\<a\>* tag, the realURL is the *href* of the a tag, and the *src/dynsrc/area* is the displayedURL of the img
360
-
361
-  if nested withing a *form* tag, then the action attribute of the *form* tag is the realURL
362
-
363
-- `iframe`
364
-
365
-  if nested withing an *\<a\>* tag the *src* attribute is the displayedURL, and the *href* of its parent *a* tag is the realURL
366
-
367
-  if nested withing a *form* tag, then the action attribute of the *form* tag is the realURL
368
-
369
-### Example
370
-
371
-Consider this html file:
372
-
373
-```html
374
-<a href=”evilurl”\>www.paypal.com\</a\>*
375
-
376
-<a href=”evilurl2” title=”www.ebay.com”\>click here to sign
377
-in\</a\>*
378
-
379
-<form action=”evilurl_form”\>*
380
-
381
-*Please sign in to \<a href=”cgi.ebay.com”\>Ebay\</a\using this
382
-form*
383
-
384
-<input type=’text’ name=’username’\>Username\</input\>*
385
-
386
-*....*
387
-
388
-</form\>*
389
-
390
-<a href=”evilurl”\>\<img src=”images.paypal.com/secure.jpg”\>\</a\>*
391
-```
392
-
393
-The resulting realURL/displayedURL pairs will be (note that one tag can generate multiple pairs):
394
-
395
-- evilurl / www.paypal.com
396
-
397
-- evilurl2 / click here to sign in
398
-
399
-- evilurl2 / www.ebay.com
400
-
401
-- evilurl_form / cgi.ebay.com
402
-
403
-- cgi.ebay.com / Ebay
404
-
405
-- evilurl / image.paypal.com/secure.jpg
406
-
407
-## Simple patterns
408
-
409
-Simple patterns are matched literally, i.e. if you say:
410
-
411
-```
412
-www.google.com
413
-```
414
-
415
-it is going to match *www.google.com*, and only that. The *. (dot)* character has no special meaning (see the section on regexes [\[sec:Regular-expressions\]](#sec:Regular-expressions) for how the *.(dot)* character behaves there)
416
-
417
-## Regular expressions
418
-
419
-POSIX regular expressions are supported, and you can consider that internally it is wrapped by *^*, and *$.* In other words, this means that the regular expression has to match the entire concatenated (see section [RealURL,-displayedURL-concatenation](#RealURL,-displayedURL-concatenation) for details on concatenation) url.
420
-
421
-It is recomended that you read section [Introduction-to-regular](#Introduction-to-regular) to learn how to write regular expressions, and then come back and read this for hints.
422
-
423
-Be advised that clamav contains an internal, very basic regex matcher to reduce the load on the regex matching core. Thus it is recomended that you avoid using regex syntax not supported by it at the very beginning of regexes (at least the first few characters).
424
-
425
-Currently the clamav regex matcher supports:
426
-
427
-- `.` (dot) character
428
-
429
-- `\(\backslash\)` (escaping special characters)
430
-
431
-- `|` (pipe) alternatives
432
-
433
-- `\[\]` (character classes)
434
-
435
-- `()` (parenthesis for grouping, but no group extraction is performed)
436
-
437
-- other non-special characters
438
-
439
-Thus the following are not supported:
440
-
441
-- `\+` repetition
442
-
443
-- `\*` repetition
444
-
445
-- `{}` repetition
446
-
447
-- backreferences
448
-
449
-- lookaround
450
-
451
-- other “advanced” features not listed in the supported list ;)
452
-
453
-This however shouldn’t discourage you from using the “not directly supported features “, because if the internal engine encounters unsupported syntax, it passes it on to the POSIX regex core (beginning from the first unsupported token, everything before that is still processed by the internal matcher). An example might make this more clear:
454
-
455
-*www\(\backslash\).google\(\backslash\).(com|ro|it) (\[a-zA-Z\])+\(\backslash\).google\(\backslash\).(com|ro|it)*
456
-
457
-Everything till *(\[a-zA-Z\])+* is processed internally, that parenthesis (and everything beyond) is processed by the posix core.
458
-
459
-Examples of url pairs that match:
460
-
461
-- *www.google.ro images.google.ro*
462
-
463
-- www.google.com images.google.ro
464
-
465
-Example of url pairs that don’t match:
466
-
467
-- www.google.ro images1.google.ro
468
-
469
-- images.google.com image.google.com
470
-
471
-## Flags
472
-
473
-Flags are a binary OR of the following numbers:
474
-
475
-- HOST_SUFFICIENT
476
-
477
-  1
478
-
479
-- DOMAIN_SUFFICIENT
480
-
481
-  2
482
-
483
-- DO_REVERSE_LOOKUP
484
-
485
-  4
486
-
487
-- CHECK_REDIR
488
-
489
-  8
490
-
491
-- CHECK_SSL
492
-
493
-  16
494
-
495
-- CHECK_CLOAKING
496
-
497
-  32
498
-
499
-- CLEANUP_URL
500
-
501
-  64
502
-
503
-- CHECK_DOMAIN_REVERSE
504
-
505
-  128
506
-
507
-- CHECK_IMG_URL
508
-
509
-  256
510
-
511
-- DOMAINLIST_REQUIRED
512
-
513
-  512
514
-
515
-The names of the constants are self-explanatory.
516
-
517
-These constants are defined in libclamav/phishcheck.h, you can check there for the latest flags.
518
-
519
-There is a default set of flags that are enabled, these are currently:
520
-
521
-    ( CLEANUP_URL | CHECK_SSL | CHECK_CLOAKING | CHECK_IMG_URL )
522
-
523
-ssl checking is performed only for a tags currently.
524
-
525
-You must decide for each line in the domainlist if you want to filter any flags (that is you don’t want certain checks to be done), and then calculate the binary OR of those constants, and then convert it into a 3-digit hexnumber. For example you devide that domain_sufficient shouldn’t be used for ebay.com, and you don’t want to check images either, so you come up with this flag number: \(2|256\Rightarrow\)258\((decimal)\Rightarrow102(hexadecimal)\)
526
-
527
-So you add this line to daily.wdb:
528
-
529
-- R102 www.ebay.com .+
530
-
531
-# Introduction to regular expressions
532
-
533
-Recomended reading:
534
-
535
-- http://www.regular-expressions.info/quickstart.html
536
-
537
-- http://www.regular-expressions.info/tutorial.html
538
-
539
-- regex(7) man-page: http://www.tin.org/bin/man.cgi?section=7\&topic=regex
540
-
541
-## Special characters
542
-
543
-- \[
544
-
545
-  the opening square bracket - it marks the beginning of a character class, see section[Character-classes](#Character-classes)
546
-
547
-- \(\backslash\)
548
-
549
-  the backslash - escapes special characters, see section [Escaping](#Escaping)
550
-
551
-- ^
552
-
553
-  the caret - matches the beginning of a line (not needed in clamav regexes, this is implied)
554
-
555
-- $
556
-
557
-  the dollar sign - matches the end of a line (not needed in clamav regexes, this is implied)
558
-
559
-- ̇
560
-
561
-  the period or dot - matches *any* character
562
-
563
-- |
564
-
565
-  the vertical bar or pipe symbol - matches either of the token on its left and right side, see section [Alternation](#sub:Alternation)
566
-
567
-- ?
568
-
569
-  the question mark - matches optionally the left-side token, see section[Optional-matching,-and](Optional-matching,-and)
570
-
571
-- \*
572
-
573
-  the asterisk or star - matches 0 or more occurences of the left-side token, see section [Optional-matching,-and](Optional-matching,-and)
574
-
575
-- +
576
-
577
-  the plus sign - matches 1 or more occurences of the left-side token, see section [Optional-matching,-and](Optional-matching,-and)
578
-
579
-- (
580
-
581
-  the opening round bracket - marks beginning of a group, see section [Groups](Groups)
582
-
583
-- )
584
-
585
-  the closing round bracket - marks end of a group, see section[Groups](Groups)
586
-
587
-## Character classes
588
-
589
-## Escaping
590
-
591
-Escaping has two purposes:
592
-
593
-- it allows you to actually match the special characters themselves, for example to match the literal *+*, you would write *\(\backslash\)+*
594
-
595
-- it also allows you to match non-printable characters, such as the tab (*\(\backslash\)t*), newline (*\(\backslash\)n*), ..
596
-
597
-However since non-printable characters are not valid inside an url, you won’t have a reason to use them.
598
-
599
-## Alternation
600
-
601
-## Optional matching, and repetition
602
-
603
-## Groups
604
-
605
-Groups are usually used together with repetition, or alternation. For example: *(com|it)+* means: match 1 or more repetitions of *com* or *it,* that is it matches: com, it, comcom, comcomcom, comit, itit, ititcom,... you get the idea.
606
-
607
-Groups can also be used to extract substring, but this is not supported by the clam engine, and not needed either in this case.
608
-
609
-# How to create database files
610
-
611
-## How to create and maintain the whitelist (daily.wdb)
612
-
613
-If the phishing code claims that a certain mail is phishing, but its not, you have 2 choices:
614
-
615
-- examine your rules daily.pdb, and fix them if necessary (see: section[How-to-create](How-to-create))
616
-
617
-- add it to the whitelist (discussed here)
618
-
619
-Lets assume you are having problems because of links like this in a mail:
620
-
621
-```html
622
-    <a href=''http://69.0.241.57/bCentral/L.asp?L=XXXXXXXX''>
623
-      http://www.bcentral.it/
624
-    </a>
625
-```
626
-
627
-After investigating those sites further, you decide they are no threat, and create a line like this in daily.wdb:
628
-
629
-```
630
-R http://www\(\backslash\).bcentral\(\backslash\).it/.+
631
-http://69\(\backslash\).0\(\backslash\).241\(\backslash\).57/bCentral/L\(\backslash\).asp?L=.+
632
-```
633
-
634
-Note: urls like the above can be used to track unique mail recipients, and thus know if somebody actually reads mails (so they can send more spam). However since this site required no authentication information, it is safe from a phishing point of view.
635
-
636
-## How to create and maintain the domainlist (daily.pdb)
637
-
638
-When not using –phish-scan-alldomains (production environments for example), you need to decide which urls you are going to check.
639
-
640
-Although at a first glance it might seem a good idea to check everything, it would produce false positives. Particularly newsletters, ads, etc. are likely to use URLs that look like phishing attempts.
641
-
642
-Lets assume that you’ve recently seen many phishing attempts claiming they come from Paypal. Thus you need to add paypal to daily.pdb:
643
-
644
-```
645
-R .+ .+\(\backslash\).paypal\(\backslash\).com
646
-```
647
-
648
-The above line will block (detect as phishing) mails that contain urls that claim to lead to paypal, but they don’t in fact.
649
-
650
-Be carefull not to create regexes that match a too broad range of urls though.
651
-
652
-## Dealing with false positives, and undetected phishing mails
653
-
654
-### False positives
655
-
656
-Whenever you see a false positive (mail that is detected as phishing, but its not), you need to examine *why* clamav decided that its phishing. You can do this easily by building clamav with debugging (./configure –enable-experimental –enable-debug), and then running a tool:
657
-
658
-```bash
659
-$contrib/phishing/why.py phishing.eml
660
-```
661
-
662
-This will show the url that triggers the phish verdict, and a reason why that url is considered phishing attempt.
663
-
664
-Once you know the reason, you might need to modify daily.pdb (if one of yours rules inthere are too broad), or you need to add the url to daily.wdb. If you think the algorithm is incorrect, please file a bug report on bugzilla.clamav.net, including the output of *why.py*.
665
-
666
-### Undetected phish mails
667
-
668
-Using why.py doesn’t help here unfortunately (it will say: clean), so all you can do is:
669
-
670
-```bash
671
-$clamscan/clamscan –phish-scan-alldomains undetected.eml
672
-```
673
-
674
-And see if the mail is detected, if yes, then you need to add an appropriate line to daily.pdb (see section [How-to-create](How-to-create)).
675
-
676
-If the mail is not detected, then try using:
677
-
678
-```bash
679
-$clamscan/clamscan –debug undetected.eml|less
680
-```
681
-
682
-Then see what urls are being checked, see if any of them is in a whitelist, see if all urls are detected, etc.
... ...
@@ -3,849 +3,131 @@
3 3
 Table of Contents
4 4
 
5 5
 - [Creating signatures for ClamAV](#creating-signatures-for-clamav)
6
-- [Introduction](#introduction)
7
-- [Signature formats](#signature-formats)
8
-    - [Hash-based signatures](#hash-based-signatures)
9
-        - [MD5 hash-based signatures](#md5-hash-based-signatures)
10
-        - [SHA1 and SHA256 hash-based signatures](#sha1-and-sha256-hash-based-signatures)
11
-        - [PE section based hash signatures](#pe-section-based-hash-signatures)
12
-        - [Hash signatures with unknown size](#hash-signatures-with-unknown-size)
13
-    - [Body-based signatures](#body-based-signatures)
14
-        - [Hexadecimal format](#hexadecimal-format)
15
-        - [Wildcards](#wildcards)
16
-        - [Character classes](#character-classes)
17
-        - [Alternate strings](#alternate-strings)
18
-        - [Basic signature format](#basic-signature-format)
19
-        - [Extended signature format](#extended-signature-format)
20
-        - [Logical signatures](#logical-signatures)
21
-        - [Subsignature Modifiers](#subsignature-modifiers)
22
-    - [Special Subsignature Types](#special-subsignature-types)
23
-        - [Macro subsignatures](#macro-subsignatures)
24
-        - [Byte Compare Subsignatures](#byte-compare-subsignatures)
25
-        - [PCRE subsignatures](#pcre-subsignatures)
26
-    - [Icon signatures for PE files](#icon-signatures-for-pe-files)
27
-    - [Signatures for Version Information metadata in PE files](#signatures-for-version-information-metadata-in-pe-files)
28
-    - [Trusted and Revoked Certificates](#trusted-and-revoked-certificates)
29
-    - [Signatures based on container metadata](#signatures-based-on-container-metadata)
30
-    - [Whitelist databases](#whitelist-databases)
31
-    - [Signature names](#signature-names)
32
-    - [Using YARA rules in ClamAV](#using-yara-rules-in-clamav)
33
-    - [Passwords for archive files \[experimental\]](#passwords-for-archive-files-experimental)
34
-- [Signature writing tips and tricks](#signature-writing-tips-and-tricks)
35
-    - [Testing rules with clamscan](#testing-rules-with-clamscan)
36
-    - [Debug information from libclamav](#debug-information-from-libclamav)
37
-    - [Writing signatures for special files](#writing-signatures-for-special-files)
38
-        - [HTML](#html)
39
-        - [Text files](#text-files)
40
-        - [Compressed Portable Executable files](#compressed-portable-executable-files)
41
-    - [Using sigtool](#using-sigtool)
42
-    - [Inspecting signatures inside a CVD file](#inspecting-signatures-inside-a-CVD-file)
43
-    - [External tools](#external-tools)
44
-
45
-# Introduction
6
+    - [Introduction](#introduction)
7
+    - [Database formats](#database-formats)
8
+        - [Settings databases](#settings-databases)
9
+        - [Signature databases](#signature-databases)
10
+            - [Body-based Signatures](#body-based-signatures)
11
+            - [Hash-based Signatures](#hash-based-signatures)
12
+            - [Alternative signature support](#alternative-signature-support)
13
+        - [Other database files](#other-database-files)
14
+        - [Signature names](#signature-names)
15
+    - [Signature Writing Tips and Tricks](#signature-writing-tips-and-tricks)
16
+        - [Testing rules with `clamscan`](#testing-rules-with-clamscan)
17
+        - [Debug information from libclamav](#debug-information-from-libclamav)
18
+        - [Writing signatures for special files](#writing-signatures-for-special-files)
19
+            - [HTML](#html)
20
+            - [Text files](#text-files)
21
+            - [Compressed Portable Executable files](#compressed-portable-executable-files)
22
+        - [Using `sigtool`](#using-sigtool)
23
+        - [Inspecting signatures inside a CVD file](#inspecting-signatures-inside-a-cvd-file)
24
+        - [External tools](#external-tools)
25
+
26
+## Introduction
46 27
 
47 28
 In order to detect malware and other file-based threats, ClamAV relies on signatures to differentiate clean and malicious/unwanted files.  ClamAV signatures are primarily text-based and conform to one of the ClamAV-specific signature formats associated with a given method of detection.  These formats are explained in the [Signature formats](#signature-formats) section below.  In addition, ClamAV 0.99 and above support signatures written in the YARA format.  More information on this can be found in the [Using YARA rules in ClamAV](#using-yara-rules-in-clamav) section.
48 29
 
49 30
 The ClamAV project distributes a collection of signatures in the form of CVD (ClamAV Virus Database) files.  The CVD file format provides a digitally-signed container that encapsulates the signatures and ensures that they can't be modified by a malicious third-party.  This signature set is actively maintained by [Cisco Talos](https://www.talosintelligence.com/) and can be downloaded using the `freshclam` application that ships with ClamAV.  For more details on this, see the [CVD file](#inspecting-signatures-inside-a-CVD-file) section.
50 31
 
51
-# Signature formats
32
+## Database formats
52 33
 
53
-## Hash-based signatures
34
+ClamAV CVD and CLD database archives may be unpacked to the current directory using `sigtool -u <database name>`. For more details on inspecting CVD and CLD files, see [Inspecting signatures inside a CVD file](#inspecting-signatures-inside-a-cvd-file). Once unpacked, you'll observe a large collection of database files with various extensions described below.
54 35
 
55
-The easiest way to create signatures for ClamAV is to use filehash checksums, however this method can be only used against static malware.
36
+The CVD and CLD database archives may be supplemented with custom database files in the formats described to gain additional detection functionality. This is done simply by adding files of the following formats to the database directory, typically `/usr/local/share/clamav` or `"C:\Program Files\ClamAV\database"`. Alternatively, `clamd` and `clamscan` can be instructed to load the database from an alternative database file or database directory manually using the `clamd` `DatabaseDirectory` config option or the `clamscan -d` command line option.
56 37
 
57
-### MD5 hash-based signatures
38
+### Settings databases
58 39
 
59
-To create a MD5 signature for `test.exe` use the `--md5` option of
60
-sigtool:
40
+ClamAV provides a handful of configuration related databases along side the signature definitions.
61 41
 
62
-```bash
63
-zolw@localhost:/tmp/test$ sigtool --md5 test.exe > test.hdb
64
-zolw@localhost:/tmp/test$ cat test.hdb
65
-48c4533230e1ae1c118c741c0db19dfb:17387:test.exe
66
-```
67
-
68
-That’s it\! The signature is ready for use:
69
-
70
-```bash
71
-zolw@localhost:/tmp/test$ clamscan -d test.hdb test.exe
72
-test.exe: test.exe FOUND
73
-
74
-Known viruses: 1
75
-Scanned directories: 0
76
-Engine version: 0.92.1
77
-Scanned files: 1
78
-Infected files: 1
79
-Data scanned: 0.02 MB
80
-Time: 0.024 sec (0 m 0 s)
81
-```
82
-
83
-You can change the name (by default sigtool uses the name of the file) and place it inside a `*.hdb` file. A single database file can include any number of signatures. To get them automatically loaded each time clamscan/clamd starts just copy the database file(s) into the local virus database directory (eg. /usr/local/share/clamav).
84
-
85
-*The hash-based signatures shall not be used for text files, HTML and any other data that gets internally preprocessed before pattern matching. If you really want to use a hash signature in such a case, run clamscan with –debug and –leave-temps flags as described above and create a signature for a preprocessed file left in /tmp. Please keep in mind that a hash signature will stop matching as soon as a single byte changes in the target file.*
86
-
87
-### SHA1 and SHA256 hash-based signatures
88
-
89
-ClamAV 0.98 has also added support for SHA1 and SHA256 file checksums. The format is the same as for MD5 file checksum. It can differentiate between them based on the length of the hash string in the signature. For best backwards compatibility, these should be placed inside a `*.hsb` file. The format is:
90
-
91
-```
92
-HashString:FileSize:MalwareName
93
-```
94
-
95
-### PE section based hash signatures
96
-
97
-You can create a hash signature for a specific section in a PE file. Such signatures shall be stored inside `.mdb` files in the following format:
98
-
99
-```
100
-PESectionSize:PESectionHash:MalwareName
101
-```
102
-
103
-The easiest way to generate MD5 based section signatures is to extract target PE sections into separate files and then run sigtool with the option `--mdb`
104
-
105
-ClamAV 0.98 has also added support for SHA1 and SHA256 section based signatures. The format is the same as for MD5 PE section based signatures. It can differentiate between them based on the length of the hash string in the signature. For best backwards compatibility, these should be placed inside a `*.msb` file.
106
-
107
-### Hash signatures with unknown size
108
-
109
-ClamAV 0.98 has also added support for hash signatures where the size is not known but the hash is. It is much more performance-efficient to use signatures with specific sizes, so be cautious when using this feature. For these cases, the ’\*’ character can be used in the size field. To ensure proper backwards compatibility with older versions of ClamAV, these signatures must have a minimum functional level of 73 or higher. Signatures that use the wildcard size without this level set will be rejected as malformed.
110
-
111
-```
112
-Sample .hsb signature matching any size
113
-HashString:*:MalwareName:73
114
-
115
-Sample .msb signature matching any size
116
-*:PESectionHash:MalwareName:73
117
-```
118
-
119
-## Body-based signatures
120
-
121
-ClamAV stores all body-based signatures in a hexadecimal format. In this section by a hex-signature we mean a fragment of malware’s body converted into a hexadecimal string which can be additionally extended using various wildcards.
122
-
123
-### Hexadecimal format
124
-
125
-You can use `sigtool --hex-dump` to convert any data into a hex-string:
126
-
127
-```bash
128
-zolw@localhost:/tmp/test$ sigtool --hex-dump
129
-How do I look in hex?
130
-486f7720646f2049206c6f6f6b20696e206865783f0a
131
-```
132
-
133
-### Wildcards
134
-
135
-ClamAV supports the following wildcards for hex-signatures:
136
-
137
-- `??`
138
-
139
-  Match any byte.
140
-
141
-- `a?`
142
-
143
-  Match a high nibble (the four high bits).
144
-  **IMPORTANT NOTE:** The nibble matching is only available in
145
-  libclamav with the functionality level 17 and higher therefore
146
-  please only use it with .ndb signatures followed by ":17"
147
-  (MinEngineFunctionalityLevel, see [3.2.7](#ndb)).
148
-
149
-- `?a`
150
-
151
-  Match a low nibble (the four low bits).
152
-
153
-- `*`
154
-
155
-  Match any number of bytes.
156
-
157
-- `{n}`
158
-
159
-  Match \(n\) bytes.
160
-
161
-- `{-n}`
162
-
163
-  Match \(n\) or less bytes.
164
-
165
-- `{n-}`
166
-
167
-  Match \(n\) or more bytes.
168
-
169
-- `{n-m}`
170
-
171
-  Match between \(n\) and \(m\) bytes (\(m > n\)).
172
-
173
-- `HEXSIG[x-y]aa` or `aa[x-y]HEXSIG`
174
-
175
-  Match aa anchored to a hex-signature, see
176
-  <https://bugzilla.clamav.net/show_bug.cgi?id=776> for discussion and
177
-  examples.
178
-
179
-The range signatures `*` and `{}` virtually separate a hex-signature into two parts, eg. `aabbcc*bbaacc` is treated as two sub-signatures `aabbcc` and `bbaacc` with any number of bytes between them. It’s a requirement that each sub-signature includes a block of two static characters somewhere in its body. Note that there is one exception to this restriction; that is when the range wildcard is of the form `{n}` with `n<128`. In this case, ClamAV uses an optimization and translates `{n}` to the string consisting of `n ??` character wildcards. Character wildcards do not divide hex signatures into two parts and so the two static character requirement does not apply.
180
-
181
-### Character classes
182
-
183
-ClamAV supports the following character classes for hex-signatures:
184
-
185
-- `(B)`
186
-
187
-  Match word boundary (including file boundaries).
188
-
189
-- `(L)`
190
-
191
-  Match CR, CRLF or file boundaries.
192
-
193
-- `(W)`
194
-
195
-  Match a non-alphanumeric character.
196
-
197
-### Alternate strings
198
-
199
-- Single-byte alternates (clamav-0.96) `(aa|bb|cc|...)` or `!(aa|bb|cc|...)` Match a member from a set of bytes \[aa, bb, cc, ...\].
200
-  - Negation operation can be applied to match any non-member, assumed to be one-byte in length.
201
-  - Signature modifiers and wildcards cannot be applied.
202
-
203
-- Multi-byte fixed length alternates `(aaaa|bbbb|cccc|...)` or `!(aaaa|bbbb|cccc|...)` Match a member from a set of multi-byte alternates \[aaaa, bbbb, cccc, ...\] of n-length.
204
-  - All set members must be the same length.
205
-  - Negation operation can be applied to match any non-member, assumed to be n-bytes in length (clamav-0.98.2).
206
-  - Signature modifiers and wildcards cannot be applied.
207
-
208
-- Generic alternates (clamav-0.99) `(alt1|alt2|alt3|...)` Match a member from a set of alternates \[alt1, alt2, alt3, ...\] that can be of variable lengths.
209
-  - Negation operation cannot be applied.
210
-  - Signature modifiers and nibble wildcards \[`??, a?, ?a`\] can be applied.
211
-  - Ranged wildcards \[`{n-m}`\] are limited to a fixed range of less than 128 bytes \[`{1} -> {127}`\].
212
-
213
-Note that using signature modifiers and wildcards classifies the alternate type to be a generic alternate. Thus single-byte alternates and multi-byte fixed length alternates can use signature modifiers and wildcards but will be classified as generic alternate. This means that negation cannot be applied in this situation and there is a slight performance impact.
214
-
215
-### Basic signature format
216
-
217
-The simplest (and now deprecated) signature format is:
218
-
219
-```
220
-MalwareName=HexSignature
221
-```
222
-
223
-ClamAV will scan the entire file looking for HexSignature. All signatures of this type must be placed inside `*.db` files.
224
-
225
-### Extended signature format
226
-
227
-The extended signature format allows for specification of additional information such as a target file type, virus offset or engine version, making the detection more reliable. The format is:
228
-
229
-```
230
-MalwareName:TargetType:Offset:HexSignature[:MinFL:[MaxFL]]
231
-```
232
-
233
-where `TargetType` is one of the following numbers specifying the type of the target file:
234
-
235
-- 0 = any file
236
-
237
-- 1 = Portable Executable, both 32- and 64-bit.
238
-
239
-- 2 = OLE2 containers, including their specific macros. The OLE2 format is primarily used by MS Office and MSI installation files.
240
-
241
-- 3 = HTML (normalized: whitespace transformed to spaces, tags/tag attributes normalized, all lowercase), Javascript is normalized too: all strings are normalized (hex encoding is decoded), numbers are parsed and normalized, local variables/function names are normalized to ’n001’ format, argument to eval() is parsed as JS again, unescape() is handled, some simple JS packers are handled, output is whitespace normalized.
242
-
243
-- 4 = Mail file
244
-
245
-- 5 = Graphics
246
-
247
-- 6 = ELF
248
-
249
-- 7 = ASCII text file (normalized)
250
-
251
-- 8 = Unused
252
-
253
-- 9 = Mach-O files
254
-
255
-- 10 = PDF files
256
-
257
-- 11 = Flash files
258
-
259
-- 12 = Java class files
260
-
261
-And `Offset` is an asterisk or a decimal number `n` possibly combined with a special modifier:
262
-
263
-- `*` = any
264
-
265
-- `n` = absolute offset
266
-
267
-- `EOF-n` = end of file minus `n` bytes
268
-
269
-Signatures for PE, ELF and Mach-O files additionally support:
270
-
271
-- `EP+n` = entry point plus n bytes (`EP+0` for `EP`)
272
-
273
-- `EP-n` = entry point minus n bytes
274
-
275
-- `Sx+n` = start of section `x`’s (counted from 0) data plus `n` bytes
276
-
277
-- `SEx` = entire section `x` (offset must lie within section boundaries)
278
-
279
-- `SL+n` = start of last section plus `n` bytes
280
-
281
-All the above offsets except `*` can be turned into **floating offsets** and represented as `Offset,MaxShift` where `MaxShift` is an unsigned integer. A floating offset will match every offset between `Offset` and `Offset+MaxShift`, eg. `10,5` will match all offsets from 10 to 15 and `EP+n,y` will match all offsets from `EP+n` to `EP+n+y`. Versions of ClamAV older than 0.91 will silently ignore the `MaxShift` extension and only use `Offset`. Optional `MinFL` and `MaxFL` parameters can restrict the signature to specific engine releases. All signatures in the extended format must be placed inside `*.ndb` files.
282
-
283
-### Logical signatures
284
-
285
-Logical signatures allow combining of multiple signatures in extended format using logical operators. They can provide both more detailed and flexible pattern matching. The logical sigs are stored inside `*.ldb` files in the following format:
286
-
287
-```
288
-SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;
289
-Subsig1;Subsig2;...
290
-```
291
-
292
-where:
293
-
294
-- `TargetDescriptionBlock` provides information about the engine and target file with comma separated `Arg:Val` pairs. For args where `Val` is a range, the minimum and maximum values should be expressed as `min-max`.
295
-
296
-- `LogicalExpression` specifies the logical expression describing the relationship between `Subsig0...SubsigN`. **Basis clause:** 0,1,...,N decimal indexes are SUB-EXPRESSIONS representing `Subsig0, Subsig1,...,SubsigN` respectively. **Inductive clause:** if `A` and `B` are SUB-EXPRESSIONS and `X, Y` are decimal numbers then `(A&B)`, `(A|B)`, `A=X`, `A=X,Y`, `A>X`, `A>X,Y`, `A<X` and `A<X,Y` are SUB-EXPRESSIONS
297
-
298
-- `SubsigN` is n-th subsignature in extended format possibly preceded with an offset. There can be specified up to 64 subsigs.
299
-
300
-Keywords used in `TargetDescriptionBlock`:
301
-
302
-- `Target:X`: Target file type
303
-
304
-- `Engine:X-Y`: Required engine functionality (range; 0.96). Note that if the `Engine` keyword is used, it must be the first one in the `TargetDescriptionBlock` for backwards compatibility
305
-
306
-- `FileSize:X-Y`: Required file size (range in bytes; 0.96)
307
-
308
-- `EntryPoint`: Entry point offset (range in bytes; 0.96)
309
-
310
-- `NumberOfSections`: Required number of sections in executable (range; 0.96)
311
-
312
-- `Container:CL_TYPE_*`: File type of the container which stores the scanned file.
313
-
314
-  Specifying `CL_TYPE_ANY` matches on root objects only (i.e. the target file is explicitely _not_ in a container). Chances slim that you would want to use `CL_TYPE_ANY` in a signature, because placing the malicious file in an archive will then prevent it from alerting.
315
-
316
-  Every ClamAV file type has the potential to be a container for additional files, although some are more likely than others. When a file is parsed and data in the file is identified to be scanned as a unique type, that parent file becomes a container the moment the embedded content is scanned. For a list of possible CL_TYPEs, refer to the [File Types Reference](ClamAV-File-Types.md).
317
-
318
-- `Intermediates:CL_TYPE_*>CL_TYPE_*`: Specify one or more layers of file types containing the scanned file. _This is an alternative to using `Container`._
319
-
320
-  You may specify up to 16 layers of file types separated by ’`>`’ in top-down order. Note that the ’`>`’ separator is not needed if you only specify a single container. The last type should be the immediate container containing the malicious file. Unlike with the `Container` option, `CL_TYPE_ANY` can be used as a wildcard file type. (expr; 0.100.0)
321
-
322
-  For a list of possible CL_TYPEs, refer to the [File Types Reference](ClamAV-File-Types.md).
323
-
324
-- `IconGroup1`: Icon group name 1 from .idb signature Required engine functionality (range; 0.96)
325
-
326
-- `IconGroup2`: Icon group name 2 from .idb signature Required engine functionality (range; 0.96)
327
-
328
-Modifiers for subexpressions:
329
-
330
-- `A=X`: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched exactly X times; if it refers to a (logical) block of signatures then this block must generate exactly X matches (with any of its sigs).
331
-
332
-- `A=0` specifies negation (signature or block of signatures cannot be matched)
333
-
334
-- `A=X,Y`: If the SUB-EXPRESSION A refers to a single signature then this signature must be matched exactly X times; if it refers to a (logical) block of signatures then this block must generate X matches and at least Y different signatures must get matched.
335
-
336
-- `A>X`: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches (with any of its sigs).
337
-
338
-- `A>X,Y`: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches _and_ at least Y different signatures must be matched.
339
-
340
-- `A<X`: Just like `A>Z` above with the change of "more" to "less".
341
-
342
-  If the SUB-EXPRESSION A refers to a single signature then this signature must get matched less than X times; if it refers to a (logical) block of signatures then this block must generate less than X matches (with any of its sigs).
343
-
344
-- `A<X,Y`: Similar to `A>X,Y`. If the SUB-EXPRESSION A refers to a single signature then this signature must get matched less than X times; if it refers to a (logical) block of signatures then this block must generate less than X matches _and_ at least Y different signatures must be matched.
345
-
346
-Examples:
347
-
348
-```
349
-Sig1;Target:0;(0&1&2&3)&(4|1);6b6f74656b;616c61;7a6f6c77;7374656
350
-6616e;deadbeef
351
-
352
-Sig2;Target:0;((0|1|2)>5,2)&(3|1);6b6f74656b;616c61;7a6f6c77;737
353
-46566616e
354
-
355
-Sig3;Target:0;((0|1|2|3)=2)&(4|1);6b6f74656b;616c61;7a6f6c77;737
356
-46566616e;deadbeef
357
-
358
-Sig4;Engine:51-255,Target:1;((0|1)&(2|3))&4;EP+123:33c06834f04100
359
-f2aef7d14951684cf04100e8110a00;S2+78:22??232c2d252229{-15}6e6573
360
-(63|64)61706528;S3+50:68efa311c3b9963cb1ee8e586d32aeb9043e;f9c58
361
-dcf43987e4f519d629b103375;SL+550:6300680065005c0046006900
362
-```
363
-
364
-### Subsignature Modifiers
365
-
366
-ClamAV (clamav-0.99) supports a number of additional subsignature
367
-modifiers for logical signatures. This is done by specifying `::`
368
-followed by a number of characters representing the desired options.
369
-Signatures using subsignature modifiers require `Engine:81-255` for
370
-backwards-compatibility.
371
-
372
-- Case-Insensitive \[`i`\]
373
-
374
-  Specifying the `i` modifier causes ClamAV to match all alphabetic hex bytes as case-insensitive. All patterns in ClamAV are case-sensitive by default.
375
-
376
-- Wide \[`w`\]
377
-
378
-  Specifying the `w` causes ClamAV to match all hex bytes encoded with two bytes per character. Note this simply interweaves each character with NULL characters and does not truly support UTF-16 characters. Wildcards for ’wide’ subsignatures are not treated as wide (i.e. there can be an odd number of intermittent characters). This can be combined with `a` to search for patterns in both wide and ascii.
379
-
380
-- Fullword \[`f`\]
381
-
382
-  Match subsignature as a fullword (delimited by non-alphanumeric characters).
383
-
384
-- Ascii \[`a`\]
385
-
386
-  Match subsignature as ascii characters. This can be combined with `w` to search for patterns in both ascii and wide.
387
-
388
-Examples:
389
-
390
-```
391
-clamav-nocase-A;Engine:81-255,Target:0;0&1;41414141::i;424242424242::i
392
-    -matches 'AAAA'(nocase) and 'BBBBBB'(nocase)
393
-
394
-clamav-fullword-A;Engine:81-255,Target:0;0&1;414141;68656c6c6f::f
395
-    -matches 'AAA' and 'hello'(fullword)
396
-clamav-fullword-B;Engine:81-255,Target:0;0&1;414141;68656c6c6f::fi
397
-    -matches 'AAA' and 'hello'(fullword nocase)
398
-
399
-clamav-wide-B2;Engine:81-255,Target:0;0&1;414141;68656c6c6f::wa
400
-    -matches 'AAA' and 'hello'(wide ascii)
401
-clamav-wide-C0;Engine:81-255,Target:0;0&1;414141;68656c6c6f::iwfa
402
-    -matches 'AAA' and 'hello'(nocase wide fullword ascii)
403
-```
404
-
405
-## Special Subsignature Types
406
-
407
-### Macro subsignatures
408
-
409
-Introduced in ClamAV 0.96
410
-
411
-Format: `${min-max}MACROID$`
412
-
413
-Macro subsignatures are used to combine a number of existing extended
414
-signatures (`.ndb`) into a on-the-fly generated alternate string logical
415
-signature (`.ldb`). Signatures using macro subsignatures require
416
-`Engine:51-255` for backwards-compatibility.
42
+`*.cfg`: [Dynamic config settings](Signatures/DynamicConfig.md)
417 43
 
418
-Example:
44
+`*.cat` `*.crb`: [Trusted and revoked PE certs](Signatures/AuthenticodeRules.md)
419 45
 
420
-```
421
-      test.ldb:
422
-        TestMacro;Engine:51-255,Target:0;0&1;616161;${6-7}12$
423
-
424
-      test.ndb:
425
-        D1:0:$12:626262
426
-        D2:0:$12:636363
427
-        D3:0:$30:626264
428
-```
429
-
430
-The example logical signature `TestMacro` is functionally equivalent
431
-to:
432
-
433
-```
434
-`TestMacro;Engine:51-255,Target:0;0;616161{3-4}(626262|636363)`
435
-```
436
-
437
-- `MACROID` points to a group of signatures; there can be at most 32 macro groups.
438
-
439
-  - In the example, `MACROID` is `12` and both `D1` and `D2` are members of macro group `12`. `D3` is a member of separate macro group `30`.
440
-
441
-- `{min-max}` specifies the offset range at which one of the group signatures should match; the offset range is relative to the starting offset of the preceding subsignature. This means a macro subsignature cannot be the first subsignature.
442
-
443
-  - In the example, `{min-max}` is `{6-7}` and it is relative to the start of a `616161` match.
444
-
445
-- For more information and examples please see <https://bugzilla.clamav.net/show_bug.cgi?id=164>.
446
-
447
-### Byte Compare Subsignatures
448
-
449
-Introduced in ClamAV 0.101
450
-
451
-Format: `subsigid_trigger(offset#byte_options#comparisons)`
452
-
453
-Byte compare subsignatures can be used to evaluate a numeric value at a given offset from the start of another (matched) subsignature within the same logical signature. These are executed after all other subsignatures within the logical subsignature are fired, with the exception of PCRE subsignatures. They can evaluate offsets only from a single referenced subsignature, and that subsignature must give a valid match for the evaluation to occur.
454
-
455
-- `subsigid_trigger` is a required field and may refer to any single non-PCRE, non-Byte Compare subsignature within the lsig. The byte compare subsig will evaluate if `subsigid_trigger` matches. Triggering on multiple subsigs or logic based triggering is not currently supported.
456
-
457
-- `offset` is a required field that consists of an `offset_modifier` and a numeric `offset` (hex or decimal offsets are okay).
458
-
459
-  - `offset_modifier` can be either `>>` or `<<` where the former denotes a positive offset and the latter denotes a negative offset. The offset is calculated from the start of `subsigid_trigger`, which allows for byte extraction before the specified match, after the match, and within the match itself.
460
-
461
-  - `offset` must be a positive hex or decimal value. This will be the number of bytes from the start of the referenced `subsigid_trigger` match within the file buffer to begin the comparison.
462
-
463
-- `byte_options` are used to specify the numeric type and endianess of the extracted byte sequence in that order as well as the number of bytes to be read. By default ClamAV will attempt to matchup up to the number of byte specified, unless the `e` (exact) option is specified or the numeric type is `b` (binary).  This field follows the form `[h|d|a|i][l|b][e]num_bytes`
464
-
465
-  - `h|d|a|i` where `h` specifies the byte sequence will be in hex, `d` decimal, `a` automatic detection of hex or decimal at runtime, and `i` signifies raw binary data.
466
-
467
-  - `l|b` where `l` specifies the byte sequence will be in little endian order and `b` big endian. If decimal `d` is specified, big-endian is implied and using `l` will result in a malformed database error.
468
-
469
-  - `e` specifies that ClamAV will only evaluate the comparison if it can extract the exact number of bytes specified. This option is implicitly declared when using the `i` flag.
470
-
471
-  - `num_bytes` specifies the number of bytes to extract. This can be a hex or decimal value. If `i` is specified only 1, 2, 4, and 8 are valid options.
472
-
473
-- `comparisons` are a required field which denotes how to evaluate the extracted byte sequence. Each Byte Compare signature can have one or two `comparison_sets` separated by a comma. Each `comparison_set` consists of a `Comparison_symbol` and a `Comparison_value` and takes the form `Comparison_symbolComparison_value`. Thus, `comparisons` takes the form `comparison_set[,comparison_set]`
474
-
475
-  - `Comparison_symbol` denotes the type of comparison to be done. The supported comparison symbols are `<`, `>`, `=`.
476
-
477
-  - `Comparison_value` is a required field which must be a numeric hex or decimal value. If all other conditions are met, the byte compare subsig will evalutate the extracted byte sequence against this number based on the provided `comparison_symbol`.
478
-
479
-### PCRE subsignatures
480
-
481
-Introduced in ClamAV 0.99
482
-
483
-Format: `Trigger/PCRE/[Flags]`
484
-
485
-PCRE subsignatures are used within a logical signature (`.ldb`) to specify regex matches that execute once triggered by a conditional based on preceding subsignatures. Signatures using PCRE subsignatures require `Engine:81-255` for backwards-compatibility.
486
-
487
-- `Trigger` is a required field that is a valid `LogicalExpression` and may refer to any subsignatures that precede this subsignature. Triggers cannot be self-referential and cannot refer to subsequent subsignatures.
488
-
489
-- `PCRE` is the expression representing the regex to execute. `PCRE` must be delimited by ’/’ and usage of ’/’ within the expression need to be escaped. For backward compatibility, ’;’ within the expression must be expressed as ’`\x3B`’. `PCRE` cannot be empty and (?UTF\*) control sequence is not allowed. If debug is specified, named capture groups are displayed in a post-execution report.
490
-
491
-- `Flags` are a series of characters which affect the compilation and execution of `PCRE` within the PCRE compiler and the ClamAV engine. This field is optional.
492
-
493
-  - `g [CLAMAV_GLOBAL]` specifies to search for ALL matches of PCRE (default is to search for first match). NOTE: INCREASES the time needed to run the PCRE.
494
-
495
-  - `r [CLAMAV_ROLLING]` specifies to use the given offset as the starting location to search for a match as opposed to the only location; applies to subsigs without maxshifts. By default, in order to facilatate normal ClamAV offset behavior, PCREs are auto-anchored (only attempt match on first offset); using the rolling option disables the auto-anchoring.
496
-
497
-  - `e [CLAMAV_ENCOMPASS]` specifies to CONFINE matching between the specified offset and maxshift; applies only when maxshift is specified. Note: DECREASES time needed to run the PCRE.
498
-
499
-  - `i [PCRE_CASELESS]`
500
-
501
-  - `s [PCRE_DOTALL]`
502
-
503
-  - `m [PCRE_MULTILINE]`
504
-
505
-  - `x [PCRE_EXTENDED]`
46
+`*.ftm`: [File Type Magic (FTM)](Signatures/FileTypeMagic.md)
506 47
 
507
-  - `A [PCRE_ANCHORED]`
48
+### Signature databases
508 49
 
509
-  - `E [PCRE_DOLLAR_ENODNLY]`
50
+_Note_: Signature databases with an extension ending in `u` are only loaded when Potentially Unwanted Application (PUA) signatures are enabled (default: off).
510 51
 
511
-  - `U [PCRE_UNGREEDY]`
52
+#### Body-based Signatures
512 53
 
513
-Examples:
54
+Body-based signature content is a definition that matches not based on a hash but based on the specific sequences of bytes exhibited by the target file.
514 55
 
515
-```
516
-Find.All.ClamAV;Engine:81-255,Target:0;1;6265676c6164697427736e6f7462797465636f6465;0/clamav/g
517
-
518
-Find.ClamAV.OnlyAt.299;Engine:81-255,Target:0;2;7374756c747a67657473;7063726572656765786c6f6c;299:0&1/clamav/
519
-
520
-Find.ClamAV.StartAt.300;Engine:81-255,Target:0;3;616c61696e;62756731393238;636c6f736564;300:0&1&2/clamav/r
521
-
522
-Find.All.Encompassed.ClamAV;Engine:81-255,Target:0;3;7768796172656e2774;796f757573696e67;79617261;200,300:0&1&2/clamav/ge
523
-
524
-Named.CapGroup.Pcre;Engine:81-255,Target:0;3;636f75727479617264;616c62756d;74657272696572;50:0&1&2/variable=(?<nilshell>.{16})end/gr
525
-
526
-Firefox.TreeRange.UseAfterFree;Engine:81-255,Target:0,Engine:81-255;0&1&2;2e766965772e73656c656374696f6e;2e696e76616c696461746553656c656374696f6e;0&1/\x2Eview\x2Eselection.*?\x2Etree\s*\x3D\s*null.*?\x2Einvalidate/smi
527
-
528
-Firefox.IDB.UseAfterFree;Engine:81-255,Target:0;0&1;4944424b657952616e6765;0/^\x2e(only|lowerBound|upperBound|bound)\x28.*?\x29.*?\x2e(lower|upper|lowerOpen|upperOpen)/smi
529
-
530
-Firefox.boundElements;Engine:81-255,Target:0;0&1&2;6576656e742e6
531
-26f756e64456c656d656e7473;77696e646f772e636c6f7365;0&1/on(load|click)\s*=\s*\x22?window\.close\s*\x28/si
532
-```
533
-
534
-## Icon signatures for PE files
535
-
536
-ClamAV 0.96 includes an approximate/fuzzy icon matcher to help detecting malicious executables disguising themselves as innocent looking image files, office documents and the like.
537
-
538
-Icon matching is only triggered via .ldb signatures using the special attribute tokens `IconGroup1` or `IconGroup2`. These identify two (optional) groups of icons defined in a .idb database file. The format of the .idb file is:
539
-
540
-```
541
-ICONNAME:GROUP1:GROUP2:ICON_HASH
542
-```
543
-
544
-where:
545
-
546
-- `ICON_NAME` is a unique string identifier for a specific icon,
547
-
548
-- `GROUP1` is a string identifier for the first group of icons (`IconGroup1`)
549
-
550
-- `GROUP2` is a string identifier for the second group of icons (`IconGroup2`),
551
-
552
-- `ICON_HASH` is a fuzzy hash of the icon image
553
-
554
-The `ICON_HASH` field can be obtained from the debug output of libclamav. For example:
56
+ClamAV body-based signature content has a [special format](BodySignatureFormat.md) to allow regex-like matching of data that is not entirely known. This format is used extensively in both Extended Signatures and Logical Signatures.
555 57
 
556
-```bash
557
-LibClamAV debug: ICO SIGNATURE:
558
-ICON_NAME:GROUP1:GROUP2:18e2e0304ce60a0cc3a09053a30000414100057e000afe0000e 80006e510078b0a08910d11ad04105e0811510f084e01040c080a1d0b0021000a39002a41
559
-```
58
+`*.ndb` `*.ndu`: [Extended signatures](Signatures/ExtendedSignatures.md)
560 59
 
561
-## Signatures for Version Information metadata in PE files
60
+`*.ldb` `*.ldu`; `*.idb`: [Logical Signatures](Signatures/LogicalSignatures.md)
562 61
 
563
-Starting with ClamAV 0.96 it is possible to easily match certain information built into PE files (executables and dynamic link libraries). Whenever you lookup the properties of a PE executable file in windows, you are presented with a bunch of details about the file itself.
62
+`*.cdb`: [Container Metadata Signatures](Signatures/ContainerMetadata.md)
564 63
 
565
-These info are stored in a special area of the file resources which goes under the name of `VS_VERSION_INFORMATION` (or versioninfo for short). It is divided into 2 parts. The first part (which is rather uninteresting) is really a bunch of numbers and flags indicating the product and file version. It was originally intended for use with installers which, after parsing it, should be able to determine whether a certain executable or library are to be upgraded/overwritten or are already up to date. Suffice to say, this approach never really worked and is generally never used.
64
+`*.cbc`: [Bytecode Signatures](Signatures/BytecodeSignatures.md)
566 65
 
567
-The second block is much more interesting: it is a simple list of key/value strings, intended for user information and completely ignored by the OS. For example, if you look at ping.exe you can see the company being *"Microsoft Corporation"*, the description *"TCP/IP Ping command"*, the internal name *"ping.exe"* and so on... Depending on the OS version, some keys may be given peculiar visibility in the file properties dialog, however they are internally all the same.
66
+`*.pdb` `*.gdb` `*.wdb`: [Phishing URL Signatures](Signatures/PhishSigs.md)
568 67
 
569
-To match a versioninfo key/value pair, the special file offset anchor `VI` was introduced. This is similar to the other anchors (like `EP` and `SL`) except that, instead of matching the hex pattern against a single offset, it checks it against each and every key/value pair in the file. The `VI` token doesn’t need nor accept a `+/-` offset like e.g. `EP+1`. As for the hex signature itself, it’s just the utf16 dump of the key and value. Only the `??` and `(aa|bb)` wildcards are allowed in the signature. Usually, you don’t need to bother figuring it out: each key/value pair together with the corresponding VI-based signature is printed by `clamscan` when the `--debug` option is given.
68
+#### Hash-based Signatures
570 69
 
571
-For example `clamscan --debug freecell.exe` produces:
70
+`*.hdb` `*.hsb` `*.hdu` `*.hsu`: File hash signatures
572 71
 
573
-```bash
574
-[...]
575
-Recognized MS-EXE/DLL file
576
-in cli_peheader
577
-versioninfo_cb: type: 10, name: 1, lang: 410, rva: 9608
578
-cli_peheader: parsing version info @ rva 9608 (1/1)
579
-VersionInfo (d2de): 'CompanyName'='Microsoft Corporation' -
580
-VI:43006f006d00700061006e0079004e0061006d006500000000004d006900
581
-630072006f0073006f0066007400200043006f00720070006f0072006100740
582
-069006f006e000000
583
-VersionInfo (d32a): 'FileDescription'='Entertainment Pack
584
-FreeCell Game' - VI:460069006c006500440065007300630072006900700
585
-0740069006f006e000000000045006e007400650072007400610069006e006d
586
-0065006e00740020005000610063006b0020004600720065006500430065006
587
-c006c002000470061006d0065000000
588
-VersionInfo (d396): 'FileVersion'='5.1.2600.0 (xpclient.010817
589
--1148)' - VI:460069006c006500560065007200730069006f006e00000000
590
-0035002e0031002e0032003600300030002e003000200028007800700063006
591
-c00690065006e0074002e003000310030003800310037002d00310031003400
592
-380029000000
593
-VersionInfo (d3fa): 'InternalName'='freecell' - VI:49006e007400
594
-650072006e0061006c004e0061006d006500000066007200650065006300650
595
-06c006c000000
596
-VersionInfo (d4ba): 'OriginalFilename'='freecell' - VI:4f007200
597
-6900670069006e0061006c00460069006c0065006e0061006d0065000000660
598
-0720065006500630065006c006c000000
599
-VersionInfo (d4f6): 'ProductName'='Sistema operativo Microsoft
600
-Windows' - VI:500072006f0064007500630074004e0061006d00650000000
601
-000530069007300740065006d00610020006f00700065007200610074006900
602
-76006f0020004d006900630072006f0073006f0066007400ae0020005700690
603
-06e0064006f0077007300ae000000
604
-VersionInfo (d562): 'ProductVersion'='5.1.2600.0' - VI:50007200
605
-6f006400750063007400560065007200730069006f006e00000035002e00310
606
-02e0032003600300030002e0030000000
607
-[...]
608
-```
72
+`*.mdb` `*.msb` `*.mdu` `*.msu`: PE section hash signatures
609 73
 
610
-Although VI-based signatures are intended for use in logical signatures you can test them using ordinary `.ndb` files. For example:
74
+[Hash-based Signature format](Signatures/HashSignatures.md)
611 75
 
612
-```
613
-my_test_vi_sig:1:VI:paste_your_hex_sig_here
614
-```
615
-
616
-Final note. If you want to decode a VI-based signature into a human readable form you can use:
617
-
618
-```bash
619
-echo hex_string | xxd -r -p | strings -el
620
-```
621
-
622
-For example:
623
-
624
-```bash
625
-$ echo 460069006c0065004400650073006300720069007000740069006f006e
626
-000000000045006e007400650072007400610069006e006d0065006e007400200
627
-05000610063006b0020004600720065006500430065006c006c00200047006100
628
-6d0065000000 | xxd -r -p | strings -el
629
-FileDescription
630
-Entertainment Pack FreeCell Game
631
-```
632
-
633
-## Trusted and Revoked Certificates
634
-
635
-Clamav 0.98 checks signed PE files for certificates and verifies each certificate in the chain against a database of trusted and revoked certificates. The signature format is
636
-
637
-```
638
-    Name;Trusted;Subject;Serial;Pubkey;Exponent;CodeSign;TimeSign;CertSign;
639
-    NotBefore;Comment[;minFL[;maxFL]]
640
-```
76
+#### Alternative signature support
641 77
 
642
-where the corresponding fields are:
78
+`*.yar` `*.yara`: [Yara rules](Signatures/YaraRules.md)
643 79
 
644
-- `Name:` name of the entry
80
+### Other database files
645 81
 
646
-- `Trusted:` bit field, specifying whether the cert is trusted. 1 for trusted. 0 for revoked
82
+`*.fp` `*.sfp` `*.ign` `*.ign2`: [Whitelisted files, signatures](Signatures/Whitelists.md)
647 83
 
648
-- `Subject:` sha1 of the Subject field in hex
84
+`*.pwdb`: [Encrypted archive passwords](Signatures/EncryptedArchives.md)
649 85
 
650
-- `Serial:` the serial number as clamscan –debug –verbose reports
86
+`*.info`: [Database information](Signatures/DatabaseInfo.md)`
651 87
 
652
-- `Pubkey:` the public key in hex
653
-
654
-- `Exponent:` the exponent in hex. Currently ignored and hardcoded to 010001 (in hex)
655
-
656
-- `CodeSign:` bit field, specifying whether this cert can sign code. 1 for true, 0 for false
657
-
658
-- `TimeSign:` bit field. 1 for true, 0 for false
659
-
660
-- `CertSign:` bit field, specifying whether this cert can sign other certs. 1 for true, 0 for false
661
-
662
-- `NotBefore:` integer, cert should not be added before this variable. Defaults to 0 if left empty
663
-
664
-- `Comment:` comments for this entry
665
-
666
-The signatures for certs are stored inside `.crb` files.
667
-
668
-## Signatures based on container metadata
669
-
670
-ClamAV 0.96 allows creating generic signatures matching files stored inside different container types which meet specific conditions. The signature format is
671
-
672
-```
673
-    VirusName:ContainerType:ContainerSize:FileNameREGEX:
674
-    FileSizeInContainer:FileSizeReal:IsEncrypted:FilePos:
675
-    Res1:Res2[:MinFL[:MaxFL]]
676
-```
677
-
678
-where the corresponding fields are:
679
-
680
-- `VirusName:` Virus name to be displayed when signature matches
681
-
682
-- `ContainerType:` The file type containing the target file.  For example:
683
-  - `CL_TYPE_ZIP`,
684
-  - `CL_TYPE_RAR`,
685
-  - `CL_TYPE_ARJ`,
686
-  - `CL_TYPE_MSCAB`,
687
-  - `CL_TYPE_7Z`,
688
-  - `CL_TYPE_MAIL`,
689
-  - `CL_TYPE_(POSIX|OLD)_TAR`,
690
-  - `CL_TYPE_CPIO_(OLD|ODC|NEWC|CRC)`
691
-
692
-  Use `*` as a wild card to indicate that container type may be any file type.
693
-  For a full list of ClamAV file types, see the [ClamAV File Types Reference](ClamAV-File-Types.md)
694
-
695
-- `ContainerSize:` size of the container file itself (eg. size of the zip archive) specified in bytes as absolute value or range `x-y`
696
-
697
-- `FileNameREGEX:` regular expression describing name of the target file
698
-
699
-- `FileSizeInContainer:` usually compressed size; for MAIL, TAR and CPIO == `FileSizeReal`; specified in bytes as absolute value or range
700
-
701
-- `FileSizeReal:` usually uncompressed size; for MAIL, TAR and CPIO == `FileSizeInContainer`; absolute value or range
702
-
703
-- `IsEncrypted:` 1 if the target file is encrypted, 0 if it’s not and `*` to ignore
704
-
705
-- `FilePos:` file position in container (counting from 1); absolute value or range
706
-
707
-- `Res1:` when `ContainerType` is `CL_TYPE_ZIP` or `CL_TYPE_RAR` this field is treated as a CRC sum of the target file specified in hexadecimal format; for other container types it’s ignored
708
-
709
-- `Res2:` not used as of ClamAV 0.96
710
-
711
-The signatures for container files are stored inside `.cdb` files.
712
-
713
-## Whitelist databases
714
-
715
-To whitelist a specific file use the MD5 signature format and place it inside a database file with the extension of `.fp`. To whitelist a specific file with the SHA1 or SHA256 file hash signature format, place the signature inside a database file with the extension of `.sfp`. To whitelist a specific signature from the database you just add its name into a local file called local.ign2 stored inside the database directory. You can additionally follow the signature name with the MD5 of the entire database entry for this signature, eg:
716
-
717
-```
718
-    Eicar-Test-Signature:bc356bae4c42f19a3de16e333ba3569c
719
-```
720
-
721
-In such a case, the signature will no longer be whitelisted when its entry in the database gets modified (eg. the signature gets updated to avoid false alerts).
722
-
723
-## Signature names
88
+### Signature names
724 89
 
725 90
 ClamAV uses the following prefixes for signature names:
726 91
 
727 92
 - *Worm* for Internet worms
728
-
729 93
 - *Trojan* for backdoor programs
730
-
731 94
 - *Adware* for adware
732
-
733 95
 - *Flooder* for flooders
734
-
735 96
 - *HTML* for HTML files
736
-
737 97
 - *Email* for email messages
738
-
739 98
 - *IRC* for IRC trojans
740
-
741 99
 - *JS* for Java Script malware
742
-
743 100
 - *PHP* for PHP malware
744
-
745 101
 - *ASP* for ASP malware
746
-
747 102
 - *VBS* for VBS malware
748
-
749 103
 - *BAT* for BAT malware
750
-
751 104
 - *W97M*, *W2000M* for Word macro viruses
752
-
753 105
 - *X97M*, *X2000M* for Excel macro viruses
754
-
755 106
 - *O97M*, *O2000M* for generic Office macro viruses
756
-
757 107
 - *DoS* for Denial of Service attack software
758
-
759 108
 - *DOS* for old DOS malware
760
-
761 109
 - *Exploit* for popular exploits
762
-
763 110
 - *VirTool* for virus construction kits
764
-
765 111
 - *Dialer* for dialers
766
-
767 112
 - *Joke* for hoaxes
768 113
 
769 114
 Important rules of the naming convention:
770 115
 
771 116
 - always use a -zippwd suffix in the malware name for signatures of type zmd,
772
-
773 117
 - always use a -rarpwd suffix in the malware name for signatures of type rmd,
774
-
775 118
 - only use alphanumeric characters, dash (-), dot (.), underscores (_) in malware names, never use space, apostrophe or quote mark.
776 119
 
777
-## Using YARA rules in ClamAV
778
-
779
-ClamAV version 0.99 and above can process YARA rules. ClamAV virus database file names ending with “.yar” or “.yara” are parsed as yara rule files. The link to the YARA rule grammar documentation may be found at http://plusvic.github.io/yara/. There are currently a few limitations on using YARA rules within ClamAV:
780
-
781
-- YARA modules are not yet supported by ClamAV. This includes the “import” keyword and any YARA module-specific keywords.
782
-
783
-- Global rules(“global” keyword) are not supported by ClamAV.
784
-
785
-- External variables(“contains” and “matches” keywords) are not supported.
120
+## Signature Writing Tips and Tricks
786 121
 
787
-- YARA rules pre-compiled with the *yarac* command are not supported.
122
+### Testing rules with `clamscan`
788 123
 
789
-- As in the ClamAV logical and extended signature formats, YARA strings and segments of strings separated by wild cards must represent at least two octets of data.
124
+To test a new signature, first create a text file with the extension corresponding to the signature type (Ex: `.ldb` for logical signatures).  Then, add the signature as it's own line within the file. This file can be passed to `clamscan` via the `-d` option, which tells ClamAV to load signatures from the file specified.  If the signature is not formatted correctly, ClamAV will display an error - run `clamscan` with `--debug --verbose` to see additional information about the error message.  Some common causes of errors include:
790 125
 
791
-- There is a maximum of 64 strings per YARA rule.
126
+- The signature file has the incorrect extension type for the signatures contained within
127
+- The file has one or more blank lines
128
+- For logical signatures, a semicolon exists at the end of the file
792 129
 
793
-- YARA rules in ClamAV must contain at least one literal, hexadecimal, or regular expression string.
794
-
795
-In addition, there are a few more ClamAV processing modes that may affect the outcome of YARA rules.
796
-
797
-- *File decomposition and decompression* - Since ClamAV uses file decomposition and decompression to find viruses within de-archived and uncompressed inner files, YARA rules executed by ClamAV will match against these files as well.
798
-
799
-- *Normalization* - By default, ClamAV normalizes HTML, JavaScript, and ASCII text files. YARA rules in ClamAV will match against the normalized result. The effects of normalization of these file types may be captured using `clamscan --leave-temps --tempdir=mytempdir`. YARA rules may then be written using the normalized file(s) found in `mytempdir`. Alternatively, starting with ClamAV 0.100.0, `clamscan --normalize=no` will prevent normalization and only scan the raw file. To obtain similar behavior prior to 0.99.2, use `clamscan --scan-html=no`. The corresponding parameters for clamd.conf are `Normalize` and `ScanHTML`.
800
-
801
-- *YARA conditions driven by string matches* - All YARA conditions are driven by string matches in ClamAV. This saves from executing every YARA rule on every file. Any YARA condition may be augmented with a string match clause which is always true, such as:
802
-
803
-```yara
804
-  rule CheckFileSize
805
-  {
806
-    strings:
807
-      $abc = "abc"
808
-    condition:
809
-      ($abc or not $abc) and filesize < 200KB
810
-  }
811
-```
812
-
813
-This will ensure that the YARA condition always performs the desired action (checking the file size in this example),
814
-
815
-## Passwords for archive files \[experimental\]
816
-
817
-ClamAV 0.99 allows for users to specify password attempts for certain password-compatible archives. Passwords will be attempted in order of appearance in the password signature file which use the extension of `.pwdb`. If no passwords apply or none are provided, ClamAV will default to the original behavior of parsing the file. Currently, as of ClamAV 0.99 \[flevel 81\], only `.zip` archives using the traditional PKWARE encryption are supported. The signature format is
818
-
819
-```
820
-    SignatureName;TargetDescriptionBlock;PWStorageType;Password
821
-```
822
-
823
-where:
824
-
825
-- `SignatureName`: name to be displayed during debug when a password is successful
826
-
827
-- `TargetDescriptionBlock`: provides information about the engine and target file with comma separated Arg:Val pairs
828
-  - `Engine:X-Y`: Required engine functionality
829
-  - `Container:CL_TYPE_*`: File type of applicable containers
830
-
831
-- `PWStorageType`: determines how the password field is parsed
832
-  - 0 = cleartext
833
-  - 1 = hex
834
-
835
-- `Password`: value used in password attempt
836
-
837
-The signatures for password attempts are stored inside `.pwdb` files.
838
-
839
-# Signature writing tips and tricks
840
-## Testing rules with clamscan
841
-
842
-To test a new signature, first create a text file with the extension corresponding to the signature type (Ex: '.lsb' for logical signatures).  Then, add the signature as it's own line within the file. This file can be passed to `clamscan` via the `-d` option, which tells ClamAV to load signatures from the file specified.  If the signature is not formatted correctly, ClamAV will display an error - run `clamscan` with `--debug --verbose` to see additional information about the error message.  Some common causes of errors include:
843
- - The signature file has the incorrect extension type for the signatures contained within
844
- - The file has one or more blank lines
845
- - For logical signatures, a semicolon exists at the end of the file
846
-
847
-If the rule is formatted correctly, clamscan will load the signature(s) in and scan any files specified via the command line invocation (or all files in the current directory if none are specified).  A successful detection will look like the following:
130
+If the rule is formatted correctly, `clamscan` will load the signature(s) in and scan any files specified via the command line invocation (or all files in the current directory if none are specified).  A successful detection will look like the following:
848 131
 
849 132
 ```bash
850 133
 clamscan -d test.ldb text.exe
... ...
@@ -863,18 +145,21 @@ Time: 0.400 sec (0 m 0 s)
863 863
 ```
864 864
 
865 865
 If the rule did not match as intended:
866
- - The file may have exceeded one or more of the default scanning limits built-in to ClamAV.  Try running clamscan with the following options to see if raising the limits addresses the issue: `--max-filesize=2000M --max-scansize=2000M --max-files=2000000 --max-recursion=2000000 --max-embeddedpe=2000M --max-htmlnormalize=2000000 --max-htmlnotags=2000000 --max-scriptnormalize=2000000 --max-ziptypercg=2000000 --max-partitions=2000000 --max-iconspe=2000000 --max-rechwp3=2000000 --pcre-match-limit=2000000 --pcre-recmatch-limit=2000000 --pcre-max-filesize=2000M`.
867
- - If matching on HTML or text files, ClamAV might be performing normalization that causes the content of the scanned file to change.  See the [HTML](#html) and [Text file](#text-file) sections for more details.
868
- - libclamav may have been unable to unpack or otherwise process the file.  See [Debug information from libclamav](#debug-information-from-libclamav) for more details.
866
+
867
+- The file may have exceeded one or more of the default scanning limits built-in to ClamAV.  Try running `clamscan` with the following options to see if raising the limits addresses the issue: `--max-filesize=2000M --max-scansize=2000M --max-files=2000000 --max-recursion=2000000 --max-embeddedpe=2000M --max-htmlnormalize=2000000 --max-htmlnotags=2000000 --max-scriptnormalize=2000000 --max-ziptypercg=2000000 --max-partitions=2000000 --max-iconspe=2000000 --max-rechwp3=2000000 --pcre-match-limit=2000000 --pcre-recmatch-limit=2000000 --pcre-max-filesize=2000M`.
868
+- If matching on HTML or text files, ClamAV might be performing normalization that causes the content of the scanned file to change.  See the [HTML](#html) and [Text file](#text-file) sections for more details.
869
+- libclamav may have been unable to unpack or otherwise process the file.  See [Debug information from libclamav](#debug-information-from-libclamav) for more details.
869 870
 
870 871
 NOTE: If you run `clamscan` with a `-d` flag, ClamAV will not load in the signatures downloaded via `freshclam`.  This means that:
871
- - some of ClamAV's unpacking support might be disabled, since some unpackers are implemented as bytecode signatures
872
- - PE whitelisting based on Authenticode signatures won't work, since this functionality relies on .crb rules
872
+
873
+- some of ClamAV's unpacking support might be disabled, since some unpackers are implemented as bytecode signatures
874
+- PE whitelisting based on Authenticode signatures won't work, since this functionality relies on `.crb` rules
875
+
873 876
 If any of this functionality is needed, load in the CVD files manually with additional `-d` flags.
874 877
 
875 878
 ### Debug information from libclamav
876 879
 
877
-In order to create efficient signatures for ClamAV it’s important to understand how the engine handles input files. The best way to see how it works is having a look at the debug information from libclamav. You can do it by calling `clamscan` with the `--debug` and `--leave-temps` flags. The first switch makes clamscan display all the interesting information from libclamav and the second one avoids deleting temporary files so they can be analyzed further.
880
+In order to create efficient signatures for ClamAV it’s important to understand how the engine handles input files. The best way to see how it works is having a look at the debug information from libclamav. You can do it by calling `clamscan` with the `--debug` and `--leave-temps` flags. The first switch makes `clamscan` display all the interesting information from libclamav and the second one avoids deleting temporary files so they can be analyzed further.
878 881
 
879 882
 The now important part of the info is:
880 883
 
... ...
@@ -1025,9 +310,9 @@ No additional files get created by libclamav. By writing a signature for the dec
1025 1025
 
1026 1026
 This method should be applied to all files for which you want to create signatures. By analyzing the debug information you can quickly see how the engine recognizes and preprocesses the data and what additional files get created. Signatures created for bottom-level temporary files are usually more generic and should help detecting the same malware in different forms.
1027 1027
 
1028
-## Writing signatures for special files
1028
+### Writing signatures for special files
1029 1029
 
1030
-### HTML
1030
+#### HTML
1031 1031
 
1032 1032
 ClamAV contains HTML normalization code which makes it easier to write signatures for HTML data that might differ based on white space, capitalization, and other insignificant differences. Running `sigtool --html-normalise` on a HTML file can be used to see what a file's contents will look like after normalization.  This command should generate the following files:
1033 1033
 
... ...
@@ -1037,13 +322,13 @@ ClamAV contains HTML normalization code which makes it easier to write signature
1037 1037
 
1038 1038
 - javascript - any script contents are normalized and the results appended to this file
1039 1039
 
1040
-The code automatically decodes JScript.encode parts and char ref’s (e.g. `&#102;`). To create a successful signature for the input file type, the rule must match on the contents of one of the created files.  Signatures matching on normalized HTML should have a target type of 3.
1040
+The code automatically decodes JScript.encode parts and char ref’s (e.g. `&#102;`). To create a successful signature for the input file type, the rule must match on the contents of one of the created files.  Signatures matching on normalized HTML should have a target type of 3.  For reference, see [Target Types](Signatures/FileTypes.md#Target-Types).
1041 1041
 
1042
-### Text files
1042
+#### Text files
1043 1043
 
1044
-Similarly to HTML all ASCII text files get normalized (converted to lower-case, all superfluous white space and control characters removed, etc.) before scanning. Running `sigtool --ascii-normalise` on a text file will result in a normalized version being written to the file named 'normalised\_text'.  Rules matching on normalized ASCII text should have a target type of 7.
1044
+Similarly to HTML all ASCII text files get normalized (converted to lower-case, all superfluous white space and control characters removed, etc.) before scanning. Running `sigtool --ascii-normalise` on a text file will result in a normalized version being written to the file named 'normalised\_text'.  Rules matching on normalized ASCII text should have a target type of 7.  For reference, see [Target Types](Signatures/FileTypes.md#Target-Types).
1045 1045
 
1046
-### Compressed Portable Executable files
1046
+#### Compressed Portable Executable files
1047 1047
 
1048 1048
 If the file is compressed with UPX, FSG, Petite or another PE packer supported by libclamav, ClamAV will attempt to automatically unpack the executable and evaluate signatures against the unpacked executable.  To inspect the executable that results from ClamAV's unpacking process, run `clamscan` with `--debug --leave-temps`. Example output for a FSG compressed file:
1049 1049
 
... ...
@@ -1057,31 +342,32 @@ LibClamAV debug: FSG: Unpacked and rebuilt executable saved in
1057 1057
 
1058 1058
 In the example above, `/tmp/clamav-f592b20f9329ac1c91f0e12137bcce6c` is the unpacked executable, and a signature can be written based off of this file.
1059 1059
 
1060
-## Using sigtool
1061
-sigtool pulls in libclamav and provides shortcuts to doing tasks that clamscan does behind the scenes.  These can be really useful when writing a signature or trying to get information about a signature that might be causing FPs or performance problems.
1060
+### Using `sigtool`
1061
+
1062
+`sigtool` pulls in libclamav and provides shortcuts to doing tasks that `clamscan` does behind the scenes.  These can be really useful when writing a signature or trying to get information about a signature that might be causing FPs or performance problems.
1062 1063
 
1063
-The following sigtool flags can be especially useful for signature writing:
1064
+The following `sigtool` flags can be especially useful for signature writing:
1064 1065
 
1065
-- `--md5` / `--sha1` / `--sha256`: Generate the MD5/SHA1/SHA256 hash and calculate the file size, outputting both as a properly-formatted .hdb/.hsb signature
1066
+- `--md5` / `--sha1` / `--sha256`: Generate the MD5/SHA1/SHA256 hash and calculate the file size, outputting both as a properly-formatted `.hdb`/`.hsb` signature
1066 1067
 
1067
-- `--mdb`: Generate section hashes of the specified file.  This is useful when generating .mdb signatures.
1068
+- `--mdb`: Generate section hashes of the specified file.  This is useful when generating `.mdb` signatures.
1068 1069
 
1069
-- `--decode`: Given a ClamAV signature from STDIN, show a more user-friendly representation of it.  An example usage of this flag is `cat test.lsb | sigtool --decode`.
1070
+- `--decode`: Given a ClamAV signature from STDIN, show a more user-friendly representation of it.  An example usage of this flag is `cat test.ldb | sigtool --decode`.
1070 1071
 
1071 1072
 - `--hex-dump`: Given a sequence of bytes from STDIN, print the hex equivalent. An example usage of this flag is `echo -n "Match on this" | sigtool --hex-dump`.
1072 1073
 
1073
-- `--html-normalise`: Normalize the specified HTML file in the way that clamscan will before looking for rule matches.  Writing signatures off of these files makes it easier to write rules for target type HTML (you'll know what white space, capitalization, etc. to expect). See the [HTML](#html) section for more details.
1074
+- `--html-normalise`: Normalize the specified HTML file in the way that `clamscan` will before looking for rule matches.  Writing signatures off of these files makes it easier to write rules for target type HTML (you'll know what white space, capitalization, etc. to expect). See the [HTML](#html) section for more details.
1074 1075
 
1075
-- `--ascii-normalise`: Normalize the specified ASCII text file in the way that clamscan will before looking for rule matches. Writing signatures off of this normalized file data makes it easier to write rules for target type Txt (you'll know what white space, capitalization, etc. to expect). See the [Text files](#text-files) sectino for more details.
1076
+- `--ascii-normalise`: Normalize the specified ASCII text file in the way that `clamscan` will before looking for rule matches. Writing signatures off of this normalized file data makes it easier to write rules for target type Txt (you'll know what white space, capitalization, etc. to expect). See the [Text files](#text-files) sectino for more details.
1076 1077
 
1077 1078
 - `--print-certs`: Print the Authenticode signatures of any PE files specified.
1078
-  This is useful when writing signature-based .crb rule files.
1079
+  This is useful when writing signature-based `.crb` rule files.
1079 1080
 
1080 1081
 - `--vba`: Extract VBA/Word6 macro code
1081 1082
 
1082 1083
 - `--test-sigs`: Given a signature and a sample, determine whether the signature matches and, if so, display the offset into the file where the match occurred.  This can be useful for investigating false positive matches in clean files.
1083 1084
 
1084
-## Inspecting signatures inside a CVD file
1085
+### Inspecting signatures inside a CVD file
1085 1086
 
1086 1087
 CVD (ClamAV Virus Database) is a digitally signed container that includes signature databases in various text formats. The header of the container is a 512 bytes long string with colon separated fields:
1087 1088
 
... ...
@@ -1106,9 +392,9 @@ eaPVNe2ccXLfEegoda4xU2TezbGfbSEGoU1qolyQYLX674sNA2Ni6l6/CEKYYh
1106 1106
 Verification OK.
1107 1107
 ```
1108 1108
 
1109
-The ClamAV project distributes a number of CVD files, including *main.cvd* and *daily.cvd*.
1109
+The ClamAV project distributes a number of CVD files, including `main.cvd` and `daily.cvd`.
1110 1110
 
1111
-To view the signature associated with a given detection name, the CVD files can be unpacked and the underlying text files searched for a rule definition using a tool like `grep`.  To do this, use sigtool's `--unpack` flag as follows:
1111
+To view the signature associated with a given detection name, the CVD files can be unpacked and the underlying text files searched for a rule definition using a tool like `grep`.  To do this, use `sigtool`'s `--unpack` flag as follows:
1112 1112
 
1113 1113
 ```bash
1114 1114
 $ mkdir /tmp/clamav-sigs
... ...
@@ -1119,8 +405,8 @@ COPYING   main.fp   main.hsb   main.mdb  main.ndb
1119 1119
 main.crb  main.hdb  main.info  main.msb  main.sfp
1120 1120
 ```
1121 1121
 
1122
-## External tools
1122
+### External tools
1123 1123
 
1124 1124
 Below are tools that can be helpful when writing ClamAV signatures:
1125 1125
 
1126
- - [CASC](https://github.com/Cisco-Talos/CASC) - CASC is a plugin for IDA Pro that allows the user to highlight sections of code and create a signature based on the underlying instructions (with options to ignore bytes associated with registers, addresses, and offsets).  It also contains SigAlyzer, a tool to take an existing signature and locate the regions within the binary that match the subsignatures.
1126
+- [CASC](https://github.com/Cisco-Talos/CASC) - CASC is a plugin for IDA Pro that allows the user to highlight sections of code and create a signature based on the underlying instructions (with options to ignore bytes associated with registers, addresses, and offsets).  It also contains SigAlyzer, a tool to take an existing signature and locate the regions within the binary that match the subsignatures.
1127 1127
new file mode 100644
... ...
@@ -0,0 +1,34 @@
0
+# Trusted and Revoked Certificates
1
+
2
+Clamav 0.98 checks signed PE files for certificates and verifies each certificate in the chain against a database of trusted and revoked certificates. The signature format is
3
+
4
+```
5
+    Name;Trusted;Subject;Serial;Pubkey;Exponent;CodeSign;TimeSign;CertSign;
6
+    NotBefore;Comment[;minFL[;maxFL]]
7
+```
8
+
9
+where the corresponding fields are:
10
+
11
+- `Name:` name of the entry
12
+
13
+- `Trusted:` bit field, specifying whether the cert is trusted. 1 for trusted. 0 for revoked
14
+
15
+- `Subject:` sha1 of the Subject field in hex
16
+
17
+- `Serial:` the serial number as clamscan –debug –verbose reports
18
+
19
+- `Pubkey:` the public key in hex
20
+
21
+- `Exponent:` the exponent in hex. Currently ignored and hardcoded to 010001 (in hex)
22
+
23
+- `CodeSign:` bit field, specifying whether this cert can sign code. 1 for true, 0 for false
24
+
25
+- `TimeSign:` bit field. 1 for true, 0 for false
26
+
27
+- `CertSign:` bit field, specifying whether this cert can sign other certs. 1 for true, 0 for false
28
+
29
+- `NotBefore:` integer, cert should not be added before this variable. Defaults to 0 if left empty
30
+
31
+- `Comment:` comments for this entry
32
+
33
+The signatures for certs are stored inside `.crb` files.
0 34
new file mode 100644
... ...
@@ -0,0 +1,90 @@
0
+# Body-based Signature Content Format
1
+
2
+ClamAV stores all body-based signatures in a hexadecimal format. In this section by a hex-signature we mean a fragment of malware’s body converted into a hexadecimal string which can be additionally extended using various wildcards.
3
+
4
+## Hexadecimal format
5
+
6
+You can use `sigtool --hex-dump` to convert any data into a hex-string:
7
+
8
+```bash
9
+zolw@localhost:/tmp/test$ sigtool --hex-dump
10
+How do I look in hex?
11
+486f7720646f2049206c6f6f6b20696e206865783f0a
12
+```
13
+
14
+## Wildcards
15
+
16
+ClamAV supports the following wildcards for hex-signatures:
17
+
18
+- `??`
19
+
20
+  Match any byte.
21
+
22
+- `a?`
23
+
24
+  Match a high nibble (the four high bits).
25
+
26
+- `?a`
27
+
28
+  Match a low nibble (the four low bits).
29
+
30
+- `*`
31
+
32
+  Match any number of bytes.
33
+
34
+- `{n}`
35
+
36
+  Match `n` bytes.
37
+
38
+- `{-n}`
39
+
40
+  Match `n` or less bytes.
41
+
42
+- `{n-}`
43
+
44
+  Match `n` or more bytes.
45
+
46
+- `{n-m}`
47
+
48
+  Match between `n` and `m` bytes (where `m > n`).
49
+
50
+- `HEXSIG[x-y]aa` or `aa[x-y]HEXSIG`
51
+
52
+  Match `aa` anchored to a hex-signature, see [Bugzilla ticket 776](https://bugzilla.clamav.net/show_bug.cgi?id=776) for discussion and
53
+  examples.
54
+
55
+The range signatures `*` and `{}` virtually separate a hex-signature into two parts, eg. `aabbcc*bbaacc` is treated as two sub-signatures `aabbcc` and `bbaacc` with any number of bytes between them. It’s a requirement that each sub-signature includes a block of two static characters somewhere in its body. Note that there is one exception to this restriction; that is when the range wildcard is of the form `{n}` with `n<128`. In this case, ClamAV uses an optimization and translates `{n}` to the string consisting of `n ??` character wildcards. Character wildcards do not divide hex signatures into two parts and so the two static character requirement does not apply.
56
+
57
+## Character classes
58
+
59
+ClamAV supports the following character classes for hex-signatures:
60
+
61
+- `(B)`
62
+
63
+  Match word boundary (including file boundaries).
64
+
65
+- `(L)`
66
+
67
+  Match CR, CRLF or file boundaries.
68
+
69
+- `(W)`
70
+
71
+  Match a non-alphanumeric character.
72
+
73
+## Alternate strings
74
+
75
+- Single-byte alternates (clamav-0.96) `(aa|bb|cc|...)` or `!(aa|bb|cc|...)` Match a member from a set of bytes (eg: `aa`, `bb`, `cc`, ...).
76
+  - Negation operation can be applied to match any non-member, assumed to be one-byte in length.
77
+  - Signature modifiers and wildcards cannot be applied.
78
+
79
+- Multi-byte fixed length alternates `(aaaa|bbbb|cccc|...)` or `!(aaaa|bbbb|cccc|...)` Match a member from a set of multi-byte alternates (eg: aaaa, bbbb, cccc, ...) of n-length.
80
+  - All set members must be the same length.
81
+  - Negation operation can be applied to match any non-member, assumed to be n-bytes in length (clamav-0.98.2).
82
+  - Signature modifiers and wildcards cannot be applied.
83
+
84
+- Generic alternates (clamav-0.99) `(alt1|alt2|alt3|...)` Match a member from a set of alternates (eg: alt1, alt2, alt3, ...) that can be of variable lengths.
85
+  - Negation operation cannot be applied.
86
+  - Signature modifiers and nibble wildcards (eg: `??, a?, ?a`) can be applied.
87
+  - Ranged wildcards (eg: `{n-m}`) are limited to a fixed range of less than 128 bytes (eg: `{1} -> {127}`).
88
+
89
+Note that using signature modifiers and wildcards classifies the alternate type to be a generic alternate. Thus single-byte alternates and multi-byte fixed length alternates can use signature modifiers and wildcards but will be classified as generic alternate. This means that negation cannot be applied in this situation and there is a slight performance impact.
0 90
new file mode 100644
... ...
@@ -0,0 +1,11 @@
0
+# Bytecode Signatures
1
+
2
+Bytecode Signatures are the means by which more complex matching can be performed by writing C code to parse sample content at various stages in file extraction.
3
+
4
+It is less complicated than it sounds. Essentially the signature author writes a function in C is compiled down to an intermediate language called "bytecode". This bytecode is encoded in ASCII `.cbc` file and distributed in `bytecode.[cvd|cld]`. When the database is loaded, ClamAV can interpret this bytecode to execute the function.
5
+
6
+Bytecode functions are provided with a set of API's that may be used to access the sample data, and to access what metadata ClamAV already has concerning the sample.
7
+
8
+The function may at any time call an API to flag the sample as malicious, and may provide the signature/virus name at that time. This means a single bytecode signature (function) is written to handle a given file type and may trigger different alerts with different signature names as additional malicious characteristics for the file type are identified. That isn't to say that only one bytecode signature may be assigned to a given filetype, but that a single author may find it to be more efficient to use a bytecode signature to identify more than one type of malware.
9
+
10
+The specifics on how to write and compile bytecode signatures are outside of the scope of this documentation. Extensive documentation on ClamAV Bytecode Signatures are provided with the [ClamAV Bytecode Compiler](https://github.com/vrtadmin/clamav-bytecode-compiler).
0 11
new file mode 100644
... ...
@@ -0,0 +1,44 @@
0
+# Signatures based on container metadata
1
+
2
+ClamAV 0.96 allows creating generic signatures matching files stored inside different container types which meet specific conditions. The signature format is:
3
+
4
+```
5
+    VirusName:ContainerType:ContainerSize:FileNameREGEX:
6
+    FileSizeInContainer:FileSizeReal:IsEncrypted:FilePos:
7
+    Res1:Res2[:MinFL[:MaxFL]]
8
+```
9
+
10
+where the corresponding fields are:
11
+
12
+- `VirusName:` Virus name to be displayed when signature matches.
13
+
14
+- `ContainerType:` The file type containing the target file.  For example:
15
+  - `CL_TYPE_ZIP`,
16
+  - `CL_TYPE_RAR`,
17
+  - `CL_TYPE_ARJ`,
18
+  - `CL_TYPE_MSCAB`,
19
+  - `CL_TYPE_7Z`,
20
+  - `CL_TYPE_MAIL`,
21
+  - `CL_TYPE_(POSIX|OLD)_TAR`,
22
+  - `CL_TYPE_CPIO_(OLD|ODC|NEWC|CRC)`
23
+
24
+  Use `*` as a wild card to indicate that container type may be any file type.
25
+  For a full list of ClamAV file types, see the [ClamAV File Types Reference](ClamAVFileTypes.md).
26
+
27
+- `ContainerSize:` size of the container file itself (eg. size of the zip archive) specified in bytes as absolute value or range `x-y`.
28
+
29
+- `FileNameREGEX:` regular expression describing name of the target file
30
+
31
+- `FileSizeInContainer:` usually compressed size; for MAIL, TAR and CPIO == `FileSizeReal`; specified in bytes as absolute value or range.
32
+
33
+- `FileSizeReal:` usually uncompressed size; for MAIL, TAR and CPIO == `FileSizeInContainer`; absolute value or range.
34
+
35
+- `IsEncrypted:` 1 if the target file is encrypted, 0 if it’s not and `*` to ignore
36
+
37
+- `FilePos:` file position in container (counting from 1); absolute value or range.
38
+
39
+- `Res1:` when `ContainerType` is `CL_TYPE_ZIP` or `CL_TYPE_RAR` this field is treated as a CRC sum of the target file specified in hexadecimal format; for other container types it’s ignored.
40
+
41
+- `Res2:` not used as of ClamAV 0.96.
42
+
43
+The signatures for container files are stored inside `.cdb` files.
0 44
new file mode 100644
... ...
@@ -0,0 +1,15 @@
0
+# Database Info
1
+
2
+The `.info` file format specifies information about the other database files unpacked from a CVD or CLD database archive. This file exists for the purposes of validating the correctness of the official ClamAV database container files and cannot be loaded a la carte.
3
+
4
+The format is simply:
5
+
6
+```
7
+name:size:sha256
8
+```
9
+
10
+`name`: The database file name.
11
+
12
+`size`: The size in bytes of the database.
13
+
14
+`sha256`: A SHA256 hash of the database.
0 15
new file mode 100644
... ...
@@ -0,0 +1,81 @@
0
+# Dynamic Configuration (DCONF)
1
+
2
+ClamAV supports a limited set of configuration options that may be enabled or disabled via settings in the `*.cfg` database. At this time, these settings are distributed in `daily.cfg`.
3
+
4
+The goal of DCONF is to enable the ClamAV team to rapidly disable new or experimental features for specific ClamAV versions if a significant defect is discovered after release.
5
+
6
+This database is small, and the settings are largely vestigial. The team has not had a need to disable many features in a long time, and so the ClamAV versions in the settings at this time should no longer be in use.
7
+
8
+The strings and values referenced in `daily.cfg` are best cross-referenced with the macros and structures defined here:
9
+
10
+* https://github.com/Cisco-Talos/clamav-devel/blob/dev/0.101/libclamav/dconf.h#L49
11
+* https://github.com/Cisco-Talos/clamav-devel/blob/dev/0.101/libclamav/dconf.c#L54
12
+
13
+The format for a DCONF signature is:
14
+
15
+```
16
+Category:Flags:StartFlevel:EndFlevel
17
+```
18
+
19
+`Category` may be one of:
20
+
21
+* PE
22
+* ELF
23
+* MACHO
24
+* ARCHIVE
25
+* DOCUMENT
26
+* MAIL
27
+* OTHER
28
+* PHISHING
29
+* BYTECODE
30
+* STATS
31
+* PCRE
32
+
33
+`Flags`:
34
+
35
+Every feature that may be configured via DCONF is listed in `struct dconf_module modules` in `libclamav/dconf.c`. Any given feature may be default-on or default-off. Default-on features have the 4th field set to a `1` and default off are set to `0`. The `Flags` field for a given `Category` overrides the defaults for all of the options listed under that category. 
36
+
37
+A settings of `0x0`, for example, means that all options the category be disabled.
38
+
39
+The macros listed in `libclamav/dconf.h` will help you identify which bits to set to get the desired results.
40
+
41
+`StartFlevel`:
42
+
43
+This is the [FLEVEL](FunctionalityLevels.md) of the minimum ClamAV engine for which you want the settings to be in effect.
44
+
45
+`EndFlevel`:
46
+
47
+This is the [FLEVEL](FunctionalityLevels.md) of the maximum ClamAV engine for which you want the settings to be in effect.  You may wish to select `255` to override the defaults of future releases.
48
+
49
+## Example
50
+
51
+Consider the `OTHER_CONF_PDFNAMEOBJ` option in the `category` `OTHER`.
52
+
53
+```c
54
+#define OTHER_CONF_UUENC        0x1     // Default: 1
55
+#define OTHER_CONF_SCRENC       0x2     // Default: 1
56
+#define OTHER_CONF_RIFF         0x4     // Default: 1
57
+#define OTHER_CONF_JPEG         0x8     // Default: 1
58
+#define OTHER_CONF_CRYPTFF      0x10    // Default: 1
59
+#define OTHER_CONF_DLP          0x20    // Default: 1
60
+#define OTHER_CONF_MYDOOMLOG    0x40    // Default: 1
61
+#define OTHER_CONF_PREFILTERING 0x80    // Default: 1
62
+#define OTHER_CONF_PDFNAMEOBJ   0x100   // Default: 1
63
+#define OTHER_CONF_PRTNINTXN    0x200   // Default: 1
64
+#define OTHER_CONF_LZW          0x400   // Default: 1
65
+```
66
+
67
+All of the `OTHER` options, including `OTHER_CONF_PDFNAMEOBJ` are default-on. To disable the option for ClamAV v0.100.X but leave the other options in their default settings, we would need to set the flags to:
68
+
69
+```binary
70
+0110 1111 1111
71
+   ^pdfnameobj off
72
+```
73
+
74
+Or in hex: `0x6FF`
75
+
76
+The example setting to place in `daily.cfg` then woudl be:
77
+
78
+```
79
+OTHER:0x6FF:90:99
80
+```
0 81
new file mode 100644
... ...
@@ -0,0 +1,23 @@
0
+# Passwords for archive files \[experimental\]
1
+
2
+ClamAV 0.99 allows for users to specify password attempts for certain password-compatible archives. Passwords will be attempted in order of appearance in the password signature file which use the extension of `.pwdb`. If no passwords apply or none are provided, ClamAV will default to the original behavior of parsing the file. Currently, as of ClamAV 0.99 \[flevel 81\], only `.zip` archives using the traditional PKWARE encryption are supported. The signature format is
3
+
4
+```
5
+SignatureName;TargetDescriptionBlock;PWStorageType;Password
6
+```
7
+
8
+where:
9
+
10
+- `SignatureName`: name to be displayed during debug when a password is successful
11
+
12
+- `TargetDescriptionBlock`: provides information about the engine and target file with comma separated Arg:Val pairs
13
+  - `Engine:X-Y`: Required engine functionality level. See the [FLEVEL reference](FunctionalityLevels.md) for details.
14
+  - `Container:CL_TYPE_*`: File type of applicable containers
15
+
16
+- `PWStorageType`: determines how the password field is parsed
17
+  - 0 = cleartext
18
+  - 1 = hex
19
+
20
+- `Password`: value used in password attempt
21
+
22
+The signatures for password attempts are stored inside `.pwdb` files.
0 23
new file mode 100644
... ...
@@ -0,0 +1,37 @@
0
+# Extended signature format
1
+
2
+The extended signature format is ClamAV's most basic type of body-based signature since the deprecation of the original `.db` database format.
3
+
4
+Extended sigantures allow for specification of additional information beyond just hexidecimal content such as a file "target type", virus offset, or engine functionality level (FLEVEL), making the detection more reliable.
5
+
6
+The format is:
7
+
8
+```
9
+    MalwareName:TargetType:Offset:HexSignature[:min_flevel:[max_flevel]]
10
+```
11
+
12
+`MalwareName`: The virus name. Should conform to the standards defined [here](../Signatures.md#Signature-names).
13
+
14
+`TargetType`: A number specifying the type of the target file: [Target Types](FileTypes.md#Target-Types)
15
+
16
+`Offset`: An asterisk or a decimal number `n` possibly combined with a special modifier:
17
+
18
+- `*` = any
19
+- `n` = absolute offset
20
+- `EOF-n` = end of file minus `n` bytes
21
+
22
+Signatures for PE, ELF and Mach-O files additionally support:
23
+
24
+- `EP+n` = entry point plus n bytes (`EP+0` for `EP`)
25
+- `EP-n` = entry point minus n bytes
26
+- `Sx+n` = start of section `x`’s (counted from 0) data plus `n` bytes
27
+- `SEx` = entire section `x` (offset must lie within section boundaries)
28
+- `SL+n` = start of last section plus `n` bytes
29
+
30
+All the above offsets except `*` can be turned into **floating offsets** and represented as `Offset,MaxShift` where `MaxShift` is an unsigned integer. A floating offset will match every offset between `Offset` and `Offset+MaxShift`, eg. `10,5` will match all offsets from 10 to 15 and `EP+n,y` will match all offsets from `EP+n` to `EP+n+y`. Versions of ClamAV older than 0.91 will silently ignore the `MaxShift` extension and only use `Offset`. Optional `MinFL` and `MaxFL` parameters can restrict the signature to specific engine releases. All signatures in the extended format must be placed inside `*.ndb` files.
31
+
32
+`HexSignature`: The body-based content matching [format](BodySignatureFormat.md).
33
+
34
+`min_flevel`: (optional) The minimum ClamAV engine that the file type signature works with. See the [FLEVEL reference](FunctionalityLevels.md) for details. To be used in the event that file type support has been recently added.
35
+
36
+`max_flevel`: (optional, requires `min_flevel`) The maximum ClamAV engine that the file type signature works with. To be used in the event that file type support has been recently removed.
0 37
new file mode 100644
... ...
@@ -0,0 +1,33 @@
0
+# File Type Magic
1
+
2
+ClamAV's primary mechanism for determining file types is to match the file with a File Type Magic signature. These file type signatures are compiled into ClamAV, and may also be overridden dynamically using the definition founds found in a `*.ftm` file.
3
+
4
+The ClamAV standard signature database includes these definitions in `daily.ftm`.
5
+
6
+The signature format is not too disimilar from NDB body-based signatures.
7
+
8
+The format is:
9
+
10
+```
11
+    magictype:offset:magicbytes:name:type:type[:min_flevel[:max_flevel]]
12
+```
13
+
14
+Where:
15
+
16
+`magictype`: Supported magic types include:
17
+
18
+* 0 - direct memory comparison of `magicbytes` for file types
19
+* 1 - The `magicbytes` use the body-based content matching [format](BodySignatureFormat.md).
20
+* 4 - direct memory comparison of `magicbytes` for partition types (HFS+, HFSX)
21
+
22
+`offset`: The offset from start of the file to match against.  May be `*` if `magictype` is 1.
23
+
24
+`name`: A descriptive name for the file type.
25
+
26
+`rtype`: Usually CL_TYPE_ANY.
27
+
28
+`type`: The CL_TYPE corresponding with the file type signature. See the [CL_TYPE reference](ClamAVFileTypes.md) for details.
29
+
30
+`min_flevel`: (optional) The minimum ClamAV engine that the file type signature works with. See the [FLEVEL reference](FunctionalityLevels.md) for details. To be used in the event that file type support has been recently added.
31
+
32
+`max_flevel`: (optional, requires `min_flevel`) The maximum ClamAV engine that the file type signature works with. To be used in the event that file type support has been recently removed.
0 33
new file mode 100644
... ...
@@ -0,0 +1,120 @@
0
+# ClamAV File Types
1
+
2
+ClamAV maintains it's own file typing format and assigns these types using either:
3
+
4
+- Evaluation of a unique sequence of bytes at the start of a file ([File Type Magic](Signatures/FileTypeMagic.md)).
5
+- File type indicators when parsing container files.
6
+  - For example:
7
+    CL_TYPE_SCRIPT may be assigned to data contained in a PDF when the PDF indicates that a stream of bytes is "Javascript"
8
+- File type determination based on the names or characteristics contained within the file.
9
+  - For example:
10
+    CL_TYPE_OOXML_WORD may be assigned to a Zip file containing files with specific names.
11
+
12
+## Target Types
13
+
14
+A Target Type is an integer that indicates which kind of file the signature will match against. Target Type notation was first created for the purposes writing efficient signatures. A signature with a target type of `0` will be run against every file type, and thus is not ideal. However, the Target Type notation is limited and it may be unavoidable.
15
+
16
+Although the newer CL_TYPE string name notation has replaced the Target Type for some signature formats, many signature formats require a target type number.
17
+
18
+This is the current list of available Targe Types:
19
+
20
+- 0 = any file
21
+- 1 = Portable Executable, both 32- and 64-bit.
22
+- 2 = OLE2 containers, including their specific macros. The OLE2 format is primarily used by MS Office and MSI installation files.
23
+- 3 = HTML (normalized)
24
+- 4 = Mail file
25
+- 5 = Graphics
26
+- 6 = ELF
27
+- 7 = ASCII text file (normalized)
28
+- 8 = Unused
29
+- 9 = Mach-O files
30
+- 10 = PDF files
31
+- 11 = Flash files
32
+- 12 = Java class files
33
+
34
+**_Important_: HTML, ASCII, Javascript are all normalized.
35
+
36
+- ASCII:
37
+  - All lowercase.
38
+- HTML:
39
+  - Whitespace transformed to spaces, tags/tag attributes normalized, all lowercase.
40
+- Javascript:
41
+  - All strings are normalized (hex encoding is decoded), numbers are parsed and normalized, local variables/function names are normalized to ’n001’ format, argument to eval() is parsed as JS again, unescape() is handled, some simple JS packers are handled, output is whitespace normalized.
42
+
43
+## CL_TYPEs
44
+
45
+ClamAV Types are prefixed with `CL_TYPE_`.  The following is an exhaustive list of all current CL_TYPE's.
46
+
47
+| CL_TYPE                | Description                                                  |
48
+|------------------------|--------------------------------------------------------------|
49
+| `CL_TYPE_7Z`           | 7-Zip Archive                                                |
50
+| `CL_TYPE_7ZSFX`        | Self-Extracting 7-Zip Archive                                |
51
+| `CL_TYPE_APM`          | Disk Image - Apple Partition Map                             |
52
+| `CL_TYPE_ARJ`          | ARJ Archive                                                  |
53
+| `CL_TYPE_ARJSFX`       | Self-Extracting ARJ Archive                                  |
54
+| `CL_TYPE_AUTOIT`       | AutoIt Automation Executable                                 |
55
+| `CL_TYPE_BINARY_DATA`  | binary data                                                  |
56
+| `CL_TYPE_BINHEX`       | BinHex Macintosh 7-bit ASCII email attachment encoding       |
57
+| `CL_TYPE_BZ`           | BZip Compressed File                                         |
58
+| `CL_TYPE_CABSFX`       | Self-Extracting Microsoft CAB Archive                        |
59
+| `CL_TYPE_CPIO_CRC`     | CPIO Archive (CRC)                                           |
60
+| `CL_TYPE_CPIO_NEWC`    | CPIO Archive (NEWC)                                          |
61
+| `CL_TYPE_CPIO_ODC`     | CPIO Archive (ODC)                                           |
62
+| `CL_TYPE_CPIO_OLD`     | CPIO Archive (OLD, Little Endian or Big Endian)              |
63
+| `CL_TYPE_CRYPTFF`      | Files encrypted by CryptFF malware                           |
64
+| `CL_TYPE_DMG`          | Apple DMG Archive                                            |
65
+| `CL_TYPE_ELF`          | ELF Executable (Linux/Unix program or library)               |
66
+| `CL_TYPE_GPT`          | Disk Image - GUID Partition Table                            |
67
+| `CL_TYPE_GRAPHICS`     | TIFF (Little Endian or Big Endian)                           |
68
+| `CL_TYPE_GZ`           | GZip Compressed File                                         |
69
+| `CL_TYPE_HTML_UTF16`   | Wide-Character / UTF16 encoded HTML                          |
70
+| `CL_TYPE_HTML`         | HTML data                                                    |
71
+| `CL_TYPE_HWP3`         | Hangul Word Processor (3.X)                                  |
72
+| `CL_TYPE_HWPOLE2`      | Hangul Word Processor embedded OLE2                          |
73
+| `CL_TYPE_INTERNAL`     | Internal properties                                          |
74
+| `CL_TYPE_ISHIELD_MSI`  | Windows Install Shield MSI installer                         |
75
+| `CL_TYPE_ISO9660`      | ISO 9660 file system for optical disc media                  |
76
+| `CL_TYPE_JAVA`         | Java Class File                                              |
77
+| `CL_TYPE_LNK`          | Microsoft Windows Shortcut File                              |
78
+| `CL_TYPE_MACHO_UNIBIN` | Universal Binary/Java Bytecode                               |
79
+| `CL_TYPE_MACHO`        | Apple/NeXTSTEP Mach-O Executable file format                 |
80
+| `CL_TYPE_MAIL`         | Email file                                                   |
81
+| `CL_TYPE_MBR`          | Disk Image - Master Boot Record                              |
82
+| `CL_TYPE_MHTML`        | MHTML Saved Web Page                                         |
83
+| `CL_TYPE_MSCAB`        | Microsoft CAB Archive                                        |
84
+| `CL_TYPE_MSCHM`        | Microsoft CHM help archive                                   |
85
+| `CL_TYPE_MSEXE`        | Microsoft EXE / DLL Executable file                          |
86
+| `CL_TYPE_MSOLE2`       | Microsoft OLE2 Container file                                |
87
+| `CL_TYPE_MSSZDD`       | Microsoft Compressed EXE                                     |
88
+| `CL_TYPE_NULSFT`       | NullSoft Scripted Installer program                          |
89
+| `CL_TYPE_OLD_TAR`      | TAR archive (old)                                            |
90
+| `CL_TYPE_OOXML_HWP`    | Hangul Office Open Word Processor (5.X)                      |
91
+| `CL_TYPE_OOXML_PPT`    | Microsoft Office Open XML PowerPoint                         |
92
+| `CL_TYPE_OOXML_WORD`   | Microsoft Office Open Word 2007+                             |
93
+| `CL_TYPE_OOXML_XL`     | Microsoft Office Open Excel 2007+                            |
94
+| `CL_TYPE_PART_HFSPLUS` | Apple HFS+ partition                                         |
95
+| `CL_TYPE_PDF`          | Adobe PDF document                                           |
96
+| `CL_TYPE_POSIX_TAR`    | TAR archive                                                  |
97
+| `CL_TYPE_PS`           | Postscript                                                   |
98
+| `CL_TYPE_RAR`          | RAR Archive                                                  |
99
+| `CL_TYPE_RARSFX`       | Self-Extracting RAR Archive                                  |
100
+| `CL_TYPE_RIFF`         | Resource Interchange File Format container formatted file    |
101
+| `CL_TYPE_RTF`          | Rich Text Format document                                    |
102
+| `CL_TYPE_SCRENC`       | Files encrypted by ScrEnc malware                            |
103
+| `CL_TYPE_SCRIPT`       | Generic type for scripts (Javascript, Python, etc)           |
104
+| `CL_TYPE_SIS`          | Symbian OS Software Installation Script Archive              |
105
+| `CL_TYPE_SWF`          | Adobe Flash File (LZMA, Zlib, or uncompressed)               |
106
+| `CL_TYPE_TEXT_ASCII`   | ASCII text                                                   |
107
+| `CL_TYPE_TEXT_UTF16BE` | UTF-16BE text                                                |
108
+| `CL_TYPE_TEXT_UTF16LE` | UTF-16LE text                                                |
109
+| `CL_TYPE_TEXT_UTF8`    | UTF-8 text                                                   |
110
+| `CL_TYPE_TNEF`         | Microsoft Outlook & Exchange email attachment format         |
111
+| `CL_TYPE_UUENCODED`    | UUEncoded (Unix-to-Unix) binary file (Unix email attachment) |
112
+| `CL_TYPE_XAR`          | XAR Archive                                                  |
113
+| `CL_TYPE_XDP`          | Adobe XDP - Embedded PDF                                     |
114
+| `CL_TYPE_XML_HWP`      | Hangul Word Processor XML (HWPML) Document                   |
115
+| `CL_TYPE_XML_WORD`     | Microsoft Word 2003 XML Document                             |
116
+| `CL_TYPE_XML_XL`       | Microsoft Excel 2003 XML Document                            |
117
+| `CL_TYPE_XZ`           | XZ Archive                                                   |
118
+| `CL_TYPE_ZIP`          | Zip Archive                                                  |
119
+| `CL_TYPE_ZIPSFX`       | Self-Extracting Zip Archive                                  |
0 120
new file mode 100644
... ...
@@ -0,0 +1,30 @@
0
+# Functionality Levels (FLEVELs)
1
+
2
+The Functionality Level (or FLEVEL) is an integer that signatures may use to define which versions of ClamAV the signature features support. It is up to the signature writers to select the correct FLEVEL or range of FLEVELs when writing a signature so that it does not cause failures in older versions of ClamAV.
3
+
4
+Setting appropriate FLEVELs in signatures is particularly crucial when using features added in the last 3-4 major release versions.
5
+
6
+## ClamAV Version to FLEVEL chart
7
+
8
+| flevel | version | release | new signature features                                                 |
9
+|--------|---------|---------|------------------------------------------------------------------------|
10
+| 41     | 0.95.0  | 3/2009  | Ignores use ign format (including line number).                        |
11
+| 51     | 0.96.0  | 3/2010  | Bytecode & CDB sigs. Start using ign2.                                 |
12
+| 56     | 0.96.4  | 10/2010 | Min level for bytecode sigs.                                           |
13
+| 60     | 0.97.0  | 2/2011  |                                                                        |
14
+| 74     | 0.98.0  | 9/2013  | ISO9660 scanning support. All-match feature.                           |
15
+|        |         |         | Wild card bracket notation{} for body-based signatures.                |
16
+|        |         |         | "SE" offset modifier.                                                  |
17
+|        |         |         | Target types 10 - 13: (PDF, (SWF) Flash, Java, Internal).              |
18
+| 76     | 0.98.1  | 1/2014  | XZ support and ForceToDisk scan option.                                |
19
+|        |         |         | Libxml2, XAR, DMG, HFS+/HFSX.                                          |
20
+|        |         |         | FTM type 4 (in-buffer partition magic, analogous to type 0 for files). |
21
+| 79     | 0.98.5  | 11/2014 | File properties (preclass). Target type 13: for preclass feature.      |
22
+| 81     | 0.99.0  | 11/2015 | Yara and PCRE support. Target type 14: non-listed types ("other").     |
23
+| 82     | 0.99.1  | 2/2016  | Hangul Word Processor (HWP) type file parser.                          |
24
+| 90     | 0.100   | 4/2018  | "Intermediates" logical sig expression option.                         |
25
+|        |         |         | MHTML and PostScript types.                                            |
26
+|        |         |         | Substring wildcard (*) fix: order matters, substrings can't overlap.   |
27
+| 100    | 0.101   | 12/2018 | "Byte-Compare" Logical subsignature. Windows Shortcut (LNK) type.      |
28
+
29
+For more inforamtion on ClamAV file type support, see the [File Types Reference](FileTypes.md).
0 30
new file mode 100644
... ...
@@ -0,0 +1,67 @@
0
+# File hash signatures
1
+
2
+The easiest way to create signatures for ClamAV is to use filehash checksums, however this method can be only used against static malware.
3
+
4
+## MD5 hash-based signatures
5
+
6
+To create a MD5 signature for `test.exe` use the `--md5` option of
7
+sigtool:
8
+
9
+```bash
10
+zolw@localhost:/tmp/test$ sigtool --md5 test.exe > test.hdb
11
+zolw@localhost:/tmp/test$ cat test.hdb
12
+48c4533230e1ae1c118c741c0db19dfb:17387:test.exe
13
+```
14
+
15
+That’s it! The signature is ready for use:
16
+
17
+```bash
18
+zolw@localhost:/tmp/test$ clamscan -d test.hdb test.exe
19
+test.exe: test.exe FOUND
20
+
21
+----------- SCAN SUMMARY -----------
22
+Known viruses: 1
23
+Scanned directories: 0
24
+Engine version: 0.92.1
25
+Scanned files: 1
26
+Infected files: 1
27
+Data scanned: 0.02 MB
28
+Time: 0.024 sec (0 m 0 s)
29
+```
30
+
31
+You can change the name (by default sigtool uses the name of the file) and place it inside a `*.hdb` file. A single database file can include any number of signatures. To get them automatically loaded each time `clamscan`/`clamd` starts just copy the database file(s) into the local virus database directory (eg. `/usr/local/share/clamav`).
32
+
33
+*The hash-based signatures shall not be used for text files, HTML and any other data that gets internally preprocessed before pattern matching. If you really want to use a hash signature in such a case, run `clamscan` with `--debug` and `--leave-temps` flags as described above and create a signature for a preprocessed file left in `/tmp`. Please keep in mind that a hash signature will stop matching as soon as a single byte changes in the target file.*
34
+
35
+## SHA1 and SHA256 hash-based signatures
36
+
37
+ClamAV 0.98 has also added support for SHA1 and SHA256 file checksums. The format is the same as for MD5 file checksum. It can differentiate between them based on the length of the hash string in the signature. For best backwards compatibility, these should be placed inside a `*.hsb` file. The format is:
38
+
39
+```
40
+HashString:FileSize:MalwareName
41
+```
42
+
43
+## Hash signatures with unknown size
44
+
45
+ClamAV 0.98 has also added support for hash signatures where the size is not known but the hash is. It is much more performance-efficient to use signatures with specific sizes, so be cautious when using this feature. For these cases, the ’\*’ character can be used in the size field. To ensure proper backwards compatibility with older versions of ClamAV, these signatures must have a minimum functional level of 73 or higher. Signatures that use the wildcard size without this level set will be rejected as malformed.
46
+
47
+Sample .hsb signature matching any size:
48
+```
49
+    HashString:*:MalwareName:73
50
+```
51
+Sample .msb signature matching any size:
52
+```
53
+    *:PESectionHash:MalwareName:73
54
+```
55
+
56
+## PE section based hash signatures
57
+
58
+You can create a hash signature for a specific section in a PE file. Such signatures shall be stored inside `.mdb` (MD5) and `.msb` files in the following format:
59
+
60
+```
61
+    PESectionSize:PESectionHash:MalwareName
62
+```
63
+
64
+The easiest way to generate MD5 based section signatures is to extract target PE sections into separate files and then run sigtool with the option `--mdb`
65
+
66
+ClamAV 0.98 has also added support for SHA1 and SHA256 section based signatures. The format is the same as for MD5 PE section based signatures. It can differentiate between them based on the length of the hash string in the signature. For best backwards compatibility, these should be placed inside a `*.msb` file.
0 67
new file mode 100644
... ...
@@ -0,0 +1,351 @@
0
+# Logical signatures
1
+
2
+Logical signatures allow combining of multiple signatures in extended format using logical operators. They can provide both more detailed and flexible pattern matching. The logical sigs are stored inside `*.ldb` files in the following format:
3
+
4
+```
5
+SignatureName;TargetDescriptionBlock;LogicalExpression;Subsig0;
6
+Subsig1;Subsig2;...
7
+```
8
+
9
+where:
10
+
11
+- `TargetDescriptionBlock` provides information about the engine and target file with comma separated `Arg:Val` pairs. For args where `Val` is a range, the minimum and maximum values should be expressed as `min-max`.
12
+
13
+- `LogicalExpression` specifies the logical expression describing the relationship between `Subsig0...SubsigN`. **Basis clause:** 0,1,...,N decimal indexes are SUB-EXPRESSIONS representing `Subsig0, Subsig1,...,SubsigN` respectively. **Inductive clause:** if `A` and `B` are SUB-EXPRESSIONS and `X, Y` are decimal numbers then `(A&B)`, `(A|B)`, `A=X`, `A=X,Y`, `A>X`, `A>X,Y`, `A<X` and `A<X,Y` are SUB-EXPRESSIONS
14
+
15
+- `SubsigN` is n-th subsignature in extended format possibly preceded with an offset. There can be specified up to 64 subsigs.
16
+
17
+Keywords used in `TargetDescriptionBlock`:
18
+
19
+- `Target:X`: A number specifying the type of the target file: [Target Types](FileTypes.md#Target-Types).
20
+
21
+- `Engine:X-Y`: Required engine functionality level (range; 0.96). Note that if the `Engine` keyword is used, it must be the first one in the `TargetDescriptionBlock` for backwards compatibility. See the [FLEVEL reference](FunctionalityLevels.md) for details.
22
+
23
+- `FileSize:X-Y`: Required file size (range in bytes; 0.96)
24
+
25
+- `EntryPoint`: Entry point offset (range in bytes; 0.96)
26
+
27
+- `NumberOfSections`: Required number of sections in executable (range; 0.96)
28
+
29
+- `Container:CL_TYPE_*`: File type of the container which stores the scanned file.
30
+
31
+  Specifying `CL_TYPE_ANY` matches on root objects only (i.e. the target file is explicitely _not_ in a container). Chances slim that you would want to use `CL_TYPE_ANY` in a signature, because placing the malicious file in an archive will then prevent it from alerting.
32
+
33
+  Every ClamAV file type has the potential to be a container for additional files, although some are more likely than others. When a file is parsed and data in the file is identified to be scanned as a unique type, that parent file becomes a container the moment the embedded content is scanned. For a list of possible CL_TYPEs, refer to the [File Types Reference](ClamAVFileTypes.md).
34
+
35
+- `Intermediates:CL_TYPE_*>CL_TYPE_*`: Specify one or more layers of file types containing the scanned file. _This is an alternative to using `Container`._
36
+
37
+  You may specify up to 16 layers of file types separated by ’`>`’ in top-down order. Note that the ’`>`’ separator is not needed if you only specify a single container. The last type should be the immediate container containing the malicious file. Unlike with the `Container` option, `CL_TYPE_ANY` can be used as a wildcard file type. (expr; 0.100.0)
38
+
39
+  For a list of possible CL_TYPEs, refer to the [File Types Reference](ClamAVFileTypes.md).
40
+
41
+- `IconGroup1`: Icon group name 1 from .idb signature Required engine functionality (range; 0.96)
42
+
43
+- `IconGroup2`: Icon group name 2 from .idb signature Required engine functionality (range; 0.96)
44
+
45
+Modifiers for subexpressions:
46
+
47
+- `A=X`: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched exactly X times; if it refers to a (logical) block of signatures then this block must generate exactly X matches (with any of its sigs).
48
+
49
+- `A=0` specifies negation (signature or block of signatures cannot be matched)
50
+
51
+- `A=X,Y`: If the SUB-EXPRESSION A refers to a single signature then this signature must be matched exactly X times; if it refers to a (logical) block of signatures then this block must generate X matches and at least Y different signatures must get matched.
52
+
53
+- `A>X`: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches (with any of its sigs).
54
+
55
+- `A>X,Y`: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches _and_ at least Y different signatures must be matched.
56
+
57
+- `A<X`: Just like `A>Z` above with the change of "more" to "less".
58
+
59
+  If the SUB-EXPRESSION A refers to a single signature then this signature must get matched less than X times; if it refers to a (logical) block of signatures then this block must generate less than X matches (with any of its sigs).
60
+
61
+- `A<X,Y`: Similar to `A>X,Y`. If the SUB-EXPRESSION A refers to a single signature then this signature must get matched less than X times; if it refers to a (logical) block of signatures then this block must generate less than X matches _and_ at least Y different signatures must be matched.
62
+
63
+Examples:
64
+
65
+```
66
+Sig1;Target:0;(0&1&2&3)&(4|1);6b6f74656b;616c61;7a6f6c77;7374656
67
+6616e;deadbeef
68
+
69
+Sig2;Target:0;((0|1|2)>5,2)&(3|1);6b6f74656b;616c61;7a6f6c77;737
70
+46566616e
71
+
72
+Sig3;Target:0;((0|1|2|3)=2)&(4|1);6b6f74656b;616c61;7a6f6c77;737
73
+46566616e;deadbeef
74
+
75
+Sig4;Engine:51-255,Target:1;((0|1)&(2|3))&4;EP+123:33c06834f04100
76
+f2aef7d14951684cf04100e8110a00;S2+78:22??232c2d252229{-15}6e6573
77
+(63|64)61706528;S3+50:68efa311c3b9963cb1ee8e586d32aeb9043e;f9c58
78
+dcf43987e4f519d629b103375;SL+550:6300680065005c0046006900
79
+```
80
+
81
+## Subsignature Modifiers
82
+
83
+ClamAV (clamav-0.99) supports a number of additional subsignature
84
+modifiers for logical signatures. This is done by specifying `::`
85
+followed by a number of characters representing the desired options.
86
+Signatures using subsignature modifiers require `Engine:81-255` for
87
+backwards-compatibility.
88
+
89
+- Case-Insensitive \[`i`\]
90
+
91
+  Specifying the `i` modifier causes ClamAV to match all alphabetic hex bytes as case-insensitive. All patterns in ClamAV are case-sensitive by default.
92
+
93
+- Wide \[`w`\]
94
+
95
+  Specifying the `w` causes ClamAV to match all hex bytes encoded with two bytes per character. Note this simply interweaves each character with NULL characters and does not truly support UTF-16 characters. Wildcards for ’wide’ subsignatures are not treated as wide (i.e. there can be an odd number of intermittent characters). This can be combined with `a` to search for patterns in both wide and ascii.
96
+
97
+- Fullword \[`f`\]
98
+
99
+  Match subsignature as a fullword (delimited by non-alphanumeric characters).
100
+
101
+- Ascii \[`a`\]
102
+
103
+  Match subsignature as ascii characters. This can be combined with `w` to search for patterns in both ascii and wide.
104
+
105
+Examples:
106
+
107
+```
108
+clamav-nocase-A;Engine:81-255,Target:0;0&1;41414141::i;424242424242::i
109
+    -matches 'AAAA'(nocase) and 'BBBBBB'(nocase)
110
+
111
+clamav-fullword-A;Engine:81-255,Target:0;0&1;414141;68656c6c6f::f
112
+    -matches 'AAA' and 'hello'(fullword)
113
+clamav-fullword-B;Engine:81-255,Target:0;0&1;414141;68656c6c6f::fi
114
+    -matches 'AAA' and 'hello'(fullword nocase)
115
+
116
+clamav-wide-B2;Engine:81-255,Target:0;0&1;414141;68656c6c6f::wa
117
+    -matches 'AAA' and 'hello'(wide ascii)
118
+clamav-wide-C0;Engine:81-255,Target:0;0&1;414141;68656c6c6f::iwfa
119
+    -matches 'AAA' and 'hello'(nocase wide fullword ascii)
120
+```
121
+
122
+## Special Subsignature Types
123
+
124
+### Macro subsignatures
125
+
126
+Introduced in ClamAV 0.96
127
+
128
+Format: `${min-max}MACROID$`
129
+
130
+Macro subsignatures are used to combine a number of existing extended
131
+signatures (`.ndb`) into a on-the-fly generated alternate string logical
132
+signature (`.ldb`). Signatures using macro subsignatures require
133
+`Engine:51-255` for backwards-compatibility.
134
+
135
+Example:
136
+
137
+```
138
+      test.ldb:
139
+        TestMacro;Engine:51-255,Target:0;0&1;616161;${6-7}12$
140
+
141
+      test.ndb:
142
+        D1:0:$12:626262
143
+        D2:0:$12:636363
144
+        D3:0:$30:626264
145
+```
146
+
147
+The example logical signature `TestMacro` is functionally equivalent
148
+to:
149
+
150
+```
151
+`TestMacro;Engine:51-255,Target:0;0;616161{3-4}(626262|636363)`
152
+```
153
+
154
+- `MACROID` points to a group of signatures; there can be at most 32 macro groups.
155
+
156
+  - In the example, `MACROID` is `12` and both `D1` and `D2` are members of macro group `12`. `D3` is a member of separate macro group `30`.
157
+
158
+- `{min-max}` specifies the offset range at which one of the group signatures should match; the offset range is relative to the starting offset of the preceding subsignature. This means a macro subsignature cannot be the first subsignature.
159
+
160
+  - In the example, `{min-max}` is `{6-7}` and it is relative to the start of a `616161` match.
161
+
162
+- For more information and examples please see <https://bugzilla.clamav.net/show_bug.cgi?id=164>.
163
+
164
+### Byte Compare Subsignatures
165
+
166
+Introduced in ClamAV 0.101
167
+
168
+Format: `subsigid_trigger(offset#byte_options#comparisons)`
169
+
170
+Byte compare subsignatures can be used to evaluate a numeric value at a given offset from the start of another (matched) subsignature within the same logical signature. These are executed after all other subsignatures within the logical subsignature are fired, with the exception of PCRE subsignatures. They can evaluate offsets only from a single referenced subsignature, and that subsignature must give a valid match for the evaluation to occur.
171
+
172
+- `subsigid_trigger` is a required field and may refer to any single non-PCRE, non-Byte Compare subsignature within the lsig. The byte compare subsig will evaluate if `subsigid_trigger` matches. Triggering on multiple subsigs or logic based triggering is not currently supported.
173
+
174
+- `offset` is a required field that consists of an `offset_modifier` and a numeric `offset` (hex or decimal offsets are okay).
175
+
176
+  - `offset_modifier` can be either `>>` or `<<` where the former denotes a positive offset and the latter denotes a negative offset. The offset is calculated from the start of `subsigid_trigger`, which allows for byte extraction before the specified match, after the match, and within the match itself.
177
+
178
+  - `offset` must be a positive hex or decimal value. This will be the number of bytes from the start of the referenced `subsigid_trigger` match within the file buffer to begin the comparison.
179
+
180
+- `byte_options` are used to specify the numeric type and endianess of the extracted byte sequence in that order as well as the number of bytes to be read. By default ClamAV will attempt to matchup up to the number of byte specified, unless the `e` (exact) option is specified or the numeric type is `b` (binary).  This field follows the form `[h|d|a|i][l|b][e]num_bytes`
181
+
182
+  - `h|d|a|i` where `h` specifies the byte sequence will be in hex, `d` decimal, `a` automatic detection of hex or decimal at runtime, and `i` signifies raw binary data.
183
+
184
+  - `l|b` where `l` specifies the byte sequence will be in little endian order and `b` big endian. If decimal `d` is specified, big-endian is implied and using `l` will result in a malformed database error.
185
+
186
+  - `e` specifies that ClamAV will only evaluate the comparison if it can extract the exact number of bytes specified. This option is implicitly declared when using the `i` flag.
187
+
188
+  - `num_bytes` specifies the number of bytes to extract. This can be a hex or decimal value. If `i` is specified only 1, 2, 4, and 8 are valid options.
189
+
190
+- `comparisons` are a required field which denotes how to evaluate the extracted byte sequence. Each Byte Compare signature can have one or two `comparison_sets` separated by a comma. Each `comparison_set` consists of a `Comparison_symbol` and a `Comparison_value` and takes the form `Comparison_symbolComparison_value`. Thus, `comparisons` takes the form `comparison_set[,comparison_set]`
191
+
192
+  - `Comparison_symbol` denotes the type of comparison to be done. The supported comparison symbols are `<`, `>`, `=`.
193
+
194
+  - `Comparison_value` is a required field which must be a numeric hex or decimal value. If all other conditions are met, the byte compare subsig will evalutate the extracted byte sequence against this number based on the provided `comparison_symbol`.
195
+
196
+### PCRE subsignatures
197
+
198
+Introduced in ClamAV 0.99
199
+
200
+Format: `Trigger/PCRE/[Flags]`
201
+
202
+PCRE subsignatures are used within a logical signature (`.ldb`) to specify regex matches that execute once triggered by a conditional based on preceding subsignatures. Signatures using PCRE subsignatures require `Engine:81-255` for backwards-compatibility.
203
+
204
+- `Trigger` is a required field that is a valid `LogicalExpression` and may refer to any subsignatures that precede this subsignature. Triggers cannot be self-referential and cannot refer to subsequent subsignatures.
205
+
206
+- `PCRE` is the expression representing the regex to execute. `PCRE` must be delimited by ’/’ and usage of ’/’ within the expression need to be escaped. For backward compatibility, ’;’ within the expression must be expressed as ’`\x3B`’. `PCRE` cannot be empty and (?UTF\*) control sequence is not allowed. If debug is specified, named capture groups are displayed in a post-execution report.
207
+
208
+- `Flags` are a series of characters which affect the compilation and execution of `PCRE` within the PCRE compiler and the ClamAV engine. This field is optional.
209
+
210
+  - `g [CLAMAV_GLOBAL]` specifies to search for ALL matches of PCRE (default is to search for first match). NOTE: INCREASES the time needed to run the PCRE.
211
+
212
+  - `r [CLAMAV_ROLLING]` specifies to use the given offset as the starting location to search for a match as opposed to the only location; applies to subsigs without maxshifts. By default, in order to facilatate normal ClamAV offset behavior, PCREs are auto-anchored (only attempt match on first offset); using the rolling option disables the auto-anchoring.
213
+
214
+  - `e [CLAMAV_ENCOMPASS]` specifies to CONFINE matching between the specified offset and maxshift; applies only when maxshift is specified. Note: DECREASES time needed to run the PCRE.
215
+
216
+  - `i [PCRE_CASELESS]`
217
+
218
+  - `s [PCRE_DOTALL]`
219
+
220
+  - `m [PCRE_MULTILINE]`
221
+
222
+  - `x [PCRE_EXTENDED]`
223
+
224
+  - `A [PCRE_ANCHORED]`
225
+
226
+  - `E [PCRE_DOLLAR_ENODNLY]`
227
+
228
+  - `U [PCRE_UNGREEDY]`
229
+
230
+Examples:
231
+
232
+```
233
+Find.All.ClamAV;Engine:81-255,Target:0;1;6265676c6164697427736e6f7462797465636f6465;0/clamav/g
234
+
235
+Find.ClamAV.OnlyAt.299;Engine:81-255,Target:0;2;7374756c747a67657473;7063726572656765786c6f6c;299:0&1/clamav/
236
+
237
+Find.ClamAV.StartAt.300;Engine:81-255,Target:0;3;616c61696e;62756731393238;636c6f736564;300:0&1&2/clamav/r
238
+
239
+Find.All.Encompassed.ClamAV;Engine:81-255,Target:0;3;7768796172656e2774;796f757573696e67;79617261;200,300:0&1&2/clamav/ge
240
+
241
+Named.CapGroup.Pcre;Engine:81-255,Target:0;3;636f75727479617264;616c62756d;74657272696572;50:0&1&2/variable=(?<nilshell>.{16})end/gr
242
+
243
+Firefox.TreeRange.UseAfterFree;Engine:81-255,Target:0,Engine:81-255;0&1&2;2e766965772e73656c656374696f6e;2e696e76616c696461746553656c656374696f6e;0&1/\x2Eview\x2Eselection.*?\x2Etree\s*\x3D\s*null.*?\x2Einvalidate/smi
244
+
245
+Firefox.IDB.UseAfterFree;Engine:81-255,Target:0;0&1;4944424b657952616e6765;0/^\x2e(only|lowerBound|upperBound|bound)\x28.*?\x29.*?\x2e(lower|upper|lowerOpen|upperOpen)/smi
246
+
247
+Firefox.boundElements;Engine:81-255,Target:0;0&1&2;6576656e742e6
248
+26f756e64456c656d656e7473;77696e646f772e636c6f7365;0&1/on(load|click)\s*=\s*\x22?window\.close\s*\x28/si
249
+```
250
+
251
+## Signatures for Version Information (VI) metadata in PE files
252
+
253
+Starting with ClamAV 0.96 it is possible to easily match certain information built into PE files (executables and dynamic link libraries). Whenever you lookup the properties of a PE executable file in windows, you are presented with a bunch of details about the file itself.
254
+
255
+These info are stored in a special area of the file resources which goes under the name of `VS_VERSION_INFORMATION` (or versioninfo for short). It is divided into 2 parts. The first part (which is rather uninteresting) is really a bunch of numbers and flags indicating the product and file version. It was originally intended for use with installers which, after parsing it, should be able to determine whether a certain executable or library are to be upgraded/overwritten or are already up to date. Suffice to say, this approach never really worked and is generally never used.
256
+
257
+The second block is much more interesting: it is a simple list of key/value strings, intended for user information and completely ignored by the OS. For example, if you look at ping.exe you can see the company being *"Microsoft Corporation"*, the description *"TCP/IP Ping command"*, the internal name *"ping.exe"* and so on... Depending on the OS version, some keys may be given peculiar visibility in the file properties dialog, however they are internally all the same.
258
+
259
+To match a versioninfo key/value pair, the special file offset anchor `VI` was introduced. This is similar to the other anchors (like `EP` and `SL`) except that, instead of matching the hex pattern against a single offset, it checks it against each and every key/value pair in the file. The `VI` token doesn’t need nor accept a `+/-` offset like e.g. `EP+1`. As for the hex signature itself, it’s just the utf16 dump of the key and value. Only the `??` and `(aa|bb)` wildcards are allowed in the signature. Usually, you don’t need to bother figuring it out: each key/value pair together with the corresponding VI-based signature is printed by `clamscan` when the `--debug` option is given.
260
+
261
+For example `clamscan --debug freecell.exe` produces:
262
+
263
+```bash
264
+[...]
265
+Recognized MS-EXE/DLL file
266
+in cli_peheader
267
+versioninfo_cb: type: 10, name: 1, lang: 410, rva: 9608
268
+cli_peheader: parsing version info @ rva 9608 (1/1)
269
+VersionInfo (d2de): 'CompanyName'='Microsoft Corporation' -
270
+VI:43006f006d00700061006e0079004e0061006d006500000000004d006900
271
+630072006f0073006f0066007400200043006f00720070006f0072006100740
272
+069006f006e000000
273
+VersionInfo (d32a): 'FileDescription'='Entertainment Pack
274
+FreeCell Game' - VI:460069006c006500440065007300630072006900700
275
+0740069006f006e000000000045006e007400650072007400610069006e006d
276
+0065006e00740020005000610063006b0020004600720065006500430065006
277
+c006c002000470061006d0065000000
278
+VersionInfo (d396): 'FileVersion'='5.1.2600.0 (xpclient.010817
279
+-1148)' - VI:460069006c006500560065007200730069006f006e00000000
280
+0035002e0031002e0032003600300030002e003000200028007800700063006
281
+c00690065006e0074002e003000310030003800310037002d00310031003400
282
+380029000000
283
+VersionInfo (d3fa): 'InternalName'='freecell' - VI:49006e007400
284
+650072006e0061006c004e0061006d006500000066007200650065006300650
285
+06c006c000000
286
+VersionInfo (d4ba): 'OriginalFilename'='freecell' - VI:4f007200
287
+6900670069006e0061006c00460069006c0065006e0061006d0065000000660
288
+0720065006500630065006c006c000000
289
+VersionInfo (d4f6): 'ProductName'='Sistema operativo Microsoft
290
+Windows' - VI:500072006f0064007500630074004e0061006d00650000000
291
+000530069007300740065006d00610020006f00700065007200610074006900
292
+76006f0020004d006900630072006f0073006f0066007400ae0020005700690
293
+06e0064006f0077007300ae000000
294
+VersionInfo (d562): 'ProductVersion'='5.1.2600.0' - VI:50007200
295
+6f006400750063007400560065007200730069006f006e00000035002e00310
296
+02e0032003600300030002e0030000000
297
+[...]
298
+```
299
+
300
+Although VI-based signatures are intended for use in logical signatures you can test them using ordinary `.ndb` files. For example:
301
+
302
+```
303
+    my_test_vi_sig:1:VI:paste_your_hex_sig_here
304
+```
305
+
306
+Final note. If you want to decode a VI-based signature into a human readable form you can use:
307
+
308
+```bash
309
+echo hex_string | xxd -r -p | strings -el
310
+```
311
+
312
+For example:
313
+
314
+```bash
315
+$ echo 460069006c0065004400650073006300720069007000740069006f006e
316
+000000000045006e007400650072007400610069006e006d0065006e007400200
317
+05000610063006b0020004600720065006500430065006c006c00200047006100
318
+6d0065000000 | xxd -r -p | strings -el
319
+FileDescription
320
+Entertainment Pack FreeCell Game
321
+```
322
+
323
+## Icon Signatures for PE files
324
+
325
+While Icon Signatures are stored in a `.idb` file, they are a feature of Logical Signatures.
326
+
327
+ClamAV 0.96 includes an approximate/fuzzy icon matcher to help detecting malicious executables disguising themselves as innocent looking image files, office documents and the like.
328
+
329
+Icon matching is only triggered by Logical Signatures (`.ldb`) using the special attribute tokens `IconGroup1` or `IconGroup2`. These identify two (optional) groups of icons defined in a `.idb` database file. The format of the `.idb` file is:
330
+
331
+```
332
+    ICONNAME:GROUP1:GROUP2:ICON_HASH
333
+```
334
+
335
+where:
336
+
337
+- `ICON_NAME` is a unique string identifier for a specific icon,
338
+
339
+- `GROUP1` is a string identifier for the first group of icons (`IconGroup1`)
340
+
341
+- `GROUP2` is a string identifier for the second group of icons (`IconGroup2`),
342
+
343
+- `ICON_HASH` is a fuzzy hash of the icon image
344
+
345
+The `ICON_HASH` field can be obtained from the debug output of libclamav. For example:
346
+
347
+```bash
348
+LibClamAV debug: ICO SIGNATURE:
349
+ICON_NAME:GROUP1:GROUP2:18e2e0304ce60a0cc3a09053a30000414100057e000afe0000e 80006e510078b0a08910d11ad04105e0811510f084e01040c080a1d0b0021000a39002a41
350
+```
0 351
new file mode 100644
... ...
@@ -0,0 +1,682 @@
0
+# PhishSigs
1
+
2
+Table of Contents
3
+- [PhishSigs](#phishsigs)
4
+- [Database file format](#database-file-format)
5
+    - [PDB format](#pdb-format)
6
+    - [GDB format](#gdb-format)
7
+    - [WDB format](#wdb-format)
8
+    - [Hints](#hints)
9
+    - [Examples of PDB signatures](#examples-of-pdb-signatures)
10
+    - [Examples of WDB signatures](#examples-of-wdb-signatures)
11
+    - [Example for how the URL extractor works](#example-for-how-the-url-extractor-works)
12
+    - [How matching works](#how-matching-works)
13
+        - [RealURL, displayedURL concatenation](#realurl-displayedurl-concatenation)
14
+        - [What happens when a match is found](#what-happens-when-a-match-is-found)
15
+        - [Extraction of realURL, displayedURL from HTML tags](#extraction-of-realurl-displayedurl-from-html-tags)
16
+        - [Example](#example)
17
+    - [Simple patterns](#simple-patterns)
18
+    - [Regular expressions](#regular-expressions)
19
+    - [Flags](#flags)
20
+- [Introduction to regular expressions](#introduction-to-regular-expressions)
21
+    - [Special characters](#special-characters)
22
+    - [Character classes](#character-classes)
23
+    - [Escaping](#escaping)
24
+    - [Alternation](#alternation)
25
+    - [Optional matching, and repetition](#optional-matching-and-repetition)
26
+    - [Groups](#groups)
27
+- [How to create database files](#how-to-create-database-files)
28
+    - [How to create and maintain the whitelist (daily.wdb)](#how-to-create-and-maintain-the-whitelist-dailywdb)
29
+    - [How to create and maintain the domainlist (daily.pdb)](#how-to-create-and-maintain-the-domainlist-dailypdb)
30
+    - [Dealing with false positives, and undetected phishing mails](#dealing-with-false-positives-and-undetected-phishing-mails)
31
+        - [False positives](#false-positives)
32
+        - [Undetected phish mails](#undetected-phish-mails)
33
+
34
+# Database file format
35
+
36
+## PDB format
37
+
38
+This file contains urls/hosts that are target of phishing attempts. It
39
+contains lines in the following format:
40
+
41
+```
42
+    R[Filter]:RealURL:DisplayedURL[:FuncLevelSpec]
43
+    H[Filter]:DisplayedHostname[:FuncLevelSpec]
44
+```
45
+
46
+- `R`
47
+
48
+  regular expression, for the concatenated URL
49
+
50
+- `H`
51
+
52
+  matches the `DisplayedHostname` as a simple pattern (literally, no regular expression)
53
+
54
+  - the pattern can match either the full hostname
55
+
56
+  - or a subdomain of the specified hostname
57
+
58
+  - to avoid false matches in case of subdomain matches, the engine checks that there is a dot(`.`) or a space(` `) before the matched portion
59
+
60
+- `Filter`
61
+
62
+  is ignored for R and H for compatibility reasons
63
+
64
+- `RealURL`
65
+
66
+  is the URL the user is sent to, example: *href* attribute of an html anchor (*\<a\> tag*)
67
+
68
+- `DisplayedURL`
69
+
70
+  is the URL description displayed to the user, where its *claimed* they are sent, example: contents of an html anchor (*\<a\> tag*)
71
+
72
+- `DisplayedHostname`
73
+
74
+  is the hostname portion of the DisplayedURL
75
+
76
+- `FuncLevelSpec`
77
+
78
+  an (optional) functionality level, 2 formats are possible:
79
+
80
+  - `minlevel` all engines having functionality level \>= `minlevel` will load this line
81
+
82
+  - `minlevel-maxlevel` engines with functionality level \(>=\) `minlevel`, and \(<\) `maxlevel` will load this line
83
+
84
+## GDB format
85
+
86
+This file contains URL hashes in the following format:
87
+
88
+    S:P:HostPrefix[:FuncLevelSpec]
89
+    S:F:Sha256hash[:FuncLevelSpec]
90
+    S1:P:HostPrefix[:FuncLevelSpec]
91
+    S1:F:Sha256hash[:FuncLevelSpec]
92
+    S2:P:HostPrefix[:FuncLevelSpec]
93
+    S2:F:Sha256hash[:FuncLevelSpec]
94
+    S:W:Sha256hash[:FuncLevelSpec]
95
+
96
+- `S:`
97
+
98
+  These are hashes for Google Safe Browsing - malware sites, and should not be used for other purposes.
99
+
100
+- `S2:`
101
+
102
+  These are hashes for Google Safe Browsing - phishing sites, and should not be used for other purposes.
103
+
104
+- `S1:`
105
+
106
+  Hashes for blacklisting phishing sites. Virus name: Phishing.URL.Blacklisted
107
+
108
+- `S:W:`
109
+
110
+  Locally whitelisted hashes.
111
+
112
+- `HostPrefix`
113
+
114
+  4-byte prefix of the sha256 hash of the last 2 or 3 components of the hostname. If prefix doesn’t match, no further lookups are performed.
115
+
116
+- `Sha256hash`
117
+
118
+  sha256 hash of the canonicalized URL, or a sha256 hash of its prefix/suffix according to the Google Safe Browsing “Performing Lookups” rules. There should be a corresponding `:P:HostkeyPrefix` entry for the hash to be taken into consideration.
119
+
120
+To see which hash/URL matched, look at the `clamscan --debug` output, and look for the following strings: `Looking up hash`, `prefix matched`, and `Hash matched`. Local whitelisting of .gdb entries can be done by creating a local.gdb file, and adding a line `S:W:<HASH>`.
121
+
122
+## WDB format
123
+
124
+This file contains whitelisted url pairs It contains lines in the following format:
125
+
126
+```
127
+    X:RealURL:DisplayedURL[:FuncLevelSpec]
128
+    M:RealHostname:DisplayedHostname[:FuncLevelSpec]
129
+```
130
+
131
+- `X`
132
+
133
+  regular expression, for the *entire URL*, not just the hostname
134
+
135
+  - The regular expression is by default anchored to start-of-line and end-of-line, as if you have used `^RegularExpression$`
136
+
137
+  - A trailing `/` is automatically added both to the regex, and the input string to avoid false matches
138
+
139
+  - The regular expression matches the *concatenation* of the RealURL, a colon(`:`), and the DisplayedURL as a single string. It doesn’t separately match RealURL and DisplayedURL\!
140
+
141
+- `M`
142
+
143
+  matches hostname, or subdomain of it, see notes for H above
144
+
145
+## Hints
146
+
147
+- empty lines are ignored
148
+
149
+- the colons are mandatory
150
+
151
+- Don’t leave extra spaces on the end of a line\!
152
+
153
+- if any of the lines don’t conform to this format, clamav will abort with a Malformed Database Error
154
+
155
+- see section [Extraction-of-realURL](#Extraction-of-realURL,-displayedURL-from-HTML-tags) for more details on realURL/displayedURL
156
+
157
+## Examples of PDB signatures
158
+
159
+To check for phishing mails that target amazon.com, or subdomains of
160
+amazon.com:
161
+
162
+```
163
+    H:amazon.com
164
+```
165
+
166
+To do the same, but for amazon.co.uk:
167
+
168
+```
169
+    H:amazon.co.uk
170
+```
171
+
172
+To limit the signatures to certain engine versions:
173
+
174
+```
175
+    H:amazon.co.uk:20-30
176
+    H:amazon.co.uk:20-
177
+    H:amazon.co.uk:0-20
178
+```
179
+
180
+First line: engine versions 20, 21, ..., 29 can load it
181
+
182
+Second line: engine versions \>= 20 can load it
183
+
184
+Third line: engine versions \< 20 can load it
185
+
186
+In a real situation, you’d probably use the second form. A situation like that would be if you are using a feature of the signatures not available in earlier versions, or if earlier versions have bugs with your signature. Its neither case here, the above examples are for illustrative purposes only.
187
+
188
+## Examples of WDB signatures
189
+
190
+To allow amazon’s country specific domains and amazon.com, to mix domain names in DisplayedURL, and RealURL:
191
+
192
+    X:.+\.amazon\.(at|ca|co\.uk|co\.jp|de|fr)([/?].*)?:.+\.amazon\.com([/?].*)?:17-
193
+
194
+Explanation of this signature:
195
+
196
+- `X:`
197
+
198
+  this is a regular expression
199
+
200
+- `:17-`
201
+
202
+  load signature only for engines with functionality level \>= 17 (recommended for type X)
203
+
204
+The regular expression is the following (X:, :17- stripped, and a / appended)
205
+
206
+```
207
+    .+\.amazon\.(at|ca|co\.uk|co\.jp|de|fr)([/?].*)?:.+\.amazon\.com([/?].*)?/
208
+```
209
+
210
+Explanation of this regular expression (note that it is a single regular expression, and not 2 regular expressions splitted at the :).
211
+
212
+- `.+`
213
+
214
+  any subdomain of
215
+
216
+- `\.amazon\.`
217
+
218
+  domain we are whitelisting (RealURL part)
219
+
220
+- `(at|ca|co\.uk|co\.jp|de|fr)`
221
+
222
+  country-domains: at, ca, co.uk, co.jp, de, fr
223
+
224
+- `([/?].*)?`
225
+
226
+  recomended way to end real url part of whitelist, this protects against embedded URLs (evilurl.example.com/amazon.co.uk/)
227
+
228
+- `:`
229
+
230
+  RealURL and DisplayedURL are concatenated via a :, so match a literal : here
231
+
232
+- `.+`
233
+
234
+  any subdomain of
235
+
236
+- `\.amazon\.com`
237
+
238
+  whitelisted DisplayedURL
239
+
240
+- `([/?].*)?`
241
+
242
+  recommended way to end displayed url part, to protect against embedded URLs
243
+
244
+- `/`
245
+
246
+  automatically added to further protect against embedded URLs
247
+
248
+When you whitelist an entry make sure you check that both domains are owned by the same entity. What this whitelist entry allows is: Links claiming to point to amazon.com (DisplayedURL), but really go to country-specific domain of amazon (RealURL).
249
+
250
+## Example for how the URL extractor works
251
+
252
+Consider the following HTML file:
253
+
254
+```html
255
+    <html>
256
+    <a href="http://1.realurl.example.com/">
257
+      1.displayedurl.example.com
258
+    </a>
259
+    <a href="http://2.realurl.example.com">
260
+      2 d<b>i<p>splayedurl.e</b>xa<i>mple.com
261
+    </a>
262
+    <a href="http://3.realurl.example.com"> 
263
+      3.nested.example.com
264
+      <a href="http://4.realurl.example.com">
265
+        4.displayedurl.example.com
266
+      </a>
267
+    </a>
268
+    <form action="http://5.realurl.example.com">
269
+      sometext
270
+      <img src="http://5.displayedurl.example.com/img0.gif"/>
271
+      <a href="http://5.form.nested.displayedurl.example.com">
272
+        5.form.nested.link-displayedurl.example.com
273
+      </a>
274
+    </form>
275
+    <a href="http://6.realurl.example.com">
276
+      6.displ
277
+      <img src="6.displayedurl.example.com/img1.gif"/>
278
+      ayedurl.example.com
279
+    </a>
280
+    <a href="http://7.realurl.example.com">
281
+      <iframe src="http://7.displayedurl.example.com">
282
+    </a>
283
+```
284
+
285
+The phishing engine extract the following
286
+RealURL/DisplayedURL pairs from it:
287
+
288
+```
289
+    http://1.realurl.example.com/
290
+    1.displayedurl.example.com
291
+
292
+    http://2.realurl.example.com
293
+    2displayedurl.example.com
294
+
295
+    http://3.realurl.example.com
296
+    3.nested.example.com
297
+
298
+    http://4.realurl.example.com
299
+    4.displayedurl.example.com
300
+
301
+    http://5.realurl.example.com
302
+    http://5.displayedurl.example.com/img0.gif
303
+
304
+    http://5.realurl.example.com
305
+    http://5.form.nested.displayedurl.example.com
306
+
307
+    http://5.form.nested.displayedurl.example.com
308
+    5.form.nested.link-displayedurl.example.com
309
+
310
+    http://6.realurl.example.com
311
+    6.displayedurl.example.com
312
+
313
+    http://6.realurl.example.com
314
+    6.displayedurl.example.com/img1.gif
315
+```
316
+
317
+## How matching works
318
+
319
+### RealURL, displayedURL concatenation
320
+
321
+The phishing detection module processes pairs of RealURL/DisplayedURL. Matching against daily.wdb is done as follows: the realURL is concatenated with a `:`, and with the DisplayedURL, then that *line* is matched against the lines in daily.wdb/daily.pdb
322
+
323
+So if you have this line in daily.wdb:
324
+
325
+    M:www.google.ro:www.google.com
326
+
327
+and this href: `<a href='http://www.google.ro'>www.google.com</a>` then it will be whitelisted, but: `<a href='http://images.google.com'>www.google.com</a>` will not.
328
+
329
+### What happens when a match is found
330
+
331
+In the case of the whitelist, a match means that the RealURL/DisplayedURL combination is considered clean, and no further checks are performed on it.
332
+
333
+In the case of the domainlist, a match means that the RealURL/displayedURL is going to be checked for phishing attempts.
334
+
335
+Furthermore you can restrict what checks are to be performed by specifying the 3-digit hexnumber.
336
+
337
+### Extraction of realURL, displayedURL from HTML tags
338
+
339
+The html parser extracts pairs of realURL/displayedURL based on the following rules.
340
+
341
+In version 0.93: After URLs have been extracted, they are normalized, and cut after the hostname. `http://test.example.com/path/somecgi?queryparameters` becomes `http://test.example.com/`
342
+
343
+- `a`
344
+
345
+  (anchor) the *href* is the realURL, its *contents* is the displayedURL
346
+
347
+  - contents
348
+    is the tag-stripped contents of the \<a\> tags, so for example \<b\> tags are stripped (but not their contents)
349
+
350
+  nesting another \<a\> tag withing an \<a\> tag (besides being invalid html) is treated as a \</a\>\<a..
351
+
352
+- `form`
353
+
354
+  the *action* attribute is the realURL, and a nested \<a\> tag is the displayedURL
355
+
356
+- `img/area`
357
+
358
+  if nested within an *\<a\>* tag, the realURL is the *href* of the a tag, and the *src/dynsrc/area* is the displayedURL of the img
359
+
360
+  if nested withing a *form* tag, then the action attribute of the *form* tag is the realURL
361
+
362
+- `iframe`
363
+
364
+  if nested withing an *\<a\>* tag the *src* attribute is the displayedURL, and the *href* of its parent *a* tag is the realURL
365
+
366
+  if nested withing a *form* tag, then the action attribute of the *form* tag is the realURL
367
+
368
+### Example
369
+
370
+Consider this html file:
371
+
372
+```html
373
+<a href=”evilurl”\>www.paypal.com\</a\>*
374
+
375
+<a href=”evilurl2” title=”www.ebay.com”\>click here to sign
376
+in\</a\>*
377
+
378
+<form action=”evilurl_form”\>*
379
+
380
+*Please sign in to \<a href=”cgi.ebay.com”\>Ebay\</a\using this
381
+form*
382
+
383
+<input type=’text’ name=’username’\>Username\</input\>*
384
+
385
+*....*
386
+
387
+</form\>*
388
+
389
+<a href=”evilurl”\>\<img src=”images.paypal.com/secure.jpg”\>\</a\>*
390
+```
391
+
392
+The resulting realURL/displayedURL pairs will be (note that one tag can generate multiple pairs):
393
+
394
+- evilurl / www.paypal.com
395
+
396
+- evilurl2 / click here to sign in
397
+
398
+- evilurl2 / www.ebay.com
399
+
400
+- evilurl_form / cgi.ebay.com
401
+
402
+- cgi.ebay.com / Ebay
403
+
404
+- evilurl / image.paypal.com/secure.jpg
405
+
406
+## Simple patterns
407
+
408
+Simple patterns are matched literally, i.e. if you say:
409
+
410
+```
411
+www.google.com
412
+```
413
+
414
+it is going to match *www.google.com*, and only that. The *. (dot)* character has no special meaning (see the section on regexes [\[sec:Regular-expressions\]](#sec:Regular-expressions) for how the *.(dot)* character behaves there)
415
+
416
+## Regular expressions
417
+
418
+POSIX regular expressions are supported, and you can consider that internally it is wrapped by *^*, and *$.* In other words, this means that the regular expression has to match the entire concatenated (see section [RealURL,-displayedURL-concatenation](#RealURL,-displayedURL-concatenation) for details on concatenation) url.
419
+
420
+It is recomended that you read section [Introduction-to-regular](#Introduction-to-regular) to learn how to write regular expressions, and then come back and read this for hints.
421
+
422
+Be advised that clamav contains an internal, very basic regex matcher to reduce the load on the regex matching core. Thus it is recomended that you avoid using regex syntax not supported by it at the very beginning of regexes (at least the first few characters).
423
+
424
+Currently the clamav regex matcher supports:
425
+
426
+- `.` (dot) character
427
+
428
+- `\(\backslash\)` (escaping special characters)
429
+
430
+- `|` (pipe) alternatives
431
+
432
+- `\[\]` (character classes)
433
+
434
+- `()` (parenthesis for grouping, but no group extraction is performed)
435
+
436
+- other non-special characters
437
+
438
+Thus the following are not supported:
439
+
440
+- `\+` repetition
441
+
442
+- `\*` repetition
443
+
444
+- `{}` repetition
445
+
446
+- backreferences
447
+
448
+- lookaround
449
+
450
+- other “advanced” features not listed in the supported list ;)
451
+
452
+This however shouldn’t discourage you from using the “not directly supported features “, because if the internal engine encounters unsupported syntax, it passes it on to the POSIX regex core (beginning from the first unsupported token, everything before that is still processed by the internal matcher). An example might make this more clear:
453
+
454
+*www\(\backslash\).google\(\backslash\).(com|ro|it) (\[a-zA-Z\])+\(\backslash\).google\(\backslash\).(com|ro|it)*
455
+
456
+Everything till *(\[a-zA-Z\])+* is processed internally, that parenthesis (and everything beyond) is processed by the posix core.
457
+
458
+Examples of url pairs that match:
459
+
460
+- *www.google.ro images.google.ro*
461
+
462
+- www.google.com images.google.ro
463
+
464
+Example of url pairs that don’t match:
465
+
466
+- www.google.ro images1.google.ro
467
+
468
+- images.google.com image.google.com
469
+
470
+## Flags
471
+
472
+Flags are a binary OR of the following numbers:
473
+
474
+- HOST_SUFFICIENT
475
+
476
+  1
477
+
478
+- DOMAIN_SUFFICIENT
479
+
480
+  2
481
+
482
+- DO_REVERSE_LOOKUP
483
+
484
+  4
485
+
486
+- CHECK_REDIR
487
+
488
+  8
489
+
490
+- CHECK_SSL
491
+
492
+  16
493
+
494
+- CHECK_CLOAKING
495
+
496
+  32
497
+
498
+- CLEANUP_URL
499
+
500
+  64
501
+
502
+- CHECK_DOMAIN_REVERSE
503
+
504
+  128
505
+
506
+- CHECK_IMG_URL
507
+
508
+  256
509
+
510
+- DOMAINLIST_REQUIRED
511
+
512
+  512
513
+
514
+The names of the constants are self-explanatory.
515
+
516
+These constants are defined in libclamav/phishcheck.h, you can check there for the latest flags.
517
+
518
+There is a default set of flags that are enabled, these are currently:
519
+
520
+    ( CLEANUP_URL | CHECK_SSL | CHECK_CLOAKING | CHECK_IMG_URL )
521
+
522
+ssl checking is performed only for a tags currently.
523
+
524
+You must decide for each line in the domainlist if you want to filter any flags (that is you don’t want certain checks to be done), and then calculate the binary OR of those constants, and then convert it into a 3-digit hexnumber. For example you devide that domain_sufficient shouldn’t be used for ebay.com, and you don’t want to check images either, so you come up with this flag number: \(2|256\Rightarrow\)258\((decimal)\Rightarrow102(hexadecimal)\)
525
+
526
+So you add this line to daily.wdb:
527
+
528
+- R102 www.ebay.com .+
529
+
530
+# Introduction to regular expressions
531
+
532
+Recomended reading:
533
+
534
+- http://www.regular-expressions.info/quickstart.html
535
+
536
+- http://www.regular-expressions.info/tutorial.html
537
+
538
+- regex(7) man-page: http://www.tin.org/bin/man.cgi?section=7\&topic=regex
539
+
540
+## Special characters
541
+
542
+- \[
543
+
544
+  the opening square bracket - it marks the beginning of a character class, see section[Character-classes](#Character-classes)
545
+
546
+- \(\backslash\)
547
+
548
+  the backslash - escapes special characters, see section [Escaping](#Escaping)
549
+
550
+- ^
551
+
552
+  the caret - matches the beginning of a line (not needed in clamav regexes, this is implied)
553
+
554
+- $
555
+
556
+  the dollar sign - matches the end of a line (not needed in clamav regexes, this is implied)
557
+
558
+- ̇
559
+
560
+  the period or dot - matches *any* character
561
+
562
+- |
563
+
564
+  the vertical bar or pipe symbol - matches either of the token on its left and right side, see section [Alternation](#sub:Alternation)
565
+
566
+- ?
567
+
568
+  the question mark - matches optionally the left-side token, see section[Optional-matching,-and](Optional-matching,-and)
569
+
570
+- \*
571
+
572
+  the asterisk or star - matches 0 or more occurences of the left-side token, see section [Optional-matching,-and](Optional-matching,-and)
573
+
574
+- +
575
+
576
+  the plus sign - matches 1 or more occurences of the left-side token, see section [Optional-matching,-and](Optional-matching,-and)
577
+
578
+- (
579
+
580
+  the opening round bracket - marks beginning of a group, see section [Groups](Groups)
581
+
582
+- )
583
+
584
+  the closing round bracket - marks end of a group, see section[Groups](Groups)
585
+
586
+## Character classes
587
+
588
+## Escaping
589
+
590
+Escaping has two purposes:
591
+
592
+- it allows you to actually match the special characters themselves, for example to match the literal *+*, you would write *\(\backslash\)+*
593
+
594
+- it also allows you to match non-printable characters, such as the tab (*\(\backslash\)t*), newline (*\(\backslash\)n*), ..
595
+
596
+However since non-printable characters are not valid inside an url, you won’t have a reason to use them.
597
+
598
+## Alternation
599
+
600
+## Optional matching, and repetition
601
+
602
+## Groups
603
+
604
+Groups are usually used together with repetition, or alternation. For example: *(com|it)+* means: match 1 or more repetitions of *com* or *it,* that is it matches: com, it, comcom, comcomcom, comit, itit, ititcom,... you get the idea.
605
+
606
+Groups can also be used to extract substring, but this is not supported by the clam engine, and not needed either in this case.
607
+
608
+# How to create database files
609
+
610
+## How to create and maintain the whitelist (daily.wdb)
611
+
612
+If the phishing code claims that a certain mail is phishing, but its not, you have 2 choices:
613
+
614
+- examine your rules daily.pdb, and fix them if necessary (see: section[How-to-create](How-to-create))
615
+
616
+- add it to the whitelist (discussed here)
617
+
618
+Lets assume you are having problems because of links like this in a mail:
619
+
620
+```html
621
+    <a href=''http://69.0.241.57/bCentral/L.asp?L=XXXXXXXX''>
622
+      http://www.bcentral.it/
623
+    </a>
624
+```
625
+
626
+After investigating those sites further, you decide they are no threat, and create a line like this in daily.wdb:
627
+
628
+```
629
+R http://www\(\backslash\).bcentral\(\backslash\).it/.+
630
+http://69\(\backslash\).0\(\backslash\).241\(\backslash\).57/bCentral/L\(\backslash\).asp?L=.+
631
+```
632
+
633
+Note: urls like the above can be used to track unique mail recipients, and thus know if somebody actually reads mails (so they can send more spam). However since this site required no authentication information, it is safe from a phishing point of view.
634
+
635
+## How to create and maintain the domainlist (daily.pdb)
636
+
637
+When not using –phish-scan-alldomains (production environments for example), you need to decide which urls you are going to check.
638
+
639
+Although at a first glance it might seem a good idea to check everything, it would produce false positives. Particularly newsletters, ads, etc. are likely to use URLs that look like phishing attempts.
640
+
641
+Lets assume that you’ve recently seen many phishing attempts claiming they come from Paypal. Thus you need to add paypal to daily.pdb:
642
+
643
+```
644
+R .+ .+\(\backslash\).paypal\(\backslash\).com
645
+```
646
+
647
+The above line will block (detect as phishing) mails that contain urls that claim to lead to paypal, but they don’t in fact.
648
+
649
+Be carefull not to create regexes that match a too broad range of urls though.
650
+
651
+## Dealing with false positives, and undetected phishing mails
652
+
653
+### False positives
654
+
655
+Whenever you see a false positive (mail that is detected as phishing, but its not), you need to examine *why* clamav decided that its phishing. You can do this easily by building clamav with debugging (./configure –enable-experimental –enable-debug), and then running a tool:
656
+
657
+```bash
658
+$contrib/phishing/why.py phishing.eml
659
+```
660
+
661
+This will show the url that triggers the phish verdict, and a reason why that url is considered phishing attempt.
662
+
663
+Once you know the reason, you might need to modify daily.pdb (if one of yours rules inthere are too broad), or you need to add the url to daily.wdb. If you think the algorithm is incorrect, please file a bug report on bugzilla.clamav.net, including the output of *why.py*.
664
+
665
+### Undetected phish mails
666
+
667
+Using why.py doesn’t help here unfortunately (it will say: clean), so all you can do is:
668
+
669
+```bash
670
+$clamscan/clamscan –phish-scan-alldomains undetected.eml
671
+```
672
+
673
+And see if the mail is detected, if yes, then you need to add an appropriate line to daily.pdb (see section [How-to-create](How-to-create)).
674
+
675
+If the mail is not detected, then try using:
676
+
677
+```bash
678
+$clamscan/clamscan –debug undetected.eml|less
679
+```
680
+
681
+Then see what urls are being checked, see if any of them is in a whitelist, see if all urls are detected, etc.
0 682
new file mode 100644
... ...
@@ -0,0 +1,23 @@
0
+# Whitelist databases
1
+
2
+## File whitelists
3
+
4
+To whitelist a specific file use the MD5 signature format and place it inside a database file with the extension of `.fp`. To whitelist a specific file with the SHA1 or SHA256 file hash signature format, place the signature inside a database file with the extension of `.sfp`.
5
+
6
+## Signature whitelists
7
+
8
+To whitelist a specific signature from the database you just add the signature name into a local file with the `.ign2` extension and store it inside the database directory.
9
+
10
+E.g:
11
+
12
+```
13
+    Eicar-Test-Signature
14
+```
15
+
16
+Additionally, you can follow the signature name with the MD5 of the entire database entry for this signature. In such a case, the signature will no longer be whitelisted when its entry in the database gets modified (eg. the signature gets updated to avoid false alerts). E.g:
17
+
18
+```
19
+    Eicar-Test-Signature:bc356bae4c42f19a3de16e333ba3569c
20
+```
21
+
22
+Historically, signature whitelists were added to `.ign` files.  This format is still functional, though it has been replaced by the `.ign2` database.
0 23
new file mode 100644
... ...
@@ -0,0 +1,37 @@
0
+# Using YARA rules in ClamAV
1
+
2
+ClamAV version 0.99 and above can process YARA rules. ClamAV virus database file names ending with “.yar” or “.yara” are parsed as yara rule files. The link to the YARA rule grammar documentation may be found at http://plusvic.github.io/yara/. There are currently a few limitations on using YARA rules within ClamAV:
3
+
4
+- YARA modules are not yet supported by ClamAV. This includes the “import” keyword and any YARA module-specific keywords.
5
+
6
+- Global rules(“global” keyword) are not supported by ClamAV.
7
+
8
+- External variables(“contains” and “matches” keywords) are not supported.
9
+
10
+- YARA rules pre-compiled with the *yarac* command are not supported.
11
+
12
+- As in the ClamAV logical and extended signature formats, YARA strings and segments of strings separated by wild cards must represent at least two octets of data.
13
+
14
+- There is a maximum of 64 strings per YARA rule.
15
+
16
+- YARA rules in ClamAV must contain at least one literal, hexadecimal, or regular expression string.
17
+
18
+In addition, there are a few more ClamAV processing modes that may affect the outcome of YARA rules.
19
+
20
+- *File decomposition and decompression* - Since ClamAV uses file decomposition and decompression to find viruses within de-archived and uncompressed inner files, YARA rules executed by ClamAV will match against these files as well.
21
+
22
+- *Normalization* - By default, ClamAV normalizes HTML, JavaScript, and ASCII text files. YARA rules in ClamAV will match against the normalized result. The effects of normalization of these file types may be captured using `clamscan --leave-temps --tempdir=mytempdir`. YARA rules may then be written using the normalized file(s) found in `mytempdir`. Alternatively, starting with ClamAV 0.100.0, `clamscan --normalize=no` will prevent normalization and only scan the raw file. To obtain similar behavior prior to 0.99.2, use `clamscan --scan-html=no`. The corresponding parameters for clamd.conf are `Normalize` and `ScanHTML`.
23
+
24
+- *YARA conditions driven by string matches* - All YARA conditions are driven by string matches in ClamAV. This saves from executing every YARA rule on every file. Any YARA condition may be augmented with a string match clause which is always true, such as:
25
+
26
+```yara
27
+  rule CheckFileSize
28
+  {
29
+    strings:
30
+      $abc = "abc"
31
+    condition:
32
+      ($abc or not $abc) and filesize < 200KB
33
+  }
34
+```
35
+
36
+This will ensure that the YARA condition always performs the desired action (checking the file size in this example),