Browse code

Improvements to signature writing documentation. Notably the inclusion of a comprehensive CL_TYPE file type reference, requested by in bb11408.

Micah Snyder authored on 2018/11/03 07:48:24
Showing 3 changed files
1 1
new file mode 100644
... ...
@@ -0,0 +1,89 @@
0
+# ClamAV File Types
1
+
2
+ClamAV maintains it's own file typing format and assigns these types using either:
3
+
4
+- Evaluation of a unique sequence of bytes at the start of a file (file type magic).
5
+- File type indicators when parsing container files.
6
+  - For example:
7
+    CL_TYPE_SCRIPT may be assigned to data contained in a PDF when the PDF indicates that a stream of bytes is "Javascript"
8
+- File type determination based on the names or characteristics contained within the file.
9
+  - For example:
10
+    CL_TYPE_OOXML_WORD may be assigned to a Zip file containing files with specific names.
11
+
12
+## CL_TYPE's
13
+
14
+ClamAV Types are prefixed with `CL_TYPE_`.  The following is an exhaustive list of all current CL_TYPE's.
15
+
16
+| CL_TYPE                | Description                                                                       |
17
+|------------------------|-----------------------------------------------------------------------------------|
18
+| `CL_TYPE_7Z`           | 7-Zip Archive                                                                     |
19
+| `CL_TYPE_7ZSFX`        | Self-Extracting 7-Zip Archive                                                     |
20
+| `CL_TYPE_APM`          | Disk Image - Apple Partition Map                                                  |
21
+| `CL_TYPE_ARJ`          | ARJ Archive                                                                       |
22
+| `CL_TYPE_ARJSFX`       | Self-Extracting ARJ Archive                                                       |
23
+| `CL_TYPE_AUTOIT`       | AutoIt Automation Executable                                                      |
24
+| `CL_TYPE_BINARY_DATA`  | binary data                                                                       |
25
+| `CL_TYPE_BINHEX`       | BinHex Macintosh 7-bit ASCII email attachment encoding                            |
26
+| `CL_TYPE_BZ`           | BZip Compressed File                                                              |
27
+| `CL_TYPE_CABSFX`       | Self-Extracting Microsoft CAB Archive                                             |
28
+| `CL_TYPE_CPIO_CRC`     | CPIO Archive (CRC)                                                                |
29
+| `CL_TYPE_CPIO_NEWC`    | CPIO Archive (NEWC)                                                               |
30
+| `CL_TYPE_CPIO_ODC`     | CPIO Archive (ODC)                                                                |
31
+| `CL_TYPE_CPIO_OLD`     | CPIO Archive (OLD, Little Endian or Big Endian)                                   |
32
+| `CL_TYPE_CRYPTFF`      | Files encrypted by CryptFF malware                                                |
33
+| `CL_TYPE_DMG`          | Apple DMG Archive                                                                 |
34
+| `CL_TYPE_ELF`          | ELF Executable (Linux/Unix program or library)                                    |
35
+| `CL_TYPE_GPT`          | Disk Image - GUID Partition Table                                                 |
36
+| `CL_TYPE_GRAPHICS`     | TIFF (Little Endian or Big Endian)                                                |
37
+| `CL_TYPE_GZ`           | GZip Compressed File                                                              |
38
+| `CL_TYPE_HTML_UTF16`   | Wide-Character / UTF16 encoded HTML                                               |
39
+| `CL_TYPE_HTML`         | HTML data                                                                         |
40
+| `CL_TYPE_HWP3`         | Hangul Word Processor (3.X)                                                       |
41
+| `CL_TYPE_HWPOLE2`      | Hangul Word Processor embedded OLE2                                               |
42
+| `CL_TYPE_INTERNAL`     | Internal properties                                                               |
43
+| `CL_TYPE_ISHIELD_MSI`  | Windows Install Shield MSI installer                                              |
44
+| `CL_TYPE_ISO9660`      | ISO 9660 file system for optical disc media                                       |
45
+| `CL_TYPE_JAVA`         | Java Class File                                                                   |
46
+| `CL_TYPE_LNK`          | Microsoft Windows Shortcut File                                                   |
47
+| `CL_TYPE_MACHO_UNIBIN` | Universal Binary/Java Bytecode                                                    |
48
+| `CL_TYPE_MACHO`        | Apple/NeXTSTEP Mach-O Executable file format                                      |
49
+| `CL_TYPE_MAIL`         | Email file                                                                        |
50
+| `CL_TYPE_MBR`          | Disk Image - Master Boot Record                                                   |
51
+| `CL_TYPE_MHTML`        | MHTML Saved Web Page                                                              |
52
+| `CL_TYPE_MSCAB`        | Microsoft CAB Archive                                                             |
53
+| `CL_TYPE_MSCHM`        | Microsoft CHM help archive                                                        |
54
+| `CL_TYPE_MSEXE`        | Microsoft EXE / DLL Executable file                                               |
55
+| `CL_TYPE_MSOLE2`       | Microsoft OLE2 Container file                                                     |
56
+| `CL_TYPE_MSSZDD`       | Microsoft Compressed EXE                                                          |
57
+| `CL_TYPE_NULSFT`       | NullSoft Scripted Installer program                                               |
58
+| `CL_TYPE_OLD_TAR`      | TAR archive (old)                                                                 |
59
+| `CL_TYPE_OOXML_HWP`    | Hangul Office Open Word Processor (5.X)                                           |
60
+| `CL_TYPE_OOXML_PPT`    | Microsoft Office Open XML PowerPoint                                              |
61
+| `CL_TYPE_OOXML_WORD`   | Microsoft Office Open Word 2007+                                                  |
62
+| `CL_TYPE_OOXML_XL`     | Microsoft Office Open Excel 2007+                                                 |
63
+| `CL_TYPE_PART_HFSPLUS` | Apple HFS+ partition                                                              |
64
+| `CL_TYPE_PDF`          | Adobe PDF document                                                                |
65
+| `CL_TYPE_POSIX_TAR`    | TAR archive                                                                       |
66
+| `CL_TYPE_PS`           | Postscript                                                                        |
67
+| `CL_TYPE_RAR`          | RAR Archive                                                                       |
68
+| `CL_TYPE_RARSFX`       | Self-Extracting RAR Archive                                                       |
69
+| `CL_TYPE_RIFF`         | Resource Interchange File Format container formatted file                         |
70
+| `CL_TYPE_RTF`          | Rich Text Format document                                                         |
71
+| `CL_TYPE_SCRENC`       | Files encrypted by ScrEnc malware                                                 |
72
+| `CL_TYPE_SCRIPT`       | Generic type for scripts that don't have their own type (Javascript, Python, etc) |
73
+| `CL_TYPE_SIS`          | Symbian OS Software Installation Script Archive                                   |
74
+| `CL_TYPE_SWF`          | Adobe Flash File (LZMA, Zlib, or uncompressed)                                    |
75
+| `CL_TYPE_TEXT_ASCII`   | ASCII text                                                                        |
76
+| `CL_TYPE_TEXT_UTF16BE` | UTF-16BE text                                                                     |
77
+| `CL_TYPE_TEXT_UTF16LE` | UTF-16LE text                                                                     |
78
+| `CL_TYPE_TEXT_UTF8`    | UTF-8 text                                                                        |
79
+| `CL_TYPE_TNEF`         | Microsoft Outlook & Exchange email attachment format                              |
80
+| `CL_TYPE_UUENCODED`    | UUEncoded (Unix-to-Unix) binary file (Unix email attachment format)               |
81
+| `CL_TYPE_XAR`          | XAR Archive                                                                       |
82
+| `CL_TYPE_XDP`          | Adobe XDP - Embedded PDF                                                          |
83
+| `CL_TYPE_XML_HWP`      | Hangul Word Processor XML (HWPML) Document                                        |
84
+| `CL_TYPE_XML_WORD`     | Microsoft Word 2003 XML Document                                                  |
85
+| `CL_TYPE_XML_XL`       | Microsoft Excel 2003 XML Document                                                 |
86
+| `CL_TYPE_XZ`           | XZ Archive                                                                        |
87
+| `CL_TYPE_ZIP`          | Zip Archive                                                                       |
88
+| `CL_TYPE_ZIPSFX`       | Self-Extracting Zip Archive                                                       |
... ...
@@ -21,14 +21,13 @@ Table of Contents
21 21
         - [Logical signatures](#logical-signatures)
22 22
         - [Subsignature Modifiers](#subsignature-modifiers)
23 23
     - [Special Subsignature Types](#special-subsignature-types)
24
-        - [Macro subsignatures (clamav-0.96) : `${min-max}MACROID$`](#macro-subsignatures-clamav-096--min-maxmacroid)
25
-        - [Byte Compare Subsignatures (clamav-0.101) : `subsigid_trigger(offset#byte_options#comparisons)`](#byte-compare-subsignatures-clamav-0101--subsigid_triggeroffsetbyte_optionscomparisons)
26
-        - [PCRE subsignatures (clamav-0.99) : `Trigger/PCRE/[Flags]`](#pcre-subsignatures-clamav-099--triggerpcreflags)
24
+        - [Macro subsignatures](#macro-subsignatures)
25
+        - [Byte Compare Subsignatures](#byte-compare-subsignatures)
26
+        - [PCRE subsignatures](#pcre-subsignatures)
27 27
     - [Icon signatures for PE files](#icon-signatures-for-pe-files)
28 28
     - [Signatures for Version Information metadata in PE files](#signatures-for-version-information-metadata-in-pe-files)
29 29
     - [Trusted and Revoked Certificates](#trusted-and-revoked-certificates)
30 30
     - [Signatures based on container metadata](#signatures-based-on-container-metadata)
31
-    - [Signatures based on ZIP/RAR metadata (obsolete)](#signatures-based-on-ziprar-metadata-obsolete)
32 31
     - [Whitelist databases](#whitelist-databases)
33 32
     - [Signature names](#signature-names)
34 33
     - [Using YARA rules in ClamAV](#using-yara-rules-in-clamav)
... ...
@@ -480,9 +479,17 @@ Keywords used in `TargetDescriptionBlock`:
480 480
 
481 481
 - `NumberOfSections`: Required number of sections in executable (range; 0.96)
482 482
 
483
-- `Container:CL_TYPE_*`: File type of the container which stores the scanned file. Specifying `CL_TYPE_ANY` matches on root objects only.
483
+- `Container:CL_TYPE_*`: File type of the container which stores the scanned file.
484 484
 
485
-- `Intermediates:CL_TYPE_*>CL_TYPE_*`: File types of intermediate containers which stores the scanned file. Specify 1-16 file types separated by ’`>`’ in top-down order (’`>`’ separator not needed for single file type), last type should be the immediate container for the malicious content. `CL_TYPE_ANY` can be used as a wildcard file type. (expr; 0.100.0)
485
+  Specifying `CL_TYPE_ANY` matches on root objects only (i.e. the target file is explicitely _not_ in a container). Chances slim that you would want to use `CL_TYPE_ANY` in a signature, because placing the malicious file in an archive will then prevent it from alerting.
486
+
487
+  Every ClamAV file type has the potential to be a container for additional files, although some are more likely than others. When a file is parsed and data in the file is identified to be scanned as a unique type, that parent file becomes a container the moment the embedded content is scanned. For a list of possible CL_TYPEs, refer to the [File Types Reference](ClamAV-File-Types.md).
488
+
489
+- `Intermediates:CL_TYPE_*>CL_TYPE_*`: Specify one or more layers of file types containing the scanned file. _This is an alternative to using `Container`._
490
+
491
+  You may specify up to 16 layers of file types separated by ’`>`’ in top-down order. Note that the ’`>`’ separator is not needed if you only specify a single container. The last type should be the immediate container containing the malicious file. Unlike with the `Container` option, `CL_TYPE_ANY` can be used as a wildcard file type. (expr; 0.100.0)
492
+
493
+  For a list of possible CL_TYPEs, refer to the [File Types Reference](ClamAV-File-Types.md).
486 494
 
487 495
 - `IconGroup1`: Icon group name 1 from .idb signature Required engine functionality (range; 0.96)
488 496
 
... ...
@@ -498,9 +505,13 @@ Modifiers for subexpressions:
498 498
 
499 499
 - `A>X`: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches (with any of its sigs).
500 500
 
501
-- `A>X,Y`: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches and at least Y different signatures must be matched.
501
+- `A>X,Y`: If the SUB-EXPRESSION A refers to a single signature then this signature must get matched more than X times; if it refers to a (logical) block of signatures then this block must generate more than X matches _and_ at least Y different signatures must be matched.
502 502
 
503
-- `A<X` and `A<X,Y` as above with the change of "more" to "less".
503
+- `A<X`: Just like `A>Z` above with the change of "more" to "less".
504
+
505
+  If the SUB-EXPRESSION A refers to a single signature then this signature must get matched less than X times; if it refers to a (logical) block of signatures then this block must generate less than X matches (with any of its sigs).
506
+
507
+- `A<X,Y`: Similar to `A>X,Y`. If the SUB-EXPRESSION A refers to a single signature then this signature must get matched less than X times; if it refers to a (logical) block of signatures then this block must generate less than X matches _and_ at least Y different signatures must be matched.
504 508
 
505 509
 Examples:
506 510
 
... ...
@@ -523,7 +534,7 @@ dcf43987e4f519d629b103375;SL+550:6300680065005c0046006900
523 523
 ### Subsignature Modifiers
524 524
 
525 525
 ClamAV (clamav-0.99) supports a number of additional subsignature
526
-modifiers for logical signatures. This is done by specifying ’::’
526
+modifiers for logical signatures. This is done by specifying `::`
527 527
 followed by a number of characters representing the desired options.
528 528
 Signatures using subsignature modifiers require `Engine:81-255` for
529 529
 backwards-compatibility.
... ...
@@ -563,7 +574,11 @@ clamav-wide-C0;Engine:81-255,Target:0;0&1;414141;68656c6c6f::iwfa
563 563
 
564 564
 ## Special Subsignature Types
565 565
 
566
-### Macro subsignatures (clamav-0.96) : <span class="nodecor">`${min-max}MACROID$`</span>
566
+### Macro subsignatures
567
+
568
+Introduced in ClamAV 0.96
569
+
570
+Format: `${min-max}MACROID$`
567 571
 
568 572
 Macro subsignatures are used to combine a number of existing extended
569 573
 signatures (`.ndb`) into a on-the-fly generated alternate string logical
... ...
@@ -599,7 +614,11 @@ to:
599 599
 
600 600
 - For more information and examples please see <https://bugzilla.clamav.net/show_bug.cgi?id=164>.
601 601
 
602
-### Byte Compare Subsignatures (clamav-0.101) : <span class="nodecor">`subsigid_trigger(offset#byte_options#comparisons)`</span>
602
+### Byte Compare Subsignatures
603
+
604
+Introduced in ClamAV 0.101
605
+
606
+Format: `subsigid_trigger(offset#byte_options#comparisons)`
603 607
 
604 608
 Byte compare subsignatures can be used to evaluate a numeric value at a given offset from the start of another (matched) subsignature within the same logical signature. These are executed after all other subsignatures within the logical subsignature are fired, with the exception of PCRE subsignatures. They can evaluate offsets only from a single referenced subsignature, and that subsignature must give a valid match for the evaluation to occur.
605 609
 
... ...
@@ -627,8 +646,11 @@ Byte compare subsignatures can be used to evaluate a numeric value at a given of
627 627
 
628 628
   - `Comparison_value` is a required field which must be a numeric hex or decimal value. If all other conditions are met, the byte compare subsig will evalutate the extracted byte sequence against this number based on the provided `comparison_symbol`.
629 629
 
630
+### PCRE subsignatures
630 631
 
631
-### PCRE subsignatures (clamav-0.99) : <span class="nodecor">`Trigger/PCRE/[Flags]`</span>
632
+Introduced in ClamAV 0.99
633
+
634
+Format: `Trigger/PCRE/[Flags]`
632 635
 
633 636
 PCRE subsignatures are used within a logical signature (`.ldb`) to specify regex matches that execute once triggered by a conditional based on preceding subsignatures. Signatures using PCRE subsignatures require `Engine:81-255` for backwards-compatibility.
634 637
 
... ...
@@ -827,7 +849,7 @@ where the corresponding fields are:
827 827
 
828 828
 - `VirusName:` Virus name to be displayed when signature matches
829 829
 
830
-- `ContainerType:` one of
830
+- `ContainerType:` The file type containing the target file.  For example:
831 831
   - `CL_TYPE_ZIP`,
832 832
   - `CL_TYPE_RAR`,
833 833
   - `CL_TYPE_ARJ`,
... ...
@@ -835,8 +857,10 @@ where the corresponding fields are:
835 835
   - `CL_TYPE_7Z`,
836 836
   - `CL_TYPE_MAIL`,
837 837
   - `CL_TYPE_(POSIX|OLD)_TAR`,
838
-  - `CL_TYPE_CPIO_(OLD|ODC|NEWC|CRC)` or
839
-  - `*` to match any of the container types listed here
838
+  - `CL_TYPE_CPIO_(OLD|ODC|NEWC|CRC)`
839
+
840
+  Use `*` as a wild card to indicate that container type may be any file type.
841
+  For a full list of ClamAV file types, see the [ClamAV File Types Reference](ClamAV-File-Types.md)
840 842
 
841 843
 - `ContainerSize:` size of the container file itself (eg. size of the zip archive) specified in bytes as absolute value or range `x-y`
842 844
 
... ...
@@ -846,48 +870,16 @@ where the corresponding fields are:
846 846
 
847 847
 - `FileSizeReal:` usually uncompressed size; for MAIL, TAR and CPIO == `FileSizeInContainer`; absolute value or range
848 848
 
849
-- `IsEncrypted`: 1 if the target file is encrypted, 0 if it’s not and `*` to ignore
849
+- `IsEncrypted:` 1 if the target file is encrypted, 0 if it’s not and `*` to ignore
850 850
 
851
-- `FilePos`: file position in container (counting from 1); absolute value or range
851
+- `FilePos:` file position in container (counting from 1); absolute value or range
852 852
 
853
-- `Res1`: when `ContainerType` is `CL_TYPE_ZIP` or `CL_TYPE_RAR` this field is treated as a CRC sum of the target file specified in hexadecimal format; for other container types it’s ignored
853
+- `Res1:` when `ContainerType` is `CL_TYPE_ZIP` or `CL_TYPE_RAR` this field is treated as a CRC sum of the target file specified in hexadecimal format; for other container types it’s ignored
854 854
 
855
-- `Res2`: not used as of ClamAV 0.96
855
+- `Res2:` not used as of ClamAV 0.96
856 856
 
857 857
 The signatures for container files are stored inside `.cdb` files.
858 858
 
859
-## Signatures based on ZIP/RAR metadata (obsolete)
860
-
861
-The (now obsolete) archive metadata signatures can be only applied to
862
-ZIP and RAR files and have the following format:
863
-
864
-```
865
-    virname:encrypted:filename:normal size:csize:crc32:cmethod:
866
-    fileno:max depth
867
-```
868
-
869
-where the corresponding fields are:
870
-
871
-- Virus name
872
-
873
-- Encryption flag (1 – encrypted, 0 – not encrypted)
874
-
875
-- File name (this is a regular expression - \* to ignore)
876
-
877
-- Normal (uncompressed) size (\* to ignore)
878
-
879
-- Compressed size (\* to ignore)
880
-
881
-- CRC32 (\* to ignore)
882
-
883
-- Compression method (\* to ignore)
884
-
885
-- File position in archive (\* to ignore)
886
-
887
-- Maximum number of nested archives (\* to ignore)
888
-
889
-The database file should have the extension of `.zmd` or `.rmd` for zip or rar metadata respectively.
890
-
891 859
 ## Whitelist databases
892 860
 
893 861
 To whitelist a specific file use the MD5 signature format and place it inside a database file with the extension of `.fp`. To whitelist a specific file with the SHA1 or SHA256 file hash signature format, place the signature inside a database file with the extension of `.sfp`. To whitelist a specific signature from the database you just add its name into a local file called local.ign2 stored inside the database directory. You can additionally follow the signature name with the MD5 of the entire database entry for this signature, eg:
... ...
@@ -1263,7 +1263,7 @@ int32_t json_get_int(int32_t objid);
1263 1263
 //double json_get_double(int32_t objid);
1264 1264
 
1265 1265
 /* ----------------- END 0.98.4 APIs ---------------------------------- */
1266
-/* ----------------- BEGIN 0.100.0 APIs ------------------------------- */
1266
+/* ----------------- BEGIN 0.101.0 APIs ------------------------------- */
1267 1267
 /* ----------------- Scan Options APIs -------------------------------- */
1268 1268
 /**
1269 1269
 \group_engine