Browse code

Add page with use info related to ClamAV dev

Andrew authored on 2018/10/06 12:19:42
Showing 1 changed files
1 1
new file mode 100644
... ...
@@ -0,0 +1,400 @@
0
+# ClamAV Development
1
+This page aims to provide information useful when developing, debugging, or
2
+profiling ClamAV.
3
+
4
+## Building ClamAV for Development
5
+Below are some recommendations for building ClamAV so that it's easy to debug:
6
+
7
+### Running ./configure
8
+Suggestions:
9
+
10
+- Modify the CFLAGS variable as follows (assuming you're build with gcc):
11
+
12
+    - Include gdb debugging information (`-ggdb`).  This will make it easier
13
+      to debug with gdb
14
+
15
+    - Disable optimizations (`-O0`).  This will ensure the line numbers you
16
+      see in gdb match up with what is actually being executed.
17
+
18
+- Run configure with the following options:
19
+
20
+    - ``--prefix=`pwd`/build``: This will cause `make install` to install
21
+      into the specified directory to avoid potentially tainting a release
22
+      install of ClamAV that you may have.
23
+
24
+    - `--enable-debug`: This will define *CL_DEBUG*, which mostly just enables
25
+      additional print statements that are useful for debugging
26
+
27
+    - `--enable-check`: Enables the unit tests, which can be run with 'make
28
+      check'
29
+
30
+    - `--enable-coverage`: If using gcc, sets `-fprofile-arcs -ftest-coverage`
31
+      so that code coverage metrics will get generated when the program is run.
32
+      Note that the code inserted to store program flow data may show up in
33
+      any generated flame graphs or profiling output, so if you don't care
34
+      about code coverage, omit this
35
+
36
+    - `--enable-libjson`: Enables libjson, which enables the `--gen-json` option.
37
+      The json output contains additional metadata that might be helpful when
38
+      debugging.
39
+
40
+    - `--enable-static --disable-shared`: This will only build libclamav and
41
+      the supporting libraries as static libraries, and will result in the
42
+      clamscan that is built having this code embedded.  This is useful for
43
+      running programs like gprof which don't handle profiling code in shared
44
+      objects.
45
+
46
+    - `--with-systemdsystemunitdir=no`: Don't try to register clamd as a
47
+      systemd service
48
+
49
+    - You might want to include the following flags also so that the optional
50
+      functionality is enabled: `--enable-experimental --enable-clamdtop
51
+      --enable-libjson --enable-milter --enable-xml --enable-pcre`.
52
+      Note that this may require you to install additional development
53
+      libraries.
54
+
55
+    - I ran into problems building with llvm on Ubuntu 18.04, so add
56
+      `--disable-llvm`
57
+
58
+Altogether, the following configure command can be used:
59
+
60
+```
61
+CFLAGS="-ggdb -O0" ./configure --prefix=`pwd`/built --enable-debug --enable-check --enable-coverage --enable-libjson --enable-static --disable-shared --with-systemdsystemunitdir=no --enable-experimental --enable-clamdtop --enable-libjson --enable-xml --enable-pcre --disable-llvm
62
+```
63
+To satisify all library dependencies, something like this should work
64
+(from Ubuntu 18.04):
65
+```
66
+sudo apt-get install git gcc libxml2-dev libssl-dev make libmilter-dev libcurl4-openssl-dev libjson-c-dev check pkgconf libncurses5-dev libpcre3-dev g++ libtool libbz2-dev
67
+```
68
+
69
+### Running make
70
+Run the following to finishing building.  `-j2` in the code below is used to
71
+indicate that the build process should use 2 cores.  Increase this if your
72
+machine is more powerful.
73
+```
74
+make -j2
75
+make install
76
+```
77
+Also, you can run 'make check' to run the unit tests
78
+
79
+### Downloading the Official Ruleset
80
+If you plan to use custom rules for testing, you can invoke clamscan via
81
+`./built/bin/clamscan`, specifying your custom rule files via `-d` parameters.
82
+
83
+If you want to download the official ruleset to use with clamscan, do the
84
+following:
85
+1. Run `mkdir -p built/share/clamav`
86
+2. Comment out line 8 of etc/freshclam.conf.sample
87
+3. Run `./built/bin/freshclam --config-file etc/freshclam.conf.sample`
88
+
89
+## General Debugging
90
+### Useful clamscan Flags
91
+The following are useful flags to include when debugging clamscan:
92
+
93
+- `--debug --verbose`: Print lots of helpful debug information
94
+
95
+- `--gen-json`: Print some additional debug information in a JSON format
96
+
97
+- `--statistics=pcre --statistics=bytecode`:  Print execution statistics on any
98
+  PCRE and bytecode rules that were evaluated
99
+
100
+- `--dev-performance`: Print per-file statistics regarding how long scanning
101
+  took and the times spent in various scanning stages
102
+
103
+- `--detect-broken`: This will attempt to detect broken executable files.  If
104
+  an executable is determined to be broken, some functionality might not get
105
+  invoked for the sample, and this could be an indication of an issue parsing
106
+  the PE header or file.  This causes those binary to generate an alert instead
107
+  of just continuing on.
108
+
109
+- `--max-filesize=2000M --max-scansize=2000M --max-files=2000000
110
+   --max-recursion=2000000 --max-embeddedpe=2000M --max-htmlnormalize=2000000
111
+   --max-htmlnotags=2000000 --max-scriptnormalize=2000000
112
+   --max-ziptypercg=2000000 --max-partitions=2000000 --max-iconspe=2000000
113
+   --max-rechwp3=2000000 --pcre-match-limit=2000000
114
+   --pcre-recmatch-limit=2000000 --pcre-max-filesize=2000M`:
115
+  Effectively disables all file limits and maximums for scanning.  This is
116
+  useful if you'd like to ensure that all files in a set get scanned, and would
117
+  prefer clam to just run slowly or crash rather than skip a file because it
118
+  encounters one of these thresholds
119
+
120
+The following are useful flags to include when debugging rules that you're
121
+writing:
122
+
123
+- `-d`: Allows you to specify a custom ClamAV rule file from the command line
124
+
125
+- `--bytecode-unsigned`: If you are testing custom bytecode rules, you'll need
126
+  this flag so that clamscan actually runs the bytecode signature
127
+
128
+- `--all-match`: Allows multiple signatures to match on a file being scanned
129
+
130
+- `--leave-temps --tmpdir=/tmp`: By default, ClamAV will attempt to extract
131
+  embedded files that it finds, normalize certain text files before looking
132
+  for matches, and unpack packed executables that it has unpacking support for.
133
+  These flags tell ClamAV to write these intermediate files out to the
134
+  directory specified.  Usually when a file is written, it will mention the
135
+  file name in the --debug output, so you can have some idea at what stage in
136
+  the scanning process a tmp file was created.
137
+
138
+- `--dump-certs`: For signed PE files that match a rule, display information
139
+  about the certificates stored within the binary.  Note - sigtool has this
140
+  functionality as well and doesn't require a rule match to view the cert data
141
+
142
+### Useful sigtool Flags
143
+sigtool pulls in libclamav and provides shortcuts to doing tasks that clamscan
144
+does behind the scenes.  These can be really useful when writing a signature or
145
+trying to get information about a signature that might be causing FPs or
146
+performance problems.
147
+
148
+The following sigtool flags can be useful when debugging:
149
+
150
+- `--unpack`: Unpack the specified CVD/CLD file
151
+
152
+- `--decode`: Given a ClamAV signature from STDIN, show a more user-friendly
153
+  representation of it
154
+
155
+- `--hex-dump`: Given a sequence of bytes from STDIN, print the hex equivalent
156
+
157
+- `--mdb`: Generate section hashes of the specified file
158
+
159
+- `--imp`: Generate import hashes of the specified file
160
+
161
+- `--html-normalise`: Normalize the specified HTML file in the way that
162
+  clamscan will before looking for rule matches.  This makes it either to write
163
+  rules that will actually match.
164
+
165
+- `--ascii-normalise`: Normalized the specified ASCII text file in the way that
166
+  clamscan will before looking for rule matches
167
+
168
+- `--print-certs`: Print the Authenticode signatures of any PE files specified.
169
+  This is useful when writing signature-based .crb rule files.
170
+
171
+- `--vba`: Extract VBA/Word6 macro code
172
+
173
+### Using gdb
174
+Given that you might want to pass a lot of arguments to gdb, consider taking
175
+advantage of the `--args` parameter.  For example:
176
+```
177
+gdb --args ./built/bin/clamscan -d /tmp/test.ldb -d /tmp/blacklist.crb -d --dumpcerts --debug --verbose --max-filesize=2000M --max-scansize=2000M --max-files=2000000 --max-recursion=2000000 --max-embeddedpe=2000M --max-iconspe=2000000 f8f101166fec5785b4e240e4b9e748fb6c14fdc3cd7815d74205fc59ce121515
178
+```
179
+
180
+When using ClamAV without libclamav statically linked, if you set breakpoints
181
+on libclamav functions by name, you'll need to make sure to indicate that
182
+the breakpoints should be resolved after libraries have been loaded.
183
+
184
+For other documentation about how to use gdb, check out the following
185
+resources:
186
+ - [A Guide to gdb](http://www.cabrillo.edu/~shodges/cs19/progs/guide_to_gdb_1.1.pdf)
187
+ - [gdb Quick Reference](http://users.ece.utexas.edu/~adnan/gdb-refcard.pdf)
188
+
189
+## Hunting for Memory Leaks
190
+You can easily hunt for memory leaks with valgrind.  Check out this guide to
191
+get started:
192
+ - [Valgrind Quick Start](http://valgrind.org/docs/manual/quick-start.html)
193
+If checking for leaks, be sure to run clamscan with samples that will hit as
194
+many of the unique code paths in the code you are testing.  An example
195
+invocation is as follows:
196
+```
197
+valgrind --leak-check=full ./built/bin/clamscan -d /tmp/test.ldb --leave-temps --tempdir /tmp/test --debug --verbose /tmp/upx-samples/ > /tmp/upx-results-2.txt 2>&1
198
+```
199
+Alternatively, on Linux, you can use glibc's built-in leak checking
200
+functionality:
201
+```
202
+MALLOC_CHECK_=7 ./built/bin/clamscan
203
+```
204
+See the [mallopt man page](http://manpages.ubuntu.com/manpages/trusty/man3/mallopt.3.html) for more details
205
+
206
+## Computing Code Coverage
207
+gcov/lcov can be used to produce a code coverage report indicating which lines
208
+of code were executed on a single run or by multiple runs of clamscan.  NOTE:
209
+for these metrics to be collected, ClamAV needs to have been configured with
210
+the `--enable-coverage` option.
211
+
212
+First, run the following to zero out all of the performance metrics:
213
+```
214
+lcov -z --directory . --output-file coverage.lcov.data
215
+```
216
+Next, run ClamAV through whatever test cases you have.  Then, run lcov again
217
+to collect the coverage data as follows:
218
+```
219
+lcov -c --directory . --output-file coverage.lcov.data
220
+```
221
+Finally, run the genhtml tool that ships with lcov to produce the code coverage
222
+report:
223
+```
224
+genhtml coverage.lcov.data --output-directory report
225
+```
226
+The report directory will have an index.html page which can be loaded into any
227
+web browser.
228
+
229
+For more information, visit the [lcov webpage](http://ltp.sourceforge.net/coverage/lcov.php)
230
+
231
+## Profiling - Flame Graphs
232
+[FlameGraph](https://github.com/brendangregg/FlameGraph) is a great tool for
233
+generating interactive flame graphs based collected profiling data.  The github
234
+page has thorough documentation on how to use the tool, but an overview is
235
+presented below:
236
+
237
+First, install perf, which on Linux can be done via:
238
+```
239
+apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`
240
+```
241
+
242
+Modify the system settings to allow perf record to be run by a standard user:
243
+```
244
+$ sudo su
245
+# cat /proc/sys/kernel/perf_event_paranoid 
246
+# echo "1" > /proc/sys/kernel/perf_event_paranoid 
247
+# exit
248
+```
249
+
250
+Invoke clamscan via perf record as follows, and run perf script to collect the
251
+profiling data:
252
+```
253
+perf record -F 100 -g -- ./built/bin/clamscan -d /tmp/test.ldb /tmp/2aa6b18d509090c60c3e4ecdd8aeb16e5f149807e3404c86892112710eab576d
254
+perf script > out.perf
255
+```
256
+The '-F' parameter indicates how many samples should be collected during
257
+program execution.  If your scan will take a long time to run, a lower value
258
+should be sufficient.  Otherwise, consider choosing a higher value (on Ubuntu
259
+18.04, 7250 is the max frequency, but it can be increased via
260
+/proc/sys/kernel/perf_event_max_sample_rate.
261
+
262
+Check out the FlameGraph project and run the following commands to generate
263
+the flame graph:
264
+```
265
+perl stackcollapse-perf.pl ../clamav-devel/out.perf > /tmp/out.folded
266
+perl flamegraph.pl /tmp/out.folded > /tmp/test.svg
267
+```
268
+
269
+The SVG that is generated is interactive, but some viewers don't support this.
270
+Be sure to open it in a web browser like Chrome to be able to take full
271
+advantage of it.
272
+
273
+## Profiling - Callgrind
274
+Callgrind is a profiling tool included with valgrind.  This can be done by
275
+prepending `valgrind --tool=callgrind ` to the clamscan command.
276
+[kcachegrind](https://kcachegrind.github.io/html/Home.html)
277
+is a follow-on tool that will graphically present the
278
+profiling data and allow you to explore it visually, although if you don't
279
+already use KDE you'll have to install lots of extra packages to use it.
280
+
281
+## System Call Tracing / Fault Injection
282
+strace can be used to track the system calls that are performed and provide the
283
+number of calls / time spent in each system call.  This can be done by
284
+prepending `strace -c ` to a clamscan command.  Results will look something
285
+like this:
286
+```
287
+% time     seconds  usecs/call     calls    errors syscall
288
+------ ----------- ----------- --------- --------- ----------------
289
+ 95.04    0.831430          13     62518           read
290
+  3.22    0.028172          14      2053           munmap
291
+  0.69    0.006005           3      2102           mmap
292
+  0.28    0.002420           7       344           pread64
293
+  0.16    0.001415           5       305         1 openat
294
+  0.13    0.001108           3       405           write
295
+  0.11    0.000932          23        40           mprotect
296
+  0.07    0.000632           2       310           close
297
+  0.07    0.000583           9        67        30 access
298
+  0.05    0.000395           1       444           lseek
299
+  0.04    0.000344           2       162           fstat
300
+  0.04    0.000338           1       253           brk
301
+  0.03    0.000262           1       422           fcntl
302
+  0.02    0.000218          16        14           futex
303
+  0.01    0.000119           1       212           getpid
304
+  0.01    0.000086          14         6           getdents
305
+  0.00    0.000043           7         6           dup
306
+  0.00    0.000040           1        31           unlink
307
+  0.00    0.000038          19         2           rt_sigaction
308
+  0.00    0.000037          19         2           rt_sigprocmask
309
+  0.00    0.000029           1        37           stat
310
+  0.00    0.000022          11         2           prlimit64
311
+  0.00    0.000021          21         1           sysinfo
312
+  0.00    0.000020           1        33           clock_gettime
313
+  0.00    0.000019          19         1           arch_prctl
314
+  0.00    0.000018          18         1           set_tid_address
315
+  0.00    0.000018          18         1           set_robust_list
316
+  0.00    0.000013           0        60           lstat
317
+  0.00    0.000011           0        65           madvise
318
+  0.00    0.000002           0        68           geteuid
319
+  0.00    0.000000           0         1           execve
320
+  0.00    0.000000           0         1           uname
321
+  0.00    0.000000           0         1           getcwd
322
+------ ----------- ----------- --------- --------- ----------------
323
+100.00    0.874790                 69970        31 total
324
+```
325
+
326
+strace can also be used for cool things like system call fault injection.  For
327
+instance, I was curious whether the 'read' bytecode API call was implemented
328
+in such a way that the underlying read system call could handle EINTR being
329
+returned (which can happen periodically).  To test this, I wrote the following
330
+bytecode rule:
331
+```
332
+VIRUSNAME_PREFIX("BC.Heuristic.Test.Read.Passed")
333
+VIRUSNAMES("")
334
+TARGET(0)
335
+
336
+SIGNATURES_DECL_BEGIN
337
+DECLARE_SIGNATURE(zeroes)
338
+SIGNATURES_DECL_END
339
+
340
+SIGNATURES_DEF_BEGIN
341
+DEFINE_SIGNATURE(zeroes, "0:0000")
342
+SIGNATURES_DEF_END
343
+
344
+bool logical_trigger()
345
+{
346
+    return matches(Signatures.zeroes);
347
+}
348
+
349
+#define READ_S(value, size) if (read(value, size) != size) return 0;
350
+
351
+int entrypoint(void)
352
+{
353
+    char buffer[65536];
354
+    int i;
355
+
356
+    for (i = 0; i < 256; i++)
357
+    {
358
+        debug(i);
359
+        debug("\n");
360
+        READ_S(buffer, sizeof(buffer));
361
+    }
362
+
363
+    foundVirus("");
364
+    return 0;
365
+}
366
+```
367
+I compiled the rule, made a test file to match against, and ran it under
368
+strace to determine what underlying read system call was used for the bytecode
369
+read function:
370
+```
371
+clambc-compiler read_test.bc
372
+dd if=/dev/zero of=/tmp/zeroes bs=65535 count=256
373
+strace clamscan -d read_test.cbc --bytecode-unsigned /tmp/zeroes
374
+```
375
+It uses pread64 under the hood, so the following command could be used for fault
376
+injection:
377
+```
378
+strace -e fault=pread64:error=EINTR:when=20+10 clamscan -d read_test.cbc --bytecode-unsigned /tmp/zeroes 
379
+```
380
+This command tells strace to skip the first 20 pread64 calls (these appear to
381
+be used by the loader, which didn't seem to handle EINTR correctly) but to
382
+inject EINTR for every 10th call afterward.  We can see the injection in action
383
+and that the system call is retried successfully:
384
+```
385
+pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15007744) = 65536
386
+pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15073280) = 65536
387
+pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15138816) = 65536
388
+pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15204352) = 65536
389
+pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15269888) = 65536
390
+pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15335424) = 65536
391
+pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15400960) = 65536
392
+pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15466496) = 65536
393
+pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15532032) = 65536
394
+pread64(3, 0x7f6a7ff43000, 65536, 15597568) = -1 EINTR (Interrupted system call) (INJECTED)
395
+pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15597568) = 65536
396
+```
397
+More documentation on this feature can be found in
398
+[this presentation](https://archive.fosdem.org/2017/schedule/event/failing_strace/attachments/slides/1630/export/events/attachments/failing_strace/slides/1630/strace_fosdem2017_ta_slides.pdf)
399
+from FOSDEM 2017.