Browse code

Some formatting adjustments to the development guide. Added development guide to UserManual.md table of contents.

Micah Snyder authored on 2018/11/03 09:58:12
Showing 2 changed files
... ...
@@ -12,9 +12,10 @@ Table Of Contents
12 12
     * [Windows](UserManual/Installation-Windows.md)
13 13
 3. [Configuring ClamAV](UserManual/Configuration.md)
14 14
 4. [Using ClamAV](UserManual/Usage.md)
15
-5. [Build \[lib\]ClamAV Into Your Programs](UserManual/libclamav.md)
16
-6. [Writing ClamAV Signatures](UserManual/Signatures.md)
17
-7. [Writing ClamAV Phishing Signatures](UserManual/PhishSigs.md)
15
+5. [ClamAV Developer Tips and Tricks](UserManual/development.md)
16
+6. [Build \[lib\]ClamAV Into Your Programs](UserManual/libclamav.md)
17
+7. [Writing ClamAV Signatures](UserManual/Signatures.md)
18
+8. [Writing ClamAV Phishing Signatures](UserManual/PhishSigs.md)
18 19
 
19 20
 -----
20 21
 
... ...
@@ -1,27 +1,57 @@
1 1
 # ClamAV Development
2
-This page aims to provide information useful when developing, debugging, or
3
-profiling ClamAV.
2
+
3
+Table of Contents
4
+
5
+- [ClamAV Development](#clamav-development)
6
+    - [Introduction](#introduction)
7
+    - [Building ClamAV for Development](#building-clamav-for-development)
8
+        - [Satisfying Build Dependencies](#satisfying-build-dependencies)
9
+            - [Debian/Ubuntu](#debianubuntu)
10
+            - [CentOS/RHEL/Fedora](#centosrhelfedora)
11
+            - [Solaris (using OpenCSW)](#solaris-using-opencsw)
12
+            - [FreeBSD](#freebsd)
13
+        - [Download the Source](#download-the-source)
14
+        - [Running ./configure](#running-configure)
15
+        - [Running make](#running-make)
16
+        - [Downloading the Official Ruleset](#downloading-the-official-ruleset)
17
+    - [General Debugging](#general-debugging)
18
+        - [Useful clamscan Flags](#useful-clamscan-flags)
19
+        - [Using gdb](#using-gdb)
20
+    - [Hunting for Memory Leaks](#hunting-for-memory-leaks)
21
+    - [Computing Code Coverage](#computing-code-coverage)
22
+    - [Profiling - Flame Graphs](#profiling---flame-graphs)
23
+    - [Profiling - Callgrind](#profiling---callgrind)
24
+    - [System Call Tracing / Fault Injection](#system-call-tracing--fault-injection)
25
+
26
+## Introduction
27
+
28
+This page aims to provide information useful when developing, debugging, or profiling ClamAV.
4 29
 
5 30
 ## Building ClamAV for Development
31
+
6 32
 Below are some recommendations for building ClamAV so that it's easy to debug.
7 33
 
8 34
 ### Satisfying Build Dependencies
35
+
9 36
 To satisify all build dependencies:
10 37
 
11
-#### Debian/Ubuntu:
12
-```
38
+#### Debian/Ubuntu
39
+
40
+```bash
13 41
 sudo apt-get install libxml2-dev libxml2 libbz2-dev bzip2 check make libssl-dev openssl zlib1g zlib1g-dev gcc gettext autoconf automake libtool cmake autoconf-archive pkg-config g++-multilib libmilter1.0.1 libmilter-dev valgrind libcurl4-openssl-dev libjson-c-dev ncurses-dev libpcre3-dev
14 42
 ```
15 43
 
16 44
 #### CentOS/RHEL/Fedora
17
-```
45
+
46
+```bash
18 47
 sudo yum install libxml2-devel libxml2 bzip2-devel bzip2 check make openssl-devel openssl zlib zlib-devel gcc gettext autoconf automake libtool cmake autoreconf pkg-config g++-multilib sendmail sendmail-devel libtool-ltdl-devel valgrind
19 48
 
20 49
 sudo yum groupinstall "Development Tools"
21 50
 ```
22 51
 
23 52
 #### Solaris (using OpenCSW)
24
-```
53
+
54
+```bash
25 55
 sudo /opt/csw/bin/pkgutil -y -i common coreutils automake autoconf libxml2_2 libxml2_dev bzip2 libbz2_dev libcheck0 libcheck_dev gmake cmake libssl1_0_0 libssl_dev openssl_utilslibgcc_s1 libiconv2 zlib1 libstdc++6 libpcre1 libltdl7 lzlib_stub zlib_stub libmilter libtool ggrep gsed pkgconfig ggettext gcc4core gcc4g++ libgcc_s1 libgccpp1
26 56
 
27 57
 sudo pkg install system/header
... ...
@@ -30,293 +60,229 @@ sudo ln -sf /opt/csw/bin/gnm /usr/bin/nm
30 30
 sudo ln -sf /opt/csw/bin/gsed /usr/bin/sed
31 31
 sudo ln -sf /opt/csw/bin/gmake /usr/bin/make
32 32
 ```
33
-If you receive an error message like
34
-`gcc: error: /opt/csw/lib/libstdc++.so: No such file or directory`,
35
-change versions with `/opt/csw/sbin/alternatives --config automake`
33
+
34
+If you receive an error message like `gcc: error: /opt/csw/lib/libstdc++.so: No such file or directory`, change versions with `/opt/csw/sbin/alternatives --config automake`
36 35
 
37 36
 #### FreeBSD
37
+
38 38
 The easiest way to install dependencies for FreeBSD is to just rely on ports:
39
-```
39
+
40
+```bash
40 41
 cd /usr/ports/security/clamav
41 42
 make
42 43
 ```
43 44
 
44 45
 ### Download the Source
45
-```
46
+
47
+```bash
46 48
 git clone https://github.com/Cisco-Talos/clamav-devel.git
47 49
 cd clamav-devel
48 50
 ```
49 51
 
50
-If you intend to make changes and submit a pull request, fork the clamav-devel
51
-repo first and then clone your fork of the repository.
52
+If you intend to make changes and submit a pull request, fork the clamav-devel repo first and then clone your fork of the repository.
52 53
 
53 54
 ### Running ./configure
55
+
54 56
 Suggestions:
55 57
 
56
-- Modify the CFLAGS variable as follows (assuming you're build with gcc):
58
+- Modify the `CFLAGS` variable as follows (assuming you're build with gcc):
57 59
 
58
-    - Include gdb debugging information (`-ggdb`).  This will make it easier
59
-      to debug with gdb
60
+  - Include `gdb` debugging information (`-ggdb`).  This will make it easier to debug with `gdb`.
60 61
 
61
-    - Disable optimizations (`-O0`).  This will ensure the line numbers you
62
-      see in gdb match up with what is actually being executed.
62
+  - Disable optimizations (`-O0`).  This will ensure the line numbers you see in `gdb` match up with what is actually being executed.
63 63
 
64 64
 - Run configure with the following options:
65 65
 
66
-    - ``--prefix=`pwd`/build``: This will cause `make install` to install
67
-      into the specified directory to avoid potentially tainting a release
68
-      install of ClamAV that you may have.
69
-
70
-    - `--enable-debug`: This will define *CL_DEBUG*, which mostly just enables
71
-      additional print statements that are useful for debugging
72
-
73
-    - `--enable-check`: Enables the unit tests, which can be run with 'make
74
-      check'
75
-
76
-    - `--enable-coverage`: If using gcc, sets `-fprofile-arcs -ftest-coverage`
77
-      so that code coverage metrics will get generated when the program is run.
78
-      Note that the code inserted to store program flow data may show up in
79
-      any generated flame graphs or profiling output, so if you don't care
80
-      about code coverage, omit this
81
-
82
-    - `--enable-libjson`: Enables libjson, which enables the `--gen-json` option.
83
-      The json output contains additional metadata that might be helpful when
84
-      debugging.
85
-
86
-    - `--with-systemdsystemunitdir=no`: Don't try to register clamd as a
87
-      systemd service (on systems that use systemd). You likely don't want this
88
-      development build of clamd to register as a service, and this eliminates
89
-      the need to run `make install` with `sudo`.
90
-
91
-    - You might want to include the following flags also so that the optional
92
-      functionality is enabled: `--enable-experimental --enable-clamdtop
93
-      --enable-libjson --enable-milter --enable-xml --enable-pcre`.
94
-      Note that this may require you to install additional development
95
-      libraries.
96
-
97
-    - `--disable-llvm`: When enabled, LLVM provides the capability to
98
-      just-in-time compile ClamAV bytecode signatures. Without LLVM, ClamAV
99
-      uses a built-in bytecode interpreter to execute bytecode signatures.
100
-      The mechanism is different, but the results are same and the performance
101
-      overall is comparable.  At present only LLVM versions up to LLVM 3.6.2
102
-      are supported by ClamAV, and LLVM 3.6.2 is old enough that newer
103
-      distributions no longer provide it. Therefore, we recommend using
104
-      the `--disable-llvm` configure option.
66
+  - ``--prefix=`pwd`/build``: This will cause `make install` to install into the specified directory to avoid potentially tainting a release install of ClamAV that you may have.
67
+
68
+  - `--enable-debug`: This will define *CL_DEBUG*, which mostly just enables additional print statements that are useful for debugging.
69
+
70
+  - `--enable-check`: Enables the unit tests, which can be run with `make check`.
71
+
72
+  - `--enable-coverage`: If using gcc, sets `-fprofile-arcs -ftest-coverage` so that code coverage metrics will get generated when the program is run. Note that the code inserted to store program flow data may show up in any generated flame graphs or profiling output, so if you don't care about code coverage, omit this.
73
+
74
+  - `--enable-libjson`: Enables `libjson`, which enables the `--gen-json` option. The json output contains additional metadata that might be helpful when debugging.
75
+
76
+  - `--with-systemdsystemunitdir=no`: Don't try to register `clamd` as a `systemd` service (on systems that use `systemd`). You likely don't want this development build of `clamd` to register as a service, and this eliminates the need to run `make install` with `sudo`.
77
+
78
+  - You might want to include the following flags also so that the optional functionality is enabled: `--enable-experimental --enable-clamdtop --enable-libjson --enable-milter --enable-xml --enable-pcre`. Note that this may require you to install additional development libraries.
79
+
80
+  - `--disable-llvm`: When enabled, LLVM provides the capability to just-in-time compile ClamAV bytecode signatures. Without LLVM, ClamAV uses a built-in bytecode interpreter to execute bytecode signatures. The mechanism is different, but the results are same and the performance overall is comparable.  At present only LLVM versions up to LLVM 3.6.2 are supported by ClamAV, and LLVM 3.6.2 is old enough that newer distributions no longer provide it. Therefore, we recommend using the `--disable-llvm` configure option.
105 81
 
106 82
 Altogether, the following configure command can be used:
107 83
 
108
-```
84
+```bash
109 85
 CFLAGS="-ggdb -O0" ./configure --prefix=`pwd`/installed --enable-debug --enable-check --enable-coverage --enable-libjson --with-systemdsystemunitdir=no --enable-experimental --enable-clamdtop --enable-libjson --enable-xml --enable-pcre --disable-llvm
110 86
 ```
111 87
 
112
-NOTE: It is possible to build libclamav as a static library and have it
113
-statically linked into clamscan/clamd (to do this, run `./configure` with
114
-`--enable-static --disable-shared`).  This is useful for using tools like gprof
115
-that do not support profiling code in shared objects.  However, there are two
116
-drawbacks to doing this:
88
+NOTE: It is possible to build libclamav as a static library and have it statically linked into clamscan/clamd (to do this, run `./configure` with `--enable-static --disable-shared`).  This is useful for using tools like `gprof` that do not support profiling code in shared objects.  However, there are two drawbacks to doing this:
117 89
 
118
- - clamscan/clamd will not be able to extract files from RAR archives.  Based
119
-   on the software license of the unrar library that ClamAV uses, the library
120
-   can only be dynamically loaded.  ClamAV will attempt to dlopen the unrar
121
-   library shared object and will continue on without RAR extraction support
122
-   if the library can't be found (or if it doesn't get built, which is what
123
-   happens if you indicate that shared libraries should not be built).
90
+- `clamscan`/`clamd` will not be able to extract files from RAR archives.  Based on the software license of the unrar library that ClamAV uses, the library can only be dynamically loaded.  ClamAV will attempt to dlopen the unrar library shared object and will continue on without RAR extraction support if the library can't be found (or if it doesn't get built, which is what happens if you indicate that shared libraries should not be built).
124 91
 
125
- - If you make changes to libclamav, you'll need to `make clean`, `make`, and
126
-   `make install` again to have clamscan/clamd rebuilt using the new
127
-   libclamav.a.  The makefiles don't seem to know to rebuild clamscan/clamd
128
-   when libclamav.a changes (TODO, fix this).
92
+- If you make changes to libclamav, you'll need to `make clean`, `make`, and `make install` again to have `clamscan`/`clamd` rebuilt using the new `libclamav.a`.  The makefiles don't seem to know to rebuild `clamscan`/`clamd` when `libclamav.a` changes (TODO, fix this).
129 93
 
130 94
 ### Running make
131
-Run the following to finishing building.  `-j2` in the code below is used to
132
-indicate that the build process should use 2 cores.  Increase this if your
133
-machine is more powerful.
134
-```
95
+
96
+Run the following to finishing building.  `-j2` in the code below is used to indicate that the build process should use 2 cores.  Increase this if your machine is more powerful.
97
+
98
+```bash
135 99
 make -j2
136 100
 make install
137 101
 ```
138
-Also, you can run 'make check' to run the unit tests
102
+
103
+Also, you can run `make check` to run the unit tests
139 104
 
140 105
 ### Downloading the Official Ruleset
141
-If you plan to use custom rules for testing, you can invoke clamscan via
142
-`./installed/bin/clamscan`, specifying your custom rule files via `-d` parameters.
143 106
 
144
-If you want to download the official ruleset to use with clamscan, do the
145
-following:
107
+If you plan to use custom rules for testing, you can invoke `clamscan` via `./installed/bin/clamscan`, specifying your custom rule files via `-d` parameters.
108
+
109
+If you want to download the official ruleset to use with `clamscan`, do the following:
110
+
146 111
 1. Run `mkdir -p installed/share/clamav`
147 112
 2. Comment out line 8 of etc/freshclam.conf.sample
148 113
 3. Run `./installed/bin/freshclam --config-file etc/freshclam.conf.sample`
149 114
 
150 115
 ## General Debugging
151
-NOTE: Some of the debugging/profiling tools mentioned in the sections below are
152
-specific to Linux
116
+
117
+NOTE: Some of the debugging/profiling tools mentioned in the sections below are specific to Linux
153 118
 
154 119
 ### Useful clamscan Flags
120
+
155 121
 The following are useful flags to include when debugging clamscan:
156 122
 
157 123
 - `--debug --verbose`: Print lots of helpful debug information
158 124
 
159 125
 - `--gen-json`: Print some additional debug information in a JSON format
160 126
 
161
-- `--statistics=pcre --statistics=bytecode`:  Print execution statistics on any
162
-  PCRE and bytecode rules that were evaluated
163
-
164
-- `--dev-performance`: Print per-file statistics regarding how long scanning
165
-  took and the times spent in various scanning stages
166
-
167
-- `--detect-broken`: This will attempt to detect broken executable files.  If
168
-  an executable is determined to be broken, some functionality might not get
169
-  invoked for the sample, and this could be an indication of an issue parsing
170
-  the PE header or file.  This causes those binary to generate an alert instead
171
-  of just continuing on.  NOTE: This will be renamed to `--alert-broken`
172
-  starting in ClamAV 0.101.
173
-
174
-- `--max-filesize=2000M --max-scansize=2000M --max-files=2000000
175
-   --max-recursion=2000000 --max-embeddedpe=2000M --max-htmlnormalize=2000000
176
-   --max-htmlnotags=2000000 --max-scriptnormalize=2000000
177
-   --max-ziptypercg=2000000 --max-partitions=2000000 --max-iconspe=2000000
178
-   --max-rechwp3=2000000 --pcre-match-limit=2000000
179
-   --pcre-recmatch-limit=2000000 --pcre-max-filesize=2000M`:
180
-  Effectively disables all file limits and maximums for scanning.  This is
181
-  useful if you'd like to ensure that all files in a set get scanned, and would
182
-  prefer clam to just run slowly or crash rather than skip a file because it
183
-  encounters one of these thresholds
127
+- `--statistics=pcre --statistics=bytecode`: Print execution statistics on any PCRE and bytecode rules that were evaluated
128
+
129
+- `--dev-performance`: Print per-file statistics regarding how long scanning took and the times spent in various scanning stages
130
+
131
+- `--detect-broken`: This will attempt to detect broken executable files.  If an executable is determined to be broken, some functionality might not get invoked for the sample, and this could be an indication of an issue parsing the PE header or file.  This causes those binary to generate an alert instead of just continuing on.  NOTE: This will be renamed to `--alert-broken` starting in ClamAV 0.101.
132
+
133
+- `--max-filesize=2000M --max-scansize=2000M --max-files=2000000 --max-recursion=2000000 --max-embeddedpe=2000M --max-htmlnormalize=2000000 --max-htmlnotags=2000000 --max-scriptnormalize=2000000 --max-ziptypercg=2000000 --max-partitions=2000000 --max-iconspe=2000000 --max-rechwp3=2000000 --pcre-match-limit=2000000 --pcre-recmatch-limit=2000000 --pcre-max-filesize=2000M`:
134
+
135
+  Effectively disables all file limits and maximums for scanning.  This is useful if you'd like to ensure that all files in a set get scanned, and would prefer clam to just run slowly or crash rather than skip a file because it encounters one of these thresholds
184 136
 
185 137
 The following are useful flags to include when debugging rules that you're
186 138
 writing:
187 139
 
188 140
 - `-d`: Allows you to specify a custom ClamAV rule file from the command line
189 141
 
190
-- `--bytecode-unsigned`: If you are testing custom bytecode rules, you'll need
191
-  this flag so that clamscan actually runs the bytecode signature
142
+- `--bytecode-unsigned`: If you are testing custom bytecode rules, you'll need this flag so that `clamscan` actually runs the bytecode signature
192 143
 
193 144
 - `--all-match`: Allows multiple signatures to match on a file being scanned
194 145
 
195
-- `--leave-temps --tmpdir=/tmp`: By default, ClamAV will attempt to extract
196
-  embedded files that it finds, normalize certain text files before looking
197
-  for matches, and unpack packed executables that it has unpacking support for.
198
-  These flags tell ClamAV to write these intermediate files out to the
199
-  directory specified.  Usually when a file is written, it will mention the
200
-  file name in the --debug output, so you can have some idea at what stage in
201
-  the scanning process a tmp file was created.
146
+- `--leave-temps --tmpdir=/tmp`: By default, ClamAV will attempt to extract embedded files that it finds, normalize certain text files before looking for matches, and unpack packed executables that it has unpacking support for. These flags tell ClamAV to write these intermediate files out to the directory specified.  Usually when a file is written, it will mention the file name in the --debug output, so you can have some idea at what stage in the scanning process a tmp file was created.
202 147
 
203
-- `--dump-certs`: For signed PE files that match a rule, display information
204
-  about the certificates stored within the binary.  Note - sigtool has this
205
-  functionality as well and doesn't require a rule match to view the cert data
148
+- `--dump-certs`: For signed PE files that match a rule, display information about the certificates stored within the binary.  Note - sigtool has this functionality as well and doesn't require a rule match to view the cert data
206 149
 
207 150
 ### Using gdb
208
-Given that you might want to pass a lot of arguments to gdb, consider taking
209
-advantage of the `--args` parameter.  For example:
210
-```
151
+
152
+Given that you might want to pass a lot of arguments to `gdb`, consider taking advantage of the `--args` parameter.  For example:
153
+
154
+```bash
211 155
 gdb --args ./installed/bin/clamscan -d /tmp/test.ldb -d /tmp/blacklist.crb -d --dumpcerts --debug --verbose --max-filesize=2000M --max-scansize=2000M --max-files=2000000 --max-recursion=2000000 --max-embeddedpe=2000M --max-iconspe=2000000 f8f101166fec5785b4e240e4b9e748fb6c14fdc3cd7815d74205fc59ce121515
212 156
 ```
213 157
 
214
-When using ClamAV without libclamav statically linked, if you set breakpoints
215
-on libclamav functions by name, you'll need to make sure to indicate that
216
-the breakpoints should be resolved after libraries have been loaded.
158
+When using ClamAV without libclamav statically linked, if you set breakpoints on libclamav functions by name, you'll need to make sure to indicate that the breakpoints should be resolved after libraries have been loaded.
159
+
160
+For other documentation about how to use `gdb`, check out the following resources:
217 161
 
218
-For other documentation about how to use gdb, check out the following
219
-resources:
220
- - [A Guide to gdb](http://www.cabrillo.edu/~shodges/cs19/progs/guide_to_gdb_1.1.pdf)
221
- - [gdb Quick Reference](http://users.ece.utexas.edu/~adnan/gdb-refcard.pdf)
162
+- [A Guide to gdb](http://www.cabrillo.edu/~shodges/cs19/progs/guide_to_gdb_1.1.pdf)
163
+- [gdb Quick Reference](http://users.ece.utexas.edu/~adnan/gdb-refcard.pdf)
222 164
 
223 165
 ## Hunting for Memory Leaks
224
-You can easily hunt for memory leaks with valgrind.  Check out this guide to
225
-get started:
226
- - [Valgrind Quick Start](http://valgrind.org/docs/manual/quick-start.html)
227
-If checking for leaks, be sure to run clamscan with samples that will hit as
228
-many of the unique code paths in the code you are testing.  An example
229
-invocation is as follows:
230
-```
166
+You can easily hunt for memory leaks with valgrind.  Check out this guide to get started: [Valgrind Quick Start](http://valgrind.org/docs/manual/quick-start.html)
167
+
168
+If checking for leaks, be sure to run `clamscan` with samples that will hit as many of the unique code paths in the code you are testing.  An example invocation is as follows:
169
+
170
+```bash
231 171
 valgrind --leak-check=full ./installed/bin/clamscan -d /tmp/test.ldb --leave-temps --tempdir /tmp/test --debug --verbose /tmp/upx-samples/ > /tmp/upx-results-2.txt 2>&1
232 172
 ```
233
-Alternatively, on Linux, you can use glibc's built-in leak checking
234
-functionality:
235
-```
173
+
174
+Alternatively, on Linux, you can use glibc's built-in leak checking functionality:
175
+
176
+```bash
236 177
 MALLOC_CHECK_=7 ./installed/bin/clamscan
237 178
 ```
179
+
238 180
 See the [mallopt man page](http://manpages.ubuntu.com/manpages/trusty/man3/mallopt.3.html) for more details
239 181
 
240 182
 ## Computing Code Coverage
241
-gcov/lcov can be used to produce a code coverage report indicating which lines
242
-of code were executed on a single run or by multiple runs of clamscan.  NOTE:
243
-for these metrics to be collected, ClamAV needs to have been configured with
244
-the `--enable-coverage` option.
183
+
184
+gcov/lcov can be used to produce a code coverage report indicating which lines of code were executed on a single run or by multiple runs of `clamscan`.  NOTE: for these metrics to be collected, ClamAV needs to have been configured with the `--enable-coverage` option.
245 185
 
246 186
 First, run the following to zero out all of the performance metrics:
247
-```
187
+
188
+```bash
248 189
 lcov -z --directory . --output-file coverage.lcov.data
249 190
 ```
250
-Next, run ClamAV through whatever test cases you have.  Then, run lcov again
251
-to collect the coverage data as follows:
252
-```
191
+
192
+Next, run ClamAV through whatever test cases you have.  Then, run lcov again to collect the coverage data as follows:
193
+
194
+```bash
253 195
 lcov -c --directory . --output-file coverage.lcov.data
254 196
 ```
255
-Finally, run the genhtml tool that ships with lcov to produce the code coverage
256
-report:
257
-```
197
+
198
+Finally, run the genhtml tool that ships with lcov to produce the code coverage report:
199
+
200
+```bash
258 201
 genhtml coverage.lcov.data --output-directory report
259 202
 ```
260
-The report directory will have an index.html page which can be loaded into any
261
-web browser.
203
+
204
+The report directory will have an `index.html` page which can be loaded into any web browser.
262 205
 
263 206
 For more information, visit the [lcov webpage](http://ltp.sourceforge.net/coverage/lcov.php)
264 207
 
265 208
 ## Profiling - Flame Graphs
266
-[FlameGraph](https://github.com/brendangregg/FlameGraph) is a great tool for
267
-generating interactive flame graphs based collected profiling data.  The github
268
-page has thorough documentation on how to use the tool, but an overview is
269
-presented below:
270 209
 
271
-First, install perf, which on Linux can be done via:
272
-```
210
+[FlameGraph](https://github.com/brendangregg/FlameGraph) is a great tool for generating interactive flamegraphs based collected profiling data.  The github page has thorough documentation on how to use the tool, but an overview is presented below:
211
+
212
+First, install `perf`, which on Linux can be done via:
213
+
214
+```bash
273 215
 apt-get install linux-tools-common linux-tools-generic linux-tools-`uname -r`
274 216
 ```
275 217
 
276
-Modify the system settings to allow perf record to be run by a standard user:
277
-```
278
-$ sudo su
279
-# cat /proc/sys/kernel/perf_event_paranoid 
280
-# echo "1" > /proc/sys/kernel/perf_event_paranoid 
281
-# exit
282
-```
218
+Modify the system settings to allow `perf` record to be run by a standard user:
283 219
 
284
-Invoke clamscan via perf record as follows, and run perf script to collect the
285
-profiling data:
220
+```bash
221
+sudo su     # Run the following as root
222
+cat /proc/sys/kernel/perf_event_paranoid
223
+echo "1" > /proc/sys/kernel/perf_event_paranoid
224
+exit
286 225
 ```
226
+
227
+Invoke `clamscan` via `perf record` as follows, and run `perf script` to collect the profiling data:
228
+
229
+```bash
287 230
 perf record -F 100 -g -- ./installed/bin/clamscan -d /tmp/test.ldb /tmp/2aa6b18d509090c60c3e4ecdd8aeb16e5f149807e3404c86892112710eab576d
288 231
 perf script > out.perf
289 232
 ```
290
-The '-F' parameter indicates how many samples should be collected during
291
-program execution.  If your scan will take a long time to run, a lower value
292
-should be sufficient.  Otherwise, consider choosing a higher value (on Ubuntu
293
-18.04, 7250 is the max frequency, but it can be increased via
294
-/proc/sys/kernel/perf_event_max_sample_rate.
295
-
296
-Check out the FlameGraph project and run the following commands to generate
297
-the flame graph:
298
-```
233
+
234
+The `-F` parameter indicates how many samples should be collected during program execution.  If your scan will take a long time to run, a lower value should be sufficient.  Otherwise, consider choosing a higher value (on Ubuntu 18.04, 7250 is the max frequency, but it can be increased via `/proc/sys/kernel/perf_event_max_sample_rate`.
235
+
236
+Check out the FlameGraph project and run the following commands to generate the flame graph:
237
+
238
+```bash
299 239
 perl stackcollapse-perf.pl ../clamav-devel/out.perf > /tmp/out.folded
300 240
 perl flamegraph.pl /tmp/out.folded > /tmp/test.svg
301 241
 ```
302 242
 
303 243
 The SVG that is generated is interactive, but some viewers don't support this.
304
-Be sure to open it in a web browser like Chrome to be able to take full
305
-advantage of it.
244
+Be sure to open it in a web browser like Chrome to be able to take full advantage of it.
306 245
 
307 246
 ## Profiling - Callgrind
308
-Callgrind is a profiling tool included with valgrind.  This can be done by
309
-prepending `valgrind --tool=callgrind ` to the clamscan command.
310
-[kcachegrind](https://kcachegrind.github.io/html/Home.html)
311
-is a follow-on tool that will graphically present the
312
-profiling data and allow you to explore it visually, although if you don't
313
-already use KDE you'll have to install lots of extra packages to use it.
247
+
248
+Callgrind is a profiling tool included with `valgrind`.  This can be done by prepending `valgrind --tool=callgrind ` to the `clamscan` command.
249
+
250
+[kcachegrind](https://kcachegrind.github.io/html/Home.html) is a follow-on tool that will graphically present the profiling data and allow you to explore it visually, although if you don't already use KDE you'll have to install lots of extra packages to use it.
314 251
 
315 252
 ## System Call Tracing / Fault Injection
316
-strace can be used to track the system calls that are performed and provide the
317
-number of calls / time spent in each system call.  This can be done by
318
-prepending `strace -c ` to a clamscan command.  Results will look something
319
-like this:
253
+
254
+strace can be used to track the system calls that are performed and provide the number of calls / time spent in each system call.  This can be done by prepending `strace -c ` to a `clamscan` command.  Results will look something like this:
255
+
320 256
 ```
321 257
 % time     seconds  usecs/call     calls    errors syscall
322 258
 ------ ----------- ----------- --------- --------- ----------------
... ...
@@ -357,12 +323,9 @@ like this:
357 357
 100.00    0.874790                 69970        31 total
358 358
 ```
359 359
 
360
-strace can also be used for cool things like system call fault injection.  For
361
-instance, I was curious whether the 'read' bytecode API call was implemented
362
-in such a way that the underlying read system call could handle EINTR being
363
-returned (which can happen periodically).  To test this, I wrote the following
364
-bytecode rule:
365
-```
360
+`strace` can also be used for cool things like system call fault injection.  For instance, let's say you are curious whether the `read` bytecode API call is implemented in such a way that the underlying `read` system call could handle `EINTR` being returned (which can happen periodically).  To test this, write the following bytecode rule:
361
+
362
+```c
366 363
 VIRUSNAME_PREFIX("BC.Heuristic.Test.Read.Passed")
367 364
 VIRUSNAMES("")
368 365
 TARGET(0)
... ...
@@ -398,23 +361,23 @@ int entrypoint(void)
398 398
     return 0;
399 399
 }
400 400
 ```
401
-I compiled the rule, made a test file to match against, and ran it under
402
-strace to determine what underlying read system call was used for the bytecode
403
-read function:
404
-```
401
+
402
+Compiled the rule, and make a test file to match against it. Then run it under `strace` to determine what underlying read system call is being used for the bytecode `read` function:
403
+
404
+```bash
405 405
 clambc-compiler read_test.bc
406 406
 dd if=/dev/zero of=/tmp/zeroes bs=65535 count=256
407 407
 strace clamscan -d read_test.cbc --bytecode-unsigned /tmp/zeroes
408 408
 ```
409
-It uses pread64 under the hood, so the following command could be used for fault
410
-injection:
411
-```
412
-strace -e fault=pread64:error=EINTR:when=20+10 clamscan -d read_test.cbc --bytecode-unsigned /tmp/zeroes 
409
+
410
+It uses `pread64` under the hood, so the following command could be used for fault injection:
411
+
412
+```bash
413
+strace -e fault=pread64:error=EINTR:when=20+10 clamscan -d read_test.cbc --bytecode-unsigned /tmp/zeroes
413 414
 ```
414
-This command tells strace to skip the first 20 pread64 calls (these appear to
415
-be used by the loader, which didn't seem to handle EINTR correctly) but to
416
-inject EINTR for every 10th call afterward.  We can see the injection in action
417
-and that the system call is retried successfully:
415
+
416
+This command tells `strace` to skip the first 20 `pread64` calls (these appear to be used by the loader, which didn't seem to handle `EINTR` correctly) but to inject `EINTR` for every 10th call afterward.  We can see the injection in action and that the system call is retried successfully:
417
+
418 418
 ```
419 419
 pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15007744) = 65536
420 420
 pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15073280) = 65536
... ...
@@ -428,6 +391,5 @@ pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15532032) = 65536
428 428
 pread64(3, 0x7f6a7ff43000, 65536, 15597568) = -1 EINTR (Interrupted system call) (INJECTED)
429 429
 pread64(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 65536, 15597568) = 65536
430 430
 ```
431
-More documentation on this feature can be found in
432
-[this presentation](https://archive.fosdem.org/2017/schedule/event/failing_strace/attachments/slides/1630/export/events/attachments/failing_strace/slides/1630/strace_fosdem2017_ta_slides.pdf)
433
-from FOSDEM 2017.
431
+
432
+More documentation on using `strace` to perform system call fault injection, see [this presentation](https://archive.fosdem.org/2017/schedule/event/failing_strace/attachments/slides/1630/export/events/attachments/failing_strace/slides/1630/strace_fosdem2017_ta_slides.pdf) from FOSDEM 2017.