<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <!--Converted with LaTeX2HTML 2002-2-1 (1.70) original version by: Nikos Drakos, CBLU, University of Leeds * revised and updated by: Marcus Hennecke, Ross Moore, Herb Swan * with significant contributions from: Jens Lippmann, Marek Rouchal, Martin Wilck and others --> <HTML> <HEAD> <TITLE>Scan engine</TITLE> <META NAME="description" CONTENT="Scan engine"> <META NAME="keywords" CONTENT="clamdoc"> <META NAME="resource-type" CONTENT="document"> <META NAME="distribution" CONTENT="global"> <META NAME="Generator" CONTENT="LaTeX2HTML v2002-2-1"> <META HTTP-EQUIV="Content-Style-Type" CONTENT="text/css"> <LINK REL="STYLESHEET" HREF="clamdoc.css"> <LINK REL="next" HREF="node81.html"> <LINK REL="previous" HREF="node79.html"> <LINK REL="up" HREF="node77.html"> <LINK REL="next" HREF="node81.html"> </HEAD> <BODY > <DIV CLASS="navigation"><!--Navigation Panel--> <A NAME="tex2html1375" HREF="node81.html"> <IMG WIDTH="37" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="next" SRC="/usr/share/latex2html/icons/next.png"></A> <A NAME="tex2html1371" HREF="node77.html"> <IMG WIDTH="26" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="up" SRC="/usr/share/latex2html/icons/up.png"></A> <A NAME="tex2html1365" HREF="node79.html"> <IMG WIDTH="63" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="previous" SRC="/usr/share/latex2html/icons/prev.png"></A> <A NAME="tex2html1373" HREF="node1.html"> <IMG WIDTH="65" HEIGHT="24" ALIGN="BOTTOM" BORDER="0" ALT="contents" SRC="/usr/share/latex2html/icons/contents.png"></A> <BR> <B> Next:</B> <A NAME="tex2html1376" HREF="node81.html">CVD format</A> <B> Up:</B> <A NAME="tex2html1372" HREF="node77.html">LibClamAV</A> <B> Previous:</B> <A NAME="tex2html1366" HREF="node79.html">Database reloading</A> <B> <A NAME="tex2html1374" HREF="node1.html">Contents</A></B> <BR> <BR></DIV> <!--End of Navigation Panel--> <H2><A NAME="SECTION00073000000000000000"></A><A NAME="engine"></A> <BR> Scan engine </H2> New versions of Clam AntiVirus use a mutation of the Aho-Corasick pattern matching algorithm. The algorithm is based a finite state pattern matching automaton [<A HREF="node90.html#clr">1</A>] and it's a generalization of the famous Knuth-Morris-Pratt algorithm. Please take a look at the <code>matcher.h</code> for data type definitions. The automaton is represented by a trie. It is a rooted tree with some specific properties [<A HREF="node90.html#acwww">2</A>]. Every node of the trie represents some state of the automaton. In our implementation, the node is defined as follows: <PRE> struct cl_node { short int islast; struct cli_patt *list; int maxpatlen; struct node *next[NUM_CHILDS], *trans[NUM_CHILDS], *fail; }; </PRE> [To be continued...] <P> <BR><HR> <ADDRESS> Tomasz Kojm 2004-07-22 </ADDRESS> </BODY> </HTML>