%% LyX 1.5.3 created this file. For more info, see http://www.lyx.org/. %% Do not edit unless you really know what you are doing. \documentclass[a4paper,english,10pt]{article} \usepackage{amssymb} \usepackage{pslatex} \usepackage[T1]{fontenc} \usepackage[dvips]{graphicx} \usepackage{url} \usepackage{fancyhdr} \usepackage{varioref} \usepackage{prettyref} \date{} \begin{document} \title{{\huge Phishing signatures creation HOWTO}} \author{T\"or\"ok Edwin} \maketitle %TODO: define a LaTeX command, instead of using \textsc{RealURL} each time \section{Database file format} \subsection{PDB format} This file contains urls/hosts that are target of phishing attempts. It contains lines in the following format: \begin{verbatim} R[Filter]:RealURL:DisplayedURL[:FuncLevelSpec] H[Filter]:DisplayedHostname[:FuncLevelSpec] \end{verbatim} \begin{description} \item [{R}] regular expression, for the concatenated URL \item [{H}] matches the \verb+DisplayedHostname+ as a simple pattern (literally, no regular expression) \begin{itemize} \item the pattern can match either the full hostname \item or a subdomain of the specified hostname \item to avoid false matches in case of subdomain matches, the engine checks that there is a dot(\verb+.+) or a space(\verb+ +) before the matched portion \end{itemize} \item [{Filter}] an (optional) 3-digit hexadecimal number representing flags that should be filtered. \begin{itemize} \item flag filtering only makes sense in .pdb files. (however clamav won't complain if you put flags in .wdb files, it will just skip them) \item for details on how to construct a flag number see section \prettyref{sec:Flags} \end{itemize} \item [{\textsc{RealURL}}] is the URL the user is sent to, example: \emph{href} attribute of an html anchor (\emph{ tag}) \item [{\textsc{DisplayedURL}}] is the URL description displayed to the user, where its \emph{claimed} they are sent, example: contents of an html anchor (\emph{ tag}) \item [{DisplayedHostname}] is the hostname portion of the \textsc{DisplayedURL} \item [{FuncLevelSpec}] an (optional) functionality level, 2 formats are possible: \begin{itemize} \item \verb+minlevel+ all engines having functionality level >= \verb+minlevel+ will load this line \item \verb+minlevel-maxlevel+ engines with functionality level $>= $ \verb+minlevel+, and $< $ \verb+maxlevel+ will load this line \end{itemize} \end{description} \subsection{WDB format} This file contains whitelisted url pairs It contains lines in the following format: \begin{verbatim} X:RealURL:DisplayedURL[:FuncLevelSpec] M:RealHostname:DisplayedHostname[:FuncLevelSpec] \end{verbatim} \begin{description} \item [{X}] regular expression, for the \emph{entire URL}, not just the hostname \begin{itemize} \item The regular expression is by default anchored to start-of-line and end-of-line, as if you have used \verb+^RegularExpression$+ \item A trailing \verb+/+ is automatically added both to the regex, and the input string to avoid false matches \item The regular expression matches the \emph{concatenation} of the \textsc{RealURL}, a colon(\verb+:+), and the \textsc{DisplayedURL} as a single string. It doesn't separately match \textsc{RealURL} and \textsc{DisplayedURL}! \end{itemize} \item [{M}] matches hostname, or subdomain of it, see notes for {H} above \end{description} \subsection{Hints} \begin{itemize} \item empty lines are ignored \item the colons are mandatory \item Don't leave extra spaces on the end of a line! \item if any of the lines don't conform to this format, clamav will abort with a Malformed Database Error \item see section \vref{sub:Extraction-of-realURL,} for more details on \textsc{realURL/displayedURL} \end{itemize} \subsection{Examples of PDB signatures} To check for phishing mails that target amazon.com, or subdomains of amazon.com: \begin{verbatim} H:amazon.com \end{verbatim} To do the same, but for amazon.co.uk: \begin{verbatim} H:amazon.co.uk \end{verbatim} To limit the signatures to certain engine versions: \begin{verbatim} H:amazon.co.uk:20-30 H:amazon.co.uk:20- H:amazon.co.uk:0-20 \end{verbatim} First line: engine versions 20, 21, ..., 29 can load it Second line: engine versions >= 20 can load it Third line: engine versions < 20 can load it In a real situation, you'd probably use the second form. A situation like that would be if you are using a feature of the signatures not available in earlier versions, or if earlier versions have bugs with your signature. Its neither case here, the above examples are for illustrative purposes only. \subsection{Examples of WDB signatures} To allow amazon's country specific domains and amazon.com, to mix domain names in \textsc{DisplayedURL}, and \textsc{RealURL}: \begin{verbatim} X:.+\.amazon\.(at|ca|co\.uk|co\.jp|de|fr)([/?].*)?:.+\.amazon\.com([/?].*)?:17- \end{verbatim} Explanation of this signature: \begin{description} \item [{X:}] this is a regular expression \item [{:17-}] load signature only for engines with functionality level >= 17 (recommended for type X) \end{description} The regular expression is the following (X:, :17- stripped, and a / appended) \begin{verbatim} .+\.amazon\.(at|ca|co\.uk|co\.jp|de|fr)([/?].*)?:.+\.amazon\.com([/?].*)?/ \end{verbatim} Explanation of this regular expression (note that it is a single regular expression, and not 2 regular expressions splitted at the {:}). \begin{itemize} \item \verb;.+; any subdomain of \item \verb;\.amazon\.; domain we are whitelisting (\textsc{RealURL} part) \item \verb;(at|ca|co\.uk|co\.jp|de|fr); country-domains: at, ca, co.uk, co.jp, de, fr \item \verb;([/?].*)?; recomended way to end real url part of whitelist, this protects against embedded URLs (evilurl.example.com/amazon.co.uk/) \item \verb;:; \textsc{RealURL} and \textsc{DisplayedURL} are concatenated via a {:}, so match a literal {:} here \item \verb;.+; any subdomain of \item \verb;\.amazon\.com; whitelisted DisplayedURL \item \verb;([/?].*)?; recommended way to end displayed url part, to protect against embedded URLs \item \verb;/; automatically added to further protect against embedded URLs \end{itemize} When you whitelist an entry make sure you check that both domains are owned by the same entity. What this whitelist entry allows is: Links claiming to point to amazon.com (\textsc{DisplayedURL}), but really go to country-specific domain of amazon (\textsc{RealURL}). \subsection{Example for how the URL extractor works} Consider the following HTML file: \begin{verbatim} 1.displayedurl.example.com 2 di

splayedurl.example.com 3.nested.example.com 4.displayedurl.example.com

sometext 5.form.nested.link-displayedurl.example.com 6.displ ayedurl.example.com