155
ITA NETWORKS, INC Spam Marshall Users’ Guide

Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

ITA NETWORKS, INC

Spam Marshall Users’ Guide

Page 2: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

Spam Marshall SpamWall User’s Guide

© Copyright ITA Networks, Inc. 2000-2004. All Rights Reserved.

This guide contains proprietary information, which is protected by copyright. The software described in this guide is furnished under a software license or nondisclosure agreement. This software may be used or copied only in accordance with the terms of the applicable agreement. No part of this guide may be reproduced including photocopying and recording, for any purpose other than the purchaser’s personal use without the written permission of ITA Networks, Inc.

Warranty

The information contained in this document is subject to change without notice. ITA Networks makes no warranty of any kind with respect to this information. ITA Networks SPECIFICALLY DISCLAIMS THE IMPLIED WARRANTY OF THE MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. ITA Networks shall not be liable for any direct, indirect, incidental, consequential, or other damage alleged in connection with the furnishing or use of this information.

Trademarks

Sun, Sun Microsystems, Java, and all Sun-based and Java-based logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. This product includes software developed by the Apache Software Foundation (http://www.apache.org/).

ITA Networks Inc. 315 Forsgate Drive, Monroe, NJ 08831

Phone 732-656-4552 • Fax 928-569-9719

Page 3: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

Table of Contents Introduction 1

About Spam Marshall 1 Domain Validation 6

This method verifies if the sending

mail server has a valid MX host

record. Non existence of MX host

record in DNS is a sure sign of an

illegitimate domain. 6 IP Validation 6 Internal White and Black Lists 7 Content Filtering 7 Custom Filters 9

Installation Instructions 10 Installing Spam Marshall Spamwall

Edition 10 Hardware Requirements 10 • Minimum Hardware

requirements are as follows: 10 Software Requirements 10

Installing Spam Marshall on a

dedicated machine 11 Installing Spam Marshall and your

email server on the same machine 13 Installation and Setup 14 Verify Installation 16 Post Installation Steps 21 Initial Configuration 22 Testing installation 24 Troubleshooting 25

Configuring Spam Marshall for MS

Exchange 32 Getting Started 35

How to use Spam Marshall Control

Panel: 41 Spam Marshall Administration Console:41 Administration Console Options 44

Check Status 44 View/Modify Filters 46

Spam Score Threshold: 47 Possible Spam Score Threshold: 48 Domain Validation Weight: 48 IP Verification / RBL (Real-time Black

Hole Lists): 49 Content Validation: 54

Types of content filters 55 To allows all Emails from a particular

Sender by email address: 61 Subject Filter: 62 Body Filters 64

Preprocessing VS Post processing66 Header Filter 68 Custom Filters: 68 Black Listed IP addresses / White

Listed IP addresses 70 Enabling Challenge/Response 71

Pros and Cons of

Challenge/Response 74 Message Actions 74 Filter Operators 77

Page 4: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

Contains Word 78 Contains 78 Equals 78 Starts With / End With 78 Ends With 79 Does not contain 79 Is blank 79 Regular Expressions 79

Reports 91 Type of Reports 93 Managing Spam and Possible Spam

messages 104 A short tutorial for Regular Expressions 131

Page 5: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

1

Introduction Because Email is Important to You!

elcome to Spam Marshall. This document explains the installation and management of Spam Marshall SpamWall. This document is intended for system administrators and other technical personnel. Readers of the

document should:

• Have knowledge of the operating systems such as Windows , Linux, etc and SMTP Email servers you have currently installed in your organization such as MS Exchange 2000, Lotus Domino etc..

• Know the setup your network including network hardware and software firewalls.

• Have Administrator access on the computer that will host the Spam Marshall SpamWall software (if installing on a dedicated server) and your current Email server.

About Spam Marshall

Spam Masrhall offers a comprehensive Spam detection and elimination system for your mail server. It is designed and created with the ease of installation and management in mind. Installation is straightforward and takes only a few minutes of your valuable time. An intuitive remote Management browser based interface allows you to start managing emails most suitable for your environment without spending extensive amount of time learning the product. Spam Marshall Server based Spam detection and elimination allows users to concentrate their efforts on their jobs rather then managing Spam on their desktop.

Spam Marshall Corporate Edition offers complete server-side anti-spam protection to enterprises running any email server that uses SMTP protocol (this means all internet email servers). It actively identifies and defuses Spam attacks before they inconvenience end users and overwhelm

Chapter

1 W

Page 6: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

2

and damage your network bandwidth and other resources. Spam Marshall allows you to remove unwanted emails before it reaches your user’s inboxes, without violating their privacy.

Spam Marshall uses various types of filtering mechanisms:

• Bayesian filters are widely acclaimed to be the best way to tackle Spam because they use statistical intelligence to analyze the content of the mail. Spam Marshall uniquely implements this technology at the server level in a reliable and effective manner. Our implementation allows Spam Marshall to detect Spam right out of the box unlike other products that require a steep learning curve before becoming affective. Spam Marshall does not rely on Bayesian alone to detect Spam; it is one of many processes an email goes through before a decision is made to tag an email as a Spam. Bayesian filtering detects Spam based on message content rather than just checking for keywords and therefore less prone to spammer tricks and techniques.

• Spam Marshall Custom Rules Engine (CURE) uses state-of-the-art technologies and strategies to filter and classify emails as they enter your site. Custom Rules can also be created by you to tailor specifically to your needs.

• Content checking and filtering allow you to check incoming email for Spam words. You can also tailor this for your own business needs. This option is capable of blocking an email borne viruse before the latest anti-virus definitions are available to block it.

• Whitelists and Blacklists which can be either customized or predefined RBLs (Real-time Black Hole Lists).

Features in Spam Marshall SpamWall Version 2004

Feature Description

Bayesian filter Self-learning implementation that adapts automatically to the latest spamming techniques and catches a large number of Spam along with other built-in Spam detection methods.

Active Directory Integration

If you are running Microsoft Active Directory and Exchange Server on your network, you can integrate Spam Marshall

Page 7: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

3

with this to reduce Spam sent to invalid users that is normally accepted by Exchange Servers.

SPAM - Intrusion Detection System (IDS)

This Spam Marshall unique feature allows you to detect spammers trying to probe and connect to your mail server. Provides detail information about location and connection attempts that can be used to track down the offender and block them off completely from your network.

Custom Rules Engine Our proprietary Custom Rules Engine (CURE), developed though extensive analysis of email tricks and techniques spammers use to bypass other spam filter programs. These rules are constantly updated on-line for new tricks found in the wild.

Content Filtering Keyword Searches that can be customized to suit your own business requirements. Example of keyword searches are “Viagra” or “Lose Weight”. Spam Marshall also uses powerful Regular Expressions technology to catch words deliberately misspelled by Spammers, eg.,: V1agr@ or L00SE W*E*I*G*H*T.

RBL You can check mail against popular third party blacklists such as ORDB, SpamCop, etc. Use the existing services bundled with Spam Marshall or customize to use your own.

URL Filtering URL-based Spam is the most common type of Spam sent nowadays. Spam Marshall provides multiple ways to analyze and filter most common URL based Spam tricks such as obfuscated URL, Decimal or Hexadecimal IP addresses, Escaping, Username and Passwords, Redirection, base-64 encoding, java scripts, and many others

Page 8: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

4

common techniques.

End User Administration Interface

This option empowers end users to check their own Spam and non-Spam messages from their desktop on the Spam Marshall Server. Eliminates end-user fear of ever losing an email because of a false positive.

Reporting Graphical HTML real-time reports provide administrators with a powerful tool to monitor servers.

Management A Web based user-friendly interface to manage and monitor mail server.

Overview of Spam Marshall Corporate Edition

Spam Marshall provides a multi-staged rules-based approach in managing Spam mail. Spam Marshall allows pre and post processing of email for content filtering. This allows administrators to create custom filtering rules that are aligned with a company's policy. The rules engine assigns scores to an incoming mail message based on unique

Page 9: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

5

characteristics of the mail, content of the message, and message header. At every stage points are assigned to an email, which are then added to come up with a final score. When a message score reaches a defined threshold, it is flagged as spam and is quarantined. A final score is assigned to a message based on multiple criteria:

§ Bayesian Filter Analysis

§ Domain Validation

§ IP Validation

§ Content Filtering

§ Body filters

§ Sender filters

§ Subject filters

§ Header filters

§ Custom filters for Rules Engine (CURE)

Bayesian Filter Analysis

Bayesian filters are widely acclaimed to be the best way to tackle spam because they use statistical intelligence to analyze the content of the mail. Spam Marshall implements this technology at server level in a reliable and effective manner. Bayesian filtering detects spam based on message content. Rather than just checking for keywords, Spam Marshall Bayesian filter takes the whole message into consideration. Bayesian filtering is based on the mathematical principle that most events are dependent and that the probability of an event occurring in the future can be inferred from the previous occurrences of that event – the same concept is used to identify new spam messages based on the content of past spam messages. In short, Bayesian filtering has the following advantages:

• Looks at the whole message • Adapts itself over time • Is sensitive/adapts to the company/user • Uses artificial intelligence • Hard to trick.

Page 10: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

6

SMTP Intrusion Detection System (IDS)

Spam Marshall provides a pro-active approach to detect intrusions by malicious spammers and hackers to your mail server. These users connect to your SMTP server but usually do not send any emails. The reason often being that they are possibly probing your smtp server for open holes or weaknesses. The next step used by them is to launch a dictionary attack to guess for username and passwords or to launch a DoS (Denial of Service attack). Administrator can look at the number of invalid attempts and IP addresses originated from and possibly block them at the network firewall level. Detail information about the IP address such as its location can be found by just a single click on the ip addresses itself. This IDS feature is unique to Spam Marshall and not found in any other anti-Spam products currently available.

Active Directory Integration (IDS)

If your company runs any version of Microsoft Exchange Server (including NT and Exchange 5.5), you can use the built-in Active Directory Integration option for reducing Spam. Microsoft SMTP Servers do not check for a valid user email address before accepting an email. For example if an email is sent to [email protected] and Exchange is configured to accept email for the domain abc.com, it will accept email regardless of the fact that there may not be any email box setup for user xyz. These results in extensive amount of NDR generated by Exchange, bloated badmail folder, and can cause extensive resource utilization by Exchange in times of a Spam attack. Spam Marshall Active Directory Integration eliminates this problem and email is accepted for only those users who legitimately exist with a valid email account. This feature works with all versions of Exchange including NT and Exchange 5.5 (this version uses LDAP to store user account information even without Active Directory on your Network).

Domain Validation

This method verifies if the sending mail server has a valid MX host record. Non-existence of MX host record in DNS is a sure sign of an illegitimate or fake domain trying to send Spam.

IP Validation

Spam Marshall uses Real Time Black Hole Lists (RBL). IP addresses of the mail server and the sender in the email header is verified against various Black Lists to verify

Page 11: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

7

that it is not sent by a known spammer. A normally secure mail server should refuse to relay email from an external sender who is not part of its domain. Unfortunately there are many mail servers which are not properly configured and are used by spammers to send mail. Spam Marshall uses a default set of the most reliable RBL services and also allows you to specify which ones to use or add your own. Multiple RBL databases can be used simultaneously in Spam Marshall. Since relying on RBL alone could provide false positives, Spam Marshall assigns a score to the message based on the results received from RBL and used as part of the answer in the evaluation process.

Internal White and Black Lists

Besides using RBL servers, you can create your own White/Black list of IP Addresses. This is extremely useful in cases when you get attacked by a virus-infected computer on the internet or your company policy specifies blocking or receiving all emails from a specific source.

Content Filtering

The Rules Engine uses sophisticated algorithms to parse the email content and assigns a score based on the result. Spammers use very sophisticated tricks and techniques to avoid being caught by most common content filtering software. Spam Marshall software knows these tricks well and outsmarts the spammers. Below are some common methods used by Spammers and how Spam Marshall handles these tricks:

Header filter Extracts different elements from the email header such as IP Address

Pre-Post processing

Many spammers use embedded HTML comments to avoid being caught. For instance the following characters are displayed as Viagra on the email reader by can confuse a computer program.

V<!--abcd -->i<!-- nonsense -->a<!--X-->gr<invalidtag>a

In other words this technique allows the Rules Engine to extract embedded words within HTML comments and invalid HTML tags.

Garbage detector

Many spam messages contain meaningless words in the message in order to increase the message size and confuse pattern matching spam filters. Although this technique does not have any affect on Spam Marshall, it penalizes the email for using such mechanism.

Page 12: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

8

Regular Expression

The Rules Engine uses powerful regular expressions to search for words. For instance Viagra is caught in all of the following examples

• Viagra

• V i a g r a

• V*i*a*g*r*a

• V1agr@

Note that the words written above are interpreted correctly by a human being but is difficult for a computer to understand.

Base 64 Spammers often use a different characters set to elude the spam filtering program. Using Base64 is a common technique. Most email readers like Outlook and Netscape convert these characters into human readable format before displaying, however they can be confusing for other programs

Date verification filter

Some spammers use dates which are either very far in the future or past. When users open these messages they always appear either on the top or bottom of all other messages in the INBOX. This custom rule detects these messages and assigns a score

External Pages

Sometimes spammers don't put any content in the email message itself. Instead, the message body refers to an external HTML page that usually contains the actual message. This custom rule detects these cases and assigns a score

UUEncoded Message

UUEncoded messages are used in old days when MIME was not supported. Most of the modern email readers support these type of messages to maintain backward compatibility. Most of the email messages these days should not be UUEncoded. Therefore, the fact that an email is UUEncoded signifies that the message can come from a spammer.

Foreign Characters

This rules checks for non-English characters in the email message. If you only expect your messages to be in English, turning on this rule can eliminate emails that are sent in different languages.

Content Filtering is applied to:

Page 13: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

9

Subject

Sender

Message Body

Header

Custom Filters

Custom filters allow you to extend the capability of the Rules Engine by writing your own rules tailor to your needs. Out of the box, Spam Marshall has pre-configured Custom Rules incorporated to evaluate emails. While most users will never have to write a custom rule, the capability exists in the software which can greatly enhance customization to suit your own needs. These rules can be used to:

Block specific types of adult content

Block a email-borne virus and attachments

Block large email messages to prevent excessive bandwidth usage.

The purpose of having custom filters is to capture emails that would normally get through via normal parsing. Spam messages may contain dynamically downloaded content like images text or an image link. An email may contain only a URL link to point to an image on a remote website. When a user opens the mail the image appears to be in the email itself. These links usually point to porn or other unwanted websites. Spam Marshall recognizes this and assigns a score to this type of mail towards final evaluation.

End User Access to Quarantined Spam Emails

No Spam filtering mechanism is perfect no matter how well designed. Spam Marshall addresses the need of the end users to make sure they have not lost any good email because of false positives in identifying Spam. Administration can empower the user to check their personal quarantined Spam messages by creating a secure login account. Users can use the browser to connect to their individual email accounts and look at the messages blocked by Spam Marshall. They can also restore any message by a single click of the mouse. Interface also provides detail graphical report of email received (Spam and non-Spam) per individual user account.

Page 14: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

10

Installation Instructions This manual includes installation instructions for Spam Marshall SpamWall. Please refer to the appropriate section that applies to your environment

Installing Spam Marshall SpamWall Edition

Hardware Requirements

• Minimum Hardware requirements are as follows:

• 500 MHz processor

• 512 MB RAM

• 500MB drive

Configuration may vary based on the message load per server.

Software Requirements

• One of the following operating systems

o Windows NT, 2K, 2003, XP

o Linux

o Solaris 7 or above

o AIX

o HP-UX

• Any SMTP compliant email server

Tip: Spamwall Edition is an SMTP proxy that filters every email message before it goes to your actual SMTP server

Page 15: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

11

Two Spam Marshall Installation Scenarios

Regardless of which operating system or email server you are running, Spam Marshall SpamWall can be installed in two ways.

1. Install SpamMarshall on the same server as your existing email server

2. Install Spam Marshall on its own dedicated machine

Both methods work equally well but implemented slightly differently. Spam Marshall on a separate dedicated server does not require making any changes to the current SMTP port 25 of your email server. Installing Spam Marshall on the same server as your current email server requires changing the current port of the SMTP server to something other then 25 (we recommend changing to 2500)-this is normally a very straight forward and instructions are provided here for most major email servers such as various versions of MS Exchange.

In either installation scenarios, SpamMarshall SpamWall acts as an SMTP Proxy server. It actively sits in between your email server and the client sending an email to your server. This watchdog method allows your email server to behave normally to all client requests. SpamMarshall watches and monitors the email for Spam and ends the conversation with the client immediately upon Spam detection without your server every receiving Spam. This frees up your mail server resources from not having to process possibly thousands of Spam messages per day. This method also has a subtle but useful advantage unlike most other anti-Spam solutions that act as full SMTP Server and forward your email to your mail server. With an ani-spam solution that acts as a full SMTP server, your remote end users may possibly not able to successfully use your mail server to send out emails without opening up the mail server for relay.

Page 16: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

12

Installing Spam Marshall on same Server as your current Email Server

Installing Spam Marshall on a separate dedicated Server

Page 17: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

13

Installing Spam Marshall on a dedicated machine

Steps for installing Spam Marshall involves:

Run Spam Marshall setup.exe on a machine capable of hosting Spam Marshall

In Spam Marshall Control Panel specify IP address of your Email Server (usually the internal private ip address of your email server on your network and not the public ip)

Change your Network Firewall to redirect port 25 to the Spam Marshall Server ip address instead of the current mail Server ip address

Important Notes: Before installing Spam Marshall, please make sure no other program is installed on this box that uses port 25, Spam Marshall will use this port (in Windows use netstat –an from a command prompt to find out before installing Spam Marshall)

Detail Step-by-Step instructions given below.

Installing Spam Marshall on your current Email Server

Steps for installing Spam Marshall involves:

Change the SMTP port of your current Email Server to something other then 25 (we recommend 2500)

Run Spam Marshall setup.exe on your current Email Server

In Spam Marshall Control Panel specify the new SMTP port of your Email Server (2500 or any other you had configured in the above step). In Spam Marshall Control Panel specify IP address of your Email Server (usually the internal private ip address of your email server on your network and not the public ip)

Note: Since you are installing Spam Marshall on the same machine as your current Email Server, no need to make any changes to your Network Firewall rules for ip address or ports)

Detail Step-by-Step instructions given below.

Page 18: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

14

Installation and Setup

Please double click on the installation file after downloading it from our website http://www.spammarshall.com

Download and double Click on the installer file and follow the setup wizard::

1. Click on Next after reading through the introduction

2. Specify the folder to install in or leave the default and click Next.

Page 19: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

15

3. Specify where you would like to create Spam Marshall Program icons and click Next.

4. Verify the settings and click on the Install button.

5. Click on Done to complete the installation.

Page 20: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

16

Post Setup Installation Steps

Spam Marshall has two interfaces that allow you perform management and monitoring functions after installation.

1. Spam Marshall Control Panel

2. Spam Marshall Admin Console

.

Spam Marshall Control Panel

Control Panel provides the following functions:

Server Status Allows you to stop or start Spam Marshall Service manually

Live Monitor Allows you to monitor emails as they are coming in to your network

Server Config Allows you to set IP address and ports of internal Email Server

Live Update Allows you to receive software upgrades and enhancements

Server Log This window displays error, warnings or error logs.

Page 21: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

17

Spam Marshall Admin Console

All your Spam Marshall management tasks such as viewing reports, setting filter thresholds, viewing and restoring messages, and many other tasks are performed using this browser based interface. After Spam Marshall is setup successfully you will mostly use this interface to monitor Spams on your network.

Verify Installation

To verify that the Spam Marshall software is installed successfully, check for:

Start > Programs > Spam Marshall > Control Panel

A green light in control panel indicates that Spam Marshall is running

Page 22: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

18

Configuration Steps:

You will need to provide some basic information to Spam Marshall after running setup.exe before it will start to process your emails. These steps outline how to do this. Please follow sections that are appropriate for your type of installation.

Spam Marshall installed on same Server as your Email Server

IMPORTANT NOTE: Since you are installing Spam Marshall on the same server as your current email server, your current SMTP server is listening on port 25. You need to change this so that your SMTP server listens on any other port beside 25 (we recommend changing it to listen on port 2500). Spam Marshall will then handle traffic coming in on port 25 and forward it to your mail server on this new port. Please refer to Appendix in the back of this guide that shows you how to change SMTP port for

Page 23: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

19

some of the major Email Servers such as various versions of MS Exchange. The steps below assume you have already changed the SMTP port of your mail server to 2500.

Bring up Spam Marshall Control Panel if not already running

Click on Start – >Programs – >Spam Marshall – >Control Panel

Click on Server Config Tab

Next to Host name of your corporate SMTP server, enter the internal private ip address of your email server. Next to the TCP/IP port of your corporate SMTP server enter the port number your SMTP server is listening on, in the example above SMTP port on your mail server is 2500.

Click on Save and then on OK after the save completion message.

Click on Server Status Tab

Click on Stop and wait for traffic light to change red

Page 24: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

20

Click on Start and wait for traffic light to change green.

Spam Marshall installed on a separate dedicated Server

If you are installing Spam Marshall on a separate dedicated server, you don’t need to make any changes to your current mail server settings. However; you will need to change your network firewall to forward all SMTP traffic (port 25) to Spam Marshall server ip address after completion of the following steps.

Bring up Spam Marshall Control Panel if not already running

Click on Start – >Programs – >Spam Marshall – >Control Panel

Click on Server Config Tab

Page 25: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

21

Next to Host name of your corporate SMTP server, enter the internal private ip address of your email server.

Click on Save and then on OK after the save completion message.

Click on Server Status Tab

Click on Stop and wait for traffic light to change red

Click on Start and wait for traffic light to change green.

Steps for License Activation

Before Spam Marshall can process your emails, you must activate your copy by providing a serial key, which is usually sent to you via email upon download to the email address you specified. If you don’t have your serial key please visit Spam Marshall’s web site and apply for one by visiting www.SpamMarshall.com download section or call Spam Marshall Sales to obtain one. You need to be connected to the internet to activate your license.

License activation is normally performed upon first time launching of Spam Marshall Admin Console. Admin Console is a browser-based interface used for managing Spam Marsshall and can be brought up in one of three ways.

1. Select the Admin Console icon from the Start Menu

2. Typing the URL in your browser from Spam Marshall Server or a remote machine

3. Ex:: http://host_name_or_ip_of_spam_marshall_server:7860

4. Click on the Admin Console icon in Control Panel

Please note that you do need to type in http:// in your browser along with the hostname.

• Enter your serial key. IMPORTANT: You must be connected to the Internet in order for activation to work.

• If you use a proxy server to go out to the net, provide necessary values.

• Click the proceed button for activation to complete

Page 26: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

22

Review and Verify Configuration

Once activation is complete, a window will display current configuration settings for your review. Specifically pay attention to the Corporate Email Server Host IP and Corporate Email Server Port to make sure they are correct. Other options are explained later here in the user guide so you don’t need to specify anything else here for now. Click on Save in the bottom to proceed..

Note:

Default Username for Admin Console: Admin

Default Password for Admin Console: letmein

Page 27: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

23

You can review settings and also use a built in test tools to make sure Spam Marshall is setup correctly and is ready to process emails.

Check Your Settings

To use the Diagnostic Tool, click on proceed or in Admin Console click on Modify Configuration and select Diagnostic Tools

Page 28: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

24

Type in the domain name of your company and click on proceed.

If any of the checks failed, refer to the bottom of the window for help in solving your issue.

Once you have made necessary changes to your configuration you are ready to test the installation. Follow the steps below.

• Start Spam Marshall’s Control Panel and click on Live Monitor, which allows you to monitor emails as they come in.

Page 29: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

25

• Open any email client, such as MS Outlook or Netscape Messenger.

• Specify the host name of the machine where Spam Marshall is installed as the SMTP server and send a test message to yourself or a colleague

• You should see that message pop up in Spam Marshall’s Control Panel.

Troubleshooting

If your test fails:

Page 30: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

26

• Make sure you have activated Spam Marshall with the provided serial key. When serial key is not provided, the Rules Engine will bypass all emails.

• Make sure that the IP address of your real email server is correct and server is running. If you are running Spam Marshall on the same server as Email server make sure you have changed the SMTP port on your mail server and specified it in Spam Marshall.

• Check for error messages in the SpamMarshall.log file under the logs directory

Optional Spam Marshall Configuration

The above configurations were required in order for Spam Marshall to work properly. There are some additional options you may want to consider for Spam Marshall that will help you in your implementation. Bring up the Modify Configuration link from the Admin Console:

Page 31: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

27

DNS Server: This is the IP address of DNS server used to resolve domain names.

If no value is specified here, Spam Marshall will automatically use the DNS settings of the operating system to resolve names. In most cases leave this area blank.

Archive After: Every email that arrives to be processed by Spam Marshall is stored on local drive. In order to free up space and allow quick searches, older emails can be archived by specifying a number here. This is the number of days after which Spam Marshall will archive your emails once they arrive on the server. Default setting is three days. Note: Emails are stored in the SpamMarshall folder labeled SpamEmails, GoodEmails, and PossibleSpams. An archive subfolder folder is created under each

Page 32: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

28

category to store emails. Every email that is archived gets saved in the archive folder compressed into a zip file.

Administrator's email: Type an email address here that should receive a daily Spam report (must also check the next option below), Email address here also gets an email notification alert whenever an error occurs in the system. You may type multiple addresses here each separated by a comma only.

Email Status Report: If selected, the system will send a status report to the administrator with a summary of emails received in the past 24 hours. Here is an example of a daily Spam Messages Report processed by Spam Marshall in last 24 hours and sent to you automatically.

Page 33: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

29

Page 34: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

30

Send SMTP error for Spam Messages: Normally if an email is processed by Spam Marshall which is a Spam, it will be blocked and quarantined or deleted by Spam Marshall without notifying the sender or the intended recipient. If this option is enabled Spam Marshall will send an error message to the sender saying their message is blocked by your mail server because it was considered a Spam. This helps in letting a legitimate sender know that their email was not delivered to the recipient due to the company policy.

The exact error message they receive is: “552 your e-mail is considered Spam and does not comply with our company policies. “

This option is OFF by default. Enable it by checking this option here.

Challenge/Response:

Enable this option if you want to send an email to the sender of a message considered Spam and blocked by Spam Marshall.

If the recipient responds to the email sent by Spam Marshall, the quarantined message will automatically be restored. Assumption is that Spammers don’t normally respond to emails since most Spam is sent using machines and not sent individuals. If you get a response back this means an individual sent the email and therefore it must be ok to process as non-Spam (even though it may contain Spam contents) and get forwarded to the recipient without getting blocked. Email is temporarily held up until a response is received from the sender otherwise it does not get forwarded to the recipient.

This option is off by default. You can enable this option by checking the box here.

Challenge/Response Email: This must be a valid email address in your company, which is required by the challenge/response mechanism to route the message back to your server. IMPORTANT: This address will only be used for routing - no emails will be sent to this address. We recommend that you use the postmaster account for this

Example: [email protected]

Challenge/Response Threshold: Challenge/response email will only be sent if the email score is below this number. For example if an email Spam score is close to the Spam threshold you may want to invoke this option and if the Spam Score is very large then you maybe certain that this was sent by a Spammer and just block the message without invoking Challenge/Response by Spam Marshall.

Page 35: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

31

Spam Wall SMTP Port

This is the port Spam Marshall uses to intercept incoming email messages.

It is recommended that you use 25 for this value, which is the standard SMTP port

Corporate Email Server Host:

This is host name/IP Address of your corporate email server, which is responsible for your company's emails

Corporate Email Server Port:

This is the TCP/IP port number of your corporate email server.

IMPORTANT: If you are running Spam Marshall on the same machine as your primary email server, you MUST run your primary server on a different port such as 2500, see installation instructions above.

Web Server Port:

This is the TCP/IP port on which the Admin Console listen for incoming HTTP connections. Default is 7860 but you may change it here and by clicking on Save option.

Page 36: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

32

Configuring Spam Marshall for MS Exchange

If you decide to use Spam Marshall for MS Exchange, you get the option of integrating Spam Marshall with Microsoft Active Directory for user authentication.

By default Microsoft Exchange Server accepts emails for valid as well as invalid users as long as the domain name is valid. If an email is sent to a non-existing user on your domain, that email eventually ends up in the badmail directory configured for MS Exchange. This causes high resource utilization on your Exchange and in some cases even crash your server. Large number of NDRs end up in badmail using up disk resources until cleaned manually. Hackers have used this method to crash Exchange servers.

In order to avoid getting emails for invalid users, Spam Marshall checks the existence of a user with Microsoft Active Directory. If the user is not found on the server Spam Marshall will automatically consider that as junk and quarantine that message. Email is never sent to your Exchange server for processing saving valuable resources.

Use the Modify Configuration link in Admin Console to specify settings for Microsoft Active Directory. The following table defines each field in this category.

Table 1

Field Name Description

Enable AD Lookup If check Spam Marshall will refer to MS Active Directory to very a user. If this is not checked, all other fields will be ignored.

AD Domain Controller This is the IP address or host name of the machine which is running your Active Directory. In a typical installation, this is the same machine where your MS Exchange is running

AD Domain Name This is the root name for your Active Directory.

User Name This is an NT user that is used to query Active Directory. This user MUST have enough privileges to perform such query. An example is Administrator

Password Password of the user

Page 37: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

33

If you are running MS Exchange 5.5 on Windows NT please make sure to check the appropriate option above which says “Exchange version is 5.5”. Even though NT does not use Active Directory, Exchange 5.5 still uses LDAP to store user information.

Page 38: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

I N T R O D U C T I O N

34

Page 39: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

Getting Started

This section describes how to configure and customize Spam Marshall. Spam Marshall starts to work immediately after you install it and in most cases you will need minimum configuration for your business environment to eliminate Spam. However; to take full advantage of Spam Marshall’s powerful tools for managing, reporting, and monitoring Spam, you should read this guide.

Spam Marshall Control Panel: To access the Spam Marshall Control Panel Console:

Click Start > Programs > Spam Marshall > Control Panel

The Control Panel is the Spam Marshall Console that allows you to:

Chapter

2

Page 40: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

36

Server Status

Displays Spam Marshall Service status. You can also stop or start Spam Marshall Service using this tab.

Page 41: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

37

Live Montior

Monitor live messages and stats as Spam Marshall Rule is processing them in Real-Time:

This window provides a visual display of real-time email processing by Spam Marshall. The graphic pie chart shows the status of emails processed in terms normal, Spam, or possible Spam messages.

This window also displays the Server status such as how long the server is up and running and the total amount of memory utilized by Spam Marshall.

Tip: Click on the check box next to Display Reason with Log to see score of each email

Page 42: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

38

Server Config

This window allows displaying and modifying current Server Configuration information. You may specify SMTP IP and Port number of your corporate Email Server, change the port for Spam Marshall Web Interface (Admin Console) here; the default is 7860 (http://hostname:7860).

Note: Click on Save button in the bottom and then you must stop and start Spam Marshall using Server Status tab to have changes go into effect.

Tip: The last option, Number of days to wait before archiving, enables you to specify how many days you would like to have emails available for searching or restoring.

Page 43: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

39

Live Update

This Control Panel Window provides a Live Update of Spam Marshall to keep it up to date with rules and version updates. You may specify proxy settings here in order to access the Internet if your company requires this configuration.

Page 44: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

40

Updating Spam Marshall Successfully

Bring up Spam Marshall Control Panel. In Windows:

Click on Start->Programs->Spam Marshall->Control Panel

Select Server Status tab and make sure Spam Marshall service is running.

Select Live Update button in control Panel.

Click on Check for updates button. You can also check the box “Automatically check for new updates” for automatic reception of updates in the future. We recommend that you manually perform update by clicking on Check for updates button after installing Spam Marshall for the first time. After successful completion of completion of updates download follow these steps in order:

1. Exit Control Panel by selecting Exit Control Panel button

2. Bring up Control Panel again

3. Select Server Status Tab (should be there already if just brought up Control Panel)

4. Click on Stop button

5. Wait until traffic light turns Red

6. Click on Start button

7. Wait until traffic light turns green

8. Click on Live Monitor and make sure in the bottom left hand of Control panel says: Spam Marshall server is RUNNING”

Check the top of the Spam Marshall Control Panel and you should see the version number and build of Spam Marshall you are currently running.

You should follow above steps after every time a new update is downloaded to successfully apply new updates to Spam Marshall.

Page 45: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

41

Spam Marshall Administration Console

Spam Marshall Administration Console is an easy to use browser based interface that allows you to locally or remotely manage Spam Marshall.

Admin Console allows you to perform following functions:

• Check Status of the Server

• View, create, or modify filter rules

• View/Restore Messages processed by Spam Marshall Server

• View Hourly, Daily, Monthly, or Yearly Graphical Reports

• View/Modify Spam Marshall System level configurations

• Change Admin console password

• Perform Users Administration (Create individual accounts for users to check their own individual Spam emails, Spam reports, and restore Spam messages)

Starting up Spam Marshall Administration Console

Note: Spam Marshall Service must be running in order to successfully start Admin Console.

You can open up the Admin Console in one of three ways:

1. Click on Start > Programs > SpamMarshall > Admin Console

2. Click on the Admin Console button in bottom of Spam Marshall Control Panel

3. Type in URL in a browser with the ip address or hostname of the server where Spam Marshall is installed along with the port number . Example: http://192.168.1.10:7860 or http://mail.myserver.com:7860

Admin Console allows you to manage Spam Marshall using a browser such as IE or Netscape

Page 46: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

42

When you connect to the Admin Console using one of the above methods, the first screen will prompt you for the User name and Password:

In order to login for the first time, please specify as follows:

Default User Name and Password

User Name: Admin

Password: letmein

If you are performing login for the very first time, next screen will prompt you to change the default password. Please change the default password. We recommend

Page 47: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

43

using a combination of letters and numbers in the password along with upper and lowercase characters. A password of more than six characters highly recommended.

Click on Change button after typing in the new password.

Page 48: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

44

After logging in successfully, the next screen will bring up the Administrative and Reporting Tools in your browser.

Administration Console Options

Check Status

This option allows you to find out the current configuration and status of the server. It also displays a log of any Administrative type of activity on the server performed by the Admin account.

Page 49: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

45

Page 50: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

46

View/Modify Filters

This is the heart of the Spam Marshall Rule Engine. Most of the Administrative activities such as viewing, modifying, or creating new filter rules will be done using this option. Please read through this section to fully understand and take advantage of the capabilities of Spam Marshall Server Spam filtering.

Page 51: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

47

Spam Score Threshold:

Every email consists of at least three parts, a domain name, IP address, and the actual content of the email. Spam Marshall checks and verifies all three parts to help determine if email received is sent by a normal user or a Spammer. In order to reduce and eliminate false positive results, instead of relying on just one of the three items to classify an email as a valid email or a Spam, Spam Marshall’s Rule Engine can use the combination of all three parts of the email to determine if an email is Spam. Each part is assigned a score here, and if any of the score reaches the Score Threshold, the email is considered Spam.

As an example, using default Score Threshold of 100 in the screen above, the Domain name of the sender’s email was determined to be invalid. In this case the Spam Rule Engine assigns a score of 20, for an invalid Domain name MX record, to this part of the email. Upon further analysis, the IP address verification by Spam Marshall Rule Engine, it is found that the IP address is of a known Spammer, and the email is assigned a score of 60 based on the value here. This adds up to 80 but the email score still has not reached the Score Threshold of 100 and could be delivered to the Inbox. During the third part of the same email analysis, the content of the mail is checked. If the analysis of the mail is found to have Spam content, it will be further assigned a score based on the score set in a View/Edit Content Words area. Let’s say the mail contained the Word “Lose Weight” and the Administrator had set the score of this to be 30. The Content part of the email will get a score of 30. The Spam Marshall Rule Engine will assign a final score of 110 to the mail (60+20+30). Th Spam Marshall Rule Engine then looks at the Score Threshold value and compares this with the final score of the mail (100 vs. 110). If the final mail score is equal to or higher then the Threshold value the mail will be considered Spam and will not be delivered to the end user Inbox. Instead, it will be sent to the deleted mail folder on the Spam Marshall Server instead.

In another example, let’s say as a company policy you decide that any mail that is coming from an invalid domain will be discarded immediately as a Spam, regardless of the content or the ip address of the mail. In this case you will change the Domain Validation Weight to 100. Each time email is found to have an invalid Domain name, it will be assigned a score of 100 and since this matches the Score Threshold of 100, it will be discarded as a Spam.

Spam Marshall is designed to allow Administrators full control over how an email is handled. The settings here allow tremendous flexibility and power over every email that enters your network. The next few sections will demonstrate this further. You will learn how to customize and use Spam Marshall to control the flow of Spam emails into your Network.

Page 52: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

48

Possible Spam Score Threshold:

The value here determines how Spam Marshall categorizes an email that has some of the characteristics of a Spam mail but did not reach a final score that will make this email a Spam. For example, after analyzing the email domain name, ip address, and content, a final score of 80 was assigned. Since this did not reach the Score Threshold of 100, Spam Marshall Rule Engine next checks the Possible Spam Score Threshold, if this number falls between the Possible Spam Score Threshold Number and the Spam Score Threshold number; the email will be delivered to the user Inbox with the mail Subject heading modified with the line [POSSIBLE SPAM]. If you wish not to tag the mail Subject with these words, you can set the Possible Spam Score Threshold to the same number as Spam Score Threshold.

Domain Validation Weight:

This value here determines the score assigned to an email if it is found to have an invalid Domain name. Spam Marshall checks the DNS MX record that every properly configured mail server should have on the Internet. If no MX record is found, a value found here is assigned toward the final score for the mail. The higher the value, the more significance it has in determining if the mail is a Spam. Spammers often use Open Relay Servers to send out Spam. This makes it hard for someone to track down the Spammer. If a Spammer is using an Open Relay Server, most likely the domain names they are using have no MX record.

Bypass authenticated Users:

This option is used to prevent Spam Marshall from processing emails for users that successfully authenticate with the SMTP server. This can be used in a scenario where you want to bypass processing of outgoing emails from your SMTP server as oppose to incoming emails.

Page 53: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

49

Deletion Threshold:

Emails processed by Spam Marshall are quarantined by Spam Marshall and then archived depending on the option specified for the number of days. This number here tells Spam Marshall to delete the email immediately rather then to keep it and take up valuable disk resource if the Spam score of the email was equal to or larger then the number you specify here. By default for example if an email is a blatant Spam and receives a score of more then 500 (beyond a shadow of doubt so to speak) then you want to delete this email immediately rather then to keep and archive later.

IP Verification / RBL (Real-time Black Hole Lists):

Spam Marshall checks the incoming email IP address against Real -Time Black List (RBL) databases. The database contains lists of known spammers and is continuously updated as information becomes available. This service is provided by organizations on the Internet concerned about Spam. Spam Marshall is pre-configured to use some of the most popular of these RBL services, such as ORB and SpamCop. Spam Marshall allows you to pick and choose which services you would like to use and also has gives you the ability to add your own services as they become available.

Each of these services can be assigned a different a value or a score based on how reliable and accurate the databases they offer are. The more reliable the RBL is, the higher the score should be assigned to the service.

Adaptive Filters

Spam Marshal has the ability to use filters that are self-learning. One is Bayesian filter and the other is Auto-learn sender.

Bayesian Filter

Bayesian filters are widely acclaimed to be the best way to tackle Spam because they use statistical intelligence to analyze the content of the mail. Spam Marshall implements this technology at server level in a reliable and effective manner. Bayesian filtering detects Spam based on message content. Rather than just checking for keywords, Spam Marshall Bayesian filter takes the whole message into consideration. Bayesian filtering is based on the mathematical principle that most events are dependent and that the probability of an event occurring in the

Page 54: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

50

future can be inferred from the previous occurrences of that event – the same concept is used to identify new Spam messages based on the content of past Spam messages. In short, Bayesian filtering has the following advantages:

• Looks at the whole message • Adapts itself over time • Is sensitive/adapts to the company/user • Uses artificial intelligence • Hard to trick.

Click on View/Edit next to Bayesian Analysis to view settings. In most cases you will not need to change anything here and we recommend that you use the default settings as configured.

A Window opens up that shows you the settings:

Page 55: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

51

Bayesian Score: Spam score assigned to email. This score works two ways, an email considered Spam received a positive number specified here and an email considered good (ham) received negative number specified here.

Status: Before a Bayesian analysis filter can filter email correctly for Spam vs. non-Spam, it needs to learn the difference between the two based on the emails you receive you consider good and those you consider Spam on your server. After it has gather enough information analyzing emails then and only then it can guess correctly about an

Page 56: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

52

email being a Spam or a ham. This option allows you to put Bayesian analysis in learning mode, enabled mode, or disable mode. When a learning curve value explained below is reached Spam Marshall automatically enables Bayesian filter. You should not need to change anything here unless you want to disable Bayesian analysis.

Interesting Word Count:

A Bayesian filter extracts a few words from every message for analysis. These words usually have a high probability of either being spammy or hammy. Do NOT change this value if you are not sure how Bayesian works.

Repeat Count:

This variable defines the maximum number of times a word should be counted if it appears more than one in an email.

Minimum Length:

Minimum Length of the word to be considered for analysis

Learning Curve:

Spam Marshall will analyze this many emails before automatically enabling Bayesian filtering.

Auto-learn sender

When an authenticated sender sends an email out to a recipient, this recipient is recognized to be a legitimate person. This email address of the recipient is automatically put in the White list. Any response or messages from this email account is considered non-Spam and always allowed in.

If you want to use this option enable it by checking the box.:

Enable Auto-Learn sender Whitelist

Page 57: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

53

Page 58: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

54

Content Filtering:

This area of Spam Marshall is the heart of the email content filtering option and the most powerful tool any Administrator can have at their disposal for filtering out emails based on the content of the message. Knowing how to use this, you can use this to block any emails you consider Spam or to block out email borne viruses even before new virus definitions created by your anti-Spam vendor. It is uniquely designed to be easy to use yet powerful enough to provide total control over what email content should be allowed in or kept out of your network.

Page 59: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

55

Types of content filters

There are six types of Content Filters

o Attachment

o Sender

Page 60: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

56

o Subject

o Body

o Header

o Custom Filter

Attachment: This type of filter checks for file attachments in email messages, allowing you to block certain extension or file names

Page 61: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

57

Click on View/Edit to bring up Attachment filter windows

Click on “Add new attachment filter” to add a new filter

Click on Edit/Delete to modify or delete a Attachment filter

Page 62: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

58

Page 63: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

59

Sender: Email would typically contain the sender info such as From: MJones@xyz. If you wish to block any email coming from this person you would add a Sender entry and possibly assign a value that would be equal to or larger then the Spam Score Threshold (100 in our example).

Click on View/Edit under Action Column next to Sender filter.

Click on Add new sender filter and fill out the information:

Page 64: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

60

Page 65: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

61

To allows all Emails from a particular Sender by email address:

If you wanted to always allow emails in from a particular user (such as [email protected]), you could assign a score that would make the outcome of the final email score to be always less then the Spam Score Threshold Value. In this example below, we decide to create a value of -1000 here. This would assure that the overall email score would never reach 100.

Note: Negative number is used to decrease Spam Score, a large number such as –2500 assures this will never reach the Spam threshold and therefore always allowed in.

Page 66: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

62

Subject Filter:

Every email has a subject line that can be used to determine if it is or isn’t Spam. As an example, let’s say you don’t want to allow any emails containing a subject line “loose 40 lbs in 30 days”.

Click on View/Edit under Action Column next to Subject Filter

Click on add new Subject Filter

Page 67: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

63

Fill in the information as required and add a value you would like to assign to this. Please remember that the value you assign is counted towards the final Spam Score Threshold Value, the closer the value is to the Threshold, and the more likely it will be classified as a Spam.

Page 68: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

64

Body Filters

Similar to the way you filter out unwanted email based on the sender or the subject, you can edit, add, or remove words to customize the filtering of mail content itself known as the body of the email.

Page 69: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

65

Click on View/Edit in Action column next to the Body Filter on the Modify Filter screen.

Click on Add new body filter

.Enter the information as required and assign a value as

Page 70: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

66

Preprocessing VS Post processing

Spam Marshall processes the body of email a bit differently. Parsing is done in two passes. Many email programs, such as Outlook and Netscape use HTML tags for formatting the actual message. HTML is not very strict as for as tag rules are concerned, therefore, it can be easily used by spammers to hide the actual content. Lets take an example

<html> <body><h1> V<asdf>i<asdf>a<asdf>g<asfd>r<asfd>a </h1></body> </html>

Page 71: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

67

The above script is a valid HTML page, even if it contains invalid tag <asdf> - most HTML readers will simply ignore the tag and will display a page similar to the following screen.

Since the word “Viagra” never appear in the HTML script, many spam filtering software will not be able to detect this as a spam message. Spam Marshall avoids being tricked by filtering the message in two pass:

Pass 1 – Searches for Viagra with any modification to the original message

Pass 2 – Searches for Viagra after removing all HTML tags.

Pass 1 is called the Preprocessor stage and Pass 2 is called the post processor stage.

Tip: Use Post Processing stage for finding normal words. Use web URLs in the preprocessor stage

Page 72: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

68

Header Filter

If you would like to block or allow an email based on the header content of an email, you can configure it like you did base on Subject, Sender or Body filters as shown above.

Page 73: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

69

Custom Filters:

Custom Filters allow you to extend the capability of Spam Marshall programmatically. Spam Marshall gives you the flexibility to implement email policy suited to your own organization. For example, you can use your own IT resources to develop a custom interface to Spam Marshall that will tag an email with your own company disclaimer message. Custom filters extend the capability of Spam Marshall unlike any other anti-Spam solutions currently available. Please visit our online support area on our website for more information for writing custom filters.

Page 74: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

70

IP Filtering

Email messages can be always blocked or allowed in based on the IP address the message is received from. Black listed IP Addresses

White List/ Black List

Black or White Listed is an easy way to allow or block an email based on the known IP address found in the mail header. For example, any email originating from the mail server of your parent company should not be blocked. Simply add the IP address of the mail server of your parent company in the White Listed IP Addresses by clicking on View/Edit and then add the IP address as in the figure below:

Page 75: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

71

If you wanted to block any email from your competitor coming into your network, you would add the IP address of the mail server of your competitor in the Black Listed IP addresses area of Spam Marshall.

You could also block or allow an entire subnet by typing in the first few octets of the subnet and then leaving the last ones blank after a period. For example, you wanted to block an entire IP subnet of 192.168.1.0, you would enter 192.168.1. in the list of white or black listed IP addresses.

Page 76: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

72

Real-Time Black List or RBL Servers

Spam Marshall can use RBL databases to block Spam emails. RBL are services run by organizations concerned about Spam. They provide the ability to check the IP address of the SMTP server sending out an email to see if it is of a site sending out Spam. There are various RBL lists available on the Internet for free and Spam Marshall provides the ability to use most of the major ones and to also specify your own choice. By default Spam Marshall is configured to use two of these well known service. We recommend that you use these default services to begin with. More RBL services you use, more time is consumed to process emails.

Click on View/Edit next to RBL Server to view options

ORDB and SpamCop RBL Services are used by default.

Page 77: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

73

Page 78: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

74

Message Actions

Spam Marshall allows you to take different actions based on the category of the email message.

This screen defines how messages in different category (Spam, Possible Spam, and Good messages) are handled. You can assign one of the following actions to all three categories:

Page 79: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

75

Action Description

No Action This literally means “No action”. If you select this action emails will be passed as-is

Change subject The subject of the email message will be changed. One of the following strings will be appended to the beginning of the subject.

[Good] - If the email category is good

[PossibleSpam] - If the email category is possible spam

[Spam] - If the email category is spam

This is the default action for possible spam messages.

Change subject and forward

Besides changing the subject as described above, this action can forward the email to one or more recipients separated by a comma. This is a useful feature if an administrator wants to closely watch which emails are being quarantined. The original message will be attached to the message.

Quarantine Email will be quarantined. All emails that are quarantined are kept on the local hard drive for a specified number of days and then archived into zipped files

This is the default action of Spam Messages

Quarantine and forward This action will quarantine as well as forward the message.

Specify an email account if you want messages to be forwarded. For example, if you want to send all Spam messages to an email account [email protected], specify the email account here.

Page 80: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

76

Page 81: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

77

Filter Operators

Filter operators allow you to specify precisely what content you are looking for in the email and assign a value based on the criteria you choose. Spam Marshall allows various operators to use in filtering put contents in detecting Spam.

Page 82: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

78

Contains Word

If you choose this operator, you are asking Spam Marshall to find the string you specify as a word in the content as opposed to characters. For example, you want Spam Marshall to find the instance of the word sex and assign a value to it. Spam Marshall would assign only a value to the content of the message where the word sex appears by itself as opposed to assigning a value to the instance of anything that contains the letters sex together, ie. as in the word “essex.”

Contains

If you choose the operator Contains, in the example given above Spam Marshall would assign a value to any instance of the letters together that contain the letters sex together in sequence. In the content of the email a sentence such as “a sex study in essex collage of arts”, Spam Marshall would assign a value to sex twice since it appears as a word and also appears in the name essex as well.

Equals

Use this operator when you want to match the case of the letters as they would exactly appear in the message. For example, if you are looking for a string “FREE”, selecting the Equal operator will only look for and assign a value to an instance where free appears capital letters. Other instances of the word, such as “Free” or “free,” will be ignored.

Starts With / End With

Use this operator when you have to assign value for a text that appears in the Subject. This operator is useful for catching email borne viruses; they often contain a signature that could be used to block viruses in the Subject line. For example, some emails that were sent by the SoBig virus contained the text “Re: Wicked screensaver” You want to look for this in the subject line of the message and assign a value to it by specifying the string as Start With “Re: Wicked screensaver”

Page 83: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

79

Ends With

Similar to Starts With above except with this operator you want to look for a text in the Subject that ends with the instance of a particular item you need to assign a value to.

Does not contain

As the operator name implies, if you use this operator you are asking Spam Marshall to look for a string that is not found in the message. For example, every mail that you allow in should contain a keyword such as a disclaimer message in the bottom of every email. If company policy requires a disclaimer for all inbound messages, you can use this operator to look for the word “Disclaimer.” If the message does not contain “Disclaimer,” you can assign a value to it toward the message score.

Is blank

Use this operator for the Subject Filter. If the Subject line is blank you may want to assign a value to reject the message as some spammers send out emails with blank subject lines.

Regular Expressions

Concept

A regular expression is a text pattern consisting of a combination of alphanumeric characters and special characters known as metacharacters. A close relative is in fact the wildcard expression which are often used in file managements. The pattern is used to match against text strings. The result of a match is either successful or not--however when a match is successful, not all of the patterns must match. This provides a way to catch message content when Spammers deliberately misspell words or write them in an attempt to fool filtering software.

Usage

System administrators can use Regular expressions to search through text not normally possible with simple word or character matching.

Page 84: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

80

Quantifiers

The contents of an expression are a combination of alphanumeric characters and metacharacters. An alphanumeric character is either a letter from the alphabet:

abc

or a number:

123

In the world of regular expressions any character that is not a metacharacter will match itself (often called literal characters). However many times you're mostly concerned with the alphanumeric characters. A very special character is the backslash \, as this turns any metacharacters into literal characters, and alphanumeric characters into a sort of metacharacter or sequence. The metacharacters are:

\ | ( ) [ { ^ $ * + ? . < >

With that said normal characters don't sound too interesting so let's jump to our very first metacharacters.

The punctuation mark, or dot, “.” needs explaining first since it often leads to confusion. This character will not, as many might think, match the punctuation in a line. It is instead a special metacharacter which matches any character. Using this where you wanted to find the end of the line, or the decimal in a floating number, will lead to strange results. As explained above, you need to add a backslash to it to get the literal meaning. For instance this expression:

1.23

will match the number 1.23 in a text as you might have guessed, but it will also match these next lines:

1x23

1 23

1-23

To make the expression only match the floating number we change it to:

1\.23

Remember this, it's very important. Now with that said we can get the show going.

Page 85: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

81

Two heavily recurring metacharacters are:

* and +

They are called quantifiers and tell the engine to look for several occurrences of a character. The quantifier always precedes the character at hand. The * character matches zero or more occurrences of the character in a row, the + character is similar but matches one or more.

So if you decided to find words which had the character c in it you might be tempted to write:

c*

What might come as a surprise to you is that you will find an enormous amount of matches, even words with no c in them will match. This happens because the * character matches zero or more characters, and that's exactly what you matched, zero characters.

In regular expressions you have the possibility to match what is called the empty string, which is simply a string with zero size. This empty string can actually be found in all texts. For instance the word:

go

contains three empty strings. They are contained at the position right before the g, in between the g and the o and after the o. And an empty string contains exactly one empty string. At first this might seem like a really silly thing to do, but you'll learn later on how this is used in more complex expressions.

So with this knowledge we might want to change our expression to:

c+

and voila we get only words with c in them.

The next metacharacter you'll learn is:

?

This simply tells the engine to either match the character or not (zero or one). For instance the expression:

cows?

Page 86: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

82

will match any of these lines:

cow

cows

These three metacharacters are simply a specialized scenario for the more generalized quantifier:

{n,m}

the n and m are respectively the minimum and maximum size for the quantifier. For instance:

{1,5}

means match one or up to five characters. You can also skip m to allow for infinite match:

{1,}

which matches one or more characters. This is exactly what the + character does. So now you see the connection, * is equal to {0,}, + is equal to {1,} and ? is equal to {0,1}.

The last thing you can do with the quantifier is to also skip the comma:

{5}

which means to match 5 characters, no more, no less.

Assertions

The next type of metacharacters is assertions. These will match if a given assertion is true. The first pair of assertions are:

^ and $

which match the beginning of the line, and the end of the line, respectively. Note that some regular expression implementations allows you to change their behavior so that they will instead match the beginning of the text and the end of the text. These assertions always match a zero length string, or in other words, they match a position. For instance, if you wrote this expression:

^The

Page 87: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

83

it would match any line which began with the word The.

The next assertion characters match at the beginning and end of a word; they are:

< and >

they come in handy when you want to match a word precisely. For instance:

cow

would match any of the following words:

cow

coward

cowage

cowboy

cowl

A small change to the expression:

<cow>

and you'll only match the word cow in the text.

One last thing to be said is that all literal characters are in fact assertions themselves. The difference between them and the ones above is that literal characters have a size. So for cleanliness sake we only use the word "assertions" for those that are zero-width.

Groups and Alternation

One thing you might have noticed when we explained quantifiers is that they only worked on the character to the left. Since this pretty much limits our expressions I'll explain other uses for quantifiers. Quantifiers can also be used on metacharacters; using them on assertions doesn’t work since assertions are zero-width and matching one, two, three or more of them doesn't do any good. However the grouping and sequence metacharacters are perfect for being quantified. Let's first start with grouping.

You can form groups, or subexpressions as they are frequently called, by using the begin and end parenthesis characters:

( and )

Page 88: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

84

The ( starts the subexpression and the ) ends it. It is also possible to have one or more subexpressions inside a subexpression. The subexpression will match if the contents match. So mixing this with quantifiers and assertions you can do:

( ?ho)+

which matches all of the following lines:

ho

ho ho

ho ho ho

hohoho

Another use for subexpressions are to extract a portion of the match if it matches. This is often used in conjunction with sequences, which are discussed later.

You can also use the result of a subexpression for what is called a back reference. A back reference is given by using a backslashed digit, a single non-zero digit. This leaves you with nine back references (0 through 9). The back reference matches whatever the corresponding subexpression actually matched (except that {article_contents_1} matches a null character). To find the number of the subexpression, count the open parentheses from the left.

The uses for back references are somewhat limited, especially since you only have nine of them, but on some rare occasion you might need it. Note some regular expression implementations can use multi-digit numbers as long as they don't start with a 0.

Next are alternations, which allow you to match on any of many words. The alternation character is:

|

A sample usage is:

Bill|Linus|Steve|Larry

would match either Bill, Linus, Steve or Larry. Mixing this with subexpressions and quantifiers we can do:

cow(ard|age|boy|l)?

Page 89: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

85

which matches any of the following words but no others:

cow

coward

cowage

cowboy

cowl

I mentioned earlier that not all of the expression must match for the match to be successful. This can happen when you're using subexpressions together with alternations. For example:

((Donald|Dolly) Duck)|(Scrooge McDuck)

As you see only the left or right top subexpression will match, not both. This is sometimes handy when you want to run a complex pattern in one subexpression and if it fails try another one.

Sequences

Last we have sequences, which define sequences of characters which can match. Sometimes you don't want to match a word directly but rather something that resembles one. The sequence characters are:

[ and ]

Any characters put inside the sequence brackets are treated as literal characters, even metacharacters. The only special characters are - which denotes character ranges, and ^ which is used to negate a sequence. The sequence is somewhat similar to alternation; the similarity is that only one of the items listed will match. For instance:

[a-z]

will match any lowercase characters which are in the English alphabet (a to z). Another common sequence is:

[a-zA-Z0-9]

Which matches any lowercase or capital characters in the English alphabet as well as numbers. Sequences are also mixed with quantifiers and assertions to produce more elaborate searches. Example:

Page 90: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

86

<[a-zA-Z]+>

matches all whole words. This will match:

cow

Linus

regular

expression

but will not match:

200

x-files

C++

Now if you wanted to find anything but words, the expression:

[^a-zA-Z0-9]+

would find any sequences of characters which do not contain the English alphabet or any numbers.

Some implementations of regular expressions allow you to use shorthand versions for commonly used sequences, they are:

\d, a digit ([0-9])

\D, a non-digit ([^0-9])

\w, a word (alphanumeric) ([a-zA-Z0-9])

\W, a non-word ([^a-zA-Z0-9])

\s, a whitespace ([ \t\n\r\f])

\S, a non-whitespace ([^ \t\n\r\f])

Wildcards

For people who have some knowledge with wildcards (also known as file globs or file globbing), I'll give a brief explanation on how to convert them to regular expressions.

Page 91: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

87

After reading this article, you probably have seen the similarities with wildcards. For instance:

*.jpg

matches any text which end with .jpg. You can also specify brackets with characters, as in:

*.[ch]pp

matches any text which ends in .cpp or .hpp. Altogether very similar to regular expressions.

The * means match zero or more of anything in wildcards. As we learned, we do this is regular expression with the punctuation mark and the * quantifier. This gives:

.*

Also remember to convert any punctuation marks from wildcards to be backslashed.

The ? means match any character but do match something. This is exactly what the punctuation mark does.

Square brackets can be used untouched since they have the same meaning going from wildcards to regular expressions.

These leaves us with:

Replace any * characters with .*

Replace any ? characters with .

Leave square brackets as they are.

Replace any characters which are metacharacters with a backslashified version.

Examples

*.jpg

would be converted to:

.*\.jpg

Page 92: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

88

ez*.[ch]pp

would convert to:

ez.*\.[ch]pp

or alternatively:

ez.*\.(cpp|hpp)

Example Regular Expressions

To really get to know regular expressions here are some commonly used expressions on this page. Study them, experiment and try to understand exactly what they are doing.

Email validity: will only match email addresses which are valid, such as "[email protected]":

[a-z0-9_-]+(\.[a-z0-9_-]+)*@[a-z0-9_-]+(\.[a-z0-9_-]+)+

Email validity #2: matches email addresses with a name in front, like "John Doe <[email protected]>":

("?[a-zA-Z]+"?[ \t]*)+\<[a-z0-9_-]+(\.[a-z0-9_-]+)*@[a-z0-9_-]+(\.[a-z0-9_-]+)+\>

Protocol validity: matches web based protocols such as "htpp://", "ftp://" or "https://":

[a-z]+://

C/C++ includes: matches valid include statements in C/C++ files:

^#include[ \t]+[<"][^>"]+[">]

C++ end of line comments:

//.+$

C/C++ span line comments (it has one flaw, can you spot it?):

/\*[^*]*\*/

Page 93: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

G E T T I N G S T A R T E D

89

Floating point numbers: matches simple floating point numbers of the kind 1.2 and 0.5:

-?[0-9]+\.[0-9]+

Hexadecimal numbers: matches C/C++ style hex numbers, e.g. 0xcafebabe:

0x[0-9a-fA-F]+

Page 94: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam
Page 95: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

91

Reports Spam Marshall provides extensive web based easy to read real-time graphical reports to help administrators and management in managing and monitoring email for Spam. Reports allow administrators to see the effectiveness of the rules setup in Spam Marshall based on company policy and help them fine tune these rules to be more effective. Management reports provide an executive summary of the email entering into your network based on various categories. Various Reports are available:

Click on the View reports option in Admin Console to look at the real-time reports.

24-Hour Report

Weekly / Monthly / Yearly Reports

Top 20 Recipients

Top 50 Spammers by IP address

Email Score distribution

Content filtering rules for body

Content filtering rules for subject

Content filtering rules for sender

Custom filters

Chapter

3

Page 96: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

92

Click on the individual graph to see the close up view of the report and for additional details.

Page 97: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

93

Type of Reports

24-Hour Email Report

Shows you the emails processed by Spam Marshall that arrived for your email server on a hour-by-hour basis.

Page 98: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

94

Weekly/Monthly/Yearly Reports

Weekly Report

Page 99: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

95

Montly Report

Page 100: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

96

Yearyl Report

Page 101: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

97

Top 20 Spam Recipient Reports

Displays the top 20 recipients who received the most Spam on your email server in last 24 hours.

Page 102: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

98

Top 50 Spammers by IP Address

Shows the IP address of the Top Spam emails received. Click on Add in Black list to add IP permanently to Black List or click on the eyeglass icon on left to see the Whois lookup of IP address to trace the Spammer.

Page 103: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

99

Email Score Distribution

Provides Administrators the ability to quickly see which values they have setup for email scores got the most hits.

Page 104: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

100

Content Filtering for body

Provides detail of Spam words that were found in emails received that were assigned a score.

Page 105: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

101

Content Filtering for Subject

This report provides details of the top 10 Spam words found in the subject line of the email received.

Page 106: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

102

Content Filtering for Sender

This report shows the top 3 senders filter rules that had a match based on the emails received.

Page 107: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

103

Custom Filters

This report shows Top 4 custom filter rules in Spam Marshall that had a match in filtering mails.

Page 108: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

104

View Messages - Manage and Restore Messages

This window immediately tells you the total number of spam, possible spam or good messages received. You can click on View all next to each category to browse through all emails or you can use the search by email account or by content to search for specific messages. Search is limited by the number of days you have set for archive in configuration. For example if the option is set for messages to be archived after 3 days, any messages older then last 3 days will not be part of the search as they have been archived. Search can be also limited by selecting any category of emails (good, spam, possible spam) by selecting the appropriate option from the drop down list next to Message Type option. You can also restrict your search base by specifying email account or typing in the keyword to search for in email content. Note: you can also specify a regular expression in search by selecting the option in the bottom.

Spam Marshall allows you to easily view and restores email messages that arrived on the server and blocked by Spam Marshall for any reason. You do this by going into the View Messages area of Spam Marshall Admin console.

Click on View Messages in Spam Marshall Admin console.

Page 109: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

105

To search for messages by email address, type in the email address in Email address field, to search for messages by content type in the keyword you would like to search in the Search for box.

View and Restore Messages:

Page 110: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

106

When you are in view messages windows you have a few options you can perform:

Mark as Spam: Clicking on this tell Bayesian filter to treat this email as Spam in the future.

Mark as Good: Clicking on this tell Bayesian filter to treat this email as good in the future

Page 111: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

107

View: Clicking on this allows you to see message in its raw format with its header info etc.

Download: Opens up email message in the email client currently installed as default on your machine.

Reason: Allows you to see the score of the message. In case of message mark as Spam you will be able to see the reason for it in terms of its score assignment by various filters in Spam Marshall.

Restore: Clicking on this immediately releases the message from the Spam quarantine area and sends it back to the original recipient it was intended for.

Page 112: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

108

:

Viewing

Click on the View option to look at the mail in raw format as received by the server.

Downloading

Click on the download option to open the email in a client as a user would see it by clicking on the Open button or Click on save to save it to another .

Page 113: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

109

Page 114: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

110

Result of clicking on Open option for Download:

Reason:

Clicking on this option tells you why the mail was considered a Spam; it displays the Spam words along with the value assigned and the final score in the bottom. It also tells you the value assigned by RBL or custom filters as well. This also helps in fine tuning your Spam Marshall Filter rules for future mails.

Page 115: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

111

Restore

To send this email back to the intended recipient, simply click on the Restore option.

Page 116: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

112

Page 117: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

113

Spam Marshall Tools

These easy to use and convenient tools are additional built-in help for Administrators in managing and controlling their Email Servers effectively.

1. Spam Simulator

2. Diagnostic Check

3. Whois Lookup

4. Email Validator

5. DNS Lookup

Spam Simulator

Spam Simulator allows you to cut and paste email from an email client into this windows and determine how Spam Marshall will process this email. This helps in fine tuning Spam Marshall Filters to make it more effective and fail proof. Tip: You can go to the view option of a message and cut and paste the entire message here in this window. Click on Proceed button to process and see results from Spam Marshall Rule Engine.

Page 118: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

114

Page 119: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

115

Page 120: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

116

Spam Marshall Diagnostic Check

Use this to verify your email server and Spam Marshall settings are correct or need an adjustment. Help in the bottom of the results window tell you how to fix an issue if any problem found by Spam Marshall Diagnostic Tool

Page 121: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

117

Whois Lookup Tool

A Convenient way to discover Internet Domain name information such as the one used by a Spammer.

Page 122: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

118

Email Validator

This allows you to verify any email account to see if it is valid or not. Type in the email account you want to verify and click on proceed.

Page 123: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

119

DNS Lookup

This tool allows you to query the DNS server of the domain name specified and lookup information such as A or MX records. Type in the domain name and click on proceed, ex: SpamMarshall.com

Page 124: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

120

Page 125: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

121

Spam Marshall Intrusion Detection System (IDS)

Spam Marshall provides a pro-active approach to detect intrusions by malicious spammers and hackers to your mail server. These users connect to your SMTP server but usually do not send any emails. The reason often being that they are possibly probing your smtp server for open holes or weaknesses. The next step used by them is to launch a dictionary attack to guess for username and passwords or to launch a DoS (Denial of Service attack). Administrator can look at the number of invalid attempts and IP addresses originated from and possibly block them at the network firewall level. Detail information about the IP address such as its location can be found by just a single click on the ip addresses itself. This IDS feature is unique to Spam Marshall and not found in any other anti-Spam products currently available.

To see Spam Marshall IDS in action, click on Check Status in Admin Console Click on Log Summary link to view detail information Please refer to help section within each Log Summary window to understand each section.

Page 126: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

122

Page 127: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

123

Page 128: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

124

Page 129: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

125

Page 130: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

126

Page 131: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

127

Page 132: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

128

Page 133: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

129

Page 134: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

130

Page 135: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

A short tutorial for Regular Expressions

Regular Expressions

Before we go into what Regular Expressions (which we will refer to as "RE's"

from now on) are, let us clear up a minor problem that can cause a lot of

confusion.

Certain characters in RE's are called metacharacters or special characters.

Examples of such characters are '*' and '?'. Their occurrence means something

other than their literal value. For example, the pattern 'a*' means all

strings beginning with the letter 'a' and consisting of zero or more

occurrences of 'a'.

It is important to realize that RE's are _not_ the same as wildcards. These

are pattern specifiers used by other basic utilities present on Unix systems.

For example, the 'ls' program uses wildcards, but the interpretation of '*' in

a wildcard expression is different. The command

Appendix

A

Page 136: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

132

ls aa*

means list all files whose filename begins with the string "aa" and is

followed by zero or more occurrences of _any_ character. Thus, the string

'aadvark' would match the wildcard expression, but 'apple' would not. But

neither 'aadvark' or 'apple' would be matched by the RE 'aa*' because both

strings do not consist entirely of the character 'a'.

An even more confusing character is the '$' character. The expression '$a'

would be interpreted by the shell as the shell variable 'a'. The expression

'$a' does not make any sense when used as an RE, since '$' is used to indicate

the end of a line. The Unix utility 'sed' uses the '$' character in different

ways. In an RE, it has the standard RE meaning - end of line, but when used as

a specifier for line numbers, it indicates the last line in a file!

The moral of the story is that you need to examine the context in which a

special character is being used in order to determine the behaviour it will

produce. Typically, a program using regular expressions will mention the fact

in its man-page / documentation.

Alright. Lets get on with how to specify an RE.

The special characters used in RE's are '.', '*', '?', '+', '(', ')', '{',

'}', '[', ']', ' '̂, '$', and '|'. All other characters mean exactly what

Page 137: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

133

they are.

The '.' character will match any single character. The pattern 'a.c' will

match the string 'abc'. It will also match 'adc', or 'aqc', or any string

beginning with 'a', ending with 'c' and having exactly one character

inbetween. The RE '.abc...' would match the word 'labcoat'.

Repetition characters :

-----------------------

The '?', '*', and the '+' characters are repetition operators. The '?' will

match zero or one instance of the preceding item. '*' will match zero or more

instances of the preceding item. And '+' will match one or more instances of

the preceding item.

Thus, the strings 'ac', and 'aac' will be matched by the pattern 'aa?c'. There

are no other strings which will be matched by this RE. 'aa*c' however, will

match all strings beginning with 'a' and ending with 'c', provided that all

characters (if any) between these two are "a"'s.

It should be obvious that the RE 'a+' is equivalent to the RE 'aa*'.

Example 1 :

Page 138: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

134

The RE '.n+..?.?v*.' matches which of the following strings?

a. knives e. supernova

b. involve f. innovate

c. snail g. inactive

d. anvil h. nerve

The answer is that the first four patterns will match, and the last

four wont.

Lets discuss the above answer a bit. For example, why would 'snail' match the

RE even though it doesn’t have a 'v' in it? Because the repetition character

for 'v' is '*' and this means that there may be zero or more 'v' characters

occurring in the string. Thus after the 'n', if there are no 'v' characters in

the string, then any characters (1 to 4 in number) following it would match the

RE.

The match for 'anvil' is a bit more complicated. The RE seems to indicate that

there must be at least one character between the 'n' and the 'v' in the string

being matched - and thus 'anvil' should not match. However, 'v' can also be

matched by the '.' in the RE. So the match would take place as shown below :

. n+ . .? .? v* .

Page 139: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

135

| | | | | | |

'a' | | | | | 'l'

| | | | |

one instance | | | zero instances

| | |

one instance of 'v'| 'i' - one instance

|

zero instances

Note that whether the first ',?' repeats zero times or whether it is the

second one that does this, the result is the same. If an RE _can_ lead to

a match, even though another way of interpreting the pattern would cause a

match failure, then the RE would still be said to match the string.

'involve' would match for the same reasons. The 'vol' in 'involve' would match

the '..?.?' part of the RE. The same for 'knives'. The 'ives' would match the

'..?.?v*.' part of the RE.

The reason for the rejects are straightforward. (e) and (h) do not match

because of the wrong number of characters that occur before the first 'n' in

the string. (f) does not match because there are too many characters after the

'v' - there is no way for '..?.?v*.' to match 'ovate'. (g) does not match

because there are too many characters between the 'n' and the 'v'.

Page 140: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

136

Example 2 :

We are trying to match the string "annotation". What is wrong with the

following RE's?

a. *nnotation

b. an*n

c. an?otation

d. Annotation

(a) is wrong because '*' is a repetition character, and there is no character

occurring before it which it repeats. This is a syntactically wrong usage. The

correct usage would have been '.*nnotation' or '.nnotation'.

(b) and (c) are classic mistakes often made by people who fail to realize that

the '*' and '?' characters as used in RE's differ in functionality from the

wildcard '*' and '?' characters used in programs like 'ls'. In (b), the '*'

means zero or more occurrences of 'n'. Thus, that RE would match 'an', 'ann',

or 'annnnnn' - but it will not match 'annotation' since there are characters

other than 'n' between 'nn' and the final 'n'. In (c), the '?' means zero or

one occurrence of 'n', and not a placeholder for a character (as is the case

in 'dir' command of DOS).

(d) is wrong simply because RE's are case-sensitive.

Page 141: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

137

Lists and ranges :

----------------------------

A list of characters enclosed by '[' and ']' matches any single character in

that list. For example, the RE 'a[bcd]e' will match the strings 'abe' or 'ace'

or 'ade'. It will not match the string 'abce' because exactly one of the

characters enclosed between '[' and ']' can be used for matching.

Lists can be specified in a simpler manner by specifying ranges. For example,

a shorter way of saying '[0123456789]' is '[0-9]'. '[abcde]' can be specified

as '[a-e]'.

Note that the '-' character is not a special character : it only assumes a

special meaning when it is between two characters in a list. The RE '[-v-x]',

for instance, would mean a match on any one of the characters '-', 'v', 'w',

or 'x'.

Example 3 :

Social Security numbers in the US are written in the form nnn-nn-nnnn - where

'n' stands for a single digit. An RE that can match a Social Security number

would be '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]'.

Page 142: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

138

An expression like '[0-9]*-[0-9]*-[0-9]*' would also match a Social Security

number, but this RE would also match a string like '3984723-1-2324'

or even the string '--' which do not constitute valid Social Security

numbers.

Example 4 :

Lets say that you have a program that uses Regular Expressions to specify

search patterns ('egrep' is an example of such a program). One would like to

search for all proper nouns occurring in a document. If one makes the

assumption that all proper nouns begin with an upper-case alphabetic

character, then, an RE to find proper nouns would be :

[A-Z][a-z]+

This assumes that we are not searching for acronyms, and that proper nouns are

more than one character in length. If one needs to search for acronyms (all

upper-cased characters) also, then the RE would be :

[A-Z][A-Za-z]+

If the list of characters begins with '^', then it means that the bracketed

expression will match every character except the ones listed within the

Page 143: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

139

brackets. Thus the RE '[^0-9]+.*' will match any string that doesnt begin with

numbers.

Note : The technical jargon for the lists described above is 'bracket

expressions'.

Character classes :

-------------------

Lists can sometimes be represented in a standard format called 'character

class'. These are better from a documentation standpoint. They are also better

when considering code that needs to work in different collating sequences.

For example, in EBCDIC, the collating sequence of characters is different from

that of ASCII, and the range '[A-Z]' will include a different choice of

characters than the ASCII set does. This problem can be averted by using the

character class 'upper'. The way to use this is by saying :

[[:upper:]]

The standard character classes are 'alnum', 'alpha', 'blank', 'cntrl', 'digit'

, 'graph', 'lower', 'print', 'punct', 'space', 'upper', and 'xdigit'. These

are the same as what you would expect from the is... functions defined in the

C standard library (see ctypes.h).

Page 144: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

140

Thus, the example 4 solution could be rewritten as :

[[:upper:]][[:upper:][:lower:]]+

A character class may not be used as the end-point of a range.

Bounds :

--------

Bounds can be regarded as a type of repetition character. The a bound can be

written as :

{a,b}

where a, and b are numbers. The ',' and 'b' are optional, but if b is

present, then, the ',' must be present. The numbers can take values from 0 to

255, and a <= b.

'{a}' means that the preceding list or character will be matched

exactly 'a' number of times. '{a,}' means 'a' or more occurrences of the

preceding list or character will be matched. And '{a,b}' will match 'a'

through 'b' occurrences of the preceding list or character.

Page 145: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

141

From the above, it must be obvious that :

'{0,1}' is equivalent to '?' .

'{1,}' is equivalent to '+' .

'{0,}' is equivalent to '*' .

Example 5 :

The RE in example 3 can be rewritten as :

[0-9]{3}-[0-9]{2}-[0-9]{4}

or in a more portable way as :

[[:digit:]]{3}-[[:digit:]]{2}-[[:digit:]]{4}

Escape characters :

-------------------

Lets say that you want to use one of the special characters in a literal

sense. For example, suppose you want to match strings which contain a '?'

character in them. Certainly, the RE '.*?.*' would not work - it is illegal

Page 146: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

142

from a syntax point of view, since one cannot apply a repetition character to

a repetition character.

The way to solve this problem is by "escaping" from the normal sense of the

character. This is done by prefixing the character in question with the

backslash ('\') character. Thus, the RE, above should be written as :

.*\?.*

In the context of RE's, the '\' character is referred to as the "escape"

character. It is valid for exactly one character. If you are trying to match

all strings containing the string '?*', the RE

.*\?*.*

is illegal and will not work because the '\' applies only to the '?'

and not to any characters following it. The correct RE in this case is :

.*\?\*.*

If the escape character precedes a character that is not a special character,

then it will be ignored. Thus '\a' will be interpreted as 'a'. It is also

illegal to end an RE with a '\'.

Page 147: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

143

To represent a literal '\', write '\\'.

All special characters including '\' lose their special sense when enclosed

withing '[' and ']' in a list. Thus the RE '[ab*\]' would match any of the

characters 'a', 'b', '*' and '\'. There are some caveats to special characters

appearing in lists - please see the section "Advanced Topics" for more

information on this.

Example 6 :

-----------

Let us examine some of the RE's that can be written to match the string :

Hillary ***scares*** Bill

a. 'Hillary.*Bill' : this is the probably the loosest RE that can be

written for this string, since it will match anything betweeen the

words "Hillary" and "Bill", thus also matching "Hillary likes Bill".

b. 'Hillary \**scares\** Bill' : this is a tighter match, but it will

match strings like "Hillary scares Bill" and "Hillary *scares* Bill".

A slightly better RE would be 'Hillary \*+scares\*+ Bill'.

c. 'Hillary \*\*\*scares\*\*\* Bill' is an exact match.

d. The RE above can be rewritten as 'Hillary \*{3}scares\*{3} Bill'.

Page 148: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

144

Grouping and alternation :

--------------------------

Let suppose that you want to match all occurrences of the string :

ha-ha-ha-ha-ha

One way of doing it is, of course, to write it out just like that. A more

compact way of writing it would be to group the '-ha' substring together and

specify a suitable repetition factor. This is how it could be done :

ha(-ha){4}

The '(' and the ')', when used in this manner will cause the repetition factor

to apply to the enclosed string.

The grouping is essential to specify the OR condition in patterns. If

you wanted to match strings which had "Bill" or "Hillary" in them, you would

write it as

(Bill)|(Hillary)

where the '|' character implements the logical OR.

Page 149: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

145

Example 7 :

Lets say that you have a document which contains URL's in them. You want to

match all lines containing the URL's. Here is an RE for it :

http://([[:alphanum:]_-]+\.){1,4}[[:alphanum:]_-]+(/[[:alphanum:].]+/*)*

where we make the assumption that the URL main site can have at most

5 components.

Example 8 :

The RE for the full pathname of a file in Unix is :

/([[:alphanum:]._+-&*]+/)*([[:alphanum:]._+-&*]+)*

where, for convenience sake we assume that some of the special

characters (like '[' for instance) cannot be used in a filename.

Example 9 :

Here is an RE for the name of a person in titular form :

Page 150: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

146

(Mr\.)|(Mrs\.)|(Ms) [A-Z][a-z]* (([A-Z][a-z]*)[- ]*)+

Anchors :

---------

When placed at the beginning of an RE and not in a list, the '^' character

matches the start of line. The '$' character when used at the end of an RE,

will match the end of a line. These characters are called "anchors" because

they "anchor" the RE to either the beginning or end of a line.

Thus the RE '^a.*' will match every line that begins with 'a', and the RE

'.*a$' will match every line ending in 'a'.

Basic vs Extended Regular Expressions :

---------------------------------------

In the infancy of Unix, when regular expressions were introduced, things were

simple : there were not as many special characters as there are now. The '|',

'+', and the '?' characters did not have any special meanings. Also, the '{',

'}', '(', and ')' characters behaved the reverse of how they do now : by

themselves, they were ordinary characters. But when escaped, the '{' and '}'

acted as delimiters for bounds, and '(' and ')' acted as delimiters for

Page 151: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

147

subexpressions. There were also other differences - but this older style of

regular expressions are referred to as "Basic regular expressions".

By contrast, the RE's that are commonly in use today are called "Extended

regular expressions". Newer utilities that are being created conform more to

extended RE's than the basic ones. However, some utilities which date from

the early days of Unix are still around (for example - grep), and these

continue to use basic RE's. The 'egrep' tool was written to provide a version

of 'grep' that used extended regular expressions.

One important feature present in Basic Regular Expressions but not present in

Extended Regular Expressions, is the backreference. This is the '\' character

followed by a non-zero digit 'd'. '\d' would match the same sequence of

characters matched by the d'th paranthesized subexpression. The subexpressions

are numbered by the positions of their opening parantheses, left to right.

Backreferences are ambiguous in nature. For example, in the Basic Regular

Expression 'a\(\(b\)*\2\)*d' the backreference is part of the first

sub-expression which contains the second (which is referred to). Thus it is

not clear whether this RE will match 'abbbd'.

Example 10 :

Here is how the RE for a dollar amount in the million dollar range would be

Page 152: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

148

written using Extended Regular Expressions :

\$[0-9](,[0-9]{3}){2}

when written using Basic Regular Expressions, it would be :

$[0-9]\(,[0-9]\{3\}\)\{2\}

Note that the '$' does not have to be escaped in the Basic RE because '$' is

not a special character unless it appears at the end of an RE or at the end of

a sub-expression.

Advanced Topics :

-----------------

There exists character sets for which multiple byte codes are used to

represent a single character. DBCS is an example of this. For such character

sets, the difference between a collating element and a character is very

important. Since multiple characters represent one character, there could be

a problem when one tries to match a collating element using an RE, and one

uses lists.

For example, if the character sequence 'qz' corresponds to a collating

Page 153: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

149

element, the RE '[qz]' will not have the intended effect. The correct way to

handle this is to write '[[.qz.]]'. The '[.' and the '.]' bracketing sequence

is intended to enclose a character sequence that represents a single collating

element.

This method can be used to specify the RE when '-' is the start point of a

range. '[[.-.]-<range endpoint char>]' is how this is written.

Also, in order to specify ']' as an element in a list, it has to be the first

character in the list. Thus the RE '[]a]' is valid and will match either ']'

or 'a'. This also means that the RE '[][]' actually matches '[' or ']'. From

this, it is evident that there is no way to distinguish the above case from

when two empty lists are concatenated together, and hence an empty list is

illegal.

On the other hand the empty sub-expression '()' is legal and matches the null

string.

When specifying the exclusion list using '^' and ']' is one of the characters

in the exlusion list, the '^' should be written first, followed by ']'.

The RE's '[[:<:]]' and '[[:>:]]' are not character classes but are RE's that

can be used to match the beginning and end of words, respectively. A word is

a string comprising of alphanumeric characters and / or underscores. This is

Page 154: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam

150

an extension that is compatible with the standard, but is not defined in it.

Page 155: Spam Marshall UsersGuide-Final · SpamWall software (if installing on a dedicated server) and your current Email server. About Spam Marshall Spam Masrhall offers a comprehensive Spam