139
Image-Filtering Technologies Michael Lamont Senior Software Engineer Process Software

Antispam Image Filtering Technologies

Embed Size (px)

DESCRIPTION

Slides from my wildly popular presentation at HP World 2005. Who knew? Grossly over-simplified signal processing methodology and sample photos of models in bikinis was a winning combo, even in San Francisco.

Citation preview

Page 1: Antispam Image Filtering Technologies

Image-Filtering

Technologies

Michael Lamont

Senior Software

Engineer

Process Software

Page 2: Antispam Image Filtering Technologies

Overview

• Role of image filtering in anti-spam

filtering

• Two popular image filtering methods:

– Shape recognition

– Skin detection

• Example image filtering

• Image filtering issues

• Tools you can play with on your own

Page 3: Antispam Image Filtering Technologies

What Isn’t Covered

• Anything requiring advanced math

• Optical character recognition (OCR)

Page 4: Antispam Image Filtering Technologies

Spam Images

• A picture is worth 1000 words…

• …and it’s a lot harder to filter than

1000 words.

• Especially when spamvertizing

pornography, photos are essential

marketing tools.

Page 5: Antispam Image Filtering Technologies

Spam Images

• Right now, a spam filter can be very

effective without looking at images.

• This is going to change when the

majority of sites start installing more

accurate filters, and spammers are

forced to adapt.

Page 6: Antispam Image Filtering Technologies

90-Second Image Review

• To understand how image filtering

technologies work, you need a basic

understanding of how computers

represent images.

• Images are broken into square dots,

which correspond to pixels on a

monitor.

Page 7: Antispam Image Filtering Technologies

90-Second Image Review

• Example image:

Page 8: Antispam Image Filtering Technologies

90-Second Image Review

• Each dot’s color is represented by 3

components: red, green, and blue.

• Each of the three color components

has a value of 0 to 255.

• If all three are 0, then the pixel is black.

If all three are 255, then the pixel is

white.

Page 9: Antispam Image Filtering Technologies

90-Second Image Review

• The higher the number, the more

intense the color component.

• Example: Increasing red value from 0

to 255 while leaving other components

at 0:

Page 10: Antispam Image Filtering Technologies

Shape Recognition

• Identifies objects in an image using

posterization and edge finding.

• Extracts interesting objects and

searches for similar objects in a

database of “bad” objects.

• For our application, the objects are

human body parts.

Page 11: Antispam Image Filtering Technologies

Posterization

• Dramatically reduces the number of

colors in an image.

• Has the side effect of lumping most of

an object’s pixels together.

• Called “posterization” because the

same kind of color reduction used to

be done for images printed on posters.

Page 12: Antispam Image Filtering Technologies

Posterization - Example

Page 13: Antispam Image Filtering Technologies

Posterization - Example

Page 14: Antispam Image Filtering Technologies

Posterization - Method

• A number of color bins are created.

• The number of bins is a lot less than

the ~16m colors that are possible.

• Each bin holds several hundred colors

that are closely related.

• Every color in the bin is represented by

the average color.

Page 15: Antispam Image Filtering Technologies

Posterization - Method

• Example: If a bin contained every

shade of red from light pink to dark

blood, every color in the bin would be

represented by plain old red.

• The posterization process itself

consists of replacing the color of every

pixel in the image with its bin’s

representative color.

Page 16: Antispam Image Filtering Technologies

Posterization - Example 2

Page 17: Antispam Image Filtering Technologies

Posterization - Example 2

Page 18: Antispam Image Filtering Technologies

Posterization - Example 3

Page 19: Antispam Image Filtering Technologies

Posterization - Example 3

Page 20: Antispam Image Filtering Technologies

Edge Finding

• After posterizing the image, edge

finding is used to identify individual

objects.

• Edge finding determines the

boundaries between different patches

of color and contrast.

Page 21: Antispam Image Filtering Technologies

Edge Finding - Example

Page 22: Antispam Image Filtering Technologies

Edge Finding - Example

Page 23: Antispam Image Filtering Technologies

Edge Finding - Method

• The edge finding program scans the

image looking for pixels that are very

different from their neighbors.

• When it finds a radically different pixel,

it marks it as part of an edge.

• Good edge finding algorithms look at

lots of neighboring pixels to help

reduce noise.

Page 24: Antispam Image Filtering Technologies

Edge Finding - Demonstration

Page 25: Antispam Image Filtering Technologies

Edge Finding - Example 2

Page 26: Antispam Image Filtering Technologies

Edge Finding - Example 2

Page 27: Antispam Image Filtering Technologies

Edge Finding - Example 3

Page 28: Antispam Image Filtering Technologies

Edge Finding - Example 3

Page 29: Antispam Image Filtering Technologies

Object Extraction

• Once objects have been identified with

posterization and edge finding, they’re

easy to extract.

Page 30: Antispam Image Filtering Technologies

Object Extraction

• Leg, midriff, and upper torso objects

are being searched in the case of

people wearing swimsuits.

Page 31: Antispam Image Filtering Technologies

Object Extraction

• A database of known objects is

searched for matches to the extracted

objects.

• Both object shape and color are used

in the search.

• Comparisons are done with a fuzzy

logic algorithm, since it’s unlikely two

objects will be exactly alike.

Page 32: Antispam Image Filtering Technologies

Skin Detection

• Subset of an image classification

method called color histogram

matching.

• Finds patches of skin tone in an image.

• Calculates the overall percentage of

the image that is skin.

• If more than a specified amount of the

image is skin, it’s filtered.

Page 33: Antispam Image Filtering Technologies

Skin Tones

• Almost all human skin is the same hue

- saturation differences result in

different skin colors.

• Human skin tones don’t often appear

in other photographed objects, so color

alone can be used to identify skin.

• Skin tones are primarily red, without

any blue and little if any green.

Page 34: Antispam Image Filtering Technologies

Skin Color Model

• To identify skin tones in an image, a

filter needs to know what colors are

skin tones.

• You could hardcode every skin color,

but there are tens of thousands of

them.

• Much more accurate to identify skin

patches in an image and “train” the

filter.

Page 35: Antispam Image Filtering Technologies

Skin Color Training

• Works almost like Bayesian filter

training, but with image colors instead

of message tokens.

• Filter maintains one database of skin

colors, and another database of non-

skin colors.

• If a color appears more often in the

skin color database, it’s treated as a

skin color.

Page 36: Antispam Image Filtering Technologies

Skin Color Training

• This system has the nice side-effect of

dropping out most skin colors that also

appear in non-skin areas of photos.

Page 37: Antispam Image Filtering Technologies

Training Sample

Page 38: Antispam Image Filtering Technologies

Skin Identification

• To analyze an image, the filter

examines the color of each pixel.

• If the color is a skin tone, the filter

marks the pixel as skin.

• When every pixel has been examined,

the % of the image that is skin is

calculated.

• If the % is over a specified threshold,

the image is filtered.

Page 39: Antispam Image Filtering Technologies

Skin Detection Example

Page 40: Antispam Image Filtering Technologies

Skin Detection Example

Page 41: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 42: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 43: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 44: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 45: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 46: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 47: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 48: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 49: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 50: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 51: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 52: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 53: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 54: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 55: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 56: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 57: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 58: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 59: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 60: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 61: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 62: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 63: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 64: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 65: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 66: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 67: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 68: Antispam Image Filtering Technologies

Correctly Filtered Images - Shape

Page 69: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 70: Antispam Image Filtering Technologies

Correctly Filtered Images - Skin

Page 71: Antispam Image Filtering Technologies

Shape Recognition Problems

• Following are examples of images that

shape recognition doesn’t handle

correctly.

• Skin detection handles them correctly,

but only because it’s biased to filter

images with a lot of skin.

Page 72: Antispam Image Filtering Technologies

Shape Recognition Problems

• Unusual angle obscures shapes

Page 73: Antispam Image Filtering Technologies

Shape Recognition Problems

• Unusual angle obscures shapes

Page 74: Antispam Image Filtering Technologies

Shape Recognition Problems

• Unusual angle obscures shapes

Page 75: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 76: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 77: Antispam Image Filtering Technologies

Shape Recognition Problems

• Shapes are too broken up for the filter

to work

Page 78: Antispam Image Filtering Technologies

Shape Recognition Problems

• Shapes are too broken up for the filter

to work

Page 79: Antispam Image Filtering Technologies

Shape Recognition Problems

• Shapes are too broken up for the filter

to work

Page 80: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 81: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 82: Antispam Image Filtering Technologies

Shape Recognition Problems

• Not enough “swimsuit” objects

Page 83: Antispam Image Filtering Technologies

Shape Recognition Problems

• Not enough “swimsuit” objects

Page 84: Antispam Image Filtering Technologies

Shape Recognition Problems

• Not enough “swimsuit” objects

Page 85: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 86: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 87: Antispam Image Filtering Technologies

Shape Recognition Problems

• Not enough “swimsuit” objects

Page 88: Antispam Image Filtering Technologies

Shape Recognition Problems

• Not enough “swimsuit” objects

Page 89: Antispam Image Filtering Technologies

Shape Recognition Problems

• Not enough “swimsuit” objects

Page 90: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 91: Antispam Image Filtering Technologies

Shape Recognition Problems

• Skin detection works

Page 92: Antispam Image Filtering Technologies

Shape Recognition Problems

• Image is so noisy that edge detection

goes crazy

Page 93: Antispam Image Filtering Technologies

Shape Recognition Problems

• Image is so noisy that edge detection

goes crazy

Page 94: Antispam Image Filtering Technologies

Shape Recognition Problems

• Image is so noisy that edge detection

goes crazy

Page 95: Antispam Image Filtering Technologies

Shape Recognition Problems

• Amazingly, skin detection still works

Page 96: Antispam Image Filtering Technologies

Shape Recognition Problems

• Amazingly, skin detection still works

Page 97: Antispam Image Filtering Technologies

Skin Detection Problems

• Following are examples of images that

skin detection incorrectly filters.

• Shape recognition works for most of

these, mainly because it can’t extract

any useful shapes.

Page 98: Antispam Image Filtering Technologies

Skin Detection Problems

• Baby photos tend to show lots of skin

Page 99: Antispam Image Filtering Technologies

Skin Detection Problems

• Baby photos tend to show lots of skin

Page 100: Antispam Image Filtering Technologies

Skin Detection Problems

• Shape recognition doesn’t filter the

image

Page 101: Antispam Image Filtering Technologies

Skin Detection Problems

• Shape recognition doesn’t filter the

image

Page 102: Antispam Image Filtering Technologies

Skin Detection Problems

• Shape recognition doesn’t filter the

image

Page 103: Antispam Image Filtering Technologies

Skin Detection Problems

• Portraits have the same problem as

babies.

Page 104: Antispam Image Filtering Technologies

Skin Detection Problems

• Portraits have the same problem as

babies.

Page 105: Antispam Image Filtering Technologies

Skin Detection Problems

• Shape recognition ignores the image.

Page 106: Antispam Image Filtering Technologies

Skin Detection Problems

• Shape recognition ignores the image.

Page 107: Antispam Image Filtering Technologies

Skin Detection Problems

• Shape recognition ignores the image.

Page 108: Antispam Image Filtering Technologies

Skin Detection Problems

• In the right light, sand can be the same

color as skin.

Page 109: Antispam Image Filtering Technologies

Skin Detection Problems

• In the right light, sand can be the same

color as skin.

Page 110: Antispam Image Filtering Technologies

Skin Detection Problems

• That’s fairly rare - usually skin color

models exclude sand colors.

Page 111: Antispam Image Filtering Technologies

Skin Detection Problems

• That’s fairly rare - usually skin color

models exclude sand colors.

Page 112: Antispam Image Filtering Technologies

Skin Detection Problems

• Black & white images can’t be filtered

Page 113: Antispam Image Filtering Technologies

Skin Detection Problems

• It also makes life rough on shape

recognition filters.

Page 114: Antispam Image Filtering Technologies

Skin Detection Problems

• It also makes life rough on shape

recognition filters.

Page 115: Antispam Image Filtering Technologies

Wedding Photos

• Wedding photos are guaranteed to

make a mess of image filters.

• Skin fades into the background

because of soft lighting, soft filters, and

retouching.

• Turns out that brides get upset if the

image is crystal clear with good

contrast - it shows off skin flaws.

Page 116: Antispam Image Filtering Technologies

Wedding Photos

• Skin detection filters start identifying

everything as skin (false positive).

• Shape recognition filters give up and

don’t filter the message (accurate, but

not for the right reasons).

• Porn tends not to be shot with soft

lighting - good contrast makes skin

“pop” in photos.

Page 117: Antispam Image Filtering Technologies

Example Wedding Photo - Shape

Page 118: Antispam Image Filtering Technologies

Example Wedding Photo - Shape

Page 119: Antispam Image Filtering Technologies

Example Wedding Photo - Shape

Page 120: Antispam Image Filtering Technologies

Example Wedding Photo - Skin

Page 121: Antispam Image Filtering Technologies

Example Wedding Photo - Skin

Page 122: Antispam Image Filtering Technologies

Example Wedding Photo - Shape

Page 123: Antispam Image Filtering Technologies

Example Wedding Photo - Shape

Page 124: Antispam Image Filtering Technologies

Example Wedding Photo - Shape

Page 125: Antispam Image Filtering Technologies

Example Wedding Photo - Skin

Page 126: Antispam Image Filtering Technologies

Example Wedding Photo - Skin

Page 127: Antispam Image Filtering Technologies

“Art Porn”

• Usually shot with the same lighting

effects as wedding photos.

• Rarely seen in email.

• In this case, skin detection is accurate

for the wrong reasons while shape

recognition lets the image pass.

Page 128: Antispam Image Filtering Technologies

“Art Porn” Example - Shape

Page 129: Antispam Image Filtering Technologies

“Artistic” Example - Shape

Page 130: Antispam Image Filtering Technologies

“Artistic” Example - Shape

Page 131: Antispam Image Filtering Technologies

“Artistic” Example - Skin

Page 132: Antispam Image Filtering Technologies

“Artistic” Example - Skin

Page 133: Antispam Image Filtering Technologies

Things I Can’t Show You

• S & M

– Skin tends to be covered with “clothing”

– Shapes are broken up by all of the

paraphernalia

• Simpson’s shocker

• Still images from “interesting” videos

– Images are badly pixelated

– Colors are muddy and smudged

Page 134: Antispam Image Filtering Technologies

Image Filtering Issues

• Accuracy:

– Shape recognition misses lots of images it

shouldn’t (false negatives)

– Skin detection filters lots of images it

shouldn’t (false positives)

– Best skin detection systems are about

80% accurate

– Best shape recognition systems are about

40% accurate

Page 135: Antispam Image Filtering Technologies

Image Filtering Issues

• Performance:

– Image filtering requires huge amounts of

memory, CPU time, and disk bandwidth.

– Unacceptably slows down most site’s

email servers/filtering systems.

– DL380 benchmark:

• ~1.2 million messages/hour with no filtering

• ~195,000 messages/hour with skin detection

• ~69,000 messages/hour with shape recognition

Page 136: Antispam Image Filtering Technologies

Image Filtering Issues

• Diminishing returns on accuracy - most

spam filters won’t see a noticeable

increase in accuracy with the addition

of image filtering.

• That’s likely to change in the future as

spammers discover it’s one of the

better options for circumventing current

solutions.

Page 137: Antispam Image Filtering Technologies

I Wanna Play!

• Shape recognition:

– UC Berkeley’s blobworld

• Open source

• http://elib.cs.berkeley.edu/

– Skin detection

• No good open-source examples

• Trivial to write your own using ImageMagick

• http://www.imagemagick.org/

Page 138: Antispam Image Filtering Technologies

Quick Review

• We covered:

– How and why images appear in spam

– Why the use of images in spam is likely to

increase

– Two methods for filtering images

– Examples of how the two methods work

and don’t work

– Why image filtering isn’t widely used at

this point.

Page 139: Antispam Image Filtering Technologies