Textual and Visual Content Based Anti-Phishing First Review

Embed Size (px)

Citation preview

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    1/53

    Textual And Visual Content Based Anti-Phishing A SVM Approach

    Abstract

    A novel framework using a SVM [Support Vector Machine] approach for content-

    based phishing web page detection is presented. Our model takes into account

    te tual and visual contents to measure the similarit! between the protected web

    page and suspicious web pages. A te t classifier" an image classifier" and an

    algorithm fusing the results from classifiers are introduced. An outstanding feature

    of this paper is the e ploration of a SVM model to estimate the matching threshold.

    #his is re$uired in the classifier for determining the class of the web page andidentif!ing whether the web page is phishing or not. %n the te t classifier" the naive

    SVM rule is used to calculate the probabilit! that a web page is phishing. %n the

    image classifier" the earth mover&s distance is emplo!ed to measure the visual

    similarit!" and our SVM model is designed to determine the threshold. %n the data

    fusion algorithm" the SVM theor! is used to s!nthesi'e the classification results

    from te tual and visual content. #he effectiveness of our proposed approach was

    e amined in a large-scale dataset collected from real phishing cases. ( perimental

    results demonstrated that the te t classifier and the image classifier we designed

    deliver promising results" the fusion algorithm outperforms either of the individual

    classifiers" and our model can be adapted to different phishing cases.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    2/53

    Introduction

    Malicious people" also known as phishers" create phishing web pages" i.e."

    forgeries of real web pages" to steal individuals& personal information such as bank

    account" password" credit card number" and other financial data. )nwar! online

    users can be easil! deceived b! these phishing web pages because of their high

    similarities to the real ones. #he Anti-*hishing +orking ,roup reported that there

    were at least " /0 phishing attacks between 1anuar! 2" 344/" and 1une 54" 344/.

    #he latest statistics show that phishing remains a ma6or criminal activit! involving

    great losses of mone! and personal data.

    Automaticall! detecting phishing web pages has attracted much attention from

    securit! and software providers" financial institutions" to academic researchers.

    Methods for detecting phishing web pages can be classified into industrial toolbar

    based anti-phishing" user-interface-based anti-phishing" and web page content-

    based anti-phishing. #o date" techni$ues for phishing detection used b! the industr!

    mainl! include authentication" filtering" attack tracing and anal!'ing" phishing

    report generating" and network law enforcement. #hese anti-phishing internet

    services are built into e-mail servers and web browsers and available as web

    browser toolbars.

    #hese industrial services" however" do not efficientl! know all phishing attacks.

    +u et al. conducted thorough stud! and anal!sis on the effectiveness of anti-

    phishing toolbars" which consist of three securit! toolbars and other mostl! used

    browser securit! indicators. #he stud! indicates that all e amined toolbars in were

    ineffective to prevent web pages from phishing attacks. 7eports show that 34 out

    of 54 sub6ects were spoofed b! at least one phishing attack" 0 8 of the spoofed

    sub6ects indicated that the websites look legitimate or e actl! same as the! visited

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    3/53

    before" and 948 of the spoofed sub6ects were tricked due to poorl! designed web

    sites. :ranor et al. performed another stud! on an evaluation of 24 anti-phishing

    tools. #he! indicated that onl! one tool could consistentl! detect more than 48 of

    phishing web sites without a high rate of false positives" whilst four tools were notable to recogni'e 48 of the tested web sites. Apart from these studies on the

    effectiveness of anti-phishing toolbars" ;i and

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    4/53

    #he method works b! first finding the associated web pages of the given webpage

    and then constructing a S;? from all those web pages. A mechanism of reasoning

    on the S;? is e ploited to identif! the phishing target. hang et al. developed a

    content-based approach" i.e." :arnegie Mellon Anti-phishing and ?etwork Anal!sis#ool" for anti-phishing b! emplo!ing the idea of robust h!perlinks [2 ]. ,iven a

    web page" this method first calculates the #=-%B= of each term" an algorithm

    usuall! used in information retrieval" generates a le ical signature9 b! selecting a

    few terms" supplies this signature to a search engine" and then matches the domain

    name of current web page and several top search results to evaluate the current

    web page is legitimate or not. Another content-based techni$ue" CA*#" is designed

    to identif! phishing websites b! using an open-source Ca!esian filter on the basis

    of tokens which are e tracted b! a document ob6ect module >BOM@ anal!'er.

    #he concept of visual approach to phishing detection was first introduced b! ;iu et

    al. #his approach" which is oriented b! the BOM-based visual similarit! of web

    pages" first decomposes the web pages into salient block regions. #he visual

    similarit! between two web pages is then evaluated b! three metrics" namel!" block level similarit!" la!out similarit!" and overall st!le similarit!" which are based on

    the matching of the salient block regions. =u et al. followed the overall strateg!"

    but proposed another method to calculate the visual similarit! of web pages. #he!

    first converted

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    5/53

    #he main ob6ective of this pro6ect is as followsD

    #o detect the phishing web pages b! using the SVM algorithm.

    #o classif! the webpage b! using te tual and visual SVM classificationalgorithms.

    #o combine the classified results like te tual and visual content b! using

    fusion algorithm.

    #o compare the true and false web page fused results b! finding the

    probabilit!" to find the given web page is phishing or not.

    Scope O Project

    #he main scope of the pro6ect is as followsD

    #o detect the website is a phishing website or not.

    #o detect the website is hacked b! the attacker or not.

    #o compare the true and attacked websites b! detecting its fusion results.

    Project !escription

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    6/53

    "xisting S#ste$

    *hishing techni$ue used b! the e isting s!stem mainl! includes authentication"

    filtering" attack tracing and anal!'ing. #oolbar based anti-phishing which

    guides the user to interact with trusted website. #he toolbars like securit!

    toolbars and browser securit! toolbars are used in the s!stem. Methods for

    detecting phishing web pages can be classified into industrial toolbar-based

    anti-phishing" user-interface-based anti-phishing" and web page content-based

    anti-phishing. #echni$ues for phishing detection used b! the industr! mainl!

    include authentication" filtering" attack tracing and anal!'ing" phishing report

    generating" and network law enforcement. #hese anti-phishing internet services

    are built into e-mail servers and web browsers and available as web browser

    toolbars.

    :ontent-based anti-phishing" which is referred to as using the features of web

    pages" consists of surface level characteristics" te tual content" and visual

    content. +e clarif! that the content of a web page we discuss here include the

    whole information of a web page such as a domain name" )7;" h!perlinks"

    terms" images" and forms embedded in the web page. Surface-level

    characteristics have been commonl! used b! industrial toolbars to detect

    phishing. =or e ample" the Spoof-,uard makes use of inspecting the age of

    domain" well known logos" )7;" and links to ac$uire the characteristics of

    phishing web pages. ;iu et al. proposed the use of semantic link network to

    automaticall! identif! the phishing target of a given webpage.

    #he method works b! first finding the associated web pages of the given

    webpage and then constructing a S;? from all those web pages. A mechanism

    of reasoning on the S;? is e ploited to identif! the phishing target. hang et al.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    7/53

    developed a content-based approach" i.e." :arnegie Mellon Anti-phishing and

    ?etwork Anal!sis #ool" for anti-phishing b! emplo!ing the idea of robust

    h!perlinks. ,iven a web page" this method first calculates the #=-%B= of each

    term" an algorithm usuall! used in information retrieval" generates a le icalsignature9 b! selecting a few terms" supplies this signature to a search engine"

    and then matches the domain name of current web page and several top search

    results to evaluate the current web page is legitimate or not. Another content-

    based techni$ue" CA*#" is designed to identif! phishing websites b! using an

    open-source Ca!esian filter on the basis of tokens which are e tracted b! a

    document ob6ect module anal!'er.

    #he concept of visual approach to phishing detection was first introduced b!

    ;iu et al. #his approach" which is oriented b! the BOM-based visual similarit!

    of web pages" first decomposes the web pages into salient block regions. #he

    visual similarit! between two web pages is then evaluated b! three metrics"

    namel!" block level similarit!" la!out similarit!" and overall st!le similarit!"

    which are based on the matching of the salient block regions. =u et al. followedthe overall strateg!" but proposed another method to calculate the visual

    similarit! of web pages. #he! first converted

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    8/53

    All phishing attacks will not be detected b! using the detection techni$ues.

    #he toolbar techni$ue is an ineffective wa! to prevent web pages from

    phishing attacks.

    #he online traffic will decrease the $ualit! of web pages and its applications.

    #he e isting approach onl! investigates phishing detection at the pi el level

    of web pages without considering the te t level.

    #he e isting s!stems like :A?#%?A" #ool-bar based techni$ue is ver!

    difficult to implement.

    All the phishing web pages will not be detected b! using the :A?#%?A and

    #ool-bar based s!stems.

    Proposed S#ste$

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    9/53

    #he content representation of proposed s!stem is divided into two categories.

    2@Textual content% E#e tual contentF in this paper is defined as the terms or

    words that appear in a given web page" e cept for the stop words. +e first separate

    the main te t content from

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    10/53

    #he s!stem includes a training section" which is to estimate the statistics of

    historical data" and a testing section" which is to e amine the incoming testing web

    pages. #he statistics of the web page training set consists of the probabilities that a

    te tual web page belongs to the categories" the matching thresholds of classifiers"and the posterior probabilit! of data fusion. #hrough the preprocessing" content

    representations" i.e." te tual and visual" are rapidl! e tracted from a given testing

    web page. #he te t classifier is used to classif! the given web page into the

    corresponding categor! based on the te tual features. #he image classifier is used

    to classif! the given web page into the corresponding categor! based on the visual

    content. #hen the fusion algorithm is used to combine the detection results

    delivered b! the two classifiers. #he detection results are eventuall! transmitted to

    the online users or the web browsers.

    *reprocessing is the main conte ts of a given web page are firstl! separated from

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    11/53

    of original words. +e store the stemmed words to construct the vocabular!. ,iven

    a web page" we then form a histogram vector" where each component represents

    the term fre$uenc! and n denotes the total number of components in the vector. +e

    e plain three points here.

    2@ +e do not e tract words from all the web pages in a dataset to construct the

    vocabular!" because phishers usuall! onl! use the words from a targeted web page

    to scam unwar! users.

    3@ =or the sake of simplicit!" we do not use an! feature e traction algorithms in

    the process of vocabular! construction.

    5@ +e do not take the semantic associations of web pages into account" because

    the si'es of most phishing web pages are small.

    %n realit!" using onl! te t content is insufficient to detect phishing web pages. #his

    method will usuall! result in high false positives" because phishing web pages are

    highl! similar to the targeted web pages not onl! in te tual content but also in

    visual content such as famous logos" la!out" and overall st!le. %n this s!stem" weuse the same approach as in using the SVM to measure the visual similarit!

    between an incoming web page and a protected web page.

    =irst" we retrieve the suspected web pages and protected web pages from the web.

    Second" we generate their signatures" which are used for the calculation of the

    SVM between them. #hus all the web page images are normali'ed into fi ed-si'e

    s$uare images. +e use these normali'ed images to generate the signature of eachweb page.

    #he image classifier is implemented b! setting a threshold" which is later

    estimated in the subse$uent section. %f the visual similarit! between a suspected

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    12/53

    web page and the protected web page e ceeds the threshold" the web page is

    classified as phishing" otherwise.

    #he overall implementation process of image classifier is summari'ed as

    follows.

    Step 2D Obtain the images of a web page from its )7; and perform

    normali'ation.

    Step 3D ,enerate visual signature of the input image including the color and

    coordinate features.

    Step 5D :alculate the visual similarit! between the input web page image and

    the protected web page image using SVM approach.

    Step 9D :lassif! the input web page into corresponding categor! according to

    the comparison of the visual similarit! and the threshold.

    #he overall implementation procedures of fusion algorithm are summari'ed as

    follows.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    13/53

    Step 2D %nput the training set" train a te t classifier and an image classifier" and

    then collect similarit! measurements from different classifiers.

    Step 3D *artition the interval of similarit! measurements into sub-intervals.

    Step 5D (stimate the posterior probabilities conditioning on all the sub-intervals

    for the image classifier.

    Step 9D (stimate the posterior probabilities conditioning on all the sub-intervals

    for the image classifier.

    Step D =or a new testing web page" classif! it into corresponding categor! b!

    using the te t classifier and the image classifier.

    Step D Bispla! the results whether the given web page is phishing or not.

    Advantages

    #he data fusion framework enables us to directl! incorporate the multipleresults produced b! different classifiers.

    #he SVM algorithm is used for classif!ing both the te tual and visual

    content.

    All phishing websites will be detected b! using this approach.

    &iterature Surve#

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    14/53

    !etecting phishing 'eb pages 'ith visual si$ilarit# assess$ent based on

    earth $over(s distance

    An effective approach to phishing +eb page detection is proposed" which uses

    (arth Mover&s Bistance >(MB@ to measure +eb page visual similarit!. +e first

    convert the involved +eb pages into low resolution images and then use color

    and coordinate features to represent the image signatures. +e use (MB to

    calculate the signature distances of the images of the +eb pages. +e train an

    (MB threshold vector for classif!ing a +eb page as a phishing or a normal one.

    ;arge-scale e periments with 24"302 suspected +eb pages are carried out to

    show high classification precision" phishing recall" and applicable time performance for online enterprise solution. +e also compare our method with

    two others to manifest its advantage. +e also built up a real s!stem which is

    alread! used online and it has caught man! real phishing cases.

    *hishing web pages are forged web pages that are created b! malicious people

    to mimic web pages of real web sites. Most of these kinds of web pages have

    high visual similarities to scam their victims. Some of these kinds of web pages

    look e actl! like the real ones. )nwar! %nternet users ma! be easil! deceived

    b! this kind of scam. Victims of phishing web pages ma! e pose their bank

    account" password" credit card number" or other important information to the

    phishing +eb page owners. *hishing is a relativel! new %nternet crime in

    comparison with other forms" e.g." virus and hacking. More and more phishing

    +eb pages have been found in recent !ears in an accelerative wa!. A reportfrom the Anti-*hishing +orking ,roup shows that the number of phishing +eb

    pages is increasing each month b! 4 percent and usuall! percent of the

    phishing e-mail receivers will respond to the scams. Also" there were 2 "4 4

    phishing cases reported simpl! in one month in 1une 344 . #his problem has

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    15/53

    drawn high attention from both industr! and the academic research domain

    since it is a severe securit! and privac! problem and has caused huge negative

    impacts on the %nternet world. %t is threatening people&s confidence to use the

    +eb to conduct online finance-related activities.

    %n this s!stem" we propose an effective approach for detecting phishing +eb

    pages" which emplo!s the (arth Mover&s Bistance >(MB@ to calculate the

    visual similarit! of +eb pages. #he most important reason that %nternet users

    could become phishing victims is that phishing +eb pages alwa!s have high

    visual similarit! with the real +eb pages" such as visuall! similar block la!outs"

    dominant colors" images" and fonts" etc. +e follow the anti-phishing strateg! into obtain suspected +eb pages" which are supposed to be collected from )7;s

    in those e-mails containing ke!words associated with protected +eb pages. +e

    first convert them into normali'ed images and then represent their image

    signatures with features composed of dominant color categor! and its

    corresponding centroid coordinate to calculate the visual similarit! of two +eb

    pages.

    #he linear programming algorithm for (MB is applied to visual similarit!

    computation of the two signatures. An anti-phishing s!stem ma! be re$uested to

    protect man! +eb pages. A threshold is calculated for each protected +eb page

    using supervised training. %f the (MB-based visual similarit! of a +eb page

    e ceeds the threshold of a protected +eb page" we classif! the +eb page as a

    phishing one.

    (volving with the anti-phishing techni$ues" various phishing techni$ues and

    more complicated and hard-to-detect methods are used b! phi-shers. #he most

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    16/53

    straightforward wa! for a phi-sher to scam people is to make the phishing +eb

    pages similar to their targets.

    A phishing strateg! includes both +eb link obfuscation and +eb page

    obfuscation. +eb link obfuscation can be carried out in four basic wa!sD adding

    a suffi to a domain name of the )7;" using an actual link different from the

    visible link" utili'ing s!stem bugs in real +eb sites to redirect the link to the

    phishing +eb pages. *revious research works on duplicated document detection

    approaches focus on plain te t documents and use pure te t features in

    similarit! measure" such as collection statistics" s!ntactic anal!sis" displa!ing

    structure" visual-based understanding" vector space model" etc.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    17/53

    *hishing is considered as one of the most serious threats for the %nternet and e-

    commerce. *hishing attacks abuse trust with the help of deceptive e-mails"

    fraudulent web sites and malware. %n order to prevent phishing attacks some

    organi'ations have implemented %nternet browser tool-bars for identif!ingdeceptive activities.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    18/53

    of target name in the )7; or onl! %* address without host name. #hese

    ambiguous domain names are ha'ardous for careless consumers.

    Cecause of the careless usabilit! securit! design" phishers can easil! take

    advantage of poor usabilit! design. %n order to offer more reliable securit!" anti-

    phishing tool-bars should be easier to use. Moreover" as end-users must be able

    to use the toolbars and make correct choices" usabilit! evaluation of these

    toolbars is important. Our research ob6ective was to Gnd out general usabilit!

    design principles for anti-phishing client side applications. Such information

    ma! result in valuable information for improving usabilit! and securit! of anti-

    phishing applications. Cased on this motivation" we conducted the heuristicusabilit! evaluation of Gve toolbars.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    19/53

    spoofed.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    20/53

    *eb 'allet% Preventing phishing attac+s b# revealing user intentions

    +e introduce a new anti-phishing solution" the +eb +allet. #he +eb +allet is a

    browser sidebar which users can use to submit their sensitive information

    online. %t detects phishing attacks b! determining where users intend to submit

    their information and suggests an alternative safe path to their intended site if

    the current site does not match it. %t integrates securit! $uestions into the user&s

    workflow so that its protection cannot be ignored b! the user. +e conducted a

    user stud! on the +eb +allet protot!pe and found that the +eb +allet is a

    promising approach. %n the stud!" it significantl! decreased the spoof rate of

    t!pical phishing attacks from 58 to H8" and it effectivel! prevented all phishing attacks as long as it was used. A ma6orit! of the sub6ects successfull!

    learned to depend on the +eb +allet to submit their login information.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    21/53

    sensitive data" she presses a dedicated securit! ke! on the ke!board to open the

    +eb +allet. )sing the +eb +allet" she ma! t!pe her data or retrieve her stored

    data. #he data is then filled into the web form. Cut before the fill-in" the +eb

    +allet checks if the current site is good enough to receive the sensitive data. %f the current site is not $ualified" the +eb +allet re$uires the user to e plicitl!

    indicate where she wants the data to go. %f the user&s intended site is not the

    current site" the +eb +allet shows a warning to the user about this discrepanc!"

    and gives her a safe path to her intended site. #here is one simple rule to

    correctl! use the +eb +alletD EAlwa!s use the +eb +allet to submit sensitive

    information b! pressing the securit! ke! first.F ($uivalentl!" Enever submit

    sensitive information directl! through a web form because it is not a secure

    practice.F

    +e have run a user stud! to test the +eb +allet interface. #he results are

    promisingD

    J #he +eb +allet significantl! decreased the spoof rate of normal phishing

    attacks from 58 to H8.

    J All the simulated phishing attacks in the stud! were effectivel! prevented b!

    the +eb +allet as long as it was used.

    J C! disabling direct input into web forms and thus making itself the onl! wa!

    to input sensitive information" the +eb +allet successfull! trained a ma6orit! of

    the sub6ects to use it to protect their sensitive information submission.

    Cut there are also negative results which we plan to deal with in future researchD

    J #he sub6ects totall! failed to differentiate the authentic +eb +allet interface

    from a fake +eb +allet presented b! a phishing site. #his is a new t!pe of

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    22/53

    phishing attack. %nstead of mimicking a legitimate site&s appearance" the

    attacker fakes the interface of securit! software that is run b! the user.

    J %t is not eas! to completel! stop all sub6ects from t!ping sensitive information

    directl! into web forms. )sers are familiar with web form submission and have

    a strong tendenc! to use it.

    *hishing attacks e ploit the gap between the wa! a user perceives a

    communication and the actual effect of the communication. #he computer

    s!stem and the human user have two different understandings of a web site. #he

    user recogni'es a site based on its visual appearance and the semantic meaning

    of its content. Cut the browser recogni'es a site based on s!stem properties"

    e.g." whether the site has an SS; certificate" when and where this site registered"

    etc. As a result" neither the computer s!stem nor the human user alone can

    effectivel! prevent phishing attacks.

    On the one hand" it is hard" if not impossible" for the computer to alwa!s

    correctl! derive the semantic meaning of the content. On the other hand"

    ordinar! users do not know how to correctl! interpret the s!stem properties.

    #he user interface is thus the e act place to bridge the gap between the user&s

    mental model and the s!stem model b! letting the human user and the s!stem

    share what the! individuall! know about the current site. #he +eb +allet helps

    the users transfer their real intention to the browser" especiall! when the! are

    doing phishing-critical actions" such as submitting sensitive data to web sites.

    +hen a user uses the +eb +allet a dedicated interface for sensitive information

    submission she implicitl! indicates that the submitting data is sensitive. #he

    user further indicates the sensitive data t!pe b! using the appropriate card in the

    +eb +allet.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    23/53

    Intelligent phishing 'ebsite detection s#ste$ using u,,# techni ues

    Betecting and identif!ing e-banking *hishing websites is reall! a comple and

    d!namic problem involving man! factors and criteria. Cecause of the sub6ective

    considerations and the ambiguities involved in the detection" =u''! Bata

    Mining #echni$ues can be an effective tool in assessing and identif!ing e-

    banking phishing websites since it offers a more natural wa! of dealing with

    $ualit! factors rather than e act values. %n this s!stem" we present novel

    approach to overcome the fu''iness in the e-banking phishing website

    assessment and propose an intelligent resilient and effective model for detecting

    e-banking phishing websites. #he proposed model is based on =u''! logiccombined with Bata Mining algorithms to characteri'e the e-banking phishing

    website factors and to investigate its techni$ues b! classif!ing there phishing

    t!pes and defining si e-banking phishing website attack criteria&s with a la!er

    structure. A :ase stud! was applied to illustrate and simulate the phishing

    process. Our e perimental results showed the significance and importance of

    the e-banking phishing website criteria represented b! la!er one and the variet!influence of the phishing characteristic la!ers on the final e-banking phishing

    website rate.

    (-banking *hishing websites are forged website that is created b! malicious

    people to mimic real e-banking websites. Most of these kinds of +eb pages

    have high visual similarities to scam their victims. Some of these +eb pages

    look e actl! like the real ones. )nwar! %nternet users ma! be easil! deceived b! this kind of scam. Victims of e-banking phishing +ebsites ma! e pose their

    bank account" password" credit card number" or other important information to

    the phishing +eb page owners. #he impact is the breach of information securit!

    through the compromise of confidential data and the victims ma! finall! suffer

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    24/53

    losses of mone! or other kinds. *hishing is a relativel! new %nternet crime in

    comparison with other forms" e.g." virus and hacking.

    (-banking *hishing website is a ver! comple issue to understand and to

    anal!'e" since it is 6oining technical and social problem with each other for

    which there is no known single silver bullet to entirel! solve it. #he motivation

    behind this stud! is to create a resilient and effective method that uses =u''!

    Bata Mining algorithms and tools to detect e-banking phishing websites in an

    automated manner. BM approaches such as neural networks" rule induction" and

    decision trees can be a useful addition to the fu''! logic model. %t can deliver

    answers to business $uestions that traditionall! were too time consuming toresolve such as" K+hich are most important e-banking *hishing website

    :haracteristic %ndicators and wh!LK b! anal!'ing massive databases and

    historical data for training purposes.

    .u,,# !ata Mining Algorith$s / Techni ues

    #he approach described here is to appl! fu''! logic and data mining algorithms

    to assess e-banking phishing website risk on the 3H characteristics and factors

    which stamp the forged website. #he essential advantage offered b! fu''! logic

    techni$ues is the use of linguistic variables to represent e! *hishing

    characteristic indicators and relating e-banking phishing website probabilit!.

    01 .u,,i ication

    %n this step" linguistic descriptors such as

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    25/53

    between classes. #he degree of belongingness of the values of the variables

    to an! selected class is called the degree of membership Membership

    function is designed for each *hishing characteristic indicator" which is a

    curve that defines how each point in the input space is mapped to amembership value between [4" 2]. ;inguistic values are assigned for each

    *hishing indicator as ;ow" Moderate" and

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    26/53

    51 Aggregation o the rule outputs

    #his is the process of unif!ing the outputs of all discovered rules.

    :ombining the membership functions of all the rules conse$uents previousl!

    scaled into single fu''! sets.

    4) !e- u,,i ication

    #his is the process of transforming a fu''! output of a fu''! inference

    s!stem into a crisp output. =u''iness helps to evaluate the rules" but the final

    output has to be a crisp number. #he input for the de-fu''ification process is

    the aggregate output fu''! set and the output is a number. #his step wasdone using :entroid techni$ue since it is a commonl! used method.

    #here are a number of challenges posed b! doing post- hoc classification of e-

    banking phishing websites. Most of these challenges onl! appl! to the e-banking

    phishing websites data and materiali'e as a form of information" which has the net

    effect of increasing the false negative rate. #he age of the dataset is the most

    significant problem" which is particularl! relevant with the phishing corpus. (- banking *hishing websites are short-lived" often lasting onl! in the order of 90

    hours. Some of our features can therefore not be e tracted from older websites"

    making our tests difficult. #he average phishing site sta!s live for appro imatel!

    3.3 da!s. =urthermore" the process of transforming the original e- banking

    phishing website archives into record feature datasets is not without error. %t

    re$uires the use of heuristics at several steps. #hus high accurac! from the data

    mining algorithms cannot be e pected.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    27/53

    CA6TI6A% A content-based approach to detecting phishing 'eb sites

    *hishing is a significant problem involving fraudulent email and web sites that

    trick unsuspecting users into revealing private information. %n this paper" we

    present the design" implementation" and evaluation of :A?#%?A" a novel"

    content-based approach to detecting phishing web sites" based on the #=-%B=

    information retrieval algorithm. +e also discuss the design and evaluation of

    several heuristics we developed to reduce false positives. Our e periments show

    that :A?#%?A is good at detecting phishing sites" correctl! labeling

    appro imatel! / 8 of phishing sites.

    7ecentl!" there has been a dramatic increase in phishing" a kind of attack in

    which victims are tricked b! spoofed emails and fraudulent web sites into

    giving up personal information. *hishing is a rapidl! growing problem" with

    /"3 uni$ue phishing sites reported in 1une of 344 alone. %t is unknown

    precisel! how much phishing costs each !ear since impacted industries are

    reluctant to release figures estimates range from Q2 billion to 3.0 billion per

    !ear. #o respond to this threat" software vendors and companies have released a

    variet! of anti-phishing toolbars.

    =or e ample" eCa! offers a free toolbar that can positivel! identif! eCa!-owned

    sites" and ,oogle offers a free toolbar aimed at identif!ing an! fraudulent site.

    As of September 344 " the free software download site download.com" listed 09

    anti-phishing toolbars.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    28/53

    %n this s!stem" we present the design" implementation" and evaluation of

    :A?#%?A" a novel content-based approach for detecting phishing web sites.

    :A?#%?A e amines the content of a web page to determine whether it is

    legitimate or not" in contrast to other approaches that look at surfacecharacteristics of a web page" for e ample the )7; and its domain name.

    :A?#%?A makes use of the well-known #=-%B= algorithm used in information

    retrieval" and more specificall!" the 7obust

    developed b! *helps and +ilensk! for overcoming broken h!perlinks. Our

    results show that :A?#%?A is $uite good at detecting phishing sites" detecting

    /9-/H8 of phishing sites.

    +e also show that we can use :A?#%?A in con6unction with heuristics used

    b! other tools to reduce false positives" while lowering phish detection rates

    onl! slightl!. +e present a summar! evaluation" comparing :A?#%?A to two

    popular anti-phishing toolbars that are representative of the most effective tools

    for detecting phishing sites currentl! available. Our e periments show that

    :A?#%?A has comparable or better performance to Spoof-,uard with far fewer false positives" and does about as well as ?et :raft. =inall!" we show that

    :A?#%?A combined with heuristics is effective at detecting phishing )7;s in

    usersP actual email" and that it&s most fre$uent mistake is labeling spam-related

    )7;s as phishing.

    #=-%B= is an algorithm often used in information retrieval and te t mining. #=-

    %B= !ields a weight that measures how important a word is to a document in acorpus. #he importance increases proportionall! to the number of times a word

    appears in the document" but is offset b! the fre$uenc! of the word in the

    corpus. #he term fre$uenc! >#=@ is simpl! the number of times a given term

    appears in a specific document. #his count is usuall! normali'ed to prevent a

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    29/53

    bias towards longer documents to give a measure of the importance of the term

    within the particular document. #he inverse document fre$uenc! >%B=@ is a

    measure of the general importance of the term. 7oughl! speaking" the %B=

    measures how common a term is across an entire collection of documents.#hus" a term has a high #=-%B= weight b! having a high term fre$uenc! in a

    given document.

    :A?#%?A works as followsD

    ,iven a web page" calculate the #=-%B= scores of each term on that web

    page. R ,enerate a le ical signature b! taking the five terms with highest

    #=-%B= weights.

    =eed this le ical signature to a search engine" which in our case is

    ,oogle.

    %f the domain name of the current web page matches the domain name of

    the ? top search results" we consider it to be a legitimate web site.

    Otherwise" we consider it a phishing site.

    Our techni$ue makes the assumption that ,oogle inde es the vast ma6orit!

    of legitimate web sites" and that legitimate sites will be ranked higher than

    phishing sites. :ombined suggest that a phishing scam will rarel!" if ever" be

    highl! ranked. At the end of this paper" however" we discuss some wa!s of

    possibl! subverting :A?#%?A.

    Age o !o$ain

    #his heuristic checks the age of the domain name. Man! phishing sites have

    domains that are registered onl! a few da!s before phishing emails are sent

    out. +e use a +

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    30/53

    measures the number of months from when the domain name was first

    registered. %f the page has been registered longer than 23 months" the

    heuristic will return 2" deeming it as legitimate and otherwise returns -2"

    deeming it as phishing. %f the +

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    31/53

    retrieving the page. :ombined with the limited si'e of the browser address

    bar" this makes it possible to write )7;s that appear legitimate within the

    address bar" but actuall! cause the browser to retrieve a different page. #his

    heuristic is used b! Mo'illa =ire-=o . Bashes are also rarel! used b!legitimate sites" so we use this as another heuristic. Spoof-,uard checks for

    both at s!mbols and dashes in )7;s.

    Suspicious &in+s

    #his heuristic applies the )7; check above to all the links on the page. %f

    an! link on a page fails this )7; check" then the page is labeled as a

    possible phishing scam. #his heuristic is also used b! Spoof-,uard.

    IP Address

    #his heuristic checks if a page&s domain name is an %* address. #his

    heuristic is also used in *%;=(7.

    !ots in )3&

    #his heuristic check the number of dots in a page&s )7;. +e found that

    phishing pages tend to use man! dots in their )7;s but legitimate sites

    usuall! do not. :urrentl!" this heuristic labels a page as phish if there are

    or more dots. #his heuristic is also used in *%;=(7.

    .or$s

    #his heuristic checks if a page contains an!

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    32/53

    So t'are !escription

    8ava

    1ava is a programming language originall! developed b! 1ames ,osling at Sun

    Micros!stems >now a subsidiar! of Oracle :orporation@ and released in 2// as a

    core component of Sun Micros!stemsP 1ava platform. #he language derives much

    of its s!nta from : and : but has a simpler ob6ect model and fewer low-level

    facilities. 1ava applications are t!picall! compiled to b!te code >class file@ that can

    run on an! 1ava Virtual Machine >1VM@ regardless of computer architecture. 1ava

    is a general-purpose" concurrent" class-based" ob6ect-oriented language that isspecificall! designed to have as few implementation dependencies as possible. %t is

    intended to let application developers Kwrite once" run an!where.K 1ava is currentl!

    one of the most popular programming languages in use" particularl! for client-

    server web applications.

    #he original and reference implementation 1ava compilers" virtual machines" and

    class libraries were developed b! Sun from 2// . As of Ma! 344H" in compliancewith the specifications of the 1ava :ommunit! *rocess" Sun relicensed most of its

    1ava technologies under the ,?) ,eneral *ublic ;icense. Others have also

    developed alternative implementations of these Sun technologies" such as the ,?)

    :ompiler for 1ava and ,?) :lass path.

    8ava Plat or$%

    One characteristic of 1ava is portabilit!" which means that computer programs

    written in the 1ava language must run similarl! on an! hardwareToperating-s!stem

    platform. #his is achieved b! compiling the 1ava language code to an intermediate

    representation called 1ava b!te code" instead of directl! to platform-specific

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    33/53

    machine code. 1ava b!te code instructions are analogous to machine code" but are

    intended to be interpreted b! a virtual machine >VM@ written specificall! for the

    host hardware. (nd-users commonl! use a 1ava 7untime (nvironment >17(@

    installed on their own machine for standalone 1ava applications" or in a +eb browser for 1ava applets.

    Standardi'ed libraries provide a generic wa! to access host-specific features such

    as graphics" threading" and networking.

    A ma6or benefit of using b!te code is porting.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    34/53

    the ?et Ceans runtime container is an e ecution environment that understands

    what a module is" handles its lifec!cle" and enables it to interact with other

    modules in the same application.

    7egistration of various ob6ects" files and hints into la!er is prett! central to the wa!

    ?et Ceans based applications handle communication between modules. #his page

    summari'es the list of such e tension points defined b! modules with A*%.

    :onte t menu actions are read from the la!er folder ;oadersTte tT -

    ant mlTActions.

    e! maps folder contains subfolders for individual ke! maps >(macs" 1Cuilder" ?et Ceans@. #he name of ke! map can be locali'ed. )se

    KS!stem=ileS!stem.locali'ingCundleK attribute of !our folder for this purpose.

    %ndividual ke! map folder contains shadows to actions. Shortcut is mapped to the

    name of file. (macs shortcut format is used" multike!s are separated b! space chars

    >K:-X *K means :trl X followed b! *@. Kcurrent e!mapK propert! of K e! mapsK

    folder contains original >not locali'ed@ name of current ke! map.

    #his folder contains registration of shortcuts. %ts supported for backward

    compatibilit! purpose onl!. All new shortcuts should be registerred in

    K e!mapsT?etCeansK folder. Shortcuts installed ins Shortcuts folder will be added

    to all ke!maps" if there is no conflict. %t means that if the same shortcut is mapped

    to different actions in Shortcut folder and current ke!map folder >like

    e!mapT?etCeans@" the Shortcuts folder mapping will be ignored.

    Y Batabase( plorer;a!erA*% in Batabase ( plorer

    Y ;oaders-te t-dbschema-Actions in Batabase ( plorer

    Y ;oaders-te t-s$l-Actions in Batabase ( plorer

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    35/53

    Y *lugin7egistration in 1ava (( Server 7egistr!

    XM; la!er contract for registration of server plug-ins and instances that

    implement optional capabilities of server plug-ins. *lug-ins with server-specific

    deplo!ment descriptor files should declare the full list in XM; la!er as specified in

    the document plugin-la!er-file.html from the above link.

    K*ro6ectsTorg-netbeans-modules-6ava-63sepro6ectT:ustomi'erK folderPs content

    is used to construct the pro6ectPs customi'er. %tPs content is e pected to be

    *ro6ect:ustomi'er.:omposite:ategor!*rovider instances. #he lookup passed to

    the panels contains an instance of *ro6ect and

    org.netbeans.modules.6ava.63sepro6ect.ui.customi'er.13S(*ro6ect*roperties *lease

    note that the latter is not part of an! public A*%s and !ou need implementation

    dependenc! to make use of it.

    K*ro6ectsTorg-netbeans-modules-6ava-63sepro6ectT?odesK folderPs content is

    used to construct the pro6ectPs child nodes. %tPs content is e pected to be ?ode

    =actor! instances.

    K*ro6ectsTorg-netbeans-modules-6ava-63sepro6ectT;ookupK folderPs content is

    used to construct the pro6ectPs additional lookup. %tPs content is e pected to be

    ;ookup *rovider instances. 13S( pro6ect provides ;ookup Mergers for Sources"

    *rivileged #emplates and 7ecommended #emplates. %mplementations added b! 5rd

    parties will be merged into a single instance in the pro6ectPs lookup.

    )se Options Bialog folder for registration of custom top level options panels.7egister !our implementation of Options :ategor! there >Y.instance file@. Standard

    file s!stems sorting mechanism is used.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    36/53

    )se Options BialogTAdvanced folder for registration of custom panels to

    Miscellaneous *anel. 7egister !our implementation of Advanced:ategor! there

    >Y.instance file@. Standard file s!stems sorting mechanism is used.

    )se Options ( portT M! :ategor!W folder for registration of items for

    e portTimport of options. 7egistration in la!ers looks as follows

    Source files must be named after the public class the! contain" appending the suffi

    .6ava" for e ample"

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    37/53

    #he ke!word void indicates that the main method does not return an! value to the

    caller. %f a 1ava program is to e it with an error code" it must call S!stem.e it>@

    e plicitl!.

    #he method name KmainK is not a ke!word in the 1ava language. %t is simpl! the

    name of the method the 1ava launcher calls to pass control to the program. 1ava

    classes that run in managed environments such as applets and (nterprise

    1avaCeans do not use or need a main >@ method. A 1ava program ma! contain

    multiple classes that have main methods" which means that the VM needs to be

    e plicitl! told which class to launch from.

    #he main method must accept an arra! of String ob6ects. C! convention" it is

    referenced as args although an! other legal identifier name can be used. Since 1ava

    " the main method can also use variable arguments" in the form of public static

    void main>String... args@" allowing the main method to be invoked with an arbitrar!

    number of String arguments. #he effect of this alternate declaration is semanticall!

    identical >the args parameter is still an arra! of String ob6ects@" but allows an

    alternative s!nta for creating and passing the arra!.

    #he 1ava launcher launches 1ava b! loading a given class >specified on the

    command line or as an attribute in a 1A7@ and starting its public static void

    main>String[]@ method. Stand-alone programs must declare this method e plicitl!.

    #he String[] args parameter is an arra! of String ob6ects containing an! arguments

    passed to the class. #he parameters to main are often passed b! means of a

    command line.

    *rinting is part of a 1ava standard librar!D #he S!stem class defines a public static

    field called out. #he out ob6ect is an instance of the *rint Stream class and provides

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    38/53

    man! methods for printing data to standard out" including println >String@ which

    also appends a new line to the passed string.

    8ava 9:;igh-level &anguage%

    A high-level programming language developed b! Sun Micros!stems. 1ava was

    originall! called OA " and was designed for handheld devices and set-top bo es.

    Oak was unsuccessful so in 2// Sun changed the name to 1ava and modified the

    language to take advantage of the burgeoning +orld +ide +eb.

    1ava is an ob6ect-oriented language similar to : " but simplified to eliminate

    language features that cause common programming errors. 1ava source code files>files with a .6ava e tension@ are compiled into a format called b!te code >files with

    a .class e tension@" which can then be e ecuted b! a 1ava interpreter. :ompiled

    1ava code can run on most computers because 1ava interpreters and runtime

    environments" known as 1ava Virtual Machines >VMs@" e ist for most operating

    s!stems" including )?%X" the Macintosh OS" and +indows. C!te code can also be

    converted directl! into machine language instructions b! a 6ust-in-time compiler

    >1%#@.

    1ava is a general purpose programming language with a number of features that

    make the language well suited for use on the +orld +ide +eb. Small 1ava

    applications are called 1ava applets and can be downloaded from a +eb server and

    run on !our computer b! a 1ava-compatible +eb browser" such as ?etscape

    ?avigator or Microsoft %nternet ( plorer.

    Ob6ect-oriented software development matured significantl! during the past

    several !ears. #he convergence of ob6ect-oriented modeling techni$ues and

    notations" the development of ob6ect-oriented frameworks and design patterns" and

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    39/53

    the evolution of ob6ect-oriented programming languages have been essential in the

    progression of this technolog!.

    Ob6ect-Oriented Software Bevelopment using 1avaD *rinciples" *atterns" and

    =rameworks contains a ver! applied focus that develops skills in designing

    software-particularl! in writing well-designed" medium-si'ed ob6ect-oriented

    programs. %t provides a broad and coherent coverage of ob6ect-oriented technolog!"

    including ob6ect-oriented modeling using the )nified Modeling ;anguage >)M;@

    ob6ect-oriented design using Besign *atterns" and ob6ect-oriented programming

    using 1ava.

    6etBeans

    #he 6etBeans Plat or$ is a reusable framework for simplif!ing the development

    of 1ava Swing desktop applications. #he ?etCeans %B( bundle for 1ava S(

    contains what is needed to start developing ?etCeans plug-ins and ?etCeans

    *latform based applications no additional SB is re$uired.

    Applications can install modules d!namicall!. An! application can include the

    )pdate :enter module to allow users of the application to download digitall!-

    signed upgrades and new features directl! into the running application.

    7einstalling an upgrade or a new release does not force users to download the

    entire application again.

    http://en.wikipedia.org/wiki/Software_frameworkhttp://en.wikipedia.org/wiki/Java_Swinghttp://en.wikipedia.org/wiki/Digital_signaturehttp://en.wikipedia.org/wiki/Digital_signaturehttp://en.wikipedia.org/wiki/Software_frameworkhttp://en.wikipedia.org/wiki/Java_Swinghttp://en.wikipedia.org/wiki/Digital_signaturehttp://en.wikipedia.org/wiki/Digital_signature
  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    40/53

    #he platform offers reusable services common to desktop applications" allowing

    developers to focus on the logic specific to their application. Among the features of

    the platform areD

    )ser interface management >e.g. menus and toolbars@

    )ser settings management

    Storage management >saving and loading an! kind of data@

    +indow management

    +i'ard framework >supports step-b!-step dialogs@

    ?etCeans Visual ;ibrar!

    %ntegrated Bevelopment #ools

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    41/53

    *a$p Server

    +AM*s are packages of independentl!-created programs installed on computersthat use a Microsoft +indows operating s!stem. +AM* is an acron!m formed

    from the initials of the operating s!stem Microsoft +indows and the principal

    components of the packageD Apache "M!SZ; and one of *

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    42/53

    S#ste$ Architecture

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    43/53

    Modules

    ;oading web page training set.

    #e tual and visual content feature e traction.

    #e t and image classification.

    =using of detected results.

    :omparison of detected fusion results.

    Module !escription

    &oading 'eb page training set

    ;oading the phishing web pages into the database.

    ;oading the protected web pages into the database.

    Textual and visual content eature extraction

    ( traction of te tual content of web page b! using e traction algorithms.

    ( traction of visual content of web page b! using e traction algorithms.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    44/53

    #he te tual feature e traction is done b! using

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    45/53

    =usion algorithm is used for merging or 6oining the te tual and visual

    classified results.

    Co$parison o detected usion results

    #he detected fusion results will be compared with original web page.

    #he posteriori probabilit! will be found b! using the similarit!.

    C! this probabilit! the fusion results of false and true web pages will be

    compared.

    #he false web page is compared with the true web page.

    #he detected results will be shown to the user.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    46/53

    S#ste$ 3e uire$ents

    So t'are 3e uire$ent Operating S!stem D +indows X* ;anguage D :ore 1ava Version D 1B 2. %B( D ?et beans .3 Batabase D M!-S$l

    ;ard'are 3e uire$ents *7O:(SSO7 D *(?#%)M %V :;O: S*((B D 3.H ,< 7AM :A*A:%# D 2 ,C

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    47/53

    Conclusion

    A new content-based anti-phishing s!stem has been thoroughl! developed. %n this

    s!stem" we presented a new framework to solve the anti-phishing problem. #he

    new features of this framework can be represented b! a te t classifier" an image

    classifier" and a fusion algorithm. Cased on the te tual content" the te t classifier is

    able to classif! a given web page into corresponding categories as phishing or

    normal. #his te t classifier was modeled b! SVM rule. Cased on the visual content"

    the image classifier" which relies on SVM" is able to calculate the visual similarit!

    between the given web page and the protected web page efficientl!. #he matching

    threshold used in both te t classifier and image classifier is effectivel! estimated

    b! using a probabilistic model derived from the SVM theor!. A novel data fusion

    model using the SVM theor! was developed and the corresponding fusion

    algorithm presented. #his data fusion framework enables us to directl! incorporate

    the multiple results produced b! different classifiers. #his fusion method provides

    insights for other data fusion applications. More importantl!" it is worth noting that

    our content-based model can be easil! embedded into current industrial anti-

    phishing s!stems.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    48/53

    .uture "nhance$ent

    Our future work will include adding more features into the content

    representations into our current model.

    %nvestigating incremental learning models to solve the knowledge updating

    problem in current probabilistic model.

    Adding more data sets with te tual and visual content of web pages for both

    true and false web pages.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    49/53

    3e erences

    A. (migh. >344 " Oct.@. Online %dentit! #heftD *hishing #echnolog!"

    :hokepoints and :ountermeasures. 7adi ;aboratories %nc." (au :laire" +%

    [Online]. AvailableD httpDTTwww.antiphishing.orgTphisgingdhs- report.pdf

    ;. 1ames" *hishing ( posed. 7ockland" MAD S!ngress" 344 .

    A. . =u" +. ;iu" and X. Beng" EBetecting phishing web pages with visual

    similarit! assessment based on earth mover&s distance >(MB@"F %((( #rans.

    Bepend. Secure :omput." vol. 5" no. 9" pp. 542I522" Oct.I Bec. 344 .

    ,lobal *hishing Surve!D Bomain ?ame )se and #rends in 2

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    50/53

    . hang" S. (gelman" ;. :ranor" and 1.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    51/53

    . hang" 1.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    52/53

    M. :handrasekaran" . ?ara!anan" and S. )padh!a!a" E*hishing email

    detection based on structural properties"F in *roc. /th Annu. ? S :!ber

    Secur. :onf." ?ew ork" 1un. 344 " pp. 3I0.

    %. =ette" ?. Sadeh" and A. #omasic" E;earning to detect phishing emails"F in

    *roc. 2 th %nt. :onf. +orld +ide +eb" Canff" AC" :anada" Ma! 344H" pp.

    9/I .

    S. Abu-?imeh" B. ?appa" X. +ang" and S. ?air" EA comparison of machine

    learning techni$ues for phishing detection"F in *roc. Anti-*hish. +ork.

    ,roups 3nd Annu. e:rime 7es. Summit" *ittsburgh" *A" Oct. 344H" pp. 4I

    /.

    7. Casnet" S. Mukkamala" and A.

  • 8/10/2019 Textual and Visual Content Based Anti-Phishing First Review

    53/53

    :. 7. 1ohn" #he %mage *rocessing