Upload
oswald-campbell
View
218
Download
0
Embed Size (px)
DESCRIPTION
Natural Language Processing Lab National Taiwan University Uniqueness of splogs Dynamic content –Unlike web spam, a splog generates fresh content to drive traffic Non-endorsement link –Hyperlink is an endorsement of other pages –Spammers can create hyperlinks in normal blogs, links in blogs is not endorsement
Citation preview
Natural Language Processing LabNational Taiwan University
The splog Detection Task and A Solution Based on Temporal and Link PropertiesYu-Ru Lin et al.
NEC AmericaTREC 2006 (Blog session)
Presentor: Chun-Yuan Teng
Natural Language Processing LabNational Taiwan University
Splog characteristics• Machine-generated content• No Value-addition
– No unique information to their readers• Hidden agenda, usually an economic
goal– Commercial intention
Natural Language Processing LabNational Taiwan University
Uniqueness of splogs• Dynamic content
– Unlike web spam, a splog generates fresh content to drive traffic
• Non-endorsement link– Hyperlink is an endorsement of other pages– Spammers can create hyperlinks in normal bl
ogs, links in blogs is not endorsement
Natural Language Processing LabNational Taiwan University
Features to detect splog• Traditional features
– Tokenized URL, blog and post titles, homepage content, and post content
• Temporal regularity– Temporal content regularity/Temporal
structural regularity• Link regularity
– Consistency in target website
Natural Language Processing LabNational Taiwan University
Temporal Content Regularity
Natural Language Processing LabNational Taiwan University
Temporal Structural Regularity
Natural Language Processing LabNational Taiwan University
Link Regularity estimation
Natural Language Processing LabNational Taiwan University
Two kinds of spam detection
• Offline detection– Traditional measurement
• Online detection– Detect spam online
Natural Language Processing LabNational Taiwan University
Experimental Result (Offline)
Natural Language Processing LabNational Taiwan University
Experimental results (Offline)
Natural Language Processing LabNational Taiwan University
Online indexing in blog search engine
Natural Language Processing LabNational Taiwan University
Online test
Natural Language Processing LabNational Taiwan University
Online test in this paper
Natural Language Processing LabNational Taiwan University
Experimental results
Natural Language Processing LabNational Taiwan University
Conclusion and contributions
• Modeling the splog problem– The uniqueness of splog
• Regularity based detection– Content and post time
• Evaluation– Online evaluation