Conversation Level Constraints on Pedophile Detection in Chat Rooms

Conversation Level Constraints on Pedophile Detection in Chat RoomsPAN 2012 — Sexual Predator Identification

Claudia Peersman, Frederik Vaassen, Vincent Van Asch and Walter Daelemans

Overview• Task 1: Sexual Predator Identification• Preprocessing• Experimental Setup and Results• Test Run Results

• Task 2: Identifying Grooming Posts• Grooming Dictionary• Test Run Results

• Discussion

Task 1: Preprocessing of the Data• Data: PAN 2012 competition training set• predator vs. non-predator• info on the conversation, user and post level

• Two splits: training and validation set

• No user was present in more than one cluster

prevent overfitting of user-specific features

Experimental Setup

• Features: token unigrams

• LiBSVM

• Probability output

• Parameter optimization

• Experiments on 3 levels

• data resampling

Level 1: the Post Classifier

• Resample the number of posts

Equal distribution of posts per class

• About 40,000 posts per class in training

• No resampling in the validation sets

Level 1: the Post Classifier (2)• Only output on the post level

• Aggregate the post level predictions to the user level:• LiBSVM’s probability outputs• Predators = average of the 10 highest

predator class probabilities ≥ 0.85

Results for the Predator Class

Scores Post ClassifierRecall 0.93Precision 0.36F-score 0.52

Level 2: the User Classifier• Resampling on the user level

exclude users with no suspicious posts

• Filter: dictionary of grooming vocabulary see Task 2

• Why? • reduce the amount of data • “hard” classification higher precision?

Update Results (1)

Scores Post Classifier User ClassifierRecall 0.93 0.82Precision 0.36 0.88F-score 0.52 0.84

Combine systems?

Data reduction: up to 48.4%

Combining the systems• Weighted voting using LiBSVM’s

probability outputs

• 70% of the weight on the high precision User Classifier

Update Results (2)

Scores Post Classifier

User Classifier

Combined Results

Recall 0.93 0.82 0.85Precision 0.36 0.88 0.84F-score 0.52 0.84 0.84

Level 3: Conversation Level Constraints

• Both users in a conversation labeled as predators

• Our approach: • go back to predator probability output • use the high precision user classifier• Predator probability ≥ 0.75

System Overview

Combined Prediction

Apply Conversatio

n Level Constraints

Final Predator

ID List

Post Classifier

User Classifier

Update Results (3)

Scores Post Classifier

User Classifier

Combined Results

Combined +

ConstraintsRecall 0.93 0.82 0.85 0.85Precision 0.36 0.88 0.84 0.94F-score 0.52 0.84 0.84 0.89

Results on the PAN 2012 Test Set

Scores Combined + Constraints PAN Test Set

Recall 0.85 0.60Precision 0.94 0.89F-score (β = 1) 0.89 0.72

• Future research: • more splits• investigate ensembles

Task 2: Identifying Grooming Posts • From the final predator ID list detect posts

expressing typical grooming behavior

• No gold standard labels What is grooming?

• Predator conversations have predictable stages (e.g. Lanning, 2010; McGhee et al., 2011)

Task 2: Identifying Grooming Posts (2)

• Dictionary containing references to 6 stages:• sexual topic• reframing• approach • data requests• isolation from adult supervision• age (difference)

Task 2: Identifying Grooming Posts (3)

• Resources:• McGhee et al. (2011)• English Urban Dictionary website

http://www.urbandictionary.com/• English Synonyms

http://www.synonym.net/

• cf. user classifier filter

Results on the PAN 2012 Test Set

• Precision = 0.36

• Recall = 0.26

• F-score (β = 1) = 0.30

Discussion

• Use of β-factors to calculate the F-score:• Task 1: focus on precision (β = 0.5)• Task 2: focus on recall (β = 3.0)

• However, in practice:• find all predators (recall in Task 1)• find the most striking posts (precision in

Task 2)

Questions?

Contact: claudia.peersman@ua.ac.be

Conversation Level Constraints on Pedophile Detection in Chat Rooms

Documents

7FAILS REASONS LIVE CHAT - Automotive News Reasons-ebook-final10.pdf · 7 REASONS LIVE CHAT FAILS 1 Table of Contents Why Primitive Chat Failed 2 1. Low Quality Conversation 3 2

· Web view#mHealth Tweet Chat #2: May 9, 2012. To read the chat, scroll to the bottom first. The conversation moves up. dlschermd Thank YOU! #mhealth -9:00 PM May 9th, 2012

Reve Chat - Web Chat Software

AUTISM SPECTRUM DISORDERS - lumsa.it l2 chat m-chat q-chat pddst-ii esat itc fyi cesdd sacs m-chat q-chat fyi chat-23 biscuit stat

Nelson Mandela University · Web viewHow to Post in Chat Channels You can type directly in the conversation Channel (Posts tab), press Enter or Send

LEO LANDRY Pedophile Priest

How to deal with Enemies of the State. “[Your God] is a demon-possessed pedophile.”

Welcome to the December TLC Chat! A Conversation on Qualitative Research Your Hosts: Kelly and Roberta A Conversation on Qualitative Research Your Hosts:

A Conversation with the Executive Director 2020 Conference ... · A Conversation with the Executive Director 2020 Conference Update Friday, June 12th, 2020 Please use the CHAT feature

This Twitter chat is a follow on conversation from the ......This Twitter chat is a follow on conversation from the STEMTLnet and STEM for All Multiplex Theme of the Month Expert Panel,

Introduction to Skype · chat. Tells you about the folks you’re speaking to. H. Conversation window –Displayed when you select a contact or set up a group chat. Shows the instant

ignouindia.inignouindia.in/.../03/bcsl-063-Solved-Lab-Manual.docx · Web viewBy default, the pane that displays your chat partner's conversation uses the background color and font

The Teaching Exchange Pecha Kucha Presentations. What is Pecha Kucha ???? Pecha Kucha is the Japanese term for the sound of conversation (“chit chat”

mppenglishblog.files.wordpress.com · chat-up line (n): conversation meant to impress someone fact-tank (n): a research organisation that surveys people for information wary (adj):

555 chat chat chat

sb64bce9ae655ac16.jimcontent.com€¦ · "Chat" Chat 1 Chatl Chat2Dÿ6V) tŽkb0 Chat 2 ZDäË Chat 1 12 Chat 2 'Chatl

Teen DV Month Twitter Chats Adolescent Dating Abuse ... · Twitter chat—which welcomes advocates, educators, parents and concerned citizens— join our conversation on the connection

Amnesty International Director Colm O'Gorman: Catholic Church Protects Pedophile Priests

CP Link between CP and Mad Mc Cann and Pedophile Networks ?

Paul J Bohrer Asst Fire Chief Troy NY, Arsonist,Pedophile,Boy Scout Leader