Upload
ruli-manurung
View
156
Download
1
Tags:
Embed Size (px)
Citation preview
Automatic Identification of Age-Appropriate Ratings of Song Lyrics
1. The problem
Media age-appropriateness: suitability of consumption of a
song, book, film, videogame, etc., by a child of a given age.
2. The corpus
3. The experiment
• Ordinal class labels: classification via regression (Frank
et al., 1998) using M5P classifier (Wang and Witten,
1997)
• SMOTE oversampling (Chawla et al., 2002)
• 4-fold cross validation
• Features:
• Vector space model (tf-idf weight)
• MRC Psycholinguistic Database (Coltheart 1981):
• Age of acquisition
• Familiarity
• Imageability
• Concreteness
• GloVe (Pennington et al. 2014): 50 dim. pre-
trained vectors (6B tokens: Wikipedia 2014 +
Gigaword 5)
Example:
4. Results
Experiment 1: varying granularity of class labels & instances. VSM
features only:
Experiment 2: focus on per-album granularity. Vary feature
combinations:
3-year old
??
?
Oh, I love trash!
Anything dirty or dingy or dusty
Anything ragged or rotten or rusty
Yes, I love trash
Do you want to build a snowman?
Come on, let’s go and play
I never see you anymore
Come out the door
It’s like you’ve gone away
Don't you ever say I just walked away
I will always want you
I can't live a lie, running for my life
I will always want you
Age # Tracks # Albums Group #Tracks # Albums
2 696 5.7% 119 6.6%Toddler 826 6.7% 142 7.9%
3 130 1.1% 23 1.3%
4 251 2.1% 46 2.6%Pre-schooler 455 3.7% 77 4.3%
5 204 1.7% 31 1.7%
6 281 2.3% 41 2.3%Middle
childhood 11,293 10.6% 230 12.8%7 358 2.9% 71 3.9%
8 654 5.3% 118 6.6%
9 237 1.9% 50 2.8%Middle
childhood 22,407 19.7% 408 22.7%10 1,590 13.0% 253 14.1%
11 580 4.7% 105 5.8%
12 1,849 15.1% 253 14.1%
Young teen 5,069 41.4% 672 37.4%13 1,767 14.4% 242 13.5%
14 1,453 11.9% 177 9.8%
15 653 5.3% 116 6.5%
Teenager 1354 11.1% 196 10.9%16 521 4.3% 64 3.6%
17 180 1.5% 16 0.9%
>17 838 6.8% 73 4.1% Adult 838 6.8% 73 4.1%
Total 12,242 100.0% 1,798 100.0% 12,242 100.0% 1,798 100.0%
Beyond mere censorship:
behavioral, sociological,
psychological, cultural norms.
w1 w2 … wnAOA FAM IMG CNC
- 233 563 274
- 483 628 465
303 311 619 569
- 588 541 599
303 404 588 477
GloVe01 … GloVe50
-0.070292 … 0.71087
0.11891 … 0.92121
-0.13886 … 0.2898
-0.64487 … -1.0992
-0.183778 … 0.20567
Words
oh
i
love
trash
Features used Target: Age Group Target: Year
VSM 70.60% 57.15%
VSM + MRC 71.02% 56.80%
VSM + GloVe 70.58% 57.68%
VSM + GloVe + MRC 70.47% 57.85%
Using human judgments, can we
train a classifier to distinguish age
appropriate song lyrics?
Sample granularity Target: Age Group Target: Year
Per track 69.77% 58.58%
Per album 70.60% 57.15%
Anggi Maulidyani & Ruli ManurungFaculty of Computer Science, Universitas Indonesia
[email protected], [email protected]
AOA FAM IMG CNC
dog 169 610 598 636
sun 181 617 635 639
actuality 586 247 361 213
absolution 608 241 372 256
sex 450 512 617 584
Label
2
w1 w2 … wnAOA FAM IMG CNC
- 276 609 247
- 370 632 400
- 302 606 361
- 199 613 220
- 402 554 399
- 201 632 217
292 608 307
GloVe01 … GloVe50
0.29605 … 0.96954
-0.001091 … 1.1316
0.13627 … 0.51921
0.68047 … -0.26044
1.2426 … -0.19918
0.21705 … 0.1796
0.42855 … 0.390055
Words
do
you
want
to
build
a
Label
5
• Psycholinguistic features provide very slight accuracy
increase (not statistically significant).
• Novel task, still MUCH to be explored (readability metrics,
acoustic features?)
• What is human competence and agreement on this task?
(All works are copyrighted to their respective owners)