14
evol utio n le xi ca l ea rni n g wo rd N L P mo de l no de ne tw or k sy nt ax P O S @ com ple x sem ant i e d g e ba ng la P A D D zu lu Social Computing for Linguistics & Linguistics for Social Computing Monojit Choudhury Microsoft Research India [email protected]

Evolu tion lex ica lear ning wo rd NL P mo del no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la PAPA D zul u Social Computing for

Embed Size (px)

Citation preview

Page 1: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

evolution

lexica

lear

ningword

NLP

model

node

network

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

Social Computing for Linguistics & Linguistics for Social Computing

Monojit ChoudhuryMicrosoft Research [email protected]

Page 2: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

evolution

lexica

learni

ngword

NLP

model

node

networ

k

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

Language is shaped by social interaction patterns;emails, blogs, sms, queries – CMC’sLinguistics for SoC

SoC for LinguisticsSoC platforms for linguistic experiments; SoC Models

for linguistic theories

Page 3: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

evolution

lexica

learni

ngword

NLP

model

node

networ

k

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

Language is shaped by social interaction patterns;emails, blogs, sms, queries – CMC’sLinguistics for SoC

SoC for LinguisticsSoC platforms for linguistic experiments; SoC Models

for linguistic theories

Web Search Queries as an Emerging

Language

Page 4: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

evolution

lexica

learni

ngword

NLP

model

node

networ

k

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

Language is shaped by social interaction patterns;emails, blogs, sms, queries – CMC’sLinguistics for SoC

SoC for LinguisticsSoC platforms for linguistic experiments; SoC Models

for linguistic theories

2000 2006 20100

1

2

3

4

5

2.21

3.54

Mean length of queries

Web Search Queries as an Emerging

Language

Page 5: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

evolution

lexica

learni

ngword

NLP

model

node

networ

k

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

Language is shaped by social interaction patterns;emails, blogs, sms, queries – CMC’sLinguistics for SoC

SoC for LinguisticsSoC platforms for linguistic experiments; SoC Models

for linguistic theories

2000 2006 20100

1

2

3

4

5

2.21

3.54

Mean length of queries

Web Search Queries as an Emerging

Language

Page 6: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

6

evolution

lexica

learni

ngword

NLP

model

node

networ

k

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

Language is shaped by social interaction patterns;emails, blogs, sms, queries – CMC’sLinguistics for SoC

SoC for LinguisticsSoC platforms for linguistic experiments; SoC Models

for linguistic theories

Web Search Queries as an Emerging

Language

Unsupervised segmentation

Distributional PatternsTaxonomy of Function units

Word co-occurrence NetworkQueries are difficult to interpret

User Behavior Studies

Basic units

Search expertise language acquisition

Page 7: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

7

evolution

lexica

learni

ngword

NLP

model

node

networ

k

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

SoC for LinguisticsSoC platforms for linguistic experiments; SoC Models

for linguistic theories

Web Search Queries as an Emerging

Language

Web Search Queries are an evolving Protolanguage

Collaborators: Rishiraj Saha Roy & Niloy Ganguly (IIT Kharagpur), Srivatsan Laxman & Kalika Bali (MSR India)

Page 8: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

8

evolution

lexica

learni

ngword

NLP

model

node

networ

k

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

Language is shaped by social interaction patterns;emails, blogs, sms, queries – CMC’sLinguistics for SoC

SoC for LinguisticsSoC platforms for linguistic experiments; SoC Models

for linguistic theories

Use of Indian languages on online

social media

Code mixing

Transliteration

Spelling Change

Indian English

Collaborators: Kalika Bali & Nimmi Rangaswamy (MSR India)

Page 9: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

9

evolution

lexica

learni

ngword

NLP

model

node

networ

k

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

Language is shaped by social interaction patterns;emails, blogs, sms, queries – CMC’sLinguistics for SoC

SoC for LinguisticsSoC platforms for linguistic experiments; SoC Models

for linguistic theories

Sociolinguistic ExperimentsOSN Games with a

Purpose

CodeGambler is an OSN Game to study the population-level emergence of categorization of a continuous space (colors) into discrete category terms (color names)

Collaborators: Animesh Mukherjee (IIT Kgp), Vittorio Loreto, (University of Rome)

Page 10: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

10

evolution

lexica

learni

ngword

NLP

model

node

networ

k

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

Language is shaped by social interaction patterns;emails, blogs, sms, queries – CMC’sLinguistics for SoC

SoC for LinguisticsSoC platforms for linguistic experiments; SoC Models

for linguistic theories

Sociolinguistic ExperimentsCrowdsourcing for

data

Flat segmentation (queries and sentences)

Nested segmentation

Page 11: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

11

evolution

lexica

learni

ngword

NLP

model

node

networ

k

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

Language is shaped by social interaction patterns;emails, blogs, sms, queries – CMC’sLinguistics for SoC

SoC for LinguisticsSoC platforms for linguistic experiments; SoC Models

for linguistic theories

SoC inspired Linguistic TheoryWord Co-occurrence

Network

Small world

word

language

in

human

treat

as

is

can

evolving

neighbori

ng

distinct

interacting

web

sentences

such

structur

e

acomplex

network

Two-regime power law degree distribution

Kernel-Periphery structure

Low rank

Collaborators: Animesh, Niloy (IIT Kgp), Chris Biemann (University of Darmstadt), Ravi Kannan (MSR India)

Page 12: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

12

evolution

lexica

learni

ngword

NLP

model

node

networ

k

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

Language is shaped by social interaction patterns;emails, blogs, sms, queries – CMC’sLinguistics for SoC

SoC inspired Linguistic TheoryWord Co-occurrence

NetworkCollaborators: Animesh, Niloy (IIT Kgp), Chris Biemann (University of Darmstadt), Ravi Kannan (MSR India)

WCN of Bollywood Lyrics is a very small world with a tiny kernel.

Learn 1000 Hindi words and you will understand every Bollywood song!

Page 13: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

13

evolution

lexica

learni

ngword

NLP

model

node

networ

k

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu

Language is shaped by social interaction patterns;emails, blogs, sms, queries – CMC’sLinguistics for SoC

SoC inspired Linguistic TheoryWord Co-occurrence

NetworkCollaborators: Animesh, Niloy (IIT Kgp), Chris Biemann (University of Darmstadt), Ravi Kannan (MSR India)

WCN of Web search queries is not (yet) small world!

It has a tiny kernel but very large periphery

Page 14: Evolu tion lex ica lear ning wo rd NL P mo del  no de net wo rk syn tax POSPOS @ com plex sem anti ed ge ba ng la  PAPA D zul u Social Computing for

Thank You!http://research.microsoft.com/people/monojitc/

evolution

lexica

lear

ningword

NLP

model

node

network

syntax

POS

@

complex

semanti

edge

bangla

PA

DD

zulu