14
The FIRST Consortium Scientific and technical achievements FIRST Y3 Review Meeting

Scientific and technical achievements

  • Upload
    chas

  • View
    100

  • Download
    0

Embed Size (px)

DESCRIPTION

Scientific and technical achievements. The FIRST Consortium. Data acquisition pipeline (DacqPipe). Syntactic analysis. Semantic preprocessing. HTML tokenizer. HTML tokenizer. Filter. OBIE. OBIE. Language detector. DB writer. B ' plate remover & duplicate detector. DB writer. - PowerPoint PPT Presentation

Citation preview

Page 1: Scientific and technical  achievements

The FIRST Consortium

Scientific and technical achievements

FIRST Y3 Review Meeting

Page 2: Scientific and technical  achievements

Data acquisition pipeline (DacqPipe)

Resembles big data streaming architectures such as Twitter Storm Running continuously since April 2011 Several scientific contributions

Boilerplate remover & gold standard dataset Ontology & ontology-based information extractor

Executable available at http://first.ijs.si/software/DacqPipeJun2013.zip Source code: https://github.com/project-first/dacqpipe

Luxembourg, Nov 2013FIRST Y3 Review Meeting 2

0MQchannel

Emit

OBIE

OBIE

HTML tokenizer

B'plate remover &

duplicate detector

Language detector Filter NLP pipe DB writer

HTML tokenizer

RSS reader

RSS reader

B'plate remover &

duplicate detector

Language detector Filter NLP pipe DB writer

Read & parse CleanSyntacticanalysis Store

DB

Semanticpreprocessing

Page 3: Scientific and technical  achievements

Dataset of news & blogs

Luxembourg, Nov 2013FIRST Y3 Review Meeting 3

Since April 2011 Data from 219 Web sites; 3,159 RSS feeds Roughly 15 million unique documents collected Actively used by many non-FIRST people, basis for new projects Available at http://first.ijs.si/FIRSTDataset

0

200

400

600

800

1000

1200

1400

1600

1800ya

hoo.

com

wsj.

com

indi

atim

es.c

omcb

snew

s.co

mny

times

.com

bbc.

co.u

kda

ilym

ail.c

o.uk

reut

ers.

com

go.c

omse

ekin

galp

ha.c

omte

legr

aph.

co.u

kth

eglo

bean

dmai

l.com

inde

pend

ent.c

o.uk

nypo

st.c

omch

inad

aily

.com

.cn

cbc.

caib

times

.com

busin

essin

sider

.com

foxn

ews.

com

bost

on.c

omth

eaus

tral

ian.

com

.au

kyod

onew

s.jp

mar

ketw

atch

.com

indy

star

.com

fool

.com

mirr

or.c

o.uk

stra

itstim

es.c

omcn

n.co

mft.

com

natio

nalp

ost.c

omus

atod

ay.c

omho

llyw

oodr

epor

ter.c

ompr

ime-

tass

.com

was

hing

tonp

ost.c

omla

times

.com

ibtim

es.c

o.uk

pr-in

side.

com

zack

s.co

mgu

ardi

an.c

o.uk

ap.o

rghu

rriy

etda

ilyne

ws.

com

goog

le.c

ombl

oom

berg

.com

trus

t.org

chan

neln

ewsa

sia.c

ombu

sines

swee

k.co

mita

r-ta

ss.c

omw

allst

chea

tshe

et.c

omal

lafr

ica.

com

smh.

com

.au

Avg docs per domain daily (top 50)

Avg docs (accepted, rev = 1) Avg docs (accepted, rev > 1)

Page 4: Scientific and technical  achievements

Knowledge-based sentiment analysis

Sentence-level knowledge-based approach Glass box: detailed drill-down capability Best paper award at CEC 2011 Gold standard sentiment corpus (evaluation, hybrid model) First such attempt in the financial domain and on this scale (connected to DacqPipe) Source code & rules: https://github.com/project-first/semanticinformationextraction

Luxembourg, Nov 2013FIRST Y3 Review Meeting 4

Page 5: Scientific and technical  achievements

Quantitative/qualitative models

Luxembourg, Nov 2013FIRST Y3 Review Meeting 5

Quantitative models for document categorization, pump-and-dump detection, and Twitter sentiment classification

More or less black boxes Source code

https://github.com/project-first/documentcategorizerdemo https://github.com/project-first/pumpanddumpclassifierdemo

/

P&D

model

Quantitative models

Page 6: Scientific and technical  achievements

Quantitative/qualitative models

Luxembourg, Nov 2013FIRST Y3 Review Meeting 6

/

P&D

model

Streams

Quantitative modelsCountry Black List

Industry Black List

Company Black List

Age

Bankrupt

Trading Volume

Number of Trades

Market Capitalization

Market Segment

Sentiment

Content

Black List

History

Market

Trading

News

Company

Financial Instrument

Comp_FinInst Pump & Dump

Qualitative models

Qualitative multi-attribute models for reputational risk assessment and pump-and-dump detection

Rule-based, developed by domain experts Glass box: drill-down, what-if analysis… Integrated into use-case prototypes Best paper award at Bled eConference 2013 Source code: https://github.com/project-first/rimmodel

Page 7: Scientific and technical  achievements

Visualization API

Luxembourg, Nov 2013FIRST Y3 Review Meeting 7

Comprehensive technical documentation with examples Both data sources

News & blogs Twitter

Easy to use, used in use-case prototypes Available at http://first.ijs.si/VisApi/indexVis.html

Page 8: Scientific and technical  achievements

b-next: Market manipulation prototype

Luxembourg, Nov 2013FIRST Y3 Review Meeting 8

Java-based software prototype for capital market surveillance Two new & unique market abuse scenarios based on unstructured information Alerts based on individual threshold configurations Exploration of suspicious market constellations based on alerting and visualisation

components Positive end-user feedback Customers think that these scenarios are real problems and need to be addressed

Page 9: Scientific and technical  achievements

MPS: Reputational risk prototype

Luxembourg, Nov 2013FIRST Y3 Review Meeting 9

Sentiment analysis included in reputational risk module on financial counterparts Data sources: a mix of structured (Basel II, Pillar 3) and unstructured information (Web sources) RIM model is fully scalable (by counterpart and by financial product) Visualisation tools to support decisions

…fills in a methodological gap in quantitative reputational risk assessment for financial institutions

…can fulfill also the needs of non-financial organisations Available at http://first-vm1.ijs.si/mps

Page 10: Scientific and technical  achievements

IDMS: Retail brokerage prototypes

Luxembourg, Nov 2013FIRST Y3 Review Meeting 10

Additional indicators based on sentiment and tweet volume Content exploration and drill-down Exploring lagged correlations

Trading volume : tweeting volume Price : sentiment polarity

Positive feedback from potential customers

Page 11: Scientific and technical  achievements

Luxembourg, Nov 2013FIRST Y3 Review Meeting 11

New§tream: Web-based visual interface for exploratory news analysis When, how much, with which sentiment? Volume and sentiment charts, canyon flows, tag clouds, drill-down Use cases

Globalization of local news Effects of news on CDS

Available at http://first.ijs.si/Occurrences (http://newstream.ijs.si)

Sovereign debt prototype

Page 12: Scientific and technical  achievements

Sentify portal

End-user oriented GUI as an entry point to showcase the FIRST results Document navigation and sentiment

drill-down Exploration of aggregated sentiment

data Comparative analysis between fuzzy

and crisp sentiments Reputation topics analysis

Available at http://sentify.project-first.eu

Source code: http://github.com/project-first/sentify-portal

Luxembourg, Nov 201312FIRST Y3 Review Meeting

Page 13: Scientific and technical  achievements

Political sentiment on Twitter

Luxembourg, Nov 2013FIRST Y3 Review Meeting 13

Slovene presidential elections, November 2012 Live sentiment stream shown on POP TV Political leaning based on sentiment well correlated with the election results Polling agencies and newspapers failed to predict the victor

Page 14: Scientific and technical  achievements

Political sentiment on Twitter

Luxembourg, Nov 2013FIRST Y3 Review Meeting 14

Bulgarian parliamentarian elections, May 2013 Big scandal on the day before the elections (illegal ballots) Prevailing negative sentiment Nearly perfect match between Twitter volume and election results