Upload
chas
View
100
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Scientific and technical achievements. The FIRST Consortium. Data acquisition pipeline (DacqPipe). Syntactic analysis. Semantic preprocessing. HTML tokenizer. HTML tokenizer. Filter. OBIE. OBIE. Language detector. DB writer. B ' plate remover & duplicate detector. DB writer. - PowerPoint PPT Presentation
Citation preview
The FIRST Consortium
Scientific and technical achievements
FIRST Y3 Review Meeting
Data acquisition pipeline (DacqPipe)
Resembles big data streaming architectures such as Twitter Storm Running continuously since April 2011 Several scientific contributions
Boilerplate remover & gold standard dataset Ontology & ontology-based information extractor
Executable available at http://first.ijs.si/software/DacqPipeJun2013.zip Source code: https://github.com/project-first/dacqpipe
Luxembourg, Nov 2013FIRST Y3 Review Meeting 2
0MQchannel
Emit
OBIE
OBIE
HTML tokenizer
B'plate remover &
duplicate detector
Language detector Filter NLP pipe DB writer
HTML tokenizer
RSS reader
RSS reader
B'plate remover &
duplicate detector
Language detector Filter NLP pipe DB writer
Read & parse CleanSyntacticanalysis Store
DB
Semanticpreprocessing
Dataset of news & blogs
Luxembourg, Nov 2013FIRST Y3 Review Meeting 3
Since April 2011 Data from 219 Web sites; 3,159 RSS feeds Roughly 15 million unique documents collected Actively used by many non-FIRST people, basis for new projects Available at http://first.ijs.si/FIRSTDataset
0
200
400
600
800
1000
1200
1400
1600
1800ya
hoo.
com
wsj.
com
indi
atim
es.c
omcb
snew
s.co
mny
times
.com
bbc.
co.u
kda
ilym
ail.c
o.uk
reut
ers.
com
go.c
omse
ekin
galp
ha.c
omte
legr
aph.
co.u
kth
eglo
bean
dmai
l.com
inde
pend
ent.c
o.uk
nypo
st.c
omch
inad
aily
.com
.cn
cbc.
caib
times
.com
busin
essin
sider
.com
foxn
ews.
com
bost
on.c
omth
eaus
tral
ian.
com
.au
kyod
onew
s.jp
mar
ketw
atch
.com
indy
star
.com
fool
.com
mirr
or.c
o.uk
stra
itstim
es.c
omcn
n.co
mft.
com
natio
nalp
ost.c
omus
atod
ay.c
omho
llyw
oodr
epor
ter.c
ompr
ime-
tass
.com
was
hing
tonp
ost.c
omla
times
.com
ibtim
es.c
o.uk
pr-in
side.
com
zack
s.co
mgu
ardi
an.c
o.uk
ap.o
rghu
rriy
etda
ilyne
ws.
com
goog
le.c
ombl
oom
berg
.com
trus
t.org
chan
neln
ewsa
sia.c
ombu
sines
swee
k.co
mita
r-ta
ss.c
omw
allst
chea
tshe
et.c
omal
lafr
ica.
com
smh.
com
.au
Avg docs per domain daily (top 50)
Avg docs (accepted, rev = 1) Avg docs (accepted, rev > 1)
Knowledge-based sentiment analysis
Sentence-level knowledge-based approach Glass box: detailed drill-down capability Best paper award at CEC 2011 Gold standard sentiment corpus (evaluation, hybrid model) First such attempt in the financial domain and on this scale (connected to DacqPipe) Source code & rules: https://github.com/project-first/semanticinformationextraction
Luxembourg, Nov 2013FIRST Y3 Review Meeting 4
Quantitative/qualitative models
Luxembourg, Nov 2013FIRST Y3 Review Meeting 5
Quantitative models for document categorization, pump-and-dump detection, and Twitter sentiment classification
More or less black boxes Source code
https://github.com/project-first/documentcategorizerdemo https://github.com/project-first/pumpanddumpclassifierdemo
/
P&D
model
Quantitative models
Quantitative/qualitative models
Luxembourg, Nov 2013FIRST Y3 Review Meeting 6
/
P&D
model
Streams
Quantitative modelsCountry Black List
Industry Black List
Company Black List
Age
Bankrupt
Trading Volume
Number of Trades
Market Capitalization
Market Segment
Sentiment
Content
Black List
History
Market
Trading
News
Company
Financial Instrument
Comp_FinInst Pump & Dump
Qualitative models
Qualitative multi-attribute models for reputational risk assessment and pump-and-dump detection
Rule-based, developed by domain experts Glass box: drill-down, what-if analysis… Integrated into use-case prototypes Best paper award at Bled eConference 2013 Source code: https://github.com/project-first/rimmodel
Visualization API
Luxembourg, Nov 2013FIRST Y3 Review Meeting 7
Comprehensive technical documentation with examples Both data sources
News & blogs Twitter
Easy to use, used in use-case prototypes Available at http://first.ijs.si/VisApi/indexVis.html
b-next: Market manipulation prototype
Luxembourg, Nov 2013FIRST Y3 Review Meeting 8
Java-based software prototype for capital market surveillance Two new & unique market abuse scenarios based on unstructured information Alerts based on individual threshold configurations Exploration of suspicious market constellations based on alerting and visualisation
components Positive end-user feedback Customers think that these scenarios are real problems and need to be addressed
MPS: Reputational risk prototype
Luxembourg, Nov 2013FIRST Y3 Review Meeting 9
Sentiment analysis included in reputational risk module on financial counterparts Data sources: a mix of structured (Basel II, Pillar 3) and unstructured information (Web sources) RIM model is fully scalable (by counterpart and by financial product) Visualisation tools to support decisions
…fills in a methodological gap in quantitative reputational risk assessment for financial institutions
…can fulfill also the needs of non-financial organisations Available at http://first-vm1.ijs.si/mps
IDMS: Retail brokerage prototypes
Luxembourg, Nov 2013FIRST Y3 Review Meeting 10
Additional indicators based on sentiment and tweet volume Content exploration and drill-down Exploring lagged correlations
Trading volume : tweeting volume Price : sentiment polarity
Positive feedback from potential customers
Luxembourg, Nov 2013FIRST Y3 Review Meeting 11
New§tream: Web-based visual interface for exploratory news analysis When, how much, with which sentiment? Volume and sentiment charts, canyon flows, tag clouds, drill-down Use cases
Globalization of local news Effects of news on CDS
Available at http://first.ijs.si/Occurrences (http://newstream.ijs.si)
Sovereign debt prototype
Sentify portal
End-user oriented GUI as an entry point to showcase the FIRST results Document navigation and sentiment
drill-down Exploration of aggregated sentiment
data Comparative analysis between fuzzy
and crisp sentiments Reputation topics analysis
Available at http://sentify.project-first.eu
Source code: http://github.com/project-first/sentify-portal
Luxembourg, Nov 201312FIRST Y3 Review Meeting
Political sentiment on Twitter
Luxembourg, Nov 2013FIRST Y3 Review Meeting 13
Slovene presidential elections, November 2012 Live sentiment stream shown on POP TV Political leaning based on sentiment well correlated with the election results Polling agencies and newspapers failed to predict the victor
Political sentiment on Twitter
Luxembourg, Nov 2013FIRST Y3 Review Meeting 14
Bulgarian parliamentarian elections, May 2013 Big scandal on the day before the elections (illegal ballots) Prevailing negative sentiment Nearly perfect match between Twitter volume and election results