52
Big Data from the Web Georg Gottlob, TU Wien & University of Oxford Reports about joint work with C.Koch, A. Pieris, R. Pichler, R. Baumgartner, M. Herzog, T.Furche, et al.

Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Big Data from the Web

Georg Gottlob, TU Wien & University of Oxford

Reports about joint work with C.Koch, A. Pieris, R. Pichler, R. Baumgartner, M. Herzog, T.Furche, et al.

Page 2: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

The WWW is older than we may think...

Page 3: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

3

This talk is about extracting data from the Web…

…and about our (ad)ventures in:

Theory

Technology

Commercialization

Business

(Kaplan Turbine)

Page 4: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

The Web is the Largest Database

4

Page 5: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

The Web is the Largest Database

5

Page 6: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

The Web is not a Database

6

• No common schema for data.

• No structured queries possible

• Only keyword search

Page 7: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

The Web is not a Database!

7

• No common schema for data.

• No structured queries possible

• Only keyword search

For example, the following queries cannot be answered:

- List all flats for sale in Vienna having a balcony that cost

less than €500,000 and that are in an area with

above-average many Italian restaurants.

- List all used cars for sale in or around Oxford that are not

older than 5 years and that cost no more than £6,000.

- Find translation offices in Europe that translate German to Basque,

charge no more than €20 per page, and accept VISA card payments

Page 8: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Web Data Extraction

ref-code postcode bedrooms bathrooms available price

33453 OX2 6AR 3 2 15/10/2013 £1280 pcm

33433 OX4 7DG 2 1 18/04/2013 £995 pcm

Page 9: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

HTML Content Extractor

Function f: HTML Parse tree Subtrees

Leaves of subtrees are among leaves of orig. tree

f

Page 10: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

html

body

table

tr

td

tr

td td td td td

Ch

risto

ph

Ko

ch

Geo

rg G

ottlo

b

go

ttlob

@d

ba

i.tuw

ien

.ac

.at

ko

ch

@d

ba

i.tuw

ien

.ac

.at

18

44

9

18

42

0

h1

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html> <body>

<h1>People @ DBAI</h1>

<table border="1" cellpadding="3" cellspacing="1">

<tr> <td>Georg Gottlob</td>

<td>[email protected]</td>

<td>18420</td>

</tr>

<tr> <td>Christoph Koch</td>

<td>[email protected]</td>

<td>18449</td>

</tr>

</table>

</body> </html>

A HTML page

Georg Gottlob gottlob@… 18420

Christoph Koch koch@… 18449

People @ DBAI

Page 11: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Predicate employeetable

Georg Gottlob gottlob@… 18420

Christoph Koch koch@… 18449

People @ DBAI

html

body

table

tr

td

tr

td td td td td

Ch

risto

ph

Ko

ch

Geo

rg G

ottlo

b

go

ttlob

@d

ba

i.tuw

ien

.ac

.at

ko

ch

@d

ba

i.tuw

ien

.ac

.at

18

44

9

18

42

0

h1

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html> <body>

<h1>People @ DBAI</h1>

<table border="1" cellpadding="3" cellspacing="1">

<tr> <td>Georg Gottlob</td>

<td>[email protected]</td>

<td>18420</td>

</tr>

<tr> <td>Christoph Koch</td>

<td>[email protected]</td>

<td>18449</td>

</tr>

</table>

</body> </html>

Page 12: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Predicate employee

Georg Gottlob gottlob@… 18420

Christoph Koch koch@… 18449

People @ DBAI

html

body

table

tr

td

tr

td td td td td

Ch

risto

ph

Ko

ch

Ge

org

Go

ttlob

go

ttlob

@d

ba

i.tuw

ien

.ac

.at

ko

ch

@d

ba

i.tuw

ien

.ac

.at

18

44

9

18

42

0

h1

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html> <body>

<h1>People @ DBAI</h1>

<table border="1" cellpadding="3" cellspacing="1">

<tr> <td>Georg Gottlob</td>

<td>[email protected]</td>

<td>18420</td>

</tr>

<tr> <td>Christoph Koch</td>

<td>[email protected]</td>

<td>18449</td>

</tr>

</table>

</body> </html>

Page 13: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html> <body>

<h1>People @ DBAI</h1>

<table border="1" cellpadding="3" cellspacing="1">

<tr> <td>Georg Gottlob</td>

<td>[email protected]</td>

<td>18420</td>

</tr>

<tr> <td>Christoph Koch</td>

<td>[email protected]</td>

<td>18449</td>

</tr>

</table>

</body> </html>

Predicate phone

Georg Gottlob gottlob@… 18420

Christoph Koch koch@… 18449

People @ DBAI

html

body

table

tr

td

tr

td td td td td

Ch

risto

ph

Ko

ch

Geo

rg G

ottlo

b

go

ttlob

@d

ba

i.tuw

ien

.ac

.at

ko

ch

@d

ba

i.tuw

ien

.ac

.at

18

44

9

18

42

0

h1

Page 14: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Logic heaven

DB theory heaven

DB programming heaven

Application design heaven

MSO

Monadic Datalog

Elog

Lixto Visual Wrapper

Suite

Page 15: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Monadic Datalog as a Wrapping Language

html

body

table

tr

td

tr

td td td td td

root

entry(X) :- root(R), firstchild(R,U), label[html](U),

firstchild(U,V), label[body](V),

firstchild(V,W),label[table](W),

firstchild(W,X), label[tr](X).

entry(X):- entry(Y), nextsibling(Y,X).

name(X) :- entry(E), firstchild(E, X), label[td](X).

email(X) :- name(N), nextsibling(N, X), label[td](X).

phone(X) :- email(M), nextsibling(M, X), label[td](X).

Page 16: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

entry(X) :- root(R), firstchild(R,U), label[html](U),

firstchild(U,V), label[body](V),

firstchild(V,W),label[table](W),

firstchild(W,X), label[tr](X).

entry(X):- entry(Y), nextsibling(Y,X).

name(X) :- entry(E), firstchild(E, X), label[td](X).

email(X) :- name(N), nextsibling(N, X), label[td](X).

phone(X) :- email(M), nextsibling(M, X), label[td](X).

Monadic Datalog as a Wrapping Language

html

body

table

tr

td

tr

td td td td td

root

Page 17: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

entry(X) :- root(R), firstchild(R,U), label[html](U),

firstchild(U,V), label[body](V),

firstchild(V,W),label[table](W),

firstchild(W,X), label[tr](X).

entry(X):- entry(Y), nextsibling(Y,X).

name(X) :- entry(E), firstchild(E, X), label[td](X).

email(X) :- name(N), nextsibling(N, X), label[td](X).

phone(X) :- email(M), nextsibling(M, X), label[td](X).

Monadic Datalog as a Wrapping Language

html

body

table

tr

td

tr

td td td td td

root

Page 18: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Monadic Datalog as a Wrapping Language

html

body

table

tr

td

tr

td td td td td

root

entry(X) :- root(R), firstchild(R,U), label[html](U),

firstchild(U,V), label[body](V),

firstchild(V,W),label[table](W),

firstchild(W,X), label[tr](X).

entry(X):- entry(Y), nextsibling(Y,X).

name(X) :- entry(E), firstchild(E, X), label[td](X).

email(X) :- name(N), nextsibling(N, X), label[td](X).

phone(X) :- email(M), nextsibling(M, X), label[td](X).

Page 19: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Monadic Datalog as a Wrapping Language

html

body

table

tr

td

tr

td td td td td

root

entry(X) :- root(R), firstchild(R,U), label[html](U),

firstchild(U,V), label[body](V),

firstchild(V,W),label[table](W),

firstchild(W,X), label[tr](X).

entry(X):- entry(Y), nextsibling(Y,X).

name(X) :- entry(E), firstchild(E, X), label[td](X).

email(X) :- name(N), nextsibling(N, X), label[td](X).

phone(X) :- email(M), nextsibling(M, X), label[td](X).

Page 20: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

entry(X) :- root(R), firstchild(R,U), label[html](U),

firstchild(U,V), label[body](V),

firstchild(V,W),label[table](W),

firstchild(W,X), label[tr](X).

entry(X):- entry(Y), nextsibling(Y,X).

name(X) :- entry(E), firstchild(E, X), label[td](X).

email(X) :- name(N), nextsibling(N, X), label[td](X).

phone(X) :- email(M), nextsibling(M, X), label[td](X).

Monadic Datalog as a Wrapping Language

html

body

table

tr

td

tr

td td td td td

root

Page 21: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

How complex is Monadic Datalog?

Monadic Datalog over trees has

combined complexity:

O(|data tree|*|program|)

Theorem [G. & Koch 2002]:

Page 22: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

ELOG [Baumgartner, Flesca, G. VLDB’01]

Examples of Special predicates:

subelem(S,X,Path,…)

before(X,Y,….d.)

after(X,Y,…)

property(X,Attribute, Op,Value…..)

Additional features: Stratified negation,

string processing

ontological concepts “phonenumber(X)”

ranges: H(S,X) :- body(……..)[1,5]

object hierarchies

distance tolerance,etc.

Xpath-like expression

document(URL,D)

getdocumentFromHref(X,D),

etc. url

Page 23: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

ELOG Program for eBay pages

Page 24: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Lixto Visual Developer (VD)

Navigation

Steps

Mozilla

Web

Browser

Extraction

Configuratio

n

Page 25: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

.

.

Page 26: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Comparison of Prices

Page 27: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

I INA

A

ThyssenKrupp TKP

Page 28: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

August 2013:

Page 29: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

30

This [Lixto] is great technology, but …

… too many people needed for generating and

maintaining the wrappers!

Page 30: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

31

From the tourism, real estate, and retail domains,

we understood independently that automated

wrapper-generation would be useful.

But how can this´be achieved?

Page 31: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Need for Fully Automated Extraction Technology

Example: Real Estate UK > 15000 sites

many not covered by aggregators

list of all agencies easy to get (source discovery)

but: manual or semi-automatic wrapping too expensive

wrapper construction

testing

tracking changes

No existing tool or methodology could do it fully automatically

Page 32: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

33

This talk is about extracting data from the Web…

…and about our (ad)ventures in:

Theory

Technology

Commercialization

Business K

1

Page 33: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

BLACKBOX

Application domain with thousands of websites

URL

Application-relevant Structured data (XML or RDF)

DIADEM a €2.5M project started in April '10

Page 34: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

machine learning e.g. neural networks page classification visual clue recognition text classification small entity identification

rule-based reasoning knowledge-based page exploration strategy navigation planning global decision making plausibility checks

X & Y Z

Taught knowledge

(expert system)

Self-learned

knowledge

Page 35: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Rough Idea: Knowledge via Rules Use “expert” rules that analyze Web pages and interact with them

Ontological rules (how do entities relate to each other)

- a flat is a real-estate property

- a house is a real-estate property

- a real-estate property has a number of rooms

- a price consists of a number and a currency

Phemomenological rules (how do entities manifest themselves on the Web?)

- the text chunk closest to an input field is with high prob. its descriptor.

- each sales item is described in a “convex” (usual. rectangular) region.

Site exploration rules:

- before filling a field try to leave it empty

- rules for handling next-page links

Other types of rules

Page 36: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

DIADEM PROJECT

DIADEM lab at Oxford University

2010 2011 2012 2013 2014 2015

spin-out start-up

Wrapidity

Funding so far > $7M

VADA

Page 37: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

DIADEM Architecture

OPAL

Form filling &

understanding

AMBER

Object identification

& alignment

BERyL

Block analysis &

object enrichment

OXPath

Efficient extraction

in the cloud

DATALOG± (implemented in DLV) Rule, exploration, control and integration language

New knowledge-based technology combining formalised knowledge with machine learning

Page 38: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:
Page 39: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Domains considered so far (since 2014)

4

0

• Real estate UK • Real estate US • Used cars • Consumer electronics • Restaurant chains • Restaurants in the ‘Open Web’ • Jobs (from company Web sites)

• News

Page 40: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Commercial Impact

ERC Advanced Grant DIADEM

+ ERC Proof of Concept Grant

EXTRALYTICS

Founded February 2015

operating initially in Oxford

now in London

2 possibilities:

• Build up company with large client portfolio

• Sell technology, IP & software to strategic partner

Page 41: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:
Page 42: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

43

Theory

Technology

Commercialization

Business

2

Page 43: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

New Application: Personalised Ads

Page 44: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

New Application: Personalised Ads

Full

street

addresses

Reverse Geocoding

Käppelestr. 5, 76131 Karlsruhe

Herrenstraße 21, 76133 Karlsruhe

.....

.....

Page 45: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

New Application: Personalised Ads

Full

street

addresses

Reverse Geocoding

Käppelestr. 5, 76131 Karlsruhe

Herrenstraße 21, 76133 Karlsruhe

.....

.....

Web Data Extraction

Page 46: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

New Application: Personalised Ads

Full

street

addresses

Reverse Geocoding

Käppelestr. 5, 76131 Karlsruhe

Herrenstraße 21, 76133 Karlsruhe

.....

.....

Web Data Extraction

User profile is created and later refined.

Ads related to animals and children will be sent...

Page 47: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

48

CRIME HEATMAPS

Page 48: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

49

Page 49: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Scenario ➀: Electronics retailer

electronics retailer: online market intelligence

comprehensive overview of the market

daily information on price, shipping costs, trends, product mix

by product, geographical region, or competitor

thousands of products

hundreds of competitors

nowadays: specialized companies

mostly manual, sampling

large cost

Page 50: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Scenario ➁: Hotel Agency

online travel agency

best price guarantee

prices of competing agencies

average market price

Page 51: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

Scenario ➂: Hedge Fund

house price index

published in regular intervals by national statistics agency

affects share values of various industries

hedge fund:

online market intelligence to predict the house price index

Page 52: Big Data from the Web · 2019-02-13 · ERC Advanced Grant DIADEM + ERC Proof of Concept Grant EXTRALYTICS Founded February 2015 operating initially in Oxford now in London 2 possibilities:

tenders from all over the world

existing aggregators

expensive, often incomplete

yet need to be published (online) by law in most countries

Scenario ➃: Tenders for

construction company