18
Using Linked Data to Evaluate the Impact of Research and Development in Europe: A Structural Equation Model Amrapali Zaveri, Joao Ricardo Nickenig Vissoci, Cinzia Daraio and Ricardo Pietrobon

LOD-SEM

Embed Size (px)

Citation preview

Using Linked Data to Evaluate the Impact of Research and Development in Europe:

A Structural Equation Model

Amrapali Zaveri, Joao Ricardo Nickenig Vissoci,Cinzia Daraio and Ricardo Pietrobon

Outline

• Research Question

• Datasets

• Data Extraction

• Structural Equation Modeling

• Result

• Conclusion & Future Work2

Research Question

• Research and Development (R&D) has a direct effect over

• Economic Performance (GDP) - EcoP

• Education Performance (public spending on education) - EduP

• Healthcare Performance (birth rate, death rate) - Hcare

3

Our Approach

• Identify relevant statistical datasets

• Choose & extract appropriate variables

• Exclude variables with low data quality

• Feed variables into a Structural Equation Model (SEM)

• Exclude variables that do not covariate with the others

• Obtain a stable model aligned to hypothesis4

Datasets - World Bank

• International financial institution, collects and processes large amount of data on the basis of economic models and makes them openly available

• Published as RDF

• http://worldbank.270a.info/

5

Datasets - World Bank

6

Adolescent fertility rate

Birth rateDeath rateFertility rate

GDP

Health expenditure public

High technology exportsImmunization, DPT, measles Incidence of Tuberculosis

Mortality rate, infantPublic spending on education

R&D expenditure

Researchers in R&D

Datasets - World Bank

7

Adolescent fertility rate

Birth rateDeath rate

Fertility rate

GDP

Health expenditure public

High technology exportsIncidence of Tuberculosis

Mortality rate, infant

Public spending on education

R&D expenditureResearchers in R&D

R&DHealthcare

Economic Performance

Immunization, DPT, measles

Datasets - Eurostat

• Statistical office of the European Union (EU)

• provides statistical information to the institutions of the EU to promote harmonization of statistical methods across its member states and candidates for accession as well as European Free Trade Association (EFTA) countries

• Published as RDF

• http://eurostat.linked-statistics.org/8

Datasets - Eurostat

Annual expenditure on public & private educational institutions per pupil/student

Biotechnology patent applications to EPO by priority year, country and metropolitan regions

Economic active population by sex, age and NUTS2 regions

Financial aid to students

9

Datasets - Eurostat

Annual expenditure on public & private educational institutions per pupil/student

Biotechnology patent applications to EPO by priority year, country and metropolitan regions

Economic active population by sex, age and NUTS2 regions

Financial aid to students

Educational Performance

Economic Performance

10

Data Extraction• SPARQL

• Advantages of LD:

• Discovering relevant datasets

• Data available in a single standardized structured format (RDF)

• avoiding heterogeneity of similar kinds of data (measures and their units)

• Supported query mechanism to acquire data

11

Structural Equation Modeling• Statistical technique for testing and estimating

causal relations using a combination of statistical data and qualitative causal assumptions

• Latent variables

• Observed variables

• Measured by

• Exploratory Factor Analysis (EFA)

• Confirmatory Factor Analysis (CFA)12

Structural Equation Modeling• Step 1: Specify latent variables through

sequence of CFA and EFA

• EFA to detect latent variables

• CFA to confirm the structure

• Step 2: Specify & identify SEM based on the research question by inserting one variable at a time

• Statistical analysis done in R*13

* http://www.r-project.org/

• 4 latent variables

• 12(/18) observed variables

• 20(/28) EUR countries

• 10 years: 1999 - 2009

Result

14

Result

R&D

15

SEM Model Effect weights

Factor loadings

Measurementerrors

Conclusion

• Using LD to evaluate the impact of R&D in Europe backed by robust statistical analysis

• Complex data analysis on LD can lead to important & meaningful insights on publicly available data

* http://cran.r-project.org/web/packages/SPARQL/ 16

Future Work

• R + SPARQL*

• Streamlined process

• Application of dynamic systems modeling

• More variables, more datasets

• More countries, regional level analysis

* http://cran.r-project.org/web/packages/SPARQL/ 17