Upload
amrapali-zaveri
View
256
Download
0
Tags:
Embed Size (px)
Citation preview
Using Linked Data to Evaluate the Impact of Research and Development in Europe:
A Structural Equation Model
Amrapali Zaveri, Joao Ricardo Nickenig Vissoci,Cinzia Daraio and Ricardo Pietrobon
Outline
• Research Question
• Datasets
• Data Extraction
• Structural Equation Modeling
• Result
• Conclusion & Future Work2
Research Question
• Research and Development (R&D) has a direct effect over
• Economic Performance (GDP) - EcoP
• Education Performance (public spending on education) - EduP
• Healthcare Performance (birth rate, death rate) - Hcare
3
Our Approach
• Identify relevant statistical datasets
• Choose & extract appropriate variables
• Exclude variables with low data quality
• Feed variables into a Structural Equation Model (SEM)
• Exclude variables that do not covariate with the others
• Obtain a stable model aligned to hypothesis4
Datasets - World Bank
• International financial institution, collects and processes large amount of data on the basis of economic models and makes them openly available
• Published as RDF
• http://worldbank.270a.info/
5
Datasets - World Bank
6
Adolescent fertility rate
Birth rateDeath rateFertility rate
GDP
Health expenditure public
High technology exportsImmunization, DPT, measles Incidence of Tuberculosis
Mortality rate, infantPublic spending on education
R&D expenditure
Researchers in R&D
Datasets - World Bank
7
Adolescent fertility rate
Birth rateDeath rate
Fertility rate
GDP
Health expenditure public
High technology exportsIncidence of Tuberculosis
Mortality rate, infant
Public spending on education
R&D expenditureResearchers in R&D
R&DHealthcare
Economic Performance
Immunization, DPT, measles
Datasets - Eurostat
• Statistical office of the European Union (EU)
• provides statistical information to the institutions of the EU to promote harmonization of statistical methods across its member states and candidates for accession as well as European Free Trade Association (EFTA) countries
• Published as RDF
• http://eurostat.linked-statistics.org/8
Datasets - Eurostat
Annual expenditure on public & private educational institutions per pupil/student
Biotechnology patent applications to EPO by priority year, country and metropolitan regions
Economic active population by sex, age and NUTS2 regions
Financial aid to students
9
Datasets - Eurostat
Annual expenditure on public & private educational institutions per pupil/student
Biotechnology patent applications to EPO by priority year, country and metropolitan regions
Economic active population by sex, age and NUTS2 regions
Financial aid to students
Educational Performance
Economic Performance
10
Data Extraction• SPARQL
• Advantages of LD:
• Discovering relevant datasets
• Data available in a single standardized structured format (RDF)
• avoiding heterogeneity of similar kinds of data (measures and their units)
• Supported query mechanism to acquire data
11
Structural Equation Modeling• Statistical technique for testing and estimating
causal relations using a combination of statistical data and qualitative causal assumptions
• Latent variables
• Observed variables
• Measured by
• Exploratory Factor Analysis (EFA)
• Confirmatory Factor Analysis (CFA)12
Structural Equation Modeling• Step 1: Specify latent variables through
sequence of CFA and EFA
• EFA to detect latent variables
• CFA to confirm the structure
• Step 2: Specify & identify SEM based on the research question by inserting one variable at a time
• Statistical analysis done in R*13
* http://www.r-project.org/
• 4 latent variables
• 12(/18) observed variables
• 20(/28) EUR countries
• 10 years: 1999 - 2009
Result
14
Conclusion
• Using LD to evaluate the impact of R&D in Europe backed by robust statistical analysis
• Complex data analysis on LD can lead to important & meaningful insights on publicly available data
* http://cran.r-project.org/web/packages/SPARQL/ 16
Future Work
• R + SPARQL*
• Streamlined process
• Application of dynamic systems modeling
• More variables, more datasets
• More countries, regional level analysis
* http://cran.r-project.org/web/packages/SPARQL/ 17