13
Failing Fast with Explore&Query © 2013 Vinge Free AB, Sweden

Failing fast with Explore&Query

Embed Size (px)

DESCRIPTION

If you are going to fail because your assumptions were wrong, do it before you waste your time and resources. This applies also to promise of Linked Open Data as a data source for analytics. Just that certain data is published does not mean it is useful for certain type of tasks. This case study here shows how do you quickly validate a semantic data source from quantitative and completeness perspective. First slideshare on Explore&Quey : http://www.slideshare.net/JrgenKerstna/can-spsrqlexplore-query-with-vinge-tutorial

Citation preview

Page 1: Failing fast with Explore&Query

Failing Fastwith Explore&Query

© 2013 Vinge Free AB, Sweden

Page 2: Failing fast with Explore&Query

Failing fast is a good thingIf you are going to fail because your assumptions were wrong, do it before you waste your time and resources.

This applies also to promise of Linked Open Data as a data source for analytics. Just that certain data is published does not mean it is useful for certain type of tasks.

This case study here shows how do you quickly validate a semantic data source from quantitative and completeness perspective.

The tool used is Free Edition of Explore&Query from Vinge Free.

© 2013 Vinge Free AB, Sweden

Page 3: Failing fast with Explore&Query

Film industry analysisThe case study goes as following: DBpedia contains information about films, their directors, their producers, their release dates, starring actors, composers, budget etc. - seemingly everything needed to produce a comprehensive statistical analysis about film industry.

Let’s get started by taking a look in the data and try to do a simple initial quantification how much facts do we actually have in our disposal and do we believe it provides sufficiently accurate statistical base for not getting a skewed results.

© 2013 Vinge Free AB, Sweden

Page 4: Failing fast with Explore&Query

The main concept in film industry is film. Lets find out how many films there is in DBpedia: there are 78,000 films

Films

© 2013 Vinge Free AB, Sweden

Page 5: Failing fast with Explore&Query

But over 13,000 films are creations of no one?

Every films should have a director?

© 2013 Vinge Free AB, Sweden

Page 6: Failing fast with Explore&Query

But even though films usually have several stars we seem to keep seeing our list shrinking

Every film have starring actors

© 2013 Vinge Free AB, Sweden

Page 7: Failing fast with Explore&Query

reveals that cost of the film and its turnover is only sparsely available

Budget of films v. sales

© 2013 Vinge Free AB, Sweden

Page 8: Failing fast with Explore&Query

Is the money spent on quality or quantity? Leaking here as well

Length of the films v. cost

© 2013 Vinge Free AB, Sweden

Page 9: Failing fast with Explore&Query

Adding release date, studio, distributor, composerOnly 400 films left from 77,000

More fields of interest

© 2013 Vinge Free AB, Sweden

Page 10: Failing fast with Explore&Query

There 1769 films without “length” property but with all other

We can find exactly how much we lack

© 2013 Vinge Free AB, Sweden

Page 11: Failing fast with Explore&Query

If we make everything optional - are we back in business?

Not all fields are equally important

© 2013 Vinge Free AB, Sweden

Page 12: Failing fast with Explore&Query

The answer is left to the viewer. But it went fast to find out

Have we failed or not?

© 2013 Vinge Free AB, Sweden

Page 13: Failing fast with Explore&Query

© 2013 Vinge Free AB, Sweden

download fromhttp://www.vingefree.com/querybyexplore