Data monetization

Preview:

Citation preview

S ANAND, CHIEF DATA SCIENTIST, GRAMENER

MONETISING DATA

REMOVING YOUR MENTAL HURDLES

DATA

ANALYSIS VISUALS

INSIG

HTS REPORTS

EXPLORATION

ISEVERYWHERE

DATA

ANALYSIS VISUALS

INSIG

HTS REPORTS

EXPLORATION

ISEVERYWHERE

COMMON COMPLAINT #1

WE DON’T HAVE DATA

We have internal information. Getting

information from outside is our challenge. There’s

no way of doing that.

– Senior EditorLeading Media Company

India’s religions

United Kingdom’s religions

UNCOVER YOUR DARK DATA

Source: http://www.patrickcheesman.com/dark-data-problems-and-solutions/

• INACCESSIBLE data (e.g. technology is outdated)• FORGOTTEN data (e.g. collected, but not actively used)• UNCOLLECTED data (e.g. information exists, not digitized)• SINGLE PURPOSE data (e.g. used for a specific purpose)

We’ve used network diagrams to detect terrorism, corporate fraud, product affinities and behavioural customer segmentation

AUGMENT YOUR

DATASOURCES

DATA ISEVERYWHERE

COMMON COMPLAINT #1

WE DON’T HAVE DATACOMMON COMPLAINT #2

THE DATA ISN’T STRUCTURED

CRM DATASALES DATAPRICING DATACALL RECORDSWEB LOG DATAVENDOR INVOICESSOCIAL MEDIA DATACLICKTHROUGH DATACOMPETITOR RESEARCHCUSTOMER TRANSACTIONS…

CENSUS DATAE-COMMERCE PRICESCOMMODITY PRICESSTOCK MARKET DATAFINANCIAL REPORTINGSOCIAL MEDIA DATAMOBILE PENETRATIONAADHAR DATACOURT CASE BRIEFSSHAPE FILES…

How does Mahabharata, one of the largest epics with 1.8 million words lend itself to text analytics?

Can this ‘unstructured data’ be processed to extract analytical insights?

What does sentiment analysis of this tome convey?

Is there a better way to explore relations between characters?

How can closeness of characters be analysed & visualized?

Visualising the Mahabharata

“ Can we help CFOs understand what

questions are being asked by investors and

analysts during earnings releases? How this is

different from competition?

– Product HeadGlobal Financial

Services Firm

WHAT DO FINANCIAL ANALYSTS ASK IBM VS MSFT?

DATA ISEVERYWHERE

EXTRACT THE

META DATA

AUGMENT YOUR

DATASOURCES

COMMON COMPLAINT #2

THE DATA ISN’T STRUCTURED

COMMON COMPLAINT #3

THE DATA ISN’T RICH / CLEAN

COMMONWHO, WHAT, WHEN, WHERETEXTTEXT KEYWORDSSENTIMENTIMAGEVISUAL RECOGNITIONAUDIO / CALLSTRANSCRIPTSMOOD ANALYSIS

“ Can we get the results of every single election in

history, and create a portal to visualize these

results?

– Rajdeep SardesaiCNN-IBN

The PDF files have a reasonably clear structure

… that translates into text that can be parsed

Not every spelling error is easily identifiable by the first letter

… with several names spelt wrong

These are, in fact twodifferent constituenciesBut these are exactly the same

... and so are theseI’ve no idea if these are 2, or 3, constituencies!

… with the ability for the system to correct errors automatically

DATA ISEVERYWHERE

TRANSFORM THE DATA &

ENRICH ITEXTRACT THE

META DATA

AUGMENT YOUR

DATASOURCES

COMMON COMPLAINT #3

THE DATA ISN’T RICH / CLEAN

DATA

ANALYSIS VISUALS

INSIG

HTS REPORTS

EXPLORATION

ISEVERYWHERE

COMMON COMPLAINT #1

WE DON’T HAVE THE TOOLS

This is a dataset (1975 – 1990) that has been around for several years, and has been studied extensively. Yet, a visualization can reveal patterns that are neither obvious nor well known.

For example,• Are birthdays uniformly distributed?• Do doctors or parents exercise the C-section option to move

dates?• Is there any day of the month that has unusually high or low

births?• Are there any months with relatively high or low births?More births Fewer births … on average, for each day of the year (from 1975 to 1990)

LET’S LOOK AT 15 YEARS OF US BIRTH DATA

THE PATTERN IN INDIA IS QUITE DIFFERENTThis is a birth date dataset that’s obtained from school admission data for over 10 million children. When we compare this with births in the US, we see none of the same patterns.

For example,• Is there an aversion to the 13th or is there a local cultural

nuance?• Are holidays avoided for births?• Which months have a higher propensity for births, and why?• Are there any patterns not found in the US data?

More births Fewer births … on average, for each day of the year (from 2007 to 2013)

THIS ADVERSELY IMPACTS CHILDREN’S MARKSIt’s a well established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer.

The average marks of children “born” on the 1st, 5th, 10th, 15th etc. of the month tend to score lower marks. • Are holidays avoided for births?• Which months have a higher propensity for births, and why?• Are there any patterns not found in the US data?

Higher marks Lower marks… on average, for children born on a given day of the year (from 2007 to 2013)

DEPLOY

MODERNTOOLS

ANALYSIS ISEVERYWHERE

COMMON COMPLAINT #1

WE DON’T HAVE THE TOOLS

COMMON COMPLAINT #2

WE DON’T GET INSIGHTS

RSASEXCELPYTHONDATABASESML SERVICES

RESTAURANT FOUND AN UNUSUAL DIP IN SALESA restaurant chain had data for every single transaction made over a few years. Plotting this as a time series showed them nothing unusual.

However, the same data on a calendar map reveals a very different story.

Specifically, at the bottom left point-of-sale terminal, sales dips on every Wednesday. At the bottom right point-of-sale terminal, sales rises on every Wednesday (almost as if to compensate for the loss.)

It turns out that the manager closes the bottom-left counter every Wednesday afternoon due to shortage of staff, assuming that it results in no loss of sales. There is, however, a net loss every Wednesday.

DEPLOY

MODERNTOOLS

ANALYSIS ISEVERYWHERE

TEST DATASETSANONYMISATIONEVALUATION CRITERIAIMPROVEMENT METRICDATA INFRASTRUCTUREMODEL INFRASTRUCTUREVISUALS INFRASTRUCTURE

SET UP AN ML PLATFORM

INFRASTRUCTURE FOR

RAPIDITYCOMMON COMPLAINT #2

MODELS ARE COMPLICATED

COMMON COMPLAINT #3

IMPLEMENTATIONS ARE SLOW

Nation-wide statistics onbehaviour and performance of students

Over 1,000 questions each administered toseveral lakhs of students across the country

Having books improves reading abilityHaving more books at home improves the performance of children when it comes to reading. (But children typically only have only 1-10 books at home)

… but the impact in social is lessWhile having more books improves the reading % score by 8%, it only increases the social % by 4%

Tuitions help very little

… but children of illiterate parents do worse

Watching TV occasionally is goodChildren who watch TV every day don’t do as well as children who watch TV only once a week.

But children who never watch TV fare the worst.

Watching TV every day helps improve children’s reading ability a little bit more…

… but mathematical abilities fall dramatically at that point

Having educated parents helps mostThis table shows the % improvement in score due to each factor

THIS TECHNIQUE CAN BEAPPLIED TO ANY DATASET

AUTOMATING ANALYSIS IN POULTRY FARMING

We group by every input

factor

… and calculate the impact on every metric.

By moving from average to the best group, what’s the improvement?

The actual performance by each group is shown

0-3m 3-6m 6m-1yr 1-2 yrs > 2 yrs11 12.3 12.7 15.3 16.1

Our product can create visualisations from data automatically, without any supervision.

Above is an example. Irrespective of the dataset, this visual shows which input parameters have a significant impact on the output. Another such example is the cluster scatterplot.

Only significant results shown

68% correlation between AUD &

EUR

Plot of 6 month daily AUD - EUR

values

Block of correlated currencies

… clustered hierarchically

Restaurant: Product Sales Correlation

Restaurant: Product sales correlation

DEPLOY

MODERNTOOLS

ANALYSIS ISEVERYWHERE

CLUSTER PLOTSCORRELATIONSCROSS TABULATIONGROUP MEANSKEYWORD EXTRACTIONNETWORK ANALYSISSANKEY DRILLDOWNSSENTIMENT ANALYSIS…

INFRASTRUCTURE FOR

RAPIDITYCOMMON COMPLAINT #3

IMPLEMENTATIONS ARE SLOW

BUILD AND USE

TEMPLATES

DATA

ANALYSIS VISUALS

INSIG

HTS REPORTS

EXPLORATION

ISEVERYWHERE

S ANAND, CHIEF DATA SCIENTIST, GRAMENER

THE CAPABILITIES AREIN YOUR REACH TODAY

EXPLORE THE ART OF DATA

Recommended