Upload
gramener
View
185
Download
2
Embed Size (px)
Citation preview
S ANAND, CHIEF DATA SCIENTIST, GRAMENER
MONETISING DATA
REMOVING YOUR MENTAL HURDLES
DATA
ANALYSIS VISUALS
INSIG
HTS REPORTS
EXPLORATION
ISEVERYWHERE
DATA
ANALYSIS VISUALS
INSIG
HTS REPORTS
EXPLORATION
ISEVERYWHERE
COMMON COMPLAINT #1
WE DON’T HAVE DATA
We have internal information. Getting
information from outside is our challenge. There’s
no way of doing that.
– Senior EditorLeading Media Company
“
India’s religions
United Kingdom’s religions
UNCOVER YOUR DARK DATA
Source: http://www.patrickcheesman.com/dark-data-problems-and-solutions/
• INACCESSIBLE data (e.g. technology is outdated)• FORGOTTEN data (e.g. collected, but not actively used)• UNCOLLECTED data (e.g. information exists, not digitized)• SINGLE PURPOSE data (e.g. used for a specific purpose)
We’ve used network diagrams to detect terrorism, corporate fraud, product affinities and behavioural customer segmentation
AUGMENT YOUR
DATASOURCES
DATA ISEVERYWHERE
COMMON COMPLAINT #1
WE DON’T HAVE DATACOMMON COMPLAINT #2
THE DATA ISN’T STRUCTURED
CRM DATASALES DATAPRICING DATACALL RECORDSWEB LOG DATAVENDOR INVOICESSOCIAL MEDIA DATACLICKTHROUGH DATACOMPETITOR RESEARCHCUSTOMER TRANSACTIONS…
CENSUS DATAE-COMMERCE PRICESCOMMODITY PRICESSTOCK MARKET DATAFINANCIAL REPORTINGSOCIAL MEDIA DATAMOBILE PENETRATIONAADHAR DATACOURT CASE BRIEFSSHAPE FILES…
How does Mahabharata, one of the largest epics with 1.8 million words lend itself to text analytics?
Can this ‘unstructured data’ be processed to extract analytical insights?
What does sentiment analysis of this tome convey?
Is there a better way to explore relations between characters?
How can closeness of characters be analysed & visualized?
Visualising the Mahabharata
“ Can we help CFOs understand what
questions are being asked by investors and
analysts during earnings releases? How this is
different from competition?
– Product HeadGlobal Financial
Services Firm
WHAT DO FINANCIAL ANALYSTS ASK IBM VS MSFT?
DATA ISEVERYWHERE
EXTRACT THE
META DATA
AUGMENT YOUR
DATASOURCES
COMMON COMPLAINT #2
THE DATA ISN’T STRUCTURED
COMMON COMPLAINT #3
THE DATA ISN’T RICH / CLEAN
COMMONWHO, WHAT, WHEN, WHERETEXTTEXT KEYWORDSSENTIMENTIMAGEVISUAL RECOGNITIONAUDIO / CALLSTRANSCRIPTSMOOD ANALYSIS
“ Can we get the results of every single election in
history, and create a portal to visualize these
results?
– Rajdeep SardesaiCNN-IBN
The PDF files have a reasonably clear structure
… that translates into text that can be parsed
Not every spelling error is easily identifiable by the first letter
… with several names spelt wrong
These are, in fact twodifferent constituenciesBut these are exactly the same
... and so are theseI’ve no idea if these are 2, or 3, constituencies!
… with the ability for the system to correct errors automatically
DATA ISEVERYWHERE
TRANSFORM THE DATA &
ENRICH ITEXTRACT THE
META DATA
AUGMENT YOUR
DATASOURCES
COMMON COMPLAINT #3
THE DATA ISN’T RICH / CLEAN
DATA
ANALYSIS VISUALS
INSIG
HTS REPORTS
EXPLORATION
ISEVERYWHERE
COMMON COMPLAINT #1
WE DON’T HAVE THE TOOLS
This is a dataset (1975 – 1990) that has been around for several years, and has been studied extensively. Yet, a visualization can reveal patterns that are neither obvious nor well known.
For example,• Are birthdays uniformly distributed?• Do doctors or parents exercise the C-section option to move
dates?• Is there any day of the month that has unusually high or low
births?• Are there any months with relatively high or low births?More births Fewer births … on average, for each day of the year (from 1975 to 1990)
LET’S LOOK AT 15 YEARS OF US BIRTH DATA
THE PATTERN IN INDIA IS QUITE DIFFERENTThis is a birth date dataset that’s obtained from school admission data for over 10 million children. When we compare this with births in the US, we see none of the same patterns.
For example,• Is there an aversion to the 13th or is there a local cultural
nuance?• Are holidays avoided for births?• Which months have a higher propensity for births, and why?• Are there any patterns not found in the US data?
More births Fewer births … on average, for each day of the year (from 2007 to 2013)
THIS ADVERSELY IMPACTS CHILDREN’S MARKSIt’s a well established fact that older children tend to do better at school in most activities. Since many children have had their birth dates brought forward, these younger children suffer.
The average marks of children “born” on the 1st, 5th, 10th, 15th etc. of the month tend to score lower marks. • Are holidays avoided for births?• Which months have a higher propensity for births, and why?• Are there any patterns not found in the US data?
Higher marks Lower marks… on average, for children born on a given day of the year (from 2007 to 2013)
DEPLOY
MODERNTOOLS
ANALYSIS ISEVERYWHERE
COMMON COMPLAINT #1
WE DON’T HAVE THE TOOLS
COMMON COMPLAINT #2
WE DON’T GET INSIGHTS
RSASEXCELPYTHONDATABASESML SERVICES
RESTAURANT FOUND AN UNUSUAL DIP IN SALESA restaurant chain had data for every single transaction made over a few years. Plotting this as a time series showed them nothing unusual.
However, the same data on a calendar map reveals a very different story.
Specifically, at the bottom left point-of-sale terminal, sales dips on every Wednesday. At the bottom right point-of-sale terminal, sales rises on every Wednesday (almost as if to compensate for the loss.)
It turns out that the manager closes the bottom-left counter every Wednesday afternoon due to shortage of staff, assuming that it results in no loss of sales. There is, however, a net loss every Wednesday.
DEPLOY
MODERNTOOLS
ANALYSIS ISEVERYWHERE
TEST DATASETSANONYMISATIONEVALUATION CRITERIAIMPROVEMENT METRICDATA INFRASTRUCTUREMODEL INFRASTRUCTUREVISUALS INFRASTRUCTURE
SET UP AN ML PLATFORM
INFRASTRUCTURE FOR
RAPIDITYCOMMON COMPLAINT #2
MODELS ARE COMPLICATED
COMMON COMPLAINT #3
IMPLEMENTATIONS ARE SLOW
Nation-wide statistics onbehaviour and performance of students
Over 1,000 questions each administered toseveral lakhs of students across the country
Having books improves reading abilityHaving more books at home improves the performance of children when it comes to reading. (But children typically only have only 1-10 books at home)
… but the impact in social is lessWhile having more books improves the reading % score by 8%, it only increases the social % by 4%
Tuitions help very little
… but children of illiterate parents do worse
Watching TV occasionally is goodChildren who watch TV every day don’t do as well as children who watch TV only once a week.
But children who never watch TV fare the worst.
Watching TV every day helps improve children’s reading ability a little bit more…
… but mathematical abilities fall dramatically at that point
Having educated parents helps mostThis table shows the % improvement in score due to each factor
THIS TECHNIQUE CAN BEAPPLIED TO ANY DATASET
AUTOMATING ANALYSIS IN POULTRY FARMING
We group by every input
factor
… and calculate the impact on every metric.
By moving from average to the best group, what’s the improvement?
The actual performance by each group is shown
0-3m 3-6m 6m-1yr 1-2 yrs > 2 yrs11 12.3 12.7 15.3 16.1
Our product can create visualisations from data automatically, without any supervision.
Above is an example. Irrespective of the dataset, this visual shows which input parameters have a significant impact on the output. Another such example is the cluster scatterplot.
Only significant results shown
68% correlation between AUD &
EUR
Plot of 6 month daily AUD - EUR
values
Block of correlated currencies
… clustered hierarchically
Restaurant: Product Sales Correlation
Restaurant: Product sales correlation
DEPLOY
MODERNTOOLS
ANALYSIS ISEVERYWHERE
CLUSTER PLOTSCORRELATIONSCROSS TABULATIONGROUP MEANSKEYWORD EXTRACTIONNETWORK ANALYSISSANKEY DRILLDOWNSSENTIMENT ANALYSIS…
INFRASTRUCTURE FOR
RAPIDITYCOMMON COMPLAINT #3
IMPLEMENTATIONS ARE SLOW
BUILD AND USE
TEMPLATES
DATA
ANALYSIS VISUALS
INSIG
HTS REPORTS
EXPLORATION
ISEVERYWHERE
S ANAND, CHIEF DATA SCIENTIST, GRAMENER
THE CAPABILITIES AREIN YOUR REACH TODAY
EXPLORE THE ART OF DATA