44
Social Media Analytics using Azure Technologies Koray Kocabaş

Social media analytics using Azure Technologies

Embed Size (px)

Citation preview

Page 1: Social media analytics using Azure Technologies

Social Media Analytics using Azure Technologies

Koray Kocabaş

Page 2: Social media analytics using Azure Technologies

#sqlsatistanbul

Sponsors

Media Sponsor

Main Sponsor

Swag Sponsor

Page 3: Social media analytics using Azure Technologies

#sqlsatistanbul

What do we need ?

Just a quick blog post, update on LinkedIn, or a tweet on Twitter is all we need.

Page 4: Social media analytics using Azure Technologies

#sqlsatistanbul

Session Evaluations

Evaluate sessions and get a chance for the raffle: http://spoke.at/sqlsat451

Page 5: Social media analytics using Azure Technologies

#sqlsatistanbul

About Me...

Koray KocabaşData Platform (SQL Server) MVPYemeksepeti Business IntelligenceBahcesehir University Instructor

@koraykocabashttps://tr.linkedin.com/in/koraykocabas

Blog: http://www.misjournal.comE-Mail: [email protected]

Page 6: Social media analytics using Azure Technologies

The Data Deluge

Page 7: Social media analytics using Azure Technologies

#sqlsatistanbul

What kind of solutions using Big Data

• Clickstream analysis to find buying patterns • Sentiment analysis for text data• Fraud detection; forensic analysis• Machine learning• Healthcare research• Predictive Maintenance

Just dream it. Data is everywhere!

Page 8: Social media analytics using Azure Technologies
Page 9: Social media analytics using Azure Technologies

Twitter launched in 2006Active users per month

~316 Millions (August)~320 Millions (October)

%80 of users is Mobile!

Tweets per second 6.000 Tweets per day ~500 MillionTweets per year ~200 Billion

Twitter generate a lot of data (12 TB per day)90 % of buyers trust peer recommendations

55 % of Twitter users are femalesThe average Twitter user has 27 Followers

Page 10: Social media analytics using Azure Technologies

Why it is so Popular?

Page 11: Social media analytics using Azure Technologies
Page 12: Social media analytics using Azure Technologies
Page 13: Social media analytics using Azure Technologies

Event based dataUnstructured dataDetail event informationStreamingWho is the influencer

TweetTrackerTweetArchivistRadian6SysomosTweet DeckHootsuite

Twitter Problems Dashboards For Tweets

Page 14: Social media analytics using Azure Technologies

#sqlsatistanbul

PROBLEMS...

Page 15: Social media analytics using Azure Technologies

#sqlsatistanbul

1. Collect Twitter Data & Get Simple Information

2. Data Enrichment

3. Store Semi - Structured Data

4. Analyze Semi - Structured Data

5. Visualize Meaningful Results

Page 16: Social media analytics using Azure Technologies

#sqlsatistanbul

Page 17: Social media analytics using Azure Technologies

#sqlsatistanbul

Collect Twitter Data & Get Simple Information

Page 18: Social media analytics using Azure Technologies

#sqlsatistanbul

Page 19: Social media analytics using Azure Technologies

#sqlsatistanbul

Real-Time AnalyticsIntake millions of events per secondProcess data from connected devices/appsDetect patterns and anomalies in streaming dataTransform, augment, correlate, temporal operationsNo hardware (PaaS offering)Up and running in a few clicks (and within minutes)No performance tuningEfficiently pay only for usageNot paying for idle resourcesLow startup costsScale from small to large when required

Only SQL queries needed (Thousand lines of code in other solutions, such as Apache Storm)

Page 20: Social media analytics using Azure Technologies

#sqlsatistanbul

Stream Analytics Query Language FunctionsDML Statements

• SELECT

• FROM

• WHERE

• GROUP BY

• HAVING

• CASE

• JOIN

• UNION

Windowing Extensions

• Tumbling Window

• Hopping Window

• Sliding Window

• Duration

Aggregate Functions

• SUM

• COUNT

• AVG

• MIN

• MAX

Scaling Functions

• WITH

• PARTITION BY

Date and Time Functions

• DATENAME

• DATEPART

• DAY

• MONTH

• YEAR

• DATETIMEFROMPARTS

• DATEDIFF

• DATADD

String Functions

• LEN

• CONCAT

• CHARINDEX

• SUBSTRING

Statistical Functions

• VAR

• VARP

• STDEV

Page 21: Social media analytics using Azure Technologies

0 5 10 15 20 25 30

Page 22: Social media analytics using Azure Technologies

0 5 10 15 20 25 30

4

4

5

The count of tweets every 10 secondsTumbling Windows

SELECT Topic, Count(*) AS CountFROM sqlsaturdaystream TIMESTAMP BY CreatedAtGROUP BY Topic, TumblingWindow(second,10)

Page 23: Social media analytics using Azure Technologies

0 5 10 15 20 25 30

Every 5 seconds give me the count of tweets over 10 seconds by topic

Hopping Windows

SELECT Topic, Count(*) AS CountFROM sqlsaturdaystream TIMESTAMP BY CreatedAtGROUP BY Topic, HoppingWindow(second,10,5)

Page 24: Social media analytics using Azure Technologies

0 5 10 15 20 25 30

If the tweets count is above a threshold of 8 for a total of 5 seconds

Sliding Windows

SELECT Topic, Count(*) AS CountFROM sqlsaturdaystream TIMESTAMP BY CreatedAtGROUP BY Topic, SlidingWindow(second,5)HAVING Count(*)>8

Page 25: Social media analytics using Azure Technologies

#sqlsatistanbul

Stream Analytics

Event Hub

Page 26: Social media analytics using Azure Technologies

#sqlsatistanbul

Data Enrichment

Page 27: Social media analytics using Azure Technologies

#sqlsatistanbul

Data Azure Machine Learning Consumers

Local storageUpload data from PC…

Cloud storageAzure Storage

Azure Table

Hive

etc.

Excel

Business Apps

Business problem Modeling Business valueDeployment

Azure Marketplace

(Applications store)

Azure ML Gallery

(community)

ML Web Services

(REST API Services)ML Studio

(Web IDE)

Workspace:Experiments

Datasets

Trained models

Notebooks

Access settings

Data Model API

Manage

API

Page 28: Social media analytics using Azure Technologies

#sqlsatistanbul

Page 29: Social media analytics using Azure Technologies

#sqlsatistanbul

https://sites.google.com/site/miningtwitter/questions/sentiment/sentiment

http://www.slideshare.net/ajayohri/twitter-analysis-by-kaify-rais

Sentiment140 (formerly known as "Twitter Sentiment")

allows you to discover the sentiment of a brand, product,

or topic on Twitter.

Page 30: Social media analytics using Azure Technologies

#sqlsatistanbul

SQL Server 2016 CTP 3.1

Revolution R Open 3.2.2 for Revolution

R Enterprise

Revolution R Enterprise 7.5.0

Revolution R Enterprise is able to deliver speeds 42 times faster than competing technology from SAS.Microsoft announced on January 23, 2015 that they had reached an agreement to purchase Revolution Analytics for an as yet undisclosed amount.

Page 31: Social media analytics using Azure Technologies

#sqlsatistanbul

The Klout Score is a number between 1-100 that represents your influence.

Collect and normalize more than 12 billion signals a day

Hive data warehouse of more than 1 trillion rows

Klout acquired for $200 million by Lithium Technologies

Page 32: Social media analytics using Azure Technologies

#sqlsatistanbul

Store Semi - Structured Data

Analyze Semi - Structured Data

Page 33: Social media analytics using Azure Technologies

#sqlsatistanbul

Page 34: Social media analytics using Azure Technologies

#sqlsatistanbul

Developed by Facebook. Later it was adopted in Apache as an open source project.

A data warehouse infrastructure built on top of Hadoop for providing data summarization, query and analysis

Integration between Hadoop and BI and visualization

Provides an SQL Like language called Hive QL to query data

Create Index, includes Partitioning

Not supported Update (isn’t correct)

Hive provides Users, Groups, Roles. But it’s not designed for high security.

Console (hive>), script, ODBC/JDBC, SQuirreL, HUE, Web Interface, etc.

Most popular Business Intelligence Tools support Hive

Page 35: Social media analytics using Azure Technologies

#sqlsatistanbul

Data Types

Primitive Data Types: int, bigint, float, double, boolean, decimal, string, timestamp, date etc.

Complex Data Types: arrays, maps, structs

ARRAY<string>: workplace: istanbul, ankaraSTRUCT<sex:string,age:int> : Female,25MAP<string,int>: SOLR:92

Hive RDBMS

SQL Interface SQL Interface

Focus on analytics ay focus on online or analytics

No transactions Transactions usually supported

Partition adds, no random Inserts. Random Insert and Update supported

Distributed processing via map/reduce Distributed processing varies by vendor (if available)

Scales to hundreds of nodes Seldom scale beyond 20 nodes

Built for commodity hardware Often built on proprietary hardware (especially when scaling out)

Low cost per petabyte What's petabyte? :) (note: Are you sure?)

Page 36: Social media analytics using Azure Technologies

#sqlsatistanbul

http://hortonworks.com/wp-content/uploads/downloads/2013/08/Hortonworks.CheatSheet.SQLtoHive.pdf

Page 37: Social media analytics using Azure Technologies

#sqlsatistanbul

Page 38: Social media analytics using Azure Technologies

#sqlsatistanbul

Originally developed at Yahoo! (Huge contributions from Hortonworks, Twitter)

A Platform for analyzing large data sets that consists of high-level language for expressing data analysis programs

Processing large semi-structured data sets using Hadoop Map Reduce

Write complex MapReduce jobs using a simple script language (Pig Latin)

Pig provides a bunch of aggregation function (AVG, COUNT, SUM, MAX, MIN etc.)

Developers can develop UDF

Console (grunt), script, java, HUE (Hadoop User Experience by Cloudera)

Easy to use and efficient

Page 39: Social media analytics using Azure Technologies

#sqlsatistanbul

Data Types

Simple Data Types: int, float, double, chararray (UTF-8), bytearrayComplex Data Types: map (Key,Value), Tuple, Bag (list of tuples)

Commands

Loading: LOAD, STORE, DUMPFiltering: FILTER, FOREACH, DISTINCTGrouping: JOIN, GROUP, COGROUP, CROSSOrdering: ORDER, LIMITMerging & Split: UNION, SPLIT

SQL SCRIPT PIG SCRIPT

SELECT * FROM TABLE A=LOAD 'DATA' USING PigStorage('\t') AS (col1:int, col2:int, col3:int);

SELECT col1+col2, col3 FROM TABLE B=FOREACH A GENERATE col1+col2, col3;

SELECT col1+col2, col3 FROM TABLE WHERE col3>10 C=FILTER B by col3>10;

SELECT col1, col2, sum(col3) FROM X GROUP BY col1, col2 D=GROUP A BY (col1,col2);

E=FOREACH D GENERATE FLATTEN(group), SUM(A.col3);

... HAVING sum(col3) > 5 F=FILTER E BY $2>5;

... ORDER BY col1 G=ORDER F BY $0

SELECT DISTINCT col1 FROM TABLE I=FOREACH A GENERATE col1;

J=DISTINCT I;

SELECT col1,COUNT(DISTINCT col2) FROM TABLE GROUP BY col1K=GROUP A BY col1;

L=FOREACH K {M=DISTINCT A.col2; GENERATE FLATTEN(group), count(M);}

Page 40: Social media analytics using Azure Technologies

#sqlsatistanbul

Ohhh Finally Demo Time!

Page 41: Social media analytics using Azure Technologies

#sqlsatistanbul

Visualize Meaningful Results

Page 42: Social media analytics using Azure Technologies

#sqlsatistanbul

Page 43: Social media analytics using Azure Technologies

#sqlsatistanbul

Big Data Analytics, Implementing Big Data Analysis, Big Data Analytics with HDInsight, Big Data and Business Analytics Immersion, Getting Started with Microsoft Azure Machine Learning

Real World Big Data in Azure, Big Data on Amazon Web Services, Reporting with MongoDB, Cloud Business Intelligence, HDInsight Deep Dive: Storm HBase and Hive, Data Science & Hadoop Workflows at Scale With Scalding, SQL on Hadoop - Analyzing Big Data with Hive

Introduction to Big Data Analytics, Machine Learning with Big Data, Big Data Analytics for Healthcare, Data Science at Scale, The Data Scientist's Toolbox, R Programming

Master Big Data and Hadoop Step by Step, Hadoop Essentials, Hadoop Starter Kit, Data Analytics using Hadoop eco system, Big Data: How Data Analytics Is Transforming the World, Applied Data Science with R, Hadoop Enterprise Integration

Data Science and Analytics in Context, Introduction to Big Data with Spark, Data Science and Machine Learning Essentials, Machine Learning for Data Science and Analytics, Statistical Thinking for Data Science and Analytics

Page 44: Social media analytics using Azure Technologies

#sqlsatistanbul