55
User Behavior Analysis in Big Data Morteza(Mori) Zihayat

User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 1/5518/11/2016

User Behavior Analysis in Big Data

Morteza(Mori) Zihayat

Page 2: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 2/5518/11/2016

About Myself Mitacs Elevate Postdoctoral Research Fellow

Faculty of Information, University of Toronto

Big Data Scientist

Spectrum Computing, IBM

Globe and Mail

Main research interests

User modeling

Big Data Mining and Engineering

Finding Meaningful Patterns from Structured and Unstructured Big Data

Parallel and Distributed Data Mining

Social Network Analysis

Page 3: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 3/5518/11/2016

Outline

Introduction to Big Data

User Modeling in Digital Media

Depression Acuity Detection

Cogniciti: An Online Brain Health Assessment

Conclusion

Page 4: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 4/5518/11/2016

Outline

Introduction to Big Data

User Modeling in Digital Media

Depression Acuity Detection

Cogniciti: An Online Brain Health Assessment

Conclusion

Page 5: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 5/5518/11/2016

Big Data

50 MILLION

TWEETS PER DAY

EACH DEVICE GENERATES

156 MB OF DATA PER DAY

50 MILLION DEVICES200 Gigabytes

THE SIZE OF A SINGLE

SEQUENCED HUMAN GENOME

150 Exabytes

HEALTHCARE DATA

0.5% ever analyzed and used

(MIT Technology review)

Value

Volume

Velocity

Variety

Veracity

Page 6: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 6/5518/11/2016

Big Data to Data Science

Data Scientist:

The Sexiest Job of the 21st Century

Harvard

Business Review

Oct. 2012

(c) 2012 Biocomicals

by Dr. Alper Uzon

Page 7: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 7/5518/11/2016

Challenges

Read/Write to disk is slow

Use multiple disks for parallel read

Hardware failure

Single machine/disk failure

Keep multiple copies of data

How do we merge data from different reads

Distributed processing or Hadoop MapReduce

Page 8: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 8/5518/11/2016

Apache Hadoop

Hadoop is an open-source software framework written in Java

Distributed storage

Computer clusters

Two main components

HDFS (Hadoop Distributed File System)

Provides Distributed Storage

MapReduce (Distributed Data Processing Model)

Provides Distributed Processing

Page 9: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 9/5518/11/2016

MapReduce

MapReduce is a method for distributing a task across multiple nodes

Consists of two phases

Map

Reduce

The data is process in the form of <key,value>

Each map task processes a discrete portion of the overall data

After all Maps are complete, the system distributes the intermediate data to nodes which

perform Reduce phase (aggregation)

Page 10: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 10/5518/11/2016

Example: Word counting

• Counting the number of occurrences of each word in a large collection of documents

• Input: documents

• Output: <word,frequency>

Divide collection of documents among the

class.

Each students gives count of individual word

in a document. Repeats for assigned quota of documents

Sum up the counts from all the documents

to give final answer

Page 11: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 11/5518/11/2016

Word Count Execution

the quick

brown

fox

the fox

ate the

cow

how now

brown

cow

Student 1

Student 2

Student 3

Student 1

Student 2

brown, 2

fox, 2

how, 1

now, 1

the, 3

ate, 1

cow, 2

quick, 1

the, 1

brown, 1

fox, 1

quick, 1

the, 1

fox, 1

the, 1

ate,1

cow,1

how, 1

now, 1

brown, 1

cow,1

brown, 1

brown, 1

Input Output

Page 12: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 12/5518/11/2016

Word Count Execution

the quick

brown

fox

the fox

ate the

cow

how now

brown

cow

Map

Map

Map

Reduce

Reduce

brown, 2

fox, 2

how, 1

now, 1

the, 3

ate, 1

cow, 2

quick, 1

the, 1

brown, 1

fox, 1

quick, 1

the, 1

fox, 1

the, 1

ate,1

cow,1

how, 1

now, 1

brown, 1

cow,1

brown, 1

brown, 1

Input OutputMap tasks Reduce tasks

Page 13: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 13/5518/11/2016

Problem #1

MapReduce I/O sandbags runtime for advanced analytics.

Must persist results after each pass through data

Advanced analytics often requires multiple passes through data

Hadoop

Storage

Hadoop

StorageCompute Store

Page 14: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 14/5518/11/2016

Example

Goal: find best line separating two sets of points

target

random initial line

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1 5 10 20 30

Run

nin

g T

ime

(s)

Number of Iterations

Hadoop

Page 15: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 15/5518/11/2016

Apache Spark

A response to limitations in the

MapReduce

UC Berkeley’s AMP Lab (2009)

Fully open sourced in 2010

15

Page 16: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 16/5518/11/2016

Spark Performance

Machine Learning

100x faster than MapReduce

Queries (Shark)

100x faster than Hive

Streaming

2X throughput of Storm

Graph (GraphX)

10X faster than MapReduce

Page 17: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 17/5518/11/2016

Outline

Introduction to Big Data

User Modeling in Digital Media

Depression Acuity Detection

Cogniciti: An Online Brain Health Assessment

Conclusion

Page 18: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 18/5518/11/2016

The Globe and Mail

Page 19: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 19/5518/11/2016

Digital Media

Dataset Characteristics:

Attribute: 246

Year: 2014-2015

Size: 2,842 GB

Year 2014 (Jan-Jul):

Number of Records: 264,735,412

Number of Visits: 51,748,518

Number of Users: 19,760,853

Page 20: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 20/5518/11/2016

Preprocessing

Data Preprocessing and Cleaning

Filtering out irrelevant hits

Extracting the user types

Extracting the event of interest

Computing the time spent

Roll-up form hit to visit and the user

Converting to expressive format (i.e., json)

Page 21: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 21/5518/11/2016

Frequency-based News Recommendation

News articles are not independent

Pattern(behavior): a list of visited articles

Finding sets of co-occurrence news articles

• Extract rules from the patterns

• 𝑛𝑒𝑤𝑠1 → 𝑛𝑒𝑤𝑠2

Offline

Online

Web Clickstreams

• Discover Frequent Patterns from user logs

• <News1,New2>Phase 1

Phase 2

• Recommend articlesPhase 3

Page 22: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 22/5518/11/2016

Frequent Pattern Mining

Frequent Pattern Mining (FPM)

FPM is a fundamental research topic in data mining

Example application

Discover sets of items (i.e., itemsets) that are frequently purchased together by customers

TID Transaction

T1 {Bread, Milk}

T2 {Bread, Milk}

T3 {Bread, Milk, Diaper, Beer}

T4 {Bread, Milk, Diaper, Beer}

T5 {Diamond, Necklace}

T6 {Diamond, Necklace}

Minimum support threshold: 60%

Sup({Bread, Milk}) = 4/6 = 66.6%

{Bread, Milk} is a frequent itemset

Page 23: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 23/5518/11/2016

Domain Driven Actionable Knowledge Discovery

The identified patterns are handed over to business people

They cannot interpret the patterns for business use

There are many patterns/not informative

Not interested to business needs.

How to interpret the patterns to business actions

Page 24: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 24/5518/11/2016

Insufficiency of Frequent Pattern Mining

In Market Analysis

Business objective: Increase Revenue

May lose infrequent but valuable patterns

May present too many frequent but unprofitable

patterns

Cannot find patterns having high profits

Which patterns can bring high

profits and make high revenue?

Page 25: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 25/5518/11/2016

A Motivation Example

TID Transaction

T1 {Bread(1), Milk(1)}

T2 {Bread(1), Milk(1)}

T3 {Bread(1), Milk(1), Diaper(3), Beer(6)}

T4 {Bread(1), Milk(1), Diaper(3), Beer(6)}

T5 {Diamond(1), Necklace(1)}

T6 {Diamond(1), Necklace(1)}

Item Unit Profit

Bread 20

Milk 30

Diamond 1,000

Necklace 300

Diaper 300

Beer 70

{Diamond, Necklace}: $2600 {Diaper, Beer}:$2640

{Bread, Milk}: $200

Page 26: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 26/5518/11/2016

High Utility Sequential Pattern Mining

Given a set of sequences: find all sequences whose utility is > a user-specified minimum

threshold

Each item has quantity in a transaction

Each item has a value (e.g., price)

Items Profit

Milk $3

Egg $2

Birthday Cake $20

Birthday Card $10

Bread $1

CID TID

C1 T1 {(Bread,2), (Milk,6)}

C1 T2 {(Birthday Card,2)}

C1 T3 { (Birthday Cake,2), (egg,3)}

C2 T3 {(Bread,2), (Milk,4), (Yoghurt,3), (Tuna,5)}

C2 T4 {(egg,5), (Pizza,4), (Juice,2)}

C3 T5 { (Bread,2),( Yoghurt,4), (Milk,3)}

C3 T6 {(Milk,1), (cheese,2)}

Page 27: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 27/5518/11/2016

What is utility? Utility of item in a transaction = internal utility (quantity of items in the transaction) x external utility (profit of the item).

U(Milk,T1 ) = 3 × 6 = 18

Utility of itemset in a sequence = sum of utilities of its items:

U({Bread, Milk}, C1)= 2 × 1 + 6 × 3 = 20

Utility of sub-sequence in a sequence = sum of its itemsets’ utilities

U(<{Milk}{egg}>,C1) = 3 × 6 + 3 × 2 = 24

If more than one occurrence, then maximum value among occurrences

Items Profit

Milk $3

Egg $2

Birthday Cake $20

Birthday Card $10

Bread $1

CID TID

C1 T1 {(Bread,2), (Milk,6)}

C1 T2 {(Birthday Card,2)}

C1 T3 { (Birthday Cake,2), (egg,3)}

C2 T3 {(Bread,2), (Milk,4), (Yoghurt,3), (Tuna,5)}

C2 T4 {(egg,5), (Pizza,4), (Juice,2)}

C3 T5 { (Bread,2),( Yoghurt,4), (Milk,3)}

C3 T6 {(Milk,1), (cheese,2)}

Page 28: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 28/5518/11/2016

Problem 1: Actionability

- Frequent patterns

- 75% frequency

- 2 minutes

- Actionable patterns

- 30% frequency

- 12 minutes

𝑁𝑒𝑤𝑠1

𝑁𝑒𝑤𝑠2 𝑁𝑒𝑤𝑠3

𝑁𝑒𝑤𝑠6 𝑁𝑒𝑤𝑠7

Actionability Business objective

GAP

Page 29: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 29/5518/11/2016

Problem 2: News cold-start

Recent news

Reading behavior

Visited articles

Newly-published articles

Page 30: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 30/5518/11/2016

Probabilistic Language Model

Article distribution over topics

User Based Relational Topic Model

Attractive Topic based

Rules

Personalized News Recommendation

User Logs

RecommendedArticles

Utility-based Reading Behavior

Discovery

News reading

behaviorsRule Engine

Attractive article-based

Rules

First Stage: Utility-based recommendation rules

Second Stage: Topic-based recommendation rulesNews Corpus

PENSYS: The Proposed Framework

Page 31: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat

FIRST STAGE: NEWS LEVEL

Page 32: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 32/5518/11/2016

Stage 1: Utility-based Pattern Mining

Cursor

movement

Time Spent

Engagement Measures

Editorial

Board

Freshness

External knowledge

Internal UtilityFunction (F)

External UtilityFunction (G)

𝑈𝑡𝑖𝑙𝑖𝑡𝑦 = 𝑈 (𝐹, 𝐺) Patterns with Utility ≥ Threshold

Attractive news reading behavior

𝑈𝑡𝑖𝑙𝑖𝑡𝑦(𝑛𝑒𝑤𝑠) = 𝑇𝑖𝑚𝑒 𝑠𝑝𝑒𝑛𝑡 × 𝐹𝑟𝑒𝑠ℎ𝑛𝑒𝑠𝑠

Page 33: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 33/5518/11/2016

Stage 1 Overview

• Extract Utility-Based rules from the patterns

• 𝑛𝑒𝑤𝑠1 → 𝑛𝑒𝑤𝑠2

• Recommendation

Offline

Online

• Discover High Utility Patterns from user logs

• <News1,New2>Phase 1

Phase 2

Phase 3

Web Clickstreams

Page 34: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 34/5518/11/2016

Top-2 High Utility Patterns

Num. Set of news Time

(mins)

Support

1.

Vigil held for daughter of Conservative Party

president 2537 107

MH17: Disaster ratchets up Russia-Ukraine

tensions

2.

Retiree, 60, wonders how long her money will last ,

1473 102Which is better, a RRIF or an annuity You may be

surprised

Page 35: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 35/5518/11/2016

Top-2 Frequent Patterns

Num. Set of news Time

(mins)

Support

1.

Target faces calls to withdraw from Canada ,

144 254Mike Duffy facing 31 charges from Senate

expenses scandal, RCMP says

2.

Florida police say , La Prairie, Quebec mayor dies

from wasp stings 105 286

Canadian professor was killed in targeted attack

Page 36: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 36/5518/11/2016

Recommendation Rules

Page 37: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 37/5518/11/2016

Performance

Only 116000 records

Over 4 GB memory usage in average

Around 45 mins run time

Parallelization

Partition search space

Discover candidates

Aggregate results and find patterns

Page 38: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 38/5518/11/2016

Big Data Framework

Page 39: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 39/5518/11/2016

Results

Page 40: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat

SECOND STAGE: TOPIC LEVEL

Page 41: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 41/5518/11/2016

Personalized News Recommendation

User Logs

RecommendedArticles

Utility-based Reading Behavior

Discovery

News reading

behaviorsRule Engine

Attractive article-based

Rules

First Stage: Utility-based recommendation rules

News Corpus

PENSYS: The Proposed Framework

Page 42: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 42/5518/11/2016

Stage 2: A Semantic Relational Topic Model

News cold start problem

Content based recommendation systems

Similar topics

User behavior?

Hybrid approach

Page 43: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 43/5518/11/2016

Stage 2: A Semantic Relational Topic Model

Attractive News Reading Behavior

p nwm nwn

Topic model p 𝑤𝑖 𝑡p 𝑡𝑖 𝑛𝑤

Aggregate

Semantic relationaltopic model 𝑝 𝑡𝑖 𝑡𝑗

Page 44: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 44/5518/11/2016

Topic-based Rules Example

ElectionSport

Confidence:82%

Page 45: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 45/5518/11/2016

Outline

Introduction to Big Data

User Modeling in Digital Media

Depression Acuity Detection

Cogniciti: An Online Brain Health Assessment

Conclusion

Page 46: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 46/5518/11/2016

Depression

Recalling events

Affected by current mental state

Low quality of data

Diagnosed by

Problems:

Screening questionnaire

Page 47: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 47/5518/11/2016

Depression Acuity Detection

Mental state

Physical wellness

Learning models

Sooner More Accurate

Page 48: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 48/5518/11/2016

Proposed Framework

Apache Spark

Learning Phase

Physical behavioral

patterns

Language model

Emotion model

Learning Models

Clinicians

Social Media

- Type

- Time stamp

- Duration

- Linguistic styles

- Sentiment analysis

Students

Physical Activity

Page 49: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 49/5518/11/2016

Results

1. Depression terms

65% higher

2. Mood classification

69% accuracy

3. Physical wellness indicators

Time

Duration

Sequential order of events matters

Page 50: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 50/5518/11/2016

Outline

Introduction to Big Data

User Modeling in Digital Media

Depression Acuity Detection

Cogniciti: An Online Brain Health Assessment

Conclusion

Page 51: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 51/5518/11/2016

Brain Health

Motivation

47 million people suffer from dementia (Alzheimer’s Society of Canada)

Early detection of dementia: $219 billion saving

Solution

An early warning test for brain health

Page 52: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 52/5518/11/2016

Assessments

Demographic Test Shape match test

Face-name associationLetter-number alternation

Stroop interference

Unified Score

Page 53: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 53/5518/11/2016

72%

153,000

41,000

Test re-test reliability

Site Visits

Completed Assessments

Strong consumer

response from one

Canadian media release

Page 54: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat 54/5518/11/2016

Conclusions

User behavior analysis in big data is an important area of research

We have done some (hopefully) interesting work in this area

Utility-based pattern discovery in big data streams

Depression acuity detection

Online Brain Health Assessment

A lot more research needs to be done!

Page 55: User Behavior Analysis in Big Datadata.science.uoit.ca/wp-content/uploads/2016/08/SDA_15... · 2019-12-18 · Big Data Scientist Spectrum Computing, IBM Globe and Mail Main research

Morteza Zihayat

User Behavior Analysis in Big Data

[email protected]