Real Time Geodemographics

Real time Geodemographics: Requirements and Challenges

Muhammad Adnan, Paul Longley

Current Geodemographic classifications

• Census data• E.g. OA (Output Area) dataset has 41 census variables.

• Variables are weighted according to their importance in classification.

• K-means clustering algorithm is used to cluster data into homogeneous groups.• Multiple runs of K-means due to its un-stability• 10,000 times (Singleton, 2008)

Need for real time Geodemographics

• Current classifications are created using static data sources.

• Rate and scale of current population change is making large surveys (census) increasingly redundant.• Significant hidden value in transactional data

• Data is increasingly available in near real time

e.g. ONS NESS API• Application specific (bespoke) classifications have

demonstrated utility (Longley & Singleton, 2009).

What are real time Geodemographics ?

Specification Estimation Testing

Computational challenges

• Integration of large and possibly disparate databases.• E.g. NHS data; Census data

• Data normalisation and optimization for fast transactions.

• Minimizing computational time of clustering algorithms (Very Important)!

• Common protocol• XML (SOAP)

• Use of non traditional data sources. (Singleton, 2008) • E.g. Flickr; Facebook

Important Challenge: Selection of clustering algorithm

• K-Means• PAM (Partitioning Around Medoids)• CLARA (Clustering Large Applications)• GA (Genetic Algorithm)

K-means

• Attempts to find out cluster centroids by minimising within sum of squares distance.

• K-means is unstable due to its initial seeds assignment.• Sensitive to outliers.

• Creating a Geodemographic classification requires running algorithm multiple times.• 10,000 times (Singleton, 2008)• Computationally expensive in a real time environment.

K-means (100 runs of k-means on OAC data set for k=4)

An example of bad clustering result (K-means)

Alternate Clustering Algorithms

• PAM (Partitioning around medoids) tries to minimize the sum of distances of the objects to their cluster centers.• Less sensitive to outliers than K-means.• Cannot handle larger data sets.

• CLARA (Clustering Large Applications) draws multiple samples of the dataset, applies PAM to each sample and returns the best result.

• GA (Genetic Algorithm) is inspired by models of biological evolution. It produces results through a breeding procedure.

This paper compares

• K-means• Clara• GA

By using three data normalisation techniques• Z-Scores• Range Standardisation• Principle Component Analysis.

• Algorithm stability of K-means, Clara, and GA

Data normalisation techniques used

• Z-Scores• Widely used variable normalisation technique• Can create outliers in the datasets

• Range Standardisation• Standardise values between a range of 0-1• Can erase interesting patterns in the data

• Principle Component Analysis.• Reduces the dimensions of a data set• Can erase interesting patterns in the data

Comparing computational efficiency (Z-scores)

PAM, and GA on the three geographic aggregations of a dataset covering London.

Figure 1: OA (Output Area) level results

Figure 2 : LSOA (Lower Super Output Area) level results Figure 3: Ward level results

Comparing computational efficiency (Range Standardisation)

Comparing computational efficiency (PCA)

Algorithm Stability (w.r.t. Computational time)Figure 10: Running k-means on OA (Output Area) for 120 times on each iteration

Figure 11: Running CLARA on OA (Output Area) for 120 times on each iteration Figure 12: Running GA on OA (Output Area) for 120 times on each iteration

K-means and Principle Component Analysis

• PCA can be used to facilitate K-means clustering by reducing dimensions.

(Ding, C., He, X., 2004)

Figure 13: K-means result for 41 “OAC variables”Figure 14: K-means result for 26 “OAC Principle Components”

K=4 (99% similar)

K-means and Principle Component Analysis

• PCA can be used to facilitate K-means clustering by reducing dimensions.

(Ding, C., He, X., 2004)

Figure 13: K-means result for 4 1 “OAC variables” Figure 14: K-means result for 26 “OAC Principle Components”

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

No. of clusters

Kmeans

PCA_Kmeans

Conclusion

• Clara is plausible alternative to k-means in a real time Geodemographic classification system.

• K-means might be combined with PCA for enhanced computation power.

• In an online environment k-means is better for small data sets.

• Exploration of non traditional data sources.

Real Time Geodemographics

Education

GeoDemographics for Utilities

How The British Population Survey can enhance geodemographics

Applying geodemographics - Market Research Society · Applying geodemographics Joint CGG OACUG Seminar at the MRS Northborough Street 3. rd. November 2008 Martin Callingham Visiting

Exploring Geodemographics Visually

Oracle GoldenGate 11g: Real-Time Access to Real-Time ...download.robotron.de/pdf/oracle/goldengate11g-real... · Oracle GoldenGate 11g: Real-Time Access to Real-Time Information 2

Nordic Conference on Geodemographics 22.-23. May 2017

Paul Richards, UCL CASA ESRC CASE Studentship in collaboration with the Metropolitan Police Service Real time geodemographics for reassurance policing

1 EE514 – Real-Time Computing Basic Concepts on Real-Time Systems EE514 – Real-Time Computing Basic Concepts of Real-Time Systems Department of Electrical

Real-Time simulation │ Real-Time Solutions │ OPAL-RT

Geodemographics profiling of influenza A and B virus

Open Geodemographics: Open Tools and the 2011 OAC

PowerMAX OS Real-Time Guide - Concurrent Real-Time Linux

Real-time Ruby for the Real-time Web

Fire and geodemographics - Tessa Anderson

Real time. Real solutions. - AbsInt · Real time. Real solutions. Real-Time Experts. Real-Time Experts: A strong alliance since 2008. The Real-Time Experts alliance was founded in

Real-Time Java Martin Schöberl. Real Time Java2 Overview What are real-time systems Real-time specification for Java RTSJ issues, subset Real-time profile

Geodemographics 2.0: Research Challenges for Real Time Decision Making in Public Sector Service Delivery

Creating open source geodemographics: Refining a national ... · Creating open source geodemographics: Refining a national classification of census output areas for applications

Towards the Geo-computation of Real-Time Geodemographics

Virtual Geodemographics