Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Building a Big Data Analytics Service Framework
for Mobile Advertising and Marketing
Lei Deng
School of Computer
Northwestern Polytechnical University
Xi'an, China
Jerry Gao and Chandrasekar Vuppalapati
Computer Engineering Department
San Jose State University
San Jose, United States
Corresponding mail: [email protected]
Abstract - The unprecedented growth in mobile device
adoption and the rapid advancement of mobile technologies &
wireless networks have created new opportunities in mobile
marketing and adverting. The opportunities for Mobile
Marketers and Advertisers include real-time customer
engagement, improve customer experience, build brand
loyalty, increase revenues, and drive customer satisfaction.
The challenges, however, for the Marketers and Advertisers
include how to analyze troves of data that mobile devices emit
and how to derive customer engagement insights from the
mobile data. This research paper addresses the challenge by
developing Big Data Mobile Marketing analytics and
advertising recommendation framework. The proposed
framework supports both offline and online advertising
operations in which the selected analytics techniques are used
to provide advertising recommendations based on collected
Big Data on mobile user's profiles, access behaviors, and
mobility patterns. The paper presents prototyping solution
design as well as its application and certain experimental
results.
Keywords - big data analytics, big data application, Big data
analytics service, mobile advertising, mobile marketing, and e-
ncommerce
I. INTRODUCTION
Mobile Advertising poses three distinctive Opportunities and/or Challenges to Marketers and Advertisers: 1) the unprecedented adoption, 2) the customer engagement challenges due to huge mobile datasets, and, finally, and 3) the potential impact of mobility on digital marketing.
First, the Mobile device adoption is increasing at a rapid pace. As indicated in [1], “every day more than 1 million new Android devices are activated worldwide”. Similarly, Apple, in September 2014, announced that “it had sold over 10 million iPhone 6‟s in the first three days of it being available. This is only 1 million more than the over 9 million iPhone 5c‟s and 5s‟ that it sold in 2013[2].” In [3], the McKinsey Global Institute predicted that the full potential of the mobile Internet is yet to be realized; over the coming decade, this technology could fuel significant transformation and disruption, not least from its potential to bring two billion to three billion more people into the connected world, mostly from developing economies. McKinsey institute estimates that the Mobile Internet could
generate annual economic impact of $3.7 trillion to $10.8 trillion globally by 2025. This value would come from three main sources: a) improved delivery of services, b) productivity increases in select work categories, and c) the value from Internet use for the new Internet users who are likely to be added in 2025 [24].
Second, the mobile devices come with different form factors, technologies, data points, and operating systems. The same is true with the users of the mobile devices. The Mobile users exhibit diverse demographics, personal preferences, behavior, social presence, and location usage. Cohen [4] stated that according to Nielsen, 61% of the USA subscriber owned a smartphone. The demographics of the mobile usage in USA shows that 81% of adults aged 25-34 have smartphones. Almost 70% of US teens age 13 - 17 use a smartphone, and 50% of US adults 55+ own a smartphone. This clearly shows the age variation associated with smartphone usage in USA. As the complexity of mobile phone increases and as the mobile users‟ demographics & personal preferences differ, the associated size of datasets with the devices and users‟ will increase dramatically. In order to engage with the customers in a meaningful way, one needs to analyze the huge and diverse datasets.
Third, recent, May 2014, Gartner Market Analysis [5] reveals that Mobility increasingly defines digital marketing. As per the Gartner research, consumers are increasingly using the Mobile phones as the remote for their lives. Gartner Mobility Market Survey reveals that: a) 43% of respondents spend more time on tablets than desktop, b) 80% smartphone owners have used their device while shopping, c) 53% of searchers purchase as a result of a smartphone search, d) and 86% use their phones while consuming other media. The social and commerce activities that consumers engage on their Mobile phones are re-defining Mobile Adverting.
It is clear that the mobile devices present huge opportunities
to the marketers and advertisers. However, there are
challenges too. The challenges are associated with the type of
the mobile data, structured & unstructured; the exclusivity of
the data; the privacy & the context information associated with
the data; and the mining of advertising insights from the
mobile user data. Unless thoroughly data mined, there is a
huge chasm between the mobile users‟ expectations and the
marketers‟/advertisers‟ exclusive campaigns to target
mindshare and wallet share of the mobile users. Analytics will
help the marketers/advertisers in understanding of framing the
consumer behavior and start understand their expectation of
how consumers like to communicate with them. In essence,
the Analytics will help to close the chasm.
This paper addresses the strong demand of mobile advertising
based on a big data analytics approach. Its major contribution
is its proposed innovative solution based a big data analytics
service framework supporting mobile advertising based on
multi-dimension big data analytics. In this approach, we focus
on location-based mobile advertising using by analyzing the
big data including mobile user profiling, mobile app usage
patterns, location-based mobile user access patterns, as well as
merchant related data. The paper presents our big data
analytics methods, service framework design, as well as our
case study results.
The structure of this paper is presented as follows. Section 2
discusses the basic concepts and methods about big data
analytics for mobile advertising. Section 3 presents our big
data analytics service system by focusing on its service
framework. Section 4 discusses its related design and
implementation decisions, and Section 5 shows a case study.
The conclusion and future work is included in Section 6.
II. UNDERSTANDING BIG DATA ANALYTICS FOR MOBILE
ADVERTISING
Big data analytics.
Big data computing environment towards analytics: Hadoop
definitely attracts much attention as it is the first open source
distributed computing environment. But there are other
platforms that have interesting advantages to the typical
Hadoop implementation, especially in the real-time analytics
of dynamic information where Hadoop does not meet the
requirement. Contrast to Hadoop, a batch processing
framework, Storm is a stream processing framework and
focuses on continuous computation [6]. Storm was developed
at twitter to process hundreds of millions of tweets generated
every day and now is an open source big data analysis system.
Spark is a scalable data analysis platform based on In-Memory
Computing and has performance advantage to Hadoop‟s
cluster storage method [7]. Spark is written in Scala and offers
single data processing environment. Spark supports iteration
tasks of distributed data sets.
Big data analytics techniques:There are many Big Data
techniques including association rule learning, data mining, cluster analysis, crowdsourcing, machine learning, text analytics, classification, data fusion, network analysis, optimization, predictive modeling, regression, special analysis, time series analysis and others. So, which ones are used depends on the type of data being analyzed, the technology available and the research questions one is trying to solve.
Marketing analytics and advertising recommendation
Decision Tree: Decision tree algorithm is used to classify
the attributes and decide the outcome of the class attribute. In
order to construct decision tree both class attribute and item
attributes are required. Decision tree is a tree like structure
where the intermediate nodes represent attributes of the data,
leaf nodes represents the outcome of the data and the branches
hold the attribute value. Decision trees are widely used in the
classification process because no domain knowledge is needed
to construct the decision tree. Figure 1 shows simple decision
trees.
Figure 1 Decision Tree Examples
The primary step in the decision tree algorithm is to identify
the root node for the given set of data. Multiple methods exist
to decide the root node of the decision tree. Information gain
and Gini impurity are the primary methods used to identify the
root node. Root node plays important role in deciding which
side of decision tree the data falls into. Like every
classification methods, decision trees are also constructed
using the training data and tested with the test data.
Information Gain: Information gain is used to the root node
and the branch nodes in the decision tree. Information gain is
calculated using entropy and information. Entropy is
calculated using the following formula [8].
Information of the attribute is calculated using the following formula.
Information gain of an attribute is the difference between
entropy and the information of that attribute. The attribute
with the highest information gain is the root node, and the next
level nodes are identified using the next high information gain
attributes. The algorithm and its pros and cons are listed below.
Algorithm:
Step 1: Calculate the information gain for all the attributes
Step 2: Select the root node from the attribute list that has more
information gain
Step 3: For each value of the root node
Step 4: Create a node for the attributes with next highest
information gain.
Step 5: For each value of the nodes
Step 6: Create subset of training data for this node
Step 7: If all the values of class node are same, create a leaf node
and stop
Step 8: Else go to step 5 and continue
K Means Clustering: Clustering is identifying and
classifying the items in to similar groups. K-means clustering
is classifying the items in to k clusters based on their similarity.
K is the number of clusters that we have to decide before
starting the clustering process. The whole solution depends on
the K value. So, it is very important to choose a correct K
value. The data point is grouped in to a cluster based on the
Euclidean distance between the point and the centroid of the
cluster. Initial clustering can be done in one of three ways. 1. Dynamically Chosen: In this method, we will choose
the first K items and assign to K clusters.
2. Randomly Chosen: In this method, we randomly select the values and assign them to K clusters.
3. Choosing from Upper and Lower Boundaries: In this method we will choose the values that are very distant from each other and use them as initial values for each cluster.
Clustering Algorithm: (see Figure 2)
Figure 2 Clustering Algorithm
Step 1: Choose the initial values using one of the above three methods
Step 2: For each additional value Step 3: Calculate the Euclidean distance between this point
and centroid of the clusters.
Step 4: Move the value to the nearest cluster. Step 5: Calculate the new centroid for the cluster. Step 6: Repeat steps 3 to 5 Step 7: Calculate centroid of the cluster. Step 8: For each value Step 9: Calculate the Euclidean distance between this
value and the centroid of all the clusters. Step 10: Move the value to the nearest cluster.
K- Nearest Neighbour: The k-nearest-neighbor method was
first described in the early 1950s. The method is labor
intensive when given large training sets, and did not gain
popularity until the 1960s when increased computing power
became available. It has since been widely used in the area of
pattern recognition [8].
Nearest-neighbor classifiers are based on learning by
analogy, that is, by comparing a given test tuple with training
tuples that are similar to it. The training tuples are described
by n attributes. Each tuple represents a point in an n-
dimensional space. In this way, all of the training tuples are
stored in an n-dimensional pattern space. When given an
unknown tuple, a k-nearest-neighbor classifier searches the
pattern space for the k training tuples that are closest to the
unknown tuple. These k training tuples are the k “nearest
neighbors” of the unknown tuple.
When the „k‟ closest points are obtained, the unknown
sample is then assigned to the most common class among
those k-points. In case of k=1, the unknown sample is
assigned to the closest point in the pattern space. The
closeness is measured using the distance between the two
points. The following table defines some of the approaches to
find distances between two points.
Among the above-mentioned distances, the most used
similarity/distance metric is Euclidean distance followed by
Manhattan. kNN features the following properties.
1. It is instance-based
2. It is a Lazy Learning algorithm
Eager learners construct a generalization model before starting
the classification i.e., before receiving any unknown samples
for classification. Back propagation and Decision tree
induction are examples of eager learners. In contrast to eager
learners, Lazy learners like k-Nearest Neighbors don‟t
construct the classifier until they receive unknown sample.
They just keep storing the training sample till they receive the
unknown sample. So, there is chance that Lazy Learners may
incur additional charges in terms of computation and also
memory to store all the training samples, especially when the
training samples to be compare the unknown sample is great.
Therefore, Lazy Learner algorithms need efficient storage and
indexing techniques. Since the computation in the Lazy
Learners is delayed till the specification of unknown samples,
the classification is slower. Even though, Lazy Learners
feature such disadvantages, they are very good at training.
kNN applies equal weights to all the attributes unlike back
propagation or decision tree. This may lead to perplexity when
there are many irrelevant attributes present in the data [8].
An instance can be considered as a point in the n-dimensional
pattern space. Every feature of the instance is related to each
of the n-dimensional pattern space. The closeness is defined
using the distance measures mentioned in the table where each
of those metrics tries maximize the distance between two
extreme points or the points that lie in two different classes
while minimize the distance between two analogous points or
close points that lie in the same class.
Recommendation Systems: The recommender systems are
grouped into two systems: content-based and collaborative
filtering. Content-based systems examine properties of the
items recommended. For instance, if a Netflix user has
watched many cowboy movies, then recommend a movie
classified in the database as having the “cowboy” genre.
Collaborative filtering systems recommend items based on
similarity measures between users and/or items. The items
recommended to a user are those preferred by similar users.
Recommendation systems always apply knowledge discovery
techniques to provide personalize recommendations. The
amount of people accessing the web today has grown
tremendously and thus, dealing with such a spare data set is a
big challenge for a recommender system. In traditional
recommendation systems, which uses collaborative filtering
algorithm, the amount of work increases as the number of
users increases. The new recommender system has been
designed to quickly produce highly accurate recommendations
for users on the web. There are two types of collaborative
filtering techniques which are generally used to provide
recommendation: user-based collaborative filtering and item-
based collaborative filtering. High level architecture of
recommendation system is shown below.[9]
Figure 3 One Recommendation Process [9]
The recommendation process is performed in three steps, each
of which is handled by a separate component:
Content Analyzer: When information has no structure (e.g.
Geo-Location details, user preferences, social media posts),
some kind of pre-processing step is needed to extract
structured relevant information. The responsibility of the
component is to represent the content of items (e.g. documents,
Web pages, GeoLocation, product descriptions, etc.) coming
from information sources in a form suitable for the next
processing steps. Data items are analyzed by feature extraction
techniques in order to shift item representation from the
original information space to the target one. This
representation is the input to the PROFILE LEARNER and
FILTERING COMPONENT;
Profile Learner: This module collects data representative
of the user preferences and attempts to generalize this
data, in order to frame the user profile.
Filtering Component: This module exploits the user
profile to suggest relevant items by matching the profile
representation against that of items to be recommended.
III. THE SYSTERM OVERVIEW
To support system scalability, we use a holistic approach for location-based ad recommendations. Our system leverages the latest open source technologies to create a big data processing platform. The core recommendation engine provides a training predictive model on a training set by using machine learning algorithms, such as collaborative filtering, clustering and classification.
System Architecture
As shown in Figure 4, we designed and developed a domain-specific big data service platform for mobile advertising and marketing. The system enables location-based
adverting to engage with its target customers by studying their profiles and dynamic behavior patterns. Unlike other data analytics engines, our system provides a holistic advertising recommendation approach for mobile users by providing a real-time big data based solution for precise marketing and analysis. The current system uses the state of the art big data technologies, such as MongoDB and Spark over a cloud
infrastructure. The outcome of this research project consists of three parts: a) precise market advertising and analysis algorithms; b) recommendation analytics algorithms; and c) a prototype system that implements the proposed algorithms and solutions based on location-based advertising solutions.. Figure 4 shows the detailed layered architecture.
Figure 4 A Big Data Ad Recommendation Service System Architecture
The System Functions
Figure 4 shows the following function components.
1) Device Location: If the user has enabled user location
sharing on his device, he can track his location by the GeoID
associated with his current location. The GeoID can be
divided into latitude, longitude corresponding to the zip code
of location associated with that user.
2) Supporting (latitude, longitude) -- > (ZipCode,
Country, State, Street #): Latitude and longitude can be
mapped by looking up the database. The accuracy may vary
depending on the GPS signal and connection or based on the
Wi-Fi router accuracy.
3) User Profile and Interests: The profile of the user to
whom the ads are going to be served also needs to be tracked.
There are various features which would be tracked such as
gender, age, address, profession, interest, etc. Based on the
profile of the user as well the GeoID, the platform would
provide modeled recommendations to the customer on his
mobile applications.
4) Ad Publisher Products Item Sets: Similar to Mobile
User data, the Advertiser Product data items are very
importatnt to generate most appropriate Ads for the Mobile
User. Our system design shall correlate the advertier product
items to User profile or preferences based on pre-defined
system mapping and recommendation ourput. At minimum,
our system assume an advertiser will store all the product item
attributes that enable correlation of mobile users to the
products.(see Table 1 below).
Table 1:The Product Attribute Set Product Item Set Attributes
Product Details – include name, description, availability
Product Category – includes type
Product Attributes such as color, size, mode
Product dimensions
Model Number
Customer reviews (if any)
Product ratings
Product Cost and Manufacturing Details
Product & Location specifics
5) Identifying relevant advertisements: The
recommendation engine would develop a predictive model
based on which the relevant advertisements would be
provided to the end user. The training dataset would be used
to create a model by applying machine learning techniques
such as collaborative filtering, clustering and classification on
the model. A key benefit of the collaborative filtering
approach is that it does not depend upon the machine
analyzable content and hence is capable of exactly
recommending products without priorknowledge of the item
itself.
6) Customer-Oriented Requirements: The customer-
oriented requirements for our system generally involves
providing qualified ad recommendations based on user
preferences, behavior ,and insights from social profile. The
system would track the geo location of the user and based on
the location id and location category. The location category
could be an industrial district, university, hospital,etc. Based
on the location category associated with the user location and
the user profile, the recommendation engine would push ads
to the end user. Ttable 2 below lists the related attributes.
Table 2: MMobile User Attributes Mobile User or Customer Attributes Summary (expandable based on
the Use Cases)
Mobile User Demographics
Mobile Location & Location Preferences
Social Profile Items preferred over the internet or posting of items on
Mobile Application Access
Mobile Commerce Data & Item Click Through Data
Customer Purchase Transaction Data
Key Technologies and Solutions
We have used a number of technologies and developed
several soultions to support our recommendation system.
1) Real-time analytics based on Spark
We have developed a recommendation engine that addresses
both off-line and real-time requirements.
Online processing generally involves direct interaction with
one or more systems and the most current slice of data for data
profiling to pick outliers, real-time customer engagement, real-
time advertisement, etc [10]. This processing usually involves
relatively low throughput since significant time may be spent
waiting for user input. Offline processing is typically batch-
oriented and often involves large volumes of data being
processed with little or no intervention (usually a system
operator rather than an end user) [11].
Apache Spark Streaming is an open source, big-data
processing system intended for distributed, real-time streaming
processing. Streaming implements a data flow model in which
data (time series facts) flows continuously through a topology
(a network of transformation entities). The slice of data being
analyzed at any moment in an aggregate function is specified
by a sliding window, a concept in Complex Event Processing
(CEP). A sliding window may be as low as "last minute", or
"last 60 minutes", which is constantly shifting over time. Data
can be fed to Streaming through distributed messaging queues
like Flume, Twitter, ZeroMQ, Kinesis or plain old TCP
sockets [12].
Sparing Streaming Architecture:
Receive data streams from input sources, process them
in a cluster, push out to databases/dashboards
Scalable, fault-tolerant, second-scale latencies [10]
Chop up data streams into batches of few seconds
Spark treats each batch of data as RDDs (Resilient
Distributed Datasets) and processes them using RDD
operations
Processed results are pushed out in batches [11]
Discretized Stream (DStream)
Represents a stream of data
Implemented as a sequence of RDDs
MapReduce is performed on each batch for aggregation
[10]
Size of each batch is defined as window length which is
in seconds. Sliding interval is the time interval
difference between 2 consecutive windows. [10]
We have developed a recommendation engine that addresses
both off-line and real-time requirements. The recommendation
engine will contain an off-line training system for producing
pre-aggregation for ad recommendation to the end users. It
also will work in real-time style based on the Spark system.
This real-time recommendation system will load in-stream
data as training datasets. Moreover, real-time recommendation
system can leverage pre-aggregation results produced by off-
line batch mode trained machine learning models.
2) GEO information integration with profile datasets.
To support location-based mobile advertsing capability, our
system needs several fundamental profile datasets, such as
geography information for describing several aspects of
location. There are also some other important information for
targeting and ad mapping, such as application profiles,
merchant profiles (i.e. shop profiles) as well as user profiles.
The ad related information is stored in MongoDB, a persistent
data repository which will be constantly updated. For the
client-side, the system is built with pure browser-based
HTML5 technologies.
3) Decisions based ML algorithms
The system provides the approaches based on both online
as well as offline solutions is to provide ad recommendations
as a service for ad and marketing service users. Our goal in
designing the system is to support different business vertical
datasets from various business users. The solutions should
require the minimum knowhow on business domain
knowledge. Hence, we have selected and used the Decision
Tree algorithm since it is an ideal supervised learning
approach requiring the minimum domain knowledge.
The system also uses the K-means clustering approach. Figure 5 shows the procedure below.
Figure 5 The Procedure of K-Means
4) Clustering
We choose the K-means algorithm that takes the number k, which specifies the desired number of clusters. To start, the algorithm takes the first k items as the centers of our k unique clusters. During each iteration, each remaining item is compared to the closest center and populates the clusters. In the next iterations, the cluster centers are re-computed based on cluster centers formed in the previous pass and the cluster membership is re-evaluated. The algorithm presented below is done for these items, in our case mobile user profile data
The system uses the adjusted cosine measure, as following formula, which looks at the angle between two vectors of ratings where a smaller angle is regarded as implying greater similarity. We normalize our data to remove noises. In this
version of the cosine measure, the difference in scale is taken into account.
Where Ri,c is the mobile user preferred item c by user i, Ac is the average rating of user i for all the co-rated items, and Ii,j is the items set both rating by user i and user j.
5) Similarity Analysis
A small but important step in this system is the similarity analysis. It takes the product items information and the customer information and tries to find similarity in them. It takes the help of association rules mining technique to identify the patterns of the customers. As an input, this system takes the customer‟s profile, his interests; purchase history related to advertiser or marketer product and some other information based on his profile. With the help of association rule mining, this system will try and find associations between the products to be suggested and the customer and his buying history, location, preference, and patterns. The figure below explains the methods to calculate the association and tries to predict the confidence of the customer in buying or switching to the product.
The system constructs a utility matrix using the utility function to fetch the data about the users‟ preferences and rating for past recommended or purchased products, and uses them to recommend a new product to someone else. The utility function and the matrix generated from the function
looks like this. Here, the utility matrix gives a clear picture of the missing preferences for the items. The aim of this system is to fill the gaps by referring the products and its attributes fit to the customer preference.
6) Machine learning with high performance
An in-memory cluster computing platform is used that
increases performance by 100X versus traditional Hadoop
deployment. This allows our platform to load data into a
cluster's memory and query it repeatedly making it suitable
for different machine learning algorithms. This allows to
process data faster, and thus helps in scaling the application.
Figure 6 System Component Architecture
SYSTEM DESIGN AND IMPLEMENTATION
This section here discusses three items: a) system component
architecture design, b) system interface and connectivity
design, and c) recommendation engine workflow.
1) Fundamental data components (Profiles and GEO
Info): This location based system will need several
fundamental profile data such as geography information
for describing several aspects of on location. There are
also some other important information for targeting and
ad mapping. They are application profiles, merchant
profiles (shop profiles) and user profile. When all data in
the location (GEO ID) matches to each other‟s, we will
have good ads or some good candidates in the ad space of
user. Then following some business strategies they will
be prioritized.All these information is stored in
MongoDB, they will be kept refreshing.
2) Other data components: There are a few other data
components that are also important for this system. They
are recommendations and result for pushing ads to end
users. Ads and profile analysis for customer review.
These will be stored in MongoDB.
Recommendations and Click-through rate (CTR) history
are improving recommendation machine learning models.
Output files are from off-line batch mode recommendation
system based on Hadoop. There are also several training
datasets in the HDFS.
Loading Balancer: In order to support incremental and
spike workloads, our system shall support load balancing.
Analytics/Recommendation Engine: We provide a real-
time processing ad recommendation engine. This
recommendation engine will involve an off-line training
system for producing pre-aggregation for ad
recommendation to the end users. It also will work in the
real-time processing Spark system.
This real-time recommendation system will load in-stream
data as refresh training datasets. Also, this real-time
recommendation system can leverage the pre-aggregation
result produced by off-line batch mode trained machine
learning models. The recommendation engine tackles various
issues such as cold-start problem by using various machine
learning techniques to improve the performance for new users
accessing our system. It deals with both offline and real time
training of datasets to create recommendation sets for each
user. The most recent sopping history is also used to
understand user behavior.
3) Sampling Engine: For getting a smaller but
representative dataset, this engine will work with
preprocess for improving the performance and reducing
the pressure from huge data scale on the production
system
4) Security: Our system shall support RBAC – Role Based
Access Support.
5) Web services and APIs: For providing web services, we
will build up web services and APIs for getting results
from services layers. This component will be on the layer
of communication.
6) User Interface: The system serve two groups of users:
The first group refers to system customers. They are the
merchant users or ad publishers who own business and
purchase the system provided advertisement services.
They need ad publishing, ad details UI, ad content
management, ad analysis, strategies, and profit analysis.
The second group refers to system end users. They are
the advertisement receivers on mobile devices.
b) System Interface and Connectivity Design In this section we will discuss the system interface and
connectivity design of the analytics service platform. The end-user or customer (merchants) logs into the system with a device through the internet and then browses the web or clicks an app. Then the service request will be sent to the Web server and Web server communicate with the recommendation engine. The recommendation engine uses the data and algorithm stored in the database to perform real-time recommendation to the end users and customers.
c) Recommendation Engine Process Flow
The whole process flow can be divided into step key steps:
Pre-process, Machine learning process and post-process. In
each of these three steps, there are several innovations.
Fig.8. Personalized Recommendation for the End-user
A CASE STUDY
We can simulate to obtain user‟s GEO information by clicking on the Google Map, see below Figure 7 and Figure 8. Based on the returned latitude and longitude, a highly personalized recommendation list is displayed for a specific user. We also look at the bubble chart by using the zip codes. Figure 9 shows the bubbles with same color belong to the same city but different zip code. The larger the bubble is, and the larger number of merchants is there. Similar location related distributions are given in Figure 10 . Here, we used the training dataset and observe the user reviews by different cities around Phoenix, AZ.
CONCLUSION AND FUTURE WORK
This paper presented a novel approach to supporting a location based ad recommendation system using the current
state of art technologies. The project provides a decision based approach to handle various use cases associated with pushing relevant ads towards the end-users. The objective of the project is to undergo the whole process of complete testing and benchmarking which would enable us to put forward a scalable big data ad processing platform in the current market. The project also provides a pilot data analytics approach for the merchants to view their end-users. Once the experimentation of this approach is beta-tested based on the reviews of the merchants, we would improve the data analytics component of our system. The scope of the current project is very large. A lot of features can be added to the system to achieve high scalability supporting real-time processing and recommendation. Currently our system handles only offline modeling and training. In the future, we plan to provide online modeling and training of datasets to enrich mobile user experience.
Figure 7.
Figure.9 Number of merchants bubble chart based zip
Figure 10. User reviews by different cities around Phoenix, AZ
References
[1] Google, “Android, the world's most popular mobile platform”, [Online], 11 Jan 2015, Available: http://developer.android.com/about/index.html
[2] Chuck Jones, “Why 10 Million iPhones Means A Lot More Than 9 Million”, Sep 2013, [Online], Available: http://www.forbes.com/sites/chuckjones/2014/09/23/why-10-million-iphones-means-a-lot-more-than-9-million/
[3] James Manyika, et al., “Disruptive technologies:Advances that willtransform life, business, and the global economy”, May 2013, http://www.mckinsey.com/insights/business_technology/disruptive_technologies
[4] Cohen H.. “How Your Audience Uses Mobile Now.”,[Online], Available: http://heidicohen.com/67-mobile-facts-from-2013-research-charts/
[5] Martin Kihn and Mike McGuire, “Gartner Webinars, Mobile Marketing and Data-Driven Marketing”, Research VP,14 May 2014, http://www.gartner.com/webinar/2689618
[6] Giamas Alex, “Spark, Storm and Real Time Analytics.”, 2014, Availale: http://www.infoq.com/news/2014/01/Spark-Storm-Real-Time-Analytics
[7] Jones M. T. “Spark, an alternative for fast data analytics”, 2012,[Online] http://www.ibm.com/developerworks/library/os-spark/
[8] Jiawei Han and Micheline Kamber and Jian Pei, Morgan Kaufmann, “Data Mining: Concepts and Techniques”, Elsevier Inc. (2011)
[9] Pasquale Lops, et al., “Content-based Recommender Systems: State of the Art and Trends‟, [Online],
http://www.ics.uci.edu/~welling/teaching/CS77Bwinter12/handbook/ContentBasedRS.pdf .
[10] Real-Time, Online and Offline Complex Event Processing [Online]. Avaliable: http://www.thecepblog.com/2009/02/08/real-time-online-and-offline-complex-event-processing . (Accessed: Nov. 5, 2014).
[11] X. Liu. (2013, Sep. 19). Understanding Big Data Processing and Analytics [Online]. Avaliable: http://www.developer.com/db/understanding-big-data-processing-and-analytics.html . (Accessed: Nov. 5, 2014)
[12] T. Das. Spark Streaming [online], Available: http://stanford.edu/~rezab/sparkclass/slides/td_streaming.pdf.
All in-text references underlined in blue are linked to publications on ResearchGate, letting you access and read them immediately.