Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Extract value fromFacebook Data
Abstract.................................................................................................................................................3
Introduction...........................................................................................................................................3
Building Blocks.....................................................................................................................................4• Configuration files
Parameterized Map/Reduce Program..................................................................................................5• Parameters• Extraction Process
Conclusion............................................................................................................................................6
About the Author...................................................................................................................................6
Contents
© Happiest Minds Technologies. All Rights Reserved2
Abstract
In present times any marketing or customer strategy is
incomplete without a social media presence. With custom-
ers depending all the more on social media channels to
access and disseminate information and reviews, it
becomes all the more important for organizations to tap
social media channels for actionable insights. For analytic
engines that churn out insights required for quick and
intelligent decisions, social media is a key channel that
needs to be explored on a consistent basis.
Organizations are increasingly looking towards accelera-
tors and frameworks that enable them to get the required
intelligence from social media channels. Having the right
accelerator enables the organization make intelligent
decisions regarding their customer behaviour.
Extraction Process and FlowThe process and cornerstones of the accelerator is based
on the understanding that Facebook exposes its data in
form of a structured Facebook schema which can be
accessed via GraphAPIs.
Introduction
Modern organisations lay a lot of emphasis on offering
customized services to their customers. In such a situa-
tion, the customer’s social profile and behaviour related
information play a crucial role. Most of the organisations
have an analytic pattern that is customer centric, descrip-
tive, predictive as well as prescriptive. Organisations have
been putting in huge chunks of investments to get the
required view from their customer data and expect a quick
return on their investments.
When the need of the hour is a deliverable system that is
astute as well as fast and reliable, organisations need to
look at quick plug and play accelerators that will allow
them to access the required information quickly in real
time. The main benefit of such data for an organisation is
that it provides the time it needs to concentrate on analytic
problem statements which gives more importance to data.
The accelerator should not only be quick but also be
effective and enterprising, be able to adapt to changing
conditions, as well as be able to make the best use of the
available resources.
Facebook AcceleratorWith the amount of time the current generation spends on
social media, it is natural that most enterprises are now
trying to keep in touch with their customers through social
channels. It is no surprise that the top social media chan-
nels like Facebook, LinkedIn and Twitter serve as sources
of data in current times.
© Happiest Minds Technologies. All Rights Reserved3
Hosted on an open source big data frame work
Leverages the power of disruptive technology and
ensures that data is available near real time
Since the accelerator is powered by metadata file, it
allows changes to be made as well as version up grada-
tion of face book schema without altering the code.
•
•
•
Facebook has a mechanism called Facebook Query Language (FQL) to allow data querying from the entire Facebook
schema. The complete schema can be found in the URL [https://developers.facebook.com/docs/reference/fql/ ]. A project
by the name “RestFB” - A subset of FQL schema, provides third party classes for the accelerator.
Building Blocks
Configuration files
© Happiest Minds Technologies. All Rights Reserved4
Mandatory configuration file: The tables and columns in this file are imperative for other tables to gather data. While these
tables are independent, the tables in the optional configuration file are dependent.
Optional configuration file: The tables in this file and their corresponding columns are dependent on the tables in the man-
datory configuration file.
•
•
EVENT
eid
name
nid
pic
host
description
event_type
eveny_subtype
start_time
end_time
creator
update_time
location
venue
•
•
•
•
•
•
•
•
•
•
•
•
•
•
STREAM
post_id
app_id
source_id
updated_time
created_time
actor_id
target_id
message
action_links
attachment
comments
likes
privacy•
•
•
••
•
•
•
•
•
•
•
•
PAGE
page_id
pic
page_url
type
company_overview
location
bio
fan_count
••
•
•
•
•
•
•
LIKE
object_id
object_id_cursor
object_type
post_id
Li kecol
user_id•
••
•
•
•
••
•
•
••
•
•
•
•
•
••
USER
uid
first_name
last_name
name
pic
birthday
sex
relationship_status
current_location
interests
about_me
profile_url
family••
•
••
•
•
••
•
•
••
COMMENT
xid
post_id
from id
time
text
id
username
reply_xid•
•
•
•
•
••
•
FQL Table Schema
© Happiest Minds Technologies. All Rights Reserved5
This program is devised to distil data from Facebook and load it in HDFS.
Parameters:
Metadata files make use of linked hashed maps to make sure they retain the order of the existing tables. Optional configu-
ration files are given to all reducers through the distributed cache.
Parameterized Map/Reduce Program
Page id
App id + secure key
Configuration files
Desired database name in HIVE
Number of machines can be specified too (reducers to be launched by Hadoop)
Extraction process involves the following steps
Configuration files are subjected to changes if any
Job is launched with the correct Face book page id as argument
Inside the mapper :
The mandatory configuration file is processed and it collates data from stream and event table
The HDFS folder is used to write output files for the stream and event table
•
•
•
••
•
•
•
Data Access from Analytics programs
Load
Configuration File +
ID and Access Token
FB name or ID of the brand page
Script calls a Map Reduce job
to fetch data in parallel
Fetch Data
Access
HIVEHDFS
Using a plug and play accelerator, teams will get access to almost all the data in near real time and help them do the actual
work (of analytics) rather than data collection and data cleansing. This helps organizations obviate the excess time required
for mundane activities and focus on the more relevant analytics that drive customer insights and revenue growth.
© Happiest Minds Technologies. All Rights Reserved6
Conclusion
Inside the reducer:
The reducer is used to hold the post IDs from the streams and event tables. The number of IDs processed by
the reducers is calculated as the Total number of post IDs/number of reducers fired up. In this way the reduc
ers will have an even distribution of load. While the key is the number of reducers, values are represented by
the post IDs.
The reducers will write their own files which correlate with the tables in the optional configuration file. The
number of reducers can be provided as a parameter while submitting the job.
Post Map-Reduce phase:
Hive script creates database and tables according to the corresponding names specified.
Data from HDFS is copied into appropriate tables in the database created in the above step.
Now data is available in a tabular format and teams requiring this data can connect to Hive database and work
on it.
At the end of the job, the particular program would have collated enough data which gives information on the post, likes
on the post, comments made on the post, number of likes, users who have engaged with the post and basic user informa-
tion. The data will be pushed to Hive Database specified by the user into appropriate tables. The table names and
columns are in accordance with those specified in the configuration files.
Bhawna Manchanda is a Big Data Architect. She plays a key role in conceptualizing and implement-
ing BIG Data Solutions/Framework and Strategies in Happiest minds. She has also worked exten-
sively with Leading Banks in BIDW space.
About the Author
Bhawna ManchandaBig Data Architect
Sunny Malik has a Master’s Degree in Computer Science from University of Southern California
(USC). He has worked extensively on Application Development using open-source technologies
and currently focused on Big Data Technologies and Algorithm Development.
Sunny MalikBig Data Technologies and Algorithm Development
Skanda Bhargav is a Cloudera Certified Hadoop developer. He is a Computer Science graduate
from Viswesvaraya Technological University, Belgaum popularly known as VTU. He has contributed
to 3 books on Big Data subject which was published by http://www.packtpub.com/ .His interests are
Hadoop, Hive, Map Reduce and Sqoop.
Skanda BhargavHadoop developer.
•
•
© Happiest Minds Technologies. All Rights Reserved
Happiest Minds, the Mindful IT Company, applies agile methodologies to enable digital transformation for enterprises and technology providers by delivering seamless customer experience, business efficiency and actionable insights. We leverage a spectrum of disruptive technologies such as: Big Data Analytics, AI & Cognitive Computing, Internet of Things, Cloud, Security, SDN-NFV, RPA, Blockchain, etc. Positioned as “Born Digital . Born Agile”, our capabilities spans across product engineering, digital business solutions, infrastructure management and security services. We deliver these services across industry sectors such as retail, consumer packaged goods, edutech, e-commerce, banking, insurance, hi-tech, engineering R&D, manufacturing, automotive and travel/transportation/hospitality.
Headquartered in Bangalore, India; Happiest Minds has operations in USA, UK, The Netherlands, Australia and Middle East.
To know more about our offerings. Please write to us at [email protected]
Happiest Minds
© Happiest Minds. All Rights Reserved. E-mail: [email protected]
Visit us: www.happiestminds.com
Follow us on
7
This document is an exclusive property of Happiest Minds Technologies.