Analyzing Patterns in Large-Scale Graphs Using MapReduce in Hadoop Joshua Schultz 1 (Undergraduate), Jonathan Vieyra 2 (Undergraduate), Enyue Lu 1 (Faculty Mentor) 1 Dept. of Math. & Computer Science, Salisbury University, Salisbury, USA 2 Dept. of Computer Science, California State Polytechnic University, Pomona, USA Analyzing patterns in large-scale graphs, such as social networks (e.g. Facebook, Linkedin, Twitter) has many applications including community identification, blog analysis, intrusion and spamming detections. Currently, it is impossible to process information in large–scale graphs with millions even billions of edges with a single computer. In this project, we take advantage of MapReduce, a programming model for processing large datasets, to detect important graph patterns using open source Hadoop on Amazon EC2. The aim of this paper is to show how MapReduce cloud computing with the application of graph pattern detection scales on real world data. • Implement MapReduce graph algorithms to enumerate important patterns including • Triangles: three-vertex complete graphs • Rectangles: four-vertex cycles • K-trusses: every edge is in K-2 triangles • Components: there is a path between any pair of vertices • Barycentric clusters: highly connected subgraphs • Analyze the performance of MapReduce graph algorithms • Create a visualization algorithm to visualize the detected graph patterns A synthetic graph G with 100 vertices and 398 edges Enumerating Triangles on graph G Enumerating 4-truss on G Data processed on a cluster was ran on Amazons Elastic MapReduce “small” computers. Each computer was outfitted with 1.7 GB of memory 160 GB of storage and the equivalent of a 1.7GHz Xeon processor. We used different datasets ranging from 1 MB to 1GB including wiki-Vote (7,115 Vertices, 103,689 Edges, 1MB), soc- Slashdot0811 (77,360 Vertices, 905,468 Edges, 10MB), and soc- LiveJournal1 (4,847,571 Vertices, 68,993,773 Edges, 1GB) from Snap Stanford. Enumerating Rectangles on G As shown in the figure above, triangle and rectangle enumerating algorithms scale well when datasets get larger. The above figure shows a steady decline in running time as the number of computers increases for large data (e.g. 1GB). The above figure indicates that the scalability of the truss algorithm depends on the number of MapReduce iterations. This research is funded by National Science Foundation CCF-1156509 under the Research Experience for Undergraduates Program. We would like to thank Professor Randal Burns at Johns Hopkins University for his valuable inputs on our work. Motivation Contributions Graph Visualization for Detected Patterns Experimental Setting Experimental Results MapReduce Programming Model Acknowledgment

Analyzing Patterns in Large-Scale Graphs Using …faculty.salisbury.edu/~ealu/REU/Projects_File/Poster/MapReduce...Analyzing Patterns in Large-Scale Graphs Using MapReduce in Hadoop

Download PDF Report

Upload
ngotram
View
214
Download
2

Embed Size (px)

Citation preview

Page 1: Analyzing Patterns in Large-Scale Graphs Using …faculty.salisbury.edu/~ealu/REU/Projects_File/Poster/MapReduce...Analyzing Patterns in Large-Scale Graphs Using MapReduce in Hadoop

Analyzing Patterns in Large-Scale Graphs Using MapReduce in Hadoop

Joshua Schultz1 (Undergraduate), Jonathan Vieyra2 (Undergraduate), Enyue Lu1 (Faculty Mentor) 1Dept. of Math. & Computer Science, Salisbury University, Salisbury, USA

2Dept. of Computer Science, California State Polytechnic University, Pomona, USA

Analyzing patterns in large-scale graphs, such as social

networks (e.g. Facebook, Linkedin, Twitter) has many

applications including community identification, blog analysis,

intrusion and spamming detections. Currently, it is impossible

to process information in large–scale graphs with millions even

billions of edges with a single computer. In this project, we take

advantage of MapReduce, a programming model for processing

large datasets, to detect important graph patterns using open

source Hadoop on Amazon EC2. The aim of this paper is to

show how MapReduce cloud computing with the application of

graph pattern detection scales on real world data.

• Implement MapReduce graph algorithms to enumerate

important patterns including

• Triangles: three-vertex complete graphs

• Rectangles: four-vertex cycles

• K-trusses: every edge is in K-2 triangles

• Components: there is a path between any pair of vertices

• Barycentric clusters: highly connected subgraphs

• Analyze the performance of MapReduce graph algorithms

• Create a visualization algorithm to visualize the detected

graph patterns

A synthetic graph G with 100 vertices and 398 edges

Enumerating Triangles on graph G

Enumerating 4-truss on G

Data processed on a cluster was ran on Amazons Elastic MapReduce

“small” computers. Each computer was outfitted with 1.7 GB of

memory 160 GB of storage and the equivalent of a 1.7GHz Xeon

processor. We used different datasets ranging from 1 MB to 1GB

including wiki-Vote (7,115 Vertices, 103,689 Edges, 1MB), soc-

Slashdot0811 (77,360 Vertices, 905,468 Edges, 10MB), and soc-

LiveJournal1 (4,847,571 Vertices, 68,993,773 Edges, 1GB) from

Snap Stanford.

Enumerating Rectangles on G As shown in the figure above, triangle and rectangle

enumerating algorithms scale well when datasets get larger.

The above figure shows a steady decline in running time as

the number of computers increases for large data (e.g. 1GB).

The above figure indicates that the scalability of the truss

algorithm depends on the number of MapReduce iterations.

This research is funded by National Science Foundation

CCF-1156509 under the Research Experience for

Undergraduates Program. We would like to thank Professor

Randal Burns at Johns Hopkins University for his valuable

inputs on our work.

Motivation

Contributions

Graph Visualization for Detected Patterns Experimental Setting

Experimental Results

MapReduce Programming Model

Acknowledgment

Towards analyzing large graphs with quantum ... - arXiv

Documents

Building and Analyzing Social Networks Insider Threat Analysis with Large Graphs

Documents

Link Structure Graphs for Representing and Analyzing Web Sites

Documents

Analyzing Probabilistic Graphs Michalis Potamias

Documents

STING: A Framework for Analyzing Spacio-Temporal Interaction Networks and Graphs

Technology

MRShare: Sharing Across Multiple Queries in MapReduce · 2019-07-12 · MRShare: Sharing Across Multiple Queries in MapReduce ... Present-day enterprise success often relies on analyzing

Documents

1-2 Analyzing Graphs of tions and Relations

Documents

1.5 Analyzing Graphs of Functions Examples: Find the ... · 1.5 Analyzing Graphs of Functions Examples: Find the domain and range of several graphs. Vertical Line Test for Functions:

Documents

Analyzing Big Data using Hadoop MapReduce - … · Analyzing Big Data using Hadoop MapReduce ... example, Hadoop is used in ... MapReduce is generally defined as a programming model

Documents

IBM Streams Processing Language: Analyzing Big Data in motionhirzels.com/martin/papers/ibmrd13-spl.pdf · MapReduce [5], Dryad [6], and various languages implemented on top of MapReduce,

Documents

A MapReduce-Based Maximum-Flow Algorithm for Large Small-World Network Graphs

Documents

5.4 - Analyzing Graphs of Polynomial Functions Day 1

Documents

Lecture 4 - Analyzing Massive Graphs Part I

Documents

Mining Tera-Scale Graphs with MapReduce: Theory ...ukang/proposal.pdfMining Tera-Scale Graphs with MapReduce: Theory, Engineering and Discoveries U Kang October 2011 CMU-CS-11-XXX

Documents

1 Lesson 5.3.1 Analyzing Graphs. 2 Lesson 5.3.1 Analyzing Graphs California Standards: Statistics, Data Analysis, and Probability 1.1 Compute the range,

Documents

Analyzing Probabilistic Graphs

Documents

Algorithmic Approaches for Analyzing Large Graphs: Sampling, Sketching ...mcgregor/slides/18-porto.pdf · Algorithmic Approaches for Analyzing Large Graphs: Sampling, Sketching, Streaming

Documents

A MapReduce Input Format for Analyzing Big High-Energy ... · MapReduce heavily relies on the Google File System (GFS) [43], which aims to provide reliable and distributed storage

Documents

Analyzing Graphs of Polynomials

Documents

Analyzing Graphs and Tables - Buffalo State Collegemath.buffalostate.edu/~it/projects/Bodkin.pdf · Analyzing Graphs and Tables By: Paul Bodkin ... when describing objects, ... (Worksheet

Documents

Analyzing Graphs of Functions and Relationsczschwartz.weebly.com/uploads/2/1/4/0/21406690/chapter_1...Analyzing Graphs of Functions and Relations You identified functions. (Lesson

Documents

6 - analyzing graphs

Documents

Analyzing Fleet Test Data · 2 Topics Reasons for analyzing fleet data MATLAB datatypes for improved productivity Scaling up analysis with mapreduce and parallel computing Machine

Documents

MapReduce. MapReduce Outline MapReduce Architecture MapReduce Internals MapReduce Examples JobTracker Interface

Documents

1 QSX: Querying Social Graphs Parallel models for querying graphs beyond MapReduce Vertex-centric models –Pregel (BSP) –GraphLab GRAPE

Documents

Visually Analyzing People with Graphs

Data & Analytics

Analyzing and Characterizing Small-World Graphs

Documents

Extracting and Analyzing Hidden Graphs from Relational ... · Extracting and Analyzing Hidden Graphs from Relational Databases Konstantinos Xirogiannopoulos University of Maryland,

Documents

RDFPath: Path Query Processing on Large RDF Graphs with …zablocki/talks/RDFPath_Path_Query... · RDFPath Path Query Processing on Large RDF Graphs with MapReduce 1st Workshop on

Documents

Analyzing Solar Energy Graphs: MY NASA DATA · 2013-03-26 · Analyzing Solar Energy Graphs: MY NASA DATA Presented by: Marti Phipps. ... Solar Cell Energy Availability from ... This

Documents