19
Dagstuhl Scalable Visual Analytics University of Illinois Role of Mashups, Cloud Computing, and Parallelism for Visual Analytics Loretta Auvil

University of Illinois Role of Mashups, Cloud Computing, and Parallelism for Visual Analytics Loretta Auvil

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Dagstuhl Scalable Visual Analytics

University of Illinois

Role of Mashups, Cloud Computing, and Parallelism for

Visual Analytics

Loretta Auvil

Dagstuhl Scalable Visual Analytics

University of Illinois

Outline

Dagstuhl Scalable Visual Analytics

University of Illinois

SW Silos

We continue to build silos.. Why?

I’m only creating a prototype for my paper… I want to have control… I want to write my own code… I can do it faster… I’m not funded to integrate with… …

Images from Google Search

Dagstuhl Scalable Visual Analytics

University of Illinois

From Silos to Mashups

Definition: Mashup is a web page or application that uses and combines data, presentation or functionality from two or more sources to create new services

Why do we want this? Enable out services in many applications and on a variety of

devices (laptop, high-res display wall, ipad, iphone or the others) Share and reuse is a good thing Reach communities with our tools and their data!!!

What can we do to change this? We can think and create data driven solutions so that they can

be mashed up with other tools. We can build web services that can be deployed or accessed. We can create API’s to be used.

How can we do this?

Dagstuhl Scalable Visual Analytics

University of Illinois

Mashup Framework

Components

Virtualization Infrastructure

Meandre Infrastructure

Visualization

Component Repository

Component Discovery

Meandre Data-Intensive Flows

Apps ServicesPlugin

sWeb Apps

Analytics

Data

Develo

per

Tools

Repositories

DataAnalysis

ComponentsFlows

User Interfaces

Computational Resources

Visualizations

Meandre Workbench

Dagstuhl Scalable Visual Analytics

University of Illinois

Kepler

Triana

BPEL

Ptolemy II

Taverna

Trident

Meandre

VisTrails

David De Roure slide (slightly modified)

BPEL

Scientific Workflows

Dagstuhl Scalable Visual Analytics

University of Illinois

Meandre for Mashups

Major Capabilities Dataflow execution Semantic technology (using RDF for storing meta info) Web-Oriented Supports publishing services for data, analytics and

visualization Modular components Encapsulation and execution mechanism Promotes reuse, sharing, and collaboration Cloud-friendly infrastructure

Note: (for Tom) Trading off some performance for reuse, flexibility and modular components… with option to parallelize components to improve performance

Dagstuhl Scalable Visual Analytics

University of Illinois

Components

Analytics

• Unsupervised Learning• Clustering• Frequent Pattern

Analysis (Rule Association)

• Supervised Learning• Naïve Bayesian• Support Vector

Machines (Weka)• Decision Trees (c4.5)

• Optimization Approaches• Genetic Algorithm

• Text Analysis (POS, Entity Ext)• OpenNLP• Stanford NER

Visualization

• Geographic (Google Maps)

• Temporal (Simile)

• Network Graphs – Link Nodes and Arcs (Protovis)

• Parallel Coordinates (Protovis)

• Stacked Area Chart (Flare)

• Tag Cloud Maker

• Decision Tree (Applet D2K)

• Naïve Bayes (Applet D2K)

• Rule Association (Applet)

• Dendogram (GWT)

Dagstuhl Scalable Visual Analytics

University of Illinois

Readability Analysis

Meandre Services from Firefox Plugin

Tag Cloud Analysis

Date Entity to Simile TimelineNetwork Analysis

Automatic Summarization

Location Entity to Google Map

Example: Zotero and SEASR

Dagstuhl Scalable Visual Analytics

University of Illinois

Cloud Metaphor The term cloud is used as a metaphor for

the Internet, based on how it is depicted in computer network diagrams and is an abstraction for the complex infrastructure it conceals

Cloud Computing – Definition The first academic use of this term appears to define it as a

computing paradigm where the boundaries of computing will be determined by economic rationale rather than technical limits.

Cloud computing is a paradigm of computing in which dynamically scalable and often virtualized resources are provided as a service over the Internet. Users need not have knowledge of, expertise in, or control over the technology infrastructure in the "cloud" that supports them

http://en.wikipedia.org/wiki/Cloud_computing

An Ideological Metaphor & Definition

Dagstuhl Scalable Visual Analytics

University of Illinois

Cloud Computing

How can we leverage these computation environments? Known issues

Cloud mechanics have a steep learning curve.. Data movement to the cloud Security

Next generation data-intensive applications will: Use cloud computing technologies and conduits Require adaptation of programming paradigms Leverage a flexible and modular architecture Promote processing and resources at scale Distributed data flow designs to allow processing to be co-

located with data sources and enable transparent scalability

Dagstuhl Scalable Visual Analytics

University of Illinois

Meandre in the Clouds

Meandre Data-intensive execution engine Component-based programming architecture Orchestrate cloud deployments Leverage cloud conduits

NCSA Virtual Machines & Enterprise Cloud VMWare, Xen, & Eucalyptus ElasticFox & AMS Web Application

Dagstuhl Scalable Visual Analytics

University of Illinois

Components for Amazon & EucalyptusComponents can be

created to: List images Launch/

terminate instances

Transfer Data or Programs to running instances

Trigger process computation

Monitor processes and/or persistent services

Dagstuhl Scalable Visual Analytics

University of Illinois

Cloud Orchestration Data Flow

Dagstuhl Scalable Visual Analytics

University of Illinois

Parallelism

Writing parallel code can be hard and debugging even harder…

But we need it because our data sets are growing… And software tools can help And hardware is also available

MapReduce model a powerful abstraction (software framework) developed

by Google to support distributed computing on large data sets on clusters of computers

Hadoop is an open source version GPUs

Dagstuhl Scalable Visual Analytics

University of Illinois

Meandre for Parallelism

Implemented a Script Language (ZigZag) Implemented MapReduce in Meandre Automatic Parallelization for stateless components

Adding the operator [+4] or [+4!] would result in a directed graph

# Describes the data-intensive flow

#

@pu = push()

@pt = pass( string:pu.string ) [+4!]

print( object:pt.string )

Dagstuhl Scalable Visual Analytics

University of Illinois

Scaling Genetic Algorithms in Meandre

Intel 2.8Ghz QuadCore, 4Gb RAM. Average of 20 runs.

Dagstuhl Scalable Visual Analytics

University of Illinois

And With Hadoop

60 Dual Quad Core Xeons with 8GB RAM. GB Ethernet

Resources exhaustion

Dagstuhl Scalable Visual Analytics

University of Illinois

Summary