26
Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Embed Size (px)

Citation preview

Page 1: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Future Directions and Course Wrap-up

Zachary G. IvesUniversity of Pennsylvania

CIS 455 / 555 – Internet and Web Systems

April 29, 2008

Page 2: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

2

Today’s Plan

Reminders: Project demos the afternoon of May 9 Project report due May 12 before the final, INCLUDING

experimental evaluation

A brief discussion on experimental methodology and some suggestions

Where the Internet and Web might be heading

A few minutes for any pressing questions … and course evaluations

Page 3: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

3

Experiments – Show It’s So!

The general goal: to help demonstrate and show why a real-world artifact provides a benefit Versus some benchmark or naïve strategy We also want to understand why there’s a benefit

Some common kinds of experiments: Usability: some sort of user tests, versus a benchmark Performance: as we increase the workload, what

happens? Scalability: as we increase the data, devices, nodes,

what happens? Complexity: especially for things like code, what

happens as we make the task harder or bigger?

Page 4: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

4

Experimentation In general, experiments should follow the scientific

method: Hypothesis (e.g., our method will do better than XYZ on

workloads like QWV, which are representative of domain ABC)

Experiment (examine this – may need many trials, random workloads, etc.)

Conclusion (show, with statistically significant measurements, that the hypothesis is true)

Often, the hypothesis almost goes unsaid in computer science – it’s implicit in the choice of the problem – but it is there!

Note that many attributes, e.g., elegance, style, are not very amenable to experiments

Others, like expressiveness, generally need to be proven rather than run

Page 5: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

5

Experimental Workloads There are generally three kinds of systems

experiments: Synthetic microbenchmark: experimental runs are done

over inputs that are generated to stress a specific factor, but is not particularly realistic

Examples: a hard disk random access test; a web server’s maximum throughput

Really shows the factor of interest; can be tweaked, scaled, etc.

Synthetic based on real behavior: experimental runs are done over inputs that are modeled after real data, but perhaps generated randomly

Examples: SPEC benchmarks; TPC-W web transaction benchmark

Enables us to generate more inputs, testing scalability, etc. Real-world: traces are collected of real system behavior

over real data Disadvantage: hard to quantify or control the different factors

Page 6: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Experimental Methodology

Consider the important factors that you wish to examine (and demonstrate) Scalability – can typically be in terms of running time, size of the

problem, space consumed, etc. Here: performance is what matters

Break it down into individual parameters Crawl & index time; time to answer a query; etc.

Consider a workload that helps measure the parameter Crawl 1000 documents; run 50 queries 10 times apiece; etc.

Vary one parameter at a time, study effects Number of machines; number of threads per machine; etc.

Run experiment multiple times; average and show 95% confidence intervals in line (continuous) or bar (discrete) chart

6

Page 7: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

7

The Future: Where Is the Internet and the Web Headed?

Technology trends: Larger numbers of compute nodes (clusters,

embedded devices, multicore, etc.) Bandwidth goes up, latency doesn’t Wireless and mobile devices Heterogeneous devices on the same network

General goals: Provide higher-level programming abstractions,

more automatic configuration/inference, especially as complexity goes up

Scalability, reliability, availability, …

Page 8: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Trend 1: Wireless Sensor Devices

Useful for environmental monitoring Interesting connection between digital &

physical world Challenges:

Many, many devices (redundancy) Limited power, CPU, bandwidth High rate of failure and error Very local knowledge – only proximity

8

http://robotics.eecs.berkeley.edu/~pister/SmartDust/

Page 9: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

9

The Problems of Focus Hardware: more efficient, more powerful nodes Robustness: need to combine info from many

sensors to account for individual errors Routing: need to aggregate data in a power-

efficient way Streams: data is an infinitely long sequence – how

do we deal with that? Summarization data structures (data is roughly according

to this distribution) Operations over “sliding windows”

Programming: how do we express what we want to do with sensor networks Surprisingly effective: XQuery/SQL-like languages for

monitoring data (e.g., TinyDB [Madden+03])

Page 10: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Example: Sensor Net Research at Penn(Ives, Guha, Lee, Loo; Mihaylov, Liu, Jacob)

The Internet has many “streaming” data sources & devices Motes, routers, monitoring software on servers, etc.

Can we build apps that let us integrate and monitor relevant data, without worrying about device specifics?

The key idea: use query languages (think XQuery or SQL) as the basic way of requesting sensor data Extend with ideas from data integration, to support

heterogeneous sensors, combining sensor data with databases, etc.

Figure out how to optimize these queries

Supplement the query language with Java, etc.

10

Page 11: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Why Query Languages?

They make programming data-centric, not device-centric Everything abstracted as tables / XML documents Request all data values with a particular property,

etc.

They allow for simple composition (views)

They are amenable to optimization Idea: place computation at “the right” nodes in the

network

11

Page 12: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Basic Approach 1/5Hide physical connectivity and location details from programmer – group data sources into abstract relations

Mic(lat, long,time,sample)

Video(lat, long,time,frame)

Page 13: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Basic Approach 2/5

Mic(lat, long,time,sample)

Video(lat, long,time,frame)

Represent each sensor as the source of a stream of time-varying tuples

(385301,770201,1,)

(385302,770201,1,)

(385303,770201,1,)

(385301,770202,1,)

(385301,770202,1, )

(385302,770202,1,)

(385300,770200,1,―)

(385302,770200,1,―)

(385300,770201,1,―)

(385301,770202,1, ┘)

(385300,770200,1,―)

,(385300,770200,2, ―)

,(385302,770200,2, ┘)

,(385300,770201,2, ┘)

,(385301,770202,2, ┐)

,(385300,770200,2, ―)

, (385301,770201,2,)

, (385302,770201,2,)

, (385303,770201,2, )

, (385301,770202,2,)

, (385301,770202,2,)

, (385302,770202,2,)

,…

,…

,…

,…

,…

, …

, …

, …

, …

, …

, …

Page 14: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Basic Approach 3/5

“Show me all of the video frames between [38°53.01’,77°02.01’] and [38°53.03’,77°02.01’] with a ”

“How many video frames with a are also near a microphone sample with sound?”

… Can also combine with lookups in tables to do data integration

e.g., “Show me video frames with a that fall within the coordinates of the conference room inRoomTable?”

e.g., “Find the ssn of Bob Smith, use this to look up histransponder ID, and show me video near him”

Support queries based on properties of the data, independent of the devices

Page 15: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Basic Approach 4/5Support logical views – “abstract sensors” integratingdata from different types of lower-level sensors

(385301,770201,1,), (385301,770201,2,)

(385302,770201,1,), (385302,770201,2,)

(385303,770201,1,), (385303,770201,2, )

(385301,770202,1,), (385301,770202,2,)

(385301,770202,1, ), (385301,770202,2,)

(385302,770202,1,), (385302,770202,2,)

(385300,770200,1,―), (385300,770200,2, ―),…

(385302,770200,1,―), (385302,770200,2, ┘) ,…

(385300,770201,1,―), (385300,770201,2, ┘) ,…

(385301,770202,1, ┘), (385301,770202,2, ┐) ,…

(385300,770200,1,―), (385300,770200,2, ―) ,…

AVObservations(lat, long,time,frame,sample) :- video(lat,long,time,frame), mic(lat2,long2,time,sample)

where dist(lat,long,lat2,long2) < 5m and sample > ― and frame >

Page 16: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Basic Approach 5/5

(385303,770201,1,), (385303,770201,2, )

(385301,770202,1, ),

(385302,770200,2, ┘) ,…

(385300,770201,2, ┘) ,…

(385301,770202,1, ┘), (385301,770202,2 ┐) ,…

(385303,770201,1,, ┘), (385301,770202,1,, ┘), (385303,770201,2,, ┘), (385303,770201,2,, ┘), (385303,770201,2, , ┐),

Support logical views – “abstract sensors” integratingdata from different types of lower-level sensors

AVObservations(lat, long,time,frame,sample) :- video(lat,long,time,frame), mic(lat2,long2,time,sample)

where dist(lat,long,lat2,long2) < 5m and sample > ― and frame >

Page 17: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Challenges We Are Addressing

Data integration has been based on static Data integration has been based on static datadata AdaptAdapt mappings, queries to stream data, including

timing, synchronization, link properties, …

Optimization of queries is hard in the Optimization of queries is hard in the simplest case, and here we need to do it simplest case, and here we need to do it in distributed fashion with limited in distributed fashion with limited knowledgeknowledge Distribute computation Distribute computation to the network, and to the

devices with the “right” position and “right” capabilities

17

Page 18: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

From Small to Big Devices: Cloud Computing

Four years ago, “grid computing” – mostly intended for science – tried to make large supercomputers available to run batch jobs “Grid” as in “electric power grid” Very difficult problems: allocating jobs to nodes,

locating resources, scheduling, etc. Many felt this didn’t succeed

Today’s buzzword: “cloud computing” Actually captures many different compute models Basics: someone else with cluster expertise

maintains large numbers of machines; they run your jobs for you

18

Page 19: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

Cloud Computing Capabilities

Google App Engine: hosts Web apps Python-based programming environment, get/put

storage interface, connections to Google accounts, URL fetching

Automatic scaling & load balancing

Amazon: hosts a variety of compute, storage jobs Simple Storage Service (S3) – get/put access via

REST/SOAP Elastic Compute Cloud (EC2) – runs virtual machines SimpleDB – table-oriented storage / query interface

Can you name a few challenges?

19

Page 20: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

20

The Semantic Web

Tim Berners-Lee, creator of the web: Let’s re-imagine the Web as a means of

interlinking meaning rather than just providing hyperlinks

All information will be annotated with its semantics, and it will be easy to map between different interpretations

Google, ca. 2010, might actually be able to give you answers instead of web pages

A nice dream – is it realizable?

Page 21: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

21

The Content of the Semantic Web

Resource Description Framework -- RDF Triples, describing objects (with IDs), properties, and

values (which also may reference other objects)(NetworkBook, hasAuthor, Rexford)

(Rexford, memberOf, Person)

RDF triples describe a graph, not a tree

RDF has (several) XML representations with some built-in concepts like identity

Page 22: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

22

RDF Example, Visualized

Page 23: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

23

Semantics through Ontologies

Schemas describe simple relationships between concepts

An ontology is like a very sophisticated class hierarchy over which queries may make inferences: Expresses basic concepts and their relationships In the Semantic Web, express constraints on concepts in a

language called OWL: PetOwner(x) <=> Person(x) and CardinalityOf(Pet(x,y)) > 0 DogOwner(x) <=> PetOwner(x) And Exists Pet(x,y), y isa Dog CatOwner(x) <=> PetOwner(x) And Exists Pet(x,y), y isa Cat

Can ask: what classes does a person with a dog and a cat belong to? Is Person(EricMiller) a DogOwner?

Page 24: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

24

Ontologies and the Semantic Web

The goal: start categorizing things into ontologies – map meanings of various entities Different ontologies can be defined, with

different “namespaces” Can build many different topic-specific Semantic

Webs

The Semantic Web technologies have been fairly stable for a while… So why aren’t we seeing more implementations?

Page 25: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

25

A Possible Pitfall of Today’s SW

Data integration teaches us that there’s a huge problem in mapping between different representations In database-land, these are schemas; in the Semantic Web,

ontologies The Semantic Web doesn’t have good technologies for mapping –

simple conversions, e.g., dollars to Euros, aren’t expressible

A middle ground: Can we extend ideas from data integration, like mappings

between XML schemas, to get most of the Semantic Web’s benefits?

Something we and others are pursuing – e.g., Hyperion @ Toronto, Orchestra @ Penn, …

Page 26: Future Directions and Course Wrap-up Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems April 29, 2008

26

Recap

Distributed, Web-scale systems are here to stay They create many issues that are not totally

resolved, and for which there is no one answer: Heterogeneity Timing Partitioning and replication Consistency and integrity Etc.

This course tried to give you a sense of the issues and state-of-the-art – as well as the skills to go out and work in this domain I hope the amount of work we all sank into the material

(and the homeworks) will pay off for you! And stay tuned – there’s lots more to come!