52
Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY 1 A Model for the Systems Architecture of the Future Prof. Paul A. Strassmann George Mason University, December 5, 2005

Lecture v4

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

1

A Model for the SystemsArchitecture of the Future

Prof. Paul A. StrassmannGeorge Mason University, December 5, 2005

Page 2: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

2

Data-Centric Era; IBM Dominates

1950-1980

Months⇒Weeks

Hundred Sources

Page 3: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

3

Workgroup-Centric Era; Microsoft, INTEL Dominate

1950-1980 1980-2010

Months⇒Weeks

Weeks⇒Days

Hundred Sources

Million Sources

Page 4: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

4

Network-Centric Era; Google and Cisco?

1950-1980 1980-2010 2010-

Months⇒Weeks

Weeks⇒Days

Days⇒Real-Time

Hundred Sources

Million Sources

Billions Sources

Data

+Text

+Multi-Media

Page 5: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

5

Example of a Network-Centric System

Page 6: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

6

Network-Centric Requirements (2010)

• Downtime (< 5 min/yr);• Display (200 Billion ops/sec);• Connectivity (> 1 Gigabyte/sec);• Access (< 0.25 sec);• Innovation (< 1 day);• Security (> 8 sigma).

Page 7: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

7

Performance (2005)

• Infrastructure = > 50% of spending;• Security = ?;• Integration = > 50% of applications;• Network downtime = > 1 hour/year;• Innovation = > 1 year.

Page 8: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

8

Conclusion

• Network-Centric systems cannot bebuilt on Workgroup-Centric architecture.

Page 9: Lecture v4

9

Network-Centric Principles (Google)

1. Build & operate protected informationnetwork;

2. Offer universal connectivity for:– Collection, processing and storing of

information;– Provide secured communications.

3. Maintain shared data models;4. Require continued upgrading & innovation.

Page 10: Lecture v4

10

Google Principle #1

Build & operate protected information network

Page 11: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

11

Standard Google Clusters Operate Net

• 359 racks• 31,654 machines• 63,184 CPUs• 126,368 Ghz of processing power• 63,184 Gb of RAM• 2,527 Tb of Hard Drive space

• Appx. 40 million searches/day

Page 12: Lecture v4

12

Clusters Have Identical Architecture

IndexServers

DocumentServers

WebServers

WebSwitch

Page 13: Lecture v4

13

Google Cluster Set-Up = Three Days

Page 14: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

14

Google Infrastructure: Key to Growth (1)

• >200,000 custom-built commodity servers;• Acting as one parallel supercomputer;• Fault tolerant hardware.• Storage capacity >5 petabytes; low response latency

(0.2 sec); >80GB per server, distributed;• Indexed >8 billion web pages; Indexing is

computationally complex (>500M * > 2B matrix)• Capital and operating costs at fraction of large scale

commercial servers; traffic growth 20-30%/month; datacenters (>12); in US, Europe and Asia.

Page 15: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

15

Google Infrastructure: Key to Growth (2)• >200,000 commodity Linux servers built to custom

specifications; distributed cluster architecture; acting as oneparallel supercomputer; scaleable;

• >50,000 requests/sec; fault tolerant (no single point of failure);diverse hardware; stripped version of Red Hat;

• Storage capacity >5 petabytes;• >80GB per server;• Indexed >8 billion web pages;• Indexing is complex (500M x 2B matrix)• Capital and operating costs at fraction of large scale commercial

servers; traffic growth 20-30%/month; data centers (>12); in US,Europe and Asia.

Page 16: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

16

Google Infrastructure: Key to Growth (3)• >200,000 commodity Linux servers built to custom

specifications; distributed cluster architecture; acting as oneparallel supercomputer; scaleable;

• >50,000 requests/sec; fault tolerant (no single point of failure);diverse hardware; stripped version of Red Hat;

• Storage capacity >5 petabytes; low response latency (0.2 sec);>80GB per server, distributed;

• Indexed >8 billion web pages; Indexing is computationallycomplex (>500M * > 2B matrix)

• Capital and operating costs a fraction of commercial servers;• Traffic growth 20-30%/month;• Data centers (>20), in US, Europe and Asia.

Page 17: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

17

Architecture for Reliability

• Replication (3x+) for redundancy;• Replication for proximity and response;• Reliability with software and architecture,

not with hardware.

Page 18: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

18

Indexing for Response

• Dynamic indexing of 8B+ pages;• Dynamic indexing of 1B+ images;• Indexing of 1B+ messages;• Index broken into “shards” and

distributed across data centers.

Page 19: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

19

Query Serving Infrastructure

• Processing a single query may involve1000+ servers;

• Index Servers access Index Shards;• Document Servers access Doc Shards;• Response times monitored to assure

<0.25 sec latency.

Page 20: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

20

Google MapReduce System (1)

• Coordinates servers in real-time;• Automates distribution of workload;• Fault tolerance and service reconstitution;• Systems-wide I/O cluster scheduling;• Status and performance monitoring.

Page 21: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

21

Google MapReduce System (2)

• Coordination of servers in real-time;• Automates distribution of workload;• Fault tolerance & service reconstitution;• Systems-wide cluster scheduling;• Status and performance monitoring.

Page 22: Lecture v4

22

Google Principle #2

Universal connectivity

Page 23: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

23

Multi-Lingual Services

Page 24: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

24

Search in Arabic Media

Page 25: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

25

Video Searches

Page 26: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

26

Google Base - Connecting Diverse Sources

Locate events within45 miles of New York in

November, 2005

Page 27: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

27

Semantic Parsing

• Tools parse millions of documents;• Automated learning for related information.

– Query: “Bay Area Cooking Classes”

– Finds: “San Francisco College Classes”;“The Magic of Thai Cuisine”

Page 28: Lecture v4

28

Google Principle #3

Shared data models

Page 29: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

29

Data Engineering

• Standard file management: The GoogleFile System (GFS);

• Standard job scheduling: The Global WorkQueue (GWQ);

• Standard network management: TheGoogle MapReduce system.

Page 30: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

30

Google File System (GFS)

• Replicated Masters manage MetaDatadirectories;

• Data transfers directly at the machinelevel within 2,000+ clusters;

• File broken into 64 MB chunks for2000+ MB/second read/write load;

• All file chunks at least triplicate forsafety.

Page 31: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

31

Data Dictionary for Interoperability

Page 32: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

32

Application Interfaces

Page 33: Lecture v4

33

Google Principle #4

Upgrading & Innovation

Page 34: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

34

Deliver On-Line Services

Page 35: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

35

Shopping Services

Page 36: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

36

Environment for Rapid Innovation

Page 37: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

37

A New Application Launched in 15 Minutes

Page 38: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

38

Occupy the Desktop

Page 39: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

39

Multimedia Services

Page 40: Lecture v4

40

Workgroup vs. Network Architectures

Comparison Summaries

Page 41: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

41

Workgroup Computing Today:Millions of Local Applications+Local Data

Page 42: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

42

Work-Groups Vulnerable Today

Page 43: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

43

New Internet: Billions of Browsers,Secure Shared Applications+Data

ApplicationApplicationApplication

Browsers

Page 44: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

44

Workgroup vs. Network Architectures (1)

Workgroup Centric Network Centric

Strategy: Capture Desktop Strategy: Occupy InternetCustomer’s labor and capital Labor and capital in networkUser-specific infrastructures Infrastructure is universalSystems controls by user Network controls in networkOperating system dependency Open source browsersLicense Software Pay for UseData read from files Data assembled in context

Page 45: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

45

Workgroup vs. Network Architectures (2)

Workgroup Centric Network Centric

Strategy: Capture Desktop Strategy: Occupy InternetCustomer’s labor and capital Labor and capital in networkUser-specific infrastructures Infrastructure is universalSystems controls by user Network controls in networkOperating system dependency Open source browsersLicense Software Pay for UseData read from files Data assembled in context

Page 46: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

46

Workgroup vs. Network Architectures (3)

Workgroup Centric Network Centric

Strategy: Capture Desktop Strategy: Occupy InternetCustomer’s labor and capital Labor and capital in networkUser-specific infrastructures Infrastructure is universalSystems controls by user Network controls in networkOperating system dependency Open source browsersLicense Software Pay for UseData read from files Data assembled in context

Page 47: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

47

Workgroup vs. Network Architectures (4)

Workgroup Centric Network Centric

Strategy: Capture Desktop Strategy: Occupy InternetCustomer’s labor and capital Labor and capital in networkUser-specific infrastructures Infrastructure is universalSystems controls by user Network controls in networkOperating system dependency Open source browsersLicense Software Pay for UseData read from files Data assembled in context

Page 48: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

48

Workgroup vs. Network Architectures (5)

Workgroup Centric Network Centric

Strategy: Capture Desktop Strategy: Occupy InternetCustomer’s labor and capital Labor and capital in networkUser-specific infrastructures Infrastructure is universalSystems controls by user Network controls in networkOperating system dependent Open source browsersLicense Software Pay for UseData read from files Data assembled in context

Page 49: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

49

Workgroup vs. Network Architectures (6)

Workgroup Centric Network Centric

Strategy: Capture Desktop Strategy: Occupy InternetCustomer’s labor and capital Labor and capital in networkUser-specific infrastructures Infrastructure is universalSystems controls by user Network controls in networkOperating system dependent Open source browsersLicense Software Pay for UseData read from files Data assembled in context

Page 50: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

50

Workgroup vs. Network Architectures (7)

Workgroup Centric Network Centric

Strategy: Capture Desktop Strategy: Occupy InternetCustomer’s labor and capital Labor and capital in networkUser-specific infrastructures Infrastructure is universalSystems controls by user Network controls in networkOperating system dependent Open source browsersLicense Software Pay for UseData read from files Data assembled in context

Page 51: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

51

The Future

Technology• All electronic devices on Internet.• Data, voice, video, sensor inputs accessible.• Phone, TV and print media displaced.

Services• Systems respond to questions.• Information is displayed in context.• Applications for decision-making.

Page 52: Lecture v4

Prof. Strassmann, GMU Lecture, 12/05/05 - REPRODUCED BY PERMISSION ONLY

52

Relevance for National Security Systems

• Workgroups to Network-Centric services.• Migrate through displacement.• Invest savings in innovation.