View
218
Download
4
Tags:
Embed Size (px)
Citation preview
1
Next Generation Information SystemsNext Generation Information Systems
Avi Silberschatz
Department of Computer Science
Yale University
URL: www.cs.yale.edu/~avi
Silberschatz2Next Generation Information Systems
The Digital AgeThe Digital Age
Digital information forms the glue for blending the fields of computing, communication and entertainment.
At the center of this revolution is data that is stored, accessed and delivered in digital format. Some of the major issues surrounding this type of data are:
Data is to be available to the users anytime and anywhere and with the desired QoS.
Data access must adhere to privacy and security policies.
Data Interoperability.
Fast access to data, which implies support for queries with approximate answers.
Data analysis and mining capabilities over very large datasets.
Many of the advances in information systems are due to development of new technologies. These advances, in turn, are pushing the developments of even newer technologies.
Silberschatz3Next Generation Information Systems
Research ChallengesResearch Challenges
Storage retrieval and delivery of multimedia data
Storage System Issues
QoS issues of continuous media data (e.g., video and audio)
Approximate answers
useful for very large data sets
useful for Web searching
Data mining
Discovering “interesting” patterns in very large data sets
Discovering “interesting” patterns from incomplete information
Data Interoperability
Privacy and security
Next generation Networks
Converged networks
Network Management
Silberschatz4Next Generation Information Systems
Multimedia DataMultimedia Data
Regular Data
text, binary, image
Database Data
tuples, objects
Continuous Media Data
Video Data
The display (playback) of the data must be continuous with a fixed rate, which is typically 30 frames/second.
A viewer may wish to control the way the data is to be displayed by applying various VCR-type operations to the video data.
Audio Data
The playback must be continuous with a fixed rate, which is dependent on the sample rate.
A listener may wish to control the way the data is played back.
Silberschatz5Next Generation Information Systems
Storage System IssuesStorage System Issues Rapid growth in storage capacity demand
world-wide installed storage: 738 PetaByte in 2000 over 75% per year storage capacity increase over the next 5 years reaches ZettaByte in 2009
data stored at Global 2500 companies double every 18 months data stored at e-commerce companies grow at 400% a year
Management 40-50% of company IT budget is spent on storage fraction of IT budget spent on storage is expected to grow cost for storage management exceeds cost of storage equipment
management: $300 per GB per year low-end storage: $14 - $50 per GB (packaged, powered, networked)
management cost is expected to grow Storage Requirement
24 x 7 Disaster recover
Silberschatz6Next Generation Information Systems
Storage is Moving Into the NetworkStorage is Moving Into the Network
Motivation Use commodity IP based networks IT staff know-how Distance and universal access
Applications Disaster recovery Archiving Backups Content Distribution Managed storage Value added storage services Consolidation of storage
Silberschatz7Next Generation Information Systems
IP-Based Network StorageIP-Based Network Storage
Storage is managed possible by different domains
Storage devices are connected over networking infrastructure
LAN
SAN
fileservers
Client site #1
LAN
Client site #2
LAN
Metro/WAN
fileserver
LAN
SAN
fileservers
Silberschatz8Next Generation Information Systems
IP-based Network Storage (Cont.)IP-based Network Storage (Cont.)
IETF standards are being drafted Most popular: iSCSI and FCIP Almost all networking and storage companies are participating in these
standards Issues
Performance Reliability
Future end-to-end iSCSI;
end-to-end IP storage networking? demise of FC?
Hybrid? FC (InfiniBand) SAN islands connected over IP networks FC SANs in data centers accessed by IP networks
Silberschatz9Next Generation Information Systems
Network Storage SecurityNetwork Storage Security
Customers may not trust the storage service provider (SSP)
Storage consolidation over different customers is essential to make storage outsourcing viable. However, customers may not trust each other
Threat model
Disclosure of data to an eavesdropper intercepting communication
Disclosure of data to storage service provider (SSP) and to other customers of the SSP
Manipulation of communication by an attacker
Manipulation of data by the SSP or other customers of the SSP
Challenges
high throughput encryption (e.g., 1Gbps, 10 Gbps)
security without hindering performance
Silberschatz10Next Generation Information Systems
Multimedia Storage and Delivery IssuesMultimedia Storage and Delivery Issues
The size of some databases is enormous, especially those that are used for data mining (e.g., cash register transactions).
30 terabytes largest commercial database
Some information sources generate data at an astonishing rate (e.g., satellite images).
EOS – 1-2 terabytes per day
The BBC is planning to digitize the last 50 years of programming.
Continuous media data is voluminous:
100 minute MPEG-1 video requires 1.125GB
100 minute HDTV video requires 15GB
Continuous media data require support for QoS.
Silberschatz11Next Generation Information Systems
System Resources to be Managed for QoSSystem Resources to be Managed for QoS
Storage Server Resources
Tertiary Storage
Secondary Storage
Buffer Space
I/O Bus
I/O Bus
Processor(s)
Network
Silberschatz12Next Generation Information Systems
Research IssuesResearch Issues
Admission control Disk Scheduling Buffer Management Storage Management
data layout varying disk transfer rates disk striping meta data fault-tolerance
Tertiary storage
Silberschatz13Next Generation Information Systems
Cycle-based SchedulingCycle-based Scheduling
Let T be the length of a service cycle
Maintain a queue of requests
corresponding to a request to view a CM clip. Each request has
an associated rate ri.
For each request, a buffer is allocated of size
Requests in the queue are served in a cyclic order using double
buffering. In each cycle I:
get data from disk to buffer (I mod 2)
transfer data from the (I + 1 mod 2) buffer to the client
R R R Rn i1 2, . . . . Each
2 T ri .
Silberschatz14Next Generation Information Systems
Disk Scheduling Disk Scheduling
Request are serviced in service cycles (rounds).
In the beginning of a service cycle requests are ordered in C-SCAN order.
In the beginning of every service cycle, it is ensured that
hold. (where are the rotational delay, settle time, and seek time, respectively, and B is the buffer pool size).
The value of T is adjusted depending on the workload.
In every service cycle,
bits of data retrieved for each request.
min , offset of last retrieved - offset of last consumedT r T ri i 2
2
2
T r B
T rr
t t t T
i
i
disk
rot settle seek
t t trot settle seek, ,
Silberschatz15Next Generation Information Systems
Admissions ControlAdmissions Control
Queue is bounded by an admission control scheme
For each request, the service time for a request is estimated.
A request is admitted only if the sum of the estimated service times for all admitted requests does not exceed the duration of service cycle T.
Silberschatz16Next Generation Information Systems
Admission Control (cont.)Admission Control (cont.)
Reserve a fraction of service cycle T, say for continuous media requests.
A request (real-time, non-real-time), is admitted if
A real-time request is admitted if
Above scheme ensures both continuous and non-continuous media requests are allocated time
during a service cycle. any time during a service cycle unused by continuous media requests is
allocated to non-continuous media requests.
T ( )0 1
T rr
t t t Ti
disk
rot settle seek
2
T rr
t tnr
t t t Ti
disk
rot settlei
disk
rot settle seek
2
Silberschatz17Next Generation Information Systems
Length of TLength of T
What about the length of T?
Silberschatz18Next Generation Information Systems
Buffer Space ConstraintsBuffer Space Constraints
Assume infinite disk bandwidth
Requirements:
N
T
For a given buffer size B, the larger T, the fewer clients can be admitted.
Let B be the available buffer size
Let N be the number of admitted clients
BrTN
ii
12
Silberschatz19Next Generation Information Systems
Disk Bandwidth ConstraintsDisk Bandwidth Constraints
Assume infinite buffer space
Use C-SCAN disk scheduling
Requirements:
N
T
The larger T the larger N is
Tr
rTttNt
N
idisk
isettlerotsettle
1)(2
Silberschatz20Next Generation Information Systems
Combining Disk & Buffer ConstraintsCombining Disk & Buffer Constraints
N
T
disk constraint
buffer constraint
The optimal T is obtained by solving a quadratic equation of the disk and buffer space constraints.
Silberschatz21Next Generation Information Systems
Minimizing Response TimeMinimizing Response Time
Under some workloads (e.g., request with small such as 64 Kbps), the value of T that maximizes throughput can be high (e.g., 20 secs.).
This might yield high response times.
Solution:
maintain small T values
in order not to degrade throughput, for each request Ri data is prefetched from disk in every ki service cycles (instead of in every service cycle)
The maximum amount of data prefetched is
buffer space allocated to Ri is
ri 's
k T ri i k T ri i 1
Silberschatz22Next Generation Information Systems
Minimizing Response Time (contd.)Minimizing Response Time (contd.)
Issues:
Calculation of ki’s
Admission control:
service cycles to manage
For a request Ri, finding the least loaded service cycles
In order to reduce response time, start a new request Ri in the first possible service cycle and then move it incrementally to the selected least loaded service cycle.
This solution also provides higher throughput for workloads with small ri’s
lcm k k kn1 2, ,...,
u k l l
lcm k k k
ki in
i
,, ,...
0 1 2 1
Silberschatz23Next Generation Information Systems
Querying Huge Data SetsQuerying Huge Data Sets
Give me all objects (e.g., images) that look like this.
If we are dealing with PetaBytes of data, this may take days or weeks.
One solution is to capture “meta data” information about the stored objects as the objects are stored in the database.
Querying is done against the “meta data”.
Major issue – nature of the meta data.
Another solution is to provide support for “approximate answers”.
Silberschatz24Next Generation Information Systems
Providing Approximate AnswersProviding Approximate Answers
Traditional databases provide exact answers to queries, but...
In massive data environments, can take minutes to hours due to disk I/Os
In distributed environments, data may be remote or currently unavailable
In real-time environments, even single I/O may be too slow
Silberschatz25Next Generation Information Systems
Providing Approximate Answers (Cont.)Providing Approximate Answers (Cont.)
Trade-off accuracy for performance: e.g., 30 minutes for exact answer vs. 3 seconds for an approximate answer with 5% error
Examples where fast approximate answers are preferred:
drill-down query sequence in data mining: searching for the “interesting” queries
tentative answer when base data unavailable
leading digits suffice (e.g., 3.5 million vs. 3.512 million)
Can proceed to the exact answer, if desired
Silberschatz26Next Generation Information Systems
The AQUA SystemThe AQUA System
Aqua precomputes and maintains small synopses of the data
Aqua provides approximate answers with accuracy guarantees, by rewriting user queries as depicted above
Approximate Query Engine for data warehousing
SQLQuery Q
Network
SQLQuery Q’
Result (w/ error bounds)HTML
XMLBrowserExcel
(Slow) Query on the warehouse data
(Fast) Query onthe Aqua synopses
DBMSfor
Large DataWarehouse
Aquasynopses
Silberschatz27Next Generation Information Systems
Aqua Synopses: The Key IngredientAqua Synopses: The Key Ingredient
First system to provide fast, highly-accurate approximate answersfor a broad class of queries arising in data warehousing scenarios
(Small) Surrogate for the actual data.
Must accurately estimate the exact answers from the synopses.
As data is updated, must keep synopses up-to-date.
We developed new techniques for summarizing data, and for adapting these summaries to changes in
both the data and the query mix.
Silberschatz28Next Generation Information Systems
Private information
Only the data subject has a right to it.
Public information
Everyone has a right to it.
Sensitive information
“Legitimate users” have a right to it.
It can harm data subjects, data owners, or data users if it is misused.
Private, Public, and Sensitive Information in a Wired WorldPrivate, Public, and Sensitive Information in a Wired World
Silberschatz29Next Generation Information Systems
“You have zero privacy. Get over it.” – Scott McNealy, 1999
Changes in technology are making privacy harder.
increased use of computers and networks
reduced cost for data storage
increased ability to process large amounts of data
Becoming more critical as public awareness, potential misuse, and conflicting goals increase.
Erosion of PrivacyErosion of Privacy
Silberschatz30Next Generation Information Systems
““Public Records” in the Internet AgePublic Records” in the Internet Age Depending on State and Federal law, “public records” can include:
Birth, death, marriage, and divorce records
Court documents and arrest warrants (including those who were acquitted)
Property ownership and tax-compliance records
Driver’s license information
Occupational certification
They are, by definition, “open to inspection by any person.” Traditionally: Many public records were “practically obscure.”
Stored at the local level on hard-to-search media, e.g., paper, microfiche, or offline computer disks.
Not often accurately and usefully indexed.
Now: More and more public records, especially Federal records, are being put on public web pages in standard, searchable formats.
Issues
Should some Internet-accessible public records be only conditionally accessible?
Should data subjects have more control?
Should data collectors be legally obligated to correct mistakes?
Silberschatz31Next Generation Information Systems
Examples of Sensitive InformationExamples of Sensitive Information
Copyright works
Certain financial information
Health Information
Question: Should some information now in “public records” be reclassified as “sensitive”?
Silberschatz32Next Generation Information Systems
State of TechnologyState of Technology
We have the ability (if not always the will) to prevent improper access to private information. Encryption is very helpful here.
We have little or no ability to prevent improper use of sensitive information. Encryption is less helpful here.
Silberschatz33Next Generation Information Systems
PORTIA: Privacy, Obligations, and Rights in Technology of Information Assessment
Large ITR grant from NSF. It is five-year multi-institutional, multi-disciplinary, multi-modal research project on end-to-end handling of sensitive information in a wired world
Researchers from: Stanford: Dan Boneh, Hector Garcia-Molina, John Mitchell, Rajeev Motwani
Yale: Joan Feigenbaum, Ravi Kennan, Avi Silberschatz
University of NM: Stephanie Forrest
Stevens Institute: Rebecca Wright
NYU: Helen Nissenbaum
Plus participation by software industry, key user communities, advocacy organizations, and non-CS academics.
http://crypto.stanford.edu/portia
The PORTIA ProjectThe PORTIA Project
Silberschatz34Next Generation Information Systems
PORTIA GoalsPORTIA Goals
Produce a next generation of technology for handling sensitive information that is qualitatively better than the current generation’s.
Enable end-to-end handling of sensitive information over the course of its lifetime.
Formulate an effective conceptual framework for policy making and philosophical inquiry into the rights and responsibilities of data subjects, data owners, and data users.
Silberschatz35Next Generation Information Systems
Five Major Research ThemesFive Major Research Themes
Privacy-preserving data mining and privacy-preserving surveillance
Database policy enforcement tools
Sensitive data in P2P systems
Policy-enforcement tools for database systems
Identity theft and identity privacy
Silberschatz36Next Generation Information Systems
Privacy and Security on the WebPrivacy and Security on the Web
Privacy concerns: providing the same user name (or e-mail) allows creation of comprehensive dossiers; providing your e-mail address reveals your true identity
Security concerns: using the same user name and password at multiple web sites enables password from insecure sites to be used to help determine password at secure sites
Junk e-mail: giving your e-mail address makes you susceptible to junk e-mail
Inconvenience: people have to invent and remember multiple user names and passwords
An increasing number of web sites require user registration, which enables personalized services. This however, raises some concerns.
Silberschatz37Next Generation Information Systems
The LPWA systemThe LPWA system
Arun Netravali
quote.com
my.yahoo.com
expedia
axyz, x45t
Czar, 4rt5
Boss, 56yh
LPWA
A tool for combining privacy, security and convenience . Enables personalized services by generating consistent, untraceable aliases for use on the web.
Silberschatz38Next Generation Information Systems
The LPWA ProxyThe LPWA Proxy
Privacy: web sites cannot collude to create dossiers
Security: different passwords for different web sites
Convenience: no need to remember multiple user names and passwords
Alias e-mail addresses support communication from web sites back to users and allow control of junk e-mail
Properties
Silberschatz39Next Generation Information Systems
Generation of AliasesGeneration of Aliases
At the first invocation of the LPWA proxy
User provides:
user’s e-mail address id
a secret S (random string)
Registering
User types \u, \p, \@ for username, password and e-mail address, resp.
LPWA uses id , S , and the domain-name of the web-site being visited to compute the users’ alias
Repeat Visits
User again types \u and \p for username and password
LPWA computes the same alias-username/password.
Silberschatz40Next Generation Information Systems
Network System ChallengesNetwork System Challenges
Next-generation network -- will be simpler, lower cost, and will provide customized services for consumers and businesses
Converged networks -- will incorporate the best features of today’s voice and data networks
Network management – automate many of the functions that are currently done by people.
Silberschatz41Next Generation Information Systems
Next-generation networksNext-generation networks
Point-to-point optical links
Circuit switched, centrally managed
Separate networks for voice, data, video
Fixed, closed
Next-Generation NetworksNext-Generation Networks
OpticalLayer
All-Optical mesh backbone Packet switched, distributed Unified network for customized multimedia
services Open APIs for ISV services
ServiceLayer
ElectronicLayer
Yesterday’s NetworksYesterday’s Networks
ADM
ADM
ADM
ADM
ADM
ADM
ADM
ADM
ADM
ADM
ADM
ADM
ADM
ADM
ADM
ADM
DCS
DCS
DCS
DCS
NM NMNM
5E
5E
5E
5E
Voice Data VideoCLEC
LocalISP
PSTN
NM
Silberschatz42Next Generation Information Systems
Next generation converged networksNext generation converged networks
DataDataNetworkNetwork
VoiceVoiceNetworkNetwork
NEXTGENERATION
NETWORK
NEXTNEXTGENERATIONGENERATION
NETWORKNETWORK
High
BandwidthHigh
Bandwidth
IP Protocol
IP Protocol
Low Per-Bit Cost
Low Per-Bit Cost
Rapid Evolution
Rapid Evolution
EfficiencyEfficiency
Reliability
ReliabilityScalability
ScalabilityUbiquityUbiquity
Availability
AvailabilityEase of Use
Ease of Use
Converged ApplicationsConverged Applications
Silberschatz43Next Generation Information Systems
Network Management ChallengesNetwork Management Challenges Managing today’s networks is extremely challenging due to their increased
complexity
Networks contain hundreds of network elements and thousands of physical links
Network elements follow a multitude of protocols (e.g., BGP, OSPF, ISIS, RIP)
Networks are heterogeneous and contain equipment from multiple different vendors
Manually managing networks
is tedious, labor-intensive, time-consuming and error-prone
is not cost-effective due to severe shortages of and high costs of skilled labor
Critical need for software tools that automate network management tasks
Silberschatz44Next Generation Information Systems
Next-Generation network management software functionality includes
Keeping track of network inventory and topology
Monitoring network link bandwidth and latency
Storing, analyzing and reporting network performance data
Load balancing by appropriately configuring network parameters
Automating and simplifying network configuration tasks (e.g., VPNs)
Value Proposition:
Ease management and configuration of ISP networks
Optimize utilization of network resources
Goal: Make networks self-administering and self-tuning
Next-Generation Network ManagementNext-Generation Network Management
Silberschatz45Next Generation Information Systems
“How do you want it – the crystalmumbo-jumbo or statistical probability?”
There are many approaches to predicting the futureThere are many approaches to predicting the future
I think there is a world market for maybe five computers.(Thomas Watson, 1943)
Video won’t be able to hold onto any market it captures after the first six months. People will soon get tired of staring at a plywood box every night.(Darryl F. Zanuck, head of 20th Century Fox, 1946)
640K ought to be enough for anybody. (Bill Gates, 1981)
Silberschatz46Next Generation Information Systems
1Five predictions for the new millenniumFive predictions for the new millennium
A mega-network of networks will enfold the earth in a communications “skin” with ubiquitous connectivity and enormous bandwidth.
Silberschatz47Next Generation Information Systems
2Five predictions for the new millenniumFive predictions for the new millennium
By 2010, there will be so many interconnected devices that the volume of “infrachatter” among communicating machines will surpass communications among humans.
Silberschatz48Next Generation Information Systems
3Five predictions for the new millenniumFive predictions for the new millennium
Bandwidth will be toocheap to meter.
$
Silberschatz49Next Generation Information Systems
4Five predictions for the new millenniumFive predictions for the new millennium
Consumers and businesses will have a vast variety of individualized, custom services -- written by countless programmers on an open mega-network.
Silberschatz50Next Generation Information Systems
5Virtual reality will become a reality and will transform the way people live and conduct their business. This lecture will be given from the comfort of my office without me having to travel.
Five predictions for the new millenniumFive predictions for the new millennium