CompSci 510: Graduate OS Landon Cox January 15, 2016

Embed Size (px)

DESCRIPTION

About the TA Teaching Assistant Animesh Srivastava Office hours to be announced

Citation preview

CompSci 510: Graduate OS Landon Cox January 15, 2016 About me Background BS in Math, CS: Duke 99 PhD in CSE: Michigan 05 Research interests OS, distributed systems, privacy, and security Why am I a professor? Research and teaching is a lot of fun Its the family business (dad is a law professor) About the TA Teaching Assistant Animesh Srivastava Office hours to be announced About CompSci 510 CompSci 510 is about operating-systems research You will read a lot of old and new papers You will perform a semester-long research project What CompSci 510 is not about Learning basic operating systems concepts Will do some review, but you should know this material already Who should take it? Graduate students and undergrads who enjoyed CompSci 310 First, a little philosophy What is computer science? Structure and Interpretation of Computer Programs Harold Abelson and Gerald Jay Sussman Longtime book for MITs first course in CS Underlying our approach to this subject is our conviction that computer science is not a science and that its significance has little to do with computers Mathematics provides a framework for dealing precisely with notions of what is. Computation provides a framework for dealing precisely with notions of how to. How we got here constructs for describing computation i.e., learn how to program physical constructs for realizing computation i.e., learn about hardware design Then we branch Theory of computation (the what is of how to?) Artificial intelligence Design of computer systems And branch again (within systems) OS, databases, architecture, software engineering, compilers, networks, security, reliability Across systems categories Many common, important problems Fault tolerance Coordination of concurrent activities Geo. separated but linked data Large-scale data sets Protection from mistakes and attacks Interactions with many entities All of these problems lead to complexity Complexity How do we control complexity in a system? Build abstractions that hide unimportant details Establish conventional interfaces Enable composition of simple, well-defined components None of this is specific to computer systems Just principles of good engineering Civil engineering, urban planning, mechanical engineering, aviation and space flight, electrical engineering, ecology and political science Two roles of the OS OS as illusionist Abstractions, hardware reality Hardware OS Applications Files, web Virtual Memory Threads Disk, NIC Page Tables Atomic Test/Set Hardware Programs OS as government Main government functions Resource manager (who gets what and when) Lock acquisition Processes Disk requests Page eviction Isolation and security (law and order) Access control Kernel bit Authentication Two roles of the OS Abstractions Government Modularity Simplicity Hide messy reality Law and order Fair, efficient allocation Source of trust Goals for each role? Two roles of the OS Abstractions Government Modularity Simplicity Hide messy reality Law and order Fair, efficient allocation Source of trust How does OS enforce modularity? Two roles of the OS Abstractions Government Modularity Simplicity Hide messy reality Law and order Fair, efficient allocation Source of trust How does OS ensure fair allocation? Two roles of the OS Abstractions Government Modularity Simplicity Hide messy reality Law and order Fair, efficient allocation Source of trust What is the basis for trust? Why do we trust the government? Key questions for semester What are the right abstractions? How should we enforce modularity? How do we ensure fair, efficient resource allocation? Is there a reasonable basis for trust? We will read a lot of papers this semester Useful to think about them in terms of these questions Sometimes goals are in tension (e.g., modularity vs. efficiency) Good papers explain the trade-offs Course administration Syllabus is online Reading list/schedule is subject to change In general, two papers per lecture Grade composition Paper presentations and summaries (5%) Programming projects (20%) Research project (25%) In-class midterm (25%) In-class final exam (25%) Paper presentations & summaries Post summaries to Piazza Summaries will be available to all Due before class Summaries must include Two positives Two negatives Two questions Presentations: 2 nd half of semester Programming Projects Done in groups of two or three Registration form will be up next week Register on course website by January 22 Two small programming projects 1. Concurrency and synchronization (50%) 2. File systems and storage (50%) Research Projects Done in groups of two or three Five phases 1. Form groups (due after concurrency project) 2. Write proposal (20% of project grade) 3. Write status report (10% of pg) 4. Write final report (60% of pg) 5. Give presentation (10% of pg) Exams Both will be in-class Will cover topics covered to that point Often will ask about composing systems How would SimOS run on top of LFS? Requires a deep understanding of both Syllabus: project collaboration Okay between groups Programming syntax, course concepts What does this part of the project specification mean? Not okay between groups Design/writing of anothers program Includes prior class solutions and Piazza How do I do this part of the handout? Dont post details of your solution to Piazza If in doubt, ask me Thoughts on cheating Cheating is a form of laziness. I like to think that cheating happens elsewhere. Duke students work hard and dont cut corners. Quick review of OS Common themes in computer systems Atomicity Fault tolerance Protection and trust Should be familiar with each concept Will do a quick review of each Atomicity What does it mean for an operation to be atomic? The operation occurs without interruption No interleaving between atomic operations Goal: high-level atomic ops from low-level Which CPU operations are atomic? Load, store, test-and-set, interrupt enable/disable Used these to implement locks, CVs, and semaphores Synchronization layers Higher-level synchronization (reader-writer functions) Hardware (load/store, interrupt enable/disable, test&set) High-level synchronization (locks, monitors, semaphores) Concurrent program Atomicity What does it mean for an operation to be atomic? The operation occurs without interruption No interleaving between atomic operations Goal: high-level atomic ops from low-level Which network operations are atomic? Send/receive Ethernet frame Used these to implement byte streams Protocol layers NFS (files) HTTP (web) SMTP ( ) SSH (login) RPC TCPUDP IP EthernetATMPPP Atomicity What does it mean for an operation to be atomic? The operation occurs without interruption No interleaving between atomic operations Goal: high-level atomic ops from low-level Which storage operations are atomic? Read/write a disk block Used these to implement transactions Storage layers Hardware (Block read/write) File system (open, close, read, write) User program Database (x-action begin, commit) Fault tolerance Dealing with failure In what ways can the network fail? Messages can be reordered, dropped, and corrupted How do we deal with re-ordered messages? Assign a sequence number to each message What is a connection? A sequence of related messages Applications determine which messages are related Dealing with failure In what ways can the network fail? Messages can be reordered, dropped, and corrupted How do we deal with dropped messages? Send the message again How to detect that a message was dropped? Require the receiver to send an acknowledgement (ACK) Dealing with failure In what ways can the network fail? Messages can be reordered, dropped, and corrupted Possible reasons we didnt receive an ACK? Message was delayed or dropped ACK was delayed or dropped What if we assume a delay when there was a drop? Well wait forever, kind of like a deadlock What if we assume a drop when there was a delay? Well send duplicate messages Dealing with failure In what ways can the network fail? Messages can be reordered, dropped, and corrupted How can we handle duplicate messages? Just drop duplicates using sequence number What can happen if we have too many duplicates? Can create crippling network congestion If network congestion was causing delays, creates positive feedback How can we limit or eliminate positive feedback loop? If you start to see dropped messages, send at a slower rate (TCP) Dealing with failure Processes in a distributed system can fail too Bugs can crash processes Hardware failures can bring down machines It is easiest to think about fail-stop failures Implicit assumption was that running == correct Can you think of scenarios in which this isnt the case? If a process has a bug that causes incorrect behavior If a process becomes compromised This larger class of failures is called Byzantine faults Famous Byzantine Fault Tolerance result Can only ensure correctness if fewer than 1/3 of processes are faulty ClientsServer Problems with this model? Performance of accessing over the network Single point of failure (availability) Performance bottleneck of server (scalability) 1 Performance of accessing over the network How can we make this faster? Caching! ClientsServer ClientsServer 1 Performance of accessing over the network What should happen if I modify my copy? S=v ClientsServer 1 Performance of accessing over the network What should happen if I modify my copy? S=v S=v ClientsServer 1 Performance of accessing over the network What should happen if I modify my copy? Could update other copies S=v ClientsServer 1 Performance of accessing over the network What should happen if I modify my copy? Could update other copies Could invalidate other copies S=v X ClientsServer 1 Performance of accessing over the network What should happen if two people modify concurrently? Let server pick a winner (e.g., last writer wins) Server serializes updates (assigns a canonical order) S=w S=v S=x ClientsServer What can we do about availability and scalability? ClientsServer What can we do about availability and scalability? Add more servers Now we have to keep servers consistent too! Introduces lots of issues for large-scale web services S=v ClientsServer Where should writes go? (defines write set) Where should reads go? (defines read set) S=v Say reads and writes can go to 1 of 3 servers. What can happen? Good and bad? S=v writer reader ClientsServer ClientsServer Say reads and writes can go to 1 of 3 servers. What can happen? Good and bad? Good: fast reads and writes Bad: readers can get stale data (copies eventually converge via async gossiping) S=v S=v S=v writer reader ClientsServer Say reads and writes go to one server. What about availability? How many failures can we tolerate? S=v S=v S=v writer reader ClientsServer Say reads and writes go to one server. What about availability? How many failures can we tolerate? Reads/writes can tolerate 1 or 2 failures S=v S=v S=v writer reader Say reads come from 2 and writes go to 1. What can happen? Good and bad? S=v writer reader ClientsServer ClientsServer Say reads come from 2 and writes go to 1. What can happen? Good and bad? Writes are still fast, reads are slower Readers can still get stale data S=v S=v S=v writer reader ClientsServer Say reads come from 2 and writes go to 1. What about availability? How many failures can we tolerate? S=v S=v S=v writer reader ClientsServer Say reads come from 2 and writes go to 1. What about availability? How many failures can we tolerate? Reads can tolerate one failure, but not two Writes can tolerate one or two failures S=v S=v S=v writer reader Say reads come from 2 and writes go to 2. What can happen? Good and bad? S=v writer reader ClientsServer ClientsServer Say reads come from 2 and writes go to 2. What can happen? Good and bad? Writes are slower, reads are slower Readers always get latest copy S=v S=v S=v writer reader ClientsServer Say reads come from 2 and writes go to 2. Why are readers guaranteed to get latest copy? S=v S=v S=v writer reader ClientsServer Say reads come from 2 and writes go to 2. Why are readers guaranteed to get latest copy? Size of read set + size of write set > # replicas Guarantees overlap between two sets S=v S=v S=v writer reader ClientsServer Say reads come from 2 and writes go to 2. What about availability? How many failures can we tolerate? S=v S=v S=v writer reader ClientsServer Say reads come from 2 and writes go to 2. What about availability? How many failures can we tolerate? Availability suffers (can tolerate 1 failure, not 2) S=v S=v S=v writer reader Say reads come from 2 and writes go to 2. What about availability? How many failures can we tolerate? Availability suffers (can tolerate 1 failure, not 2) S=v S=v S=v writer reader ClientsServer Protection and trust Define trust Expectation of correct behavior For anything to be useful, you have to trust something Need to protect yourself from components you dont trust How are processes on same machine protected from each other? Separate address spaces, managed by the kernel Controlled transitions to/from the kernel via system calls Everyone trusts the kernel How do processes on separate machines protect themselves? Have to rely on secure communication, correct protocols Confidentiality Authentication Freshness Symmetric key encryption Keys E-key = d-key (hence symmetric) Sender and receiver know the key Nobody else knows it Sometimes called the secret key Symmetric key algorithms are fast EDS Public key encryption Keys E-key d-key Typically, encrypt() = decrypt () = crypt () ED EncryptDecryptCrypt Authenticating SSL public keys I want to send my CCN to e-trade No one but e-trade should see my message E-trade wants to know its really me Use Secure Socket Layer (SSL) Authenticating e-trade E-trade has a public key How do you learn this public key? Web solution: someone else vouches for key Often called a certification authority (CA) E.g., Verisign E-trade sends you their public key Public key is digitally signed by Verisign {e-trades public key is Etrade-public} verisign-private Authenticating e-trade E-trade has a public key Decrypt using Verisigns public key I see that Verisign endorses Etrade-public Once talking to e-trade, establish session key {e-trades public key is Etrade-public} verisign-private {use session key K-sec} Etrade-public Authenticating e-trade Once talking to e-trade, establish session key How do you know Verisigns public key? Hard-coded into Firefox/IE binary How to trust Firefox binary? Downloaded from firefox.com (possibly over SSL) Without SSL, maybe downloaded with included cryptographic hash Why trust this? Went out and verified the hash, got the hash 3 places, Certificate authorities (CAs) Say we get the right key Why do we trust the CAs? Because we have to trust something Verasign in 2001 Issued cert to someone pretending to be Microsoft Mozilla has list of 36 root CAs Indirectly trusts Etilisat (UAE) via Verizon Etilisat installed spyware on 100k Blackberries Who controls the CAs? What if they are compromised? By government? By hackers? Lots of interesting questions Next week Review the basics (things you should know) Concurrency/synchronization Address spaces Storage After that well start reading papers Any questions?