14
Reliable Client-Server Communication

Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

Embed Size (px)

DESCRIPTION

Reliable Communication Observation: Most of this work assumes point- to-point communication –TCP reliable –Mask omission failure (loss of messages) –What if TCP connection breaks? High-level communication facilities.

Citation preview

Page 1: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

Reliable Client-Server Communication

Page 2: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

Reliable Communication• So far: Concentrated on process resilience (by

means of process groups).

• What about reliable communication channels?• Error detection:

– Framing of packets to allow for bit error detection– Use of frame numbering to detect packet loss

• Error correction:– Add so much redundancy that corrupted packets can

be automatically corrected– Request retransmission of lost, or last N packets

Page 3: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

Reliable Communication• Observation: Most of this work assumes point-

to-point communication– TCP reliable– Mask omission failure (loss of messages)– What if TCP connection breaks?

• High-level communication facilities.

Page 4: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

Traditional RPC

Principle of RPC between a client and server program.

Page 5: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

Remote Procedure Calls

1. The client procedure calls the client stub in the normal way.2. The client stub builds a message and calls the local operating

system.3. The client’s OS sends the message to the remote OS.4. The remote OS gives the message to the server stub.5. The server stub unpacks the parameters and calls the server.6. The server does the work and returns the result to the stub.7. The server stub packs it in a message and calls its local OS.8. The server’s OS sends the message to the client’s OS.9. The client’s OS gives the message to the client stub.10. The stub unpacks the result and returns to the client.

• A remote procedure call occurs in the following steps:

Page 6: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

RPC Failures

• Five different classes of failures.

1. Can’t find server.2. Request message lost.3. Server crashes after receiving request.4. Reply message is lost.5. Client crashes after receiving request.

Page 7: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

Methods

• 1: no server -- report back to client– Raise an exception.– Lost transparency.

• 2: Lost Request -- resend message– Start a timer, send another.– Or is the server down?

Page 8: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

3: Server Crashes

• Harder issue: Server can crash in two different points.– Client can treat differently if known which case.– But client only knows no rep, how it tell and act accordingly.

• Solution?

Page 9: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

Server Crashes• At least once: The server guarantees it will carry out an

operation at least once, no matter what. So keep trying until a reply comes back.

• At most once: The server guarantees it will carry out an operation at most once. So report failure immediately.

• No general solution for exactly once.

• Consider a print server that crashes and comes back up.– Client sends a message, gets an ack.– Server sends a completion message either right before or right

after.– If crash, client can never reissue, always reissue, only reissue if no

ack, only reissue if there is an ack.

Page 10: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

Print Server

• Three events that can happen at the server:

1. Send the completion message (M).

2. Print the text (P).3. Crash (C).

Page 11: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

Server Crashes• These events can occur in six different orderings:

1.M →P →C: A crash occurs after sending the completion message and printing the text.

2.M →C (→P): A crash happens after sending the completion message, but before the text could be printed.

3.P →M →C: A crash occurs after sending the completion message and printing the text.

4.P→C(→M): The text printed, after which a crash occurs before the completion message could be sent.

5.C (→P →M): A crash happens before the server could do anything.

6.C (→M →P): A crash happens before the server could do anything.

Page 12: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

Server Crashes

• Server crashes and comes back up.

Page 13: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

4: reply lost

• Detecting lost replies can be hard, because it can also be that the server had crashed. You don’t know whether the server has carried out the operation

• Solution: – None, except that you can try to make your operations

idempotent: repeatable without any harm done if it happened to be carried out before.

Page 14: Reliable Client-Server Communication. Reliable Communication So far: Concentrated on process resilience (by means of process groups). What about reliable

5: client crashes• Problem: The server is doing work and holding

resources for nothing (called doing an orphan computation).– Orphan is killed (or rolled back) by client when it

reboots– Broadcast new epoch number when recovering ⇒

servers kill orphans– Require computations to complete in a T time units.

Old ones are simply removed.

• Question: What’s the rolling back for?