- Client request to execute a function on a server
- Client sends message with parameters and waits for reply
- Server parses message, performs operation, puts result into a message, and returns it to the client
- Client needs connection to server, server must implement required function
- client/server implementation autogenerated from procedure spec, e.g. Protobuf for serializing/deserializing code
- gRPC is a specific implementation of this concept
serialization
- data is serialized to convert complex data structures into a flat format, ensuring data integrity, efficient transmission, and making data hardware/OS-independent
failure handling
-
failure handling
- how does either side know if a message is dropped?
- what if client or server crashes?
- what if server crashes after performing operation but before replying?
- what if server appears to crash but is just slow?
- what if network partitions?
- ^ some are handled by TCP but what if a TCP socket fails? or we’re using a diff protocol?
-
how do we know what knowledge others have?
- how to reason about the state of the system based on the messages we have?
fault model (from CSE452)
- Async fail stop nodes
- async: computer may be arbitrarily slow to respond
- fail stop: node fails by stopping
- never sends garbage data, or violates protocol, or forgets state
- alternative: allow some nodes to behave arbitrarily
- network is async and may delay, drop, reorder, or duplicate messages
- network could arbitrarily delay a message while still being correct
- does not corrupt messages
- network may partition some nodes from each other
- a group of nodes can’t talk to another group
- network is commutative and transitive
- if A can message B, B can message A
- if A can message B and B to C, A can send to C
approaches
client timeout / retransmission
- when client timer expires
- do we know if original message was delivered, in flight, or dropped?
- resend - allow duplication to be handled by the server
- when server gets a request already handled
- reply may have been dropped → send another copy of the reply
- → client may receive multiple replies to the same Remote Procedure Call (RPC)
- network could reorder replies
- client needs to distinguish which replies go with which requests → unique message IDs
unique message IDs
- distinguish replies to diff RPCs
- include message ID in every request/reply
- when client retransmits, more effective to use same ID as original request
RPC Semantics
- at least once
- at most once
- exactly once
- server remembers what they’ve sent, and can resend exact same reply if client didn’t receive it
- client keeps resending request on timeout
- TCP handles a lot, but it could timeout and client reconnects
- end to end principle
When can a server safely discard old RPCs?
- never?
- server has timeout to resend un-acked replies?
- per-client RPC seq numbers, so every RPC includes “client Q has received all replies ⇐ X”?
- only one outstanding RPC at a time from each client?
- arrival of seq+1 allows server to discard messages ⇐seq from that client
- consideration: duplicates of old requests may arrive late and out of order