• Client request to execute a function on a server
    • Client sends message with parameters and waits for reply
    • Server parses message, performs operation, puts result into a message, and returns it to the client
  • Client needs connection to server, server must implement required function
  • client/server implementation autogenerated from procedure spec, e.g. Protobuf for serializing/deserializing code
  • gRPC is a specific implementation of this concept

serialization

  • data is serialized to convert complex data structures into a flat format, ensuring data integrity, efficient transmission, and making data hardware/OS-independent

failure handling

  • failure handling

    • how does either side know if a message is dropped?
    • what if client or server crashes?
    • what if server crashes after performing operation but before replying?
    • what if server appears to crash but is just slow?
    • what if network partitions?
    • ^ some are handled by TCP but what if a TCP socket fails? or we’re using a diff protocol?
  • how do we know what knowledge others have?

    • how to reason about the state of the system based on the messages we have?

fault model (from CSE452)

  • Async fail stop nodes
    • async: computer may be arbitrarily slow to respond
    • fail stop: node fails by stopping
    • never sends garbage data, or violates protocol, or forgets state
    • alternative: allow some nodes to behave arbitrarily
  • network is async and may delay, drop, reorder, or duplicate messages
    • network could arbitrarily delay a message while still being correct
    • does not corrupt messages
  • network may partition some nodes from each other
    • a group of nodes can’t talk to another group
  • network is commutative and transitive
    • if A can message B, B can message A
    • if A can message B and B to C, A can send to C

approaches

client timeout / retransmission

  • when client timer expires
    • do we know if original message was delivered, in flight, or dropped?
    • resend - allow duplication to be handled by the server
  • when server gets a request already handled
    • reply may have been dropped send another copy of the reply
    • client may receive multiple replies to the same Remote Procedure Call (RPC)
  • network could reorder replies
    • client needs to distinguish which replies go with which requests unique message IDs

unique message IDs

  • distinguish replies to diff RPCs
  • include message ID in every request/reply
  • when client retransmits, more effective to use same ID as original request

RPC Semantics

  • at least once
  • at most once
  • exactly once
    • server remembers what they’ve sent, and can resend exact same reply if client didn’t receive it
    • client keeps resending request on timeout
  • TCP handles a lot, but it could timeout and client reconnects
  • end to end principle

When can a server safely discard old RPCs?

  • never?
  • server has timeout to resend un-acked replies?
  • per-client RPC seq numbers, so every RPC includes “client Q has received all replies X”?
  • only one outstanding RPC at a time from each client?
    • arrival of seq+1 allows server to discard messages seq from that client
    • consideration: duplicates of old requests may arrive late and out of order