MsgSend*() functions
Normally, the MsgSend*() functions return EBADF or ESRCH when a connection is stale or closed on the server end (e.g., because the server dies). In many cases, the servers themselves return (e.g., they're restarted) and begin to offer the services properly almost immediately (in an HA scenario). Rather than merely terminate the message transmission with an error, in some cases it might be possible to perform recovery and continue with the message transmission.
The HA library functions that cover
all the
MsgSend*() varieties are designed to do exactly
this. When a specific invocation of one of the
MsgSend*() functions fails, a client-provided
recovery function is called. This recovery function can
attempt to reestablish the connection and return control to
the HA library's MsgSend*() function. As long as
the connection ID returned by the recovery function is the
same as the old connection ID (which in many cases is easy
to ensure via close/open/dup2() sequences), then
the MsgSend*() functions can now attempt to retransmit the
data.
If at any point the errors returned by MsgSend*() are anything other than EBADF or ESRCH, these errors are propagated back to the client. Note also that if the connection ID isn't an HA-aware connection ID, or if the client hasn't provided a recovery function or that function can't re-obtain the same connection ID, then the error is allowed to propagate back to the client to handle in whatever way it likes.
Clients can change their recovery functions. And since
clients can also pass around
recovery/connection
information (which in
turn is passed by the HA library to the recovery function),
clients can construct complex recovery mechanisms that can
be modified dynamically.
The client-side recovery library lets clients reconstruct the state required to continue the message transmission after reconnecting to either the same server or to a different server. The client is responsible for determining what constitutes the state that must be reconstructed and for performing this appropriately while the recovery function is called.
