Home
Developer Resources
QNX RTOS v4
QNX RTOS v4 Knowledge Base

QNX RTOS v4 Knowledge Base

Foundry27
Foundry27
QNX RTOS v4 project
Resources

QNX RTOS v4 Knowledge Base

Title Proc: lost reply accross net: freeing reply...
Ref. No. QNX.000006090
Category(ies) Network
Issue Machines start up and after loggin in they get a "Proc: lost reply accross net: freeing reply..."

Systems were working, then a day later the error started to occur.  What causes this error?
Solution This is an error returned by Proc, after it determines that a remote reply() to a local receive() gets "lost" accross the network.

Basicly what happens is a process (client) called Send() accross the network, the other process (server) called Receive() and the sending process (client) becomes REPLY blocked.  The server receives the informaiton and processes the data received, then calls Reply(), which is non-blocking.  Reply can report success/failure... but once the Reply() is called, there is no way to tell the reply() that there was an error and the reply was lost.  As far as the server is concerned (called the reply()) everything went fine. 

The reply gets "lost" due to network failures (OOW collision, the number of retries exhausted, other hw failure, ...). 

There is NO WAY to report this to the server (program that called Reply()), so it can't Reply() again, it just continues operating as normal.  Yet, the Send()ing program (client) can't be un-blocked until it gets the lost Reply.  So, one end thinks it is waiting for a Reply, and the other thinks it is done.  (On earlier versions of the OS, things just stayed in this state.)

Proc/Net on the client (REPLY blocked) detects this lapse in protocol, determines that a Reply was lost, by polling the remote process and checking it's state.  If the remote process's state has changed to READY and is quiet, it is determined that the reply was lost accross the network.

Proc then tears down the vc between the two systems and unblocks the Send()ing program (client), but also logs an error.  The Send() should also be returned with an error as well.  The reported local pid is of the vc.

A solution to this problem is to try relocating the machines on the hub, and try different network cables and cards as there is more than likly a hardware problem. 

Check the output of netinfo and traceinfo to find more details and/or more errors that would help determine if there is a problem with the cards.