Title |
Out of Window Collisions in QNX4 |
Ref. No. |
QNX.000009556 |
Category(ies) |
Network |
Issue |
When our applications run, the Out of Window(OOW) Collisions (as reported by netinfo -l) exactly matches the number of frames I've run. For example, if I let the application run for 8 frames, the OOW count also increases by 8. What could we be doing in software that would cause this? What is an Out-of-Window collision, and why is it bad?
|
Solution |
Out of Window Collisions are caused by hardware, not by software. You probably have bad wiring/adapters/hubs. Check these all out as well as your connections.
A collision happens (and is detected) if two (or more) NICs are trying to transmit on the wire at the same time. Normally, both detect this, halt their TX, report a tx collision, wait a little while, then try again. (The "wait a little while" is actually an exponential backoff -- first collision they will wait for period n, second time 2n, third time 4n, etc.) This is how bandwidth is allocated on an ethernet. There is a certain period from the start of tx of a packet in which the collision is supposed to be detected (the "window"), this corresponds to the amount of time it takes to tx a minimum length ethernet packet, if a collision is detected later in the tx than this window, an Out of Window collision is logged. On a properly configured ethernet these should never happen.
Why is this bad?
Well, if you have a large packet having an OOW collision with a minimum sized packet, that small packet may be destroyed by the collision (collisions destroy both packets) without the txing NIC knowing the collision has happened -- this means a lost packet. The txing NIC won't re-try the tx, because it won't know the collision has occured. This has a couple of effects, the first is that the NIC reporting the OOW collisions is often not the one at fault -- it detected the problem but may not be causing the problem, making this a difficult fault to track down. Also, QNX protocol control packets (ACKs, NACKs, etc) tend to be minimum size packets, so losing these can cause definite problems for QNX networking, generally loss of throughput, sometimes dropped VCs or ends of a VC getting out of synch. The usual causes of OOW collisions are one or more bad NICs, bad cabling, too long a cable run, a bad hub or bad port on a hub, or too many hubs between machines (or before the termination of the collision domain). Switches (which terminate collision domains) are good things, they can allow a larger network w/o OOW issues and better throughput on each segment since packets from one segment don't need to be copied to other segments, and won't collide. |
|