[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]

Re: conserver freezes when console presents binary data?

Bryan Stansell bryan@conserver.com
Wed, 14 Apr 2004 16:14:23 -0700 (PDT)


On Wed, Apr 14, 2004 at 10:31:39AM -0400, nathan r. hruby wrote:
> Hi!
> 
> We seem to be having this problem concerning binary data shoved out a
> console and though conserver making conserver freeze.

well, that's really not supposed to happen.  i've even done some
(minimal) testing doing xmodem transfers to and from consoles (local
shell commands) using the '^Ec|' sequence.  found a few binary-data bugs
and fixed them (a few releases ago).  i was able to xfer multiple megs
of data without a hiccup.  it was definitely shoving 8-bit data without
problems.  perhaps i should try that test again...

> Apr 14 10:03:14 xoff kernel: [<c015f9e6>] sys_close [kernel] 0x66 (0xe0f97fb0)

assuming i'm reading this right, one of the child processes was locked
up trying to close the serial port (i assume it was a serial port since
higher up it was doing cyclades stuff).  all the files are opened
non-blocking (O_NDELAY for the serial ports, but under linux
O_NONBLOCK==O_NDELAY - i'm going to change all the O_NDELAY to
O_NONBLOCK, btw), so it concerns me that a close would block.  at least,
i'm assuming it was blocked on the close.

my big question is, why was it trying to close the connection to the
serial port?  where there any conserver messages logged that might
explain it?  or errors?  or anything?

> So, the conserver process seems "frozen" in that any console commands
> seem to hang.  I assume that's cause it's locked waiting for tty
> operations.

yeah, if you try and connect to any console managed by that process,
it'll lock up.  if you poll the servers it can lock too (stuff like
'console -u', 'console -w', etc) since the children provide the info.

> So my question is: is this expected or possible buglet fodder?  I'm
> thinking that just telling conserver to strip the 7th bit shoudl hopefully
> make this go away, but any guidance folks could share would be much
> appreciated.

i doubt the high-bit stripping will help.  if it does, i'd love to know.

can you make this happen at will?  if so, i'd suggest copying the
conserver.cf file, commenting out all but one console, and then tell
conserver to ignore the console by connecting to it and doing a '^Ecd'.
then start up another conserver process with 'conserver -C new.cf -DDD
-p 7777 > /tmp/conserver.log 2>&1'.  connect to the console with
'console -p 7777 (consolename)' and cause it to lock up.  then just nuke
the conserver processes that are using '-p 7777'.  oh, and connect to
the console with the real server and do a '^Eco' to bring it back up.

the /tmp/conserver.log file will be HUGE.  and i mean **HUGE**.  but,
it'll show every bit of detail, and hopefully will shed some light on
what's going on (unless you've figured out what it is by now).

and, of course, there was the off-mailing list suggestion that perhaps
it's a software flow-control issue.  dunno if you've checked into that
yet, but maybe there's some way of that influencing the close().  but
again, why is it trying to close the device?

as a final "aside", i've talked to another person about issues with the
cyclades Y cards and problems with conserver.  afaik, he's still having
problems - weird things where some ports work, others don't.  maybe
these are tied together somehow.  we never got anywhere with his
problems.  it makes me think there's some low-level serial stuff that
the cyclades drivers have issues with...or something.

ok...i'm done with my long rambling.  maybe, hopefully, something in
here will be of help.  if you can trigger the problem, i'd love to see
the conserver log (with the -DDD).

Bryan