[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]

Re: conserver eventually goes catatonic after SIGPIPE (on NetBSD)

Bryan Stansell bryan@conserver.com
Tue, 28 May 2002 18:06:18 -0700 (PDT)


On Tue, May 28, 2002 at 06:10:56PM -0400, Greg A. Woods wrote:
> Conserver definitely needs a SIGPIPE handler -- unless it wants to die
> unexpectedly every time it writes to a client that has disconnected
> abnormally.  The kernel sends SIGPIPE to processes writing to closed
> sockets just as it does to processes writing to closed pipes....

yep.  guess it's just never come up before 'cause it's mostly
processing a read() when it notices broken things (a client would have
to be sending data at just the right time).  i didn't even look for a
SIGPIPE handler until this came up, actually.  regardless, yes, this
needs to be added.

> code currently supports systems without sigaction(2).  Is that
> important?  I'll bet conserver won't even compile on systems without
> sigaction() despite the fact that API alone is handled portably....

yeah, i'm not sure myself.  i've been adding and changing things and
trying to not break older systems, but at this point, i have no idea if
i've succeeded.  i haven't heard complaints, so that's promising, but i
don't want to just assume everyone is posix-compliant or has foo and
bar or whatever.  i wish i knew for sure.  so, i'd like to try and keep
it as forgiving as possible.

> The alternative is to keep a list of chat child PIDs and walk through
> them after SIGCHLD is caught too....  That's a bit more work, but still
> easily doable (I've got similar code already tested and working for some
> other applications).  It seems a little messier since it introduces
> concurrency in the error handling for chat processes where they're
> currently really more synchronous....

and that's one thing i'm worried about...synchronous behavior.
anything that conserver does that isn't more "event-based" causes the
server (or a set of consoles, in this case) to freeze until it
completes.  the re-read of the config file is one example of this.
depending on the size of your config file, you can see a noticable
delay while it does all it's work and rebuilds all it's data
structures.  the chat patch you submitted does similar things - it
forks off chat and waits for it to return before continuing.  depending
on timeouts in that chat script, network availablity, etc, it can cause
a major hang in conserver while it's waiting for things to happen.  if
a console goes down and it retries, that group of consoles is going to
hang 'til the script completes.  then it's even worse if conserver has
been given the -O flag, that hang will occur repeatedly.

but don't get me wrong, i really like the idea of conserver being able
to handle a chat-like situation.  we just need to get things so they
don't cause the server to stop processing other consoles and client
connections.  we're already forking off processes as consoles and
looping over them - it just needs to be extended to the chat scripts (i
think - haven't really put a lot of thought into it - but i doubt it
would be *that* hard).

Bryan