[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]

Re: conserver -> cyclade, ssh timeouts

Bryan Stansell bryan@conserver.com
Thu, 1 Dec 2005 23:01:29 -0800 (PST)

On Thu, Dec 01, 2005 at 02:03:06PM -0800, Phil Dibowitz wrote:
> [Wed Nov 30 19:30:01 2005] conserver (6459): ERROR: [swx3.sys.adm1]
> initialization rate exceeded: forcing down
> [Wed Nov 30 19:30:56 2005] conserver (6459): [swx3.sys.adm1] automatic
> reinitialization
> [Wed Nov 30 19:30:56 2005] conserver (6459): [swx3.sys.adm1] console up
> At which point it almost immediately dies again:
> [Wed Nov 30 19:30:56 2005] conserver (6459): [swx3.sys.adm1] exit(255)
> [Wed Nov 30 19:30:56 2005] conserver (6459): [swx3.sys.adm1] automatic
> reinitialization

here's the deal with these messages.  if a console is forced down (like
above), then when it succeeds in coming back up the 'console up' message
is generated.  when it goes down, it retries, and all is well, no
'console up' is displayed (not sure why, but there was reasoning at the
time).  that's the real difference between all your logs.  in general,
conserver is forking off the command, seeing it exit with a -1 (255
since it's printing as unsigned) and retrying.  they seem to stay
running about 4 seconds, before they exit.  since this is a command and
conserver has no idea if the command is succeeding or not (as long as it
forks of it assumes so), it marks it as up (for about 4 seconds) and
catches it dying and retries.  looks like it was respawning a bit
quicker above, so it hit the reinitialization rate...but that doesn't
really deviate from the general behavior.

it doesn't look like a conserver issue to me (it's doing what i think it
should, at least).  it sounds like the ssh connection isn't being
dropped on the cyclades side (since it probably didn't get a FIN
packet, for whatever reason?) and keeping the port in use.  conserver
can't tell what's really going on since the command could be doing
anything.  i'd suggest having what conserver forks off do some logging
somewhere to see if ssh is getting a connection refused or failing to
authenticate or what...might help diagnose the issue.  the fact the
fork, ssh, etc is taking so long makes me think it might not be just a
simple "connection refused".