[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]

Re: Reconnect causes slowdown in 7.2.1

Aaron Burt aaron@osdl.org
Tue, 7 Jan 2003 15:49:39 -0800 (PST)


On Mon, 6 Jan 2003, Bryan Stansell wrote:
> On Mon, Jan 06, 2003 at 04:22:37PM -0800, Aaron Burt wrote:
> > The slowness is in the response time of console connections.  This
> > includes things like the time it takes to make the initial connection
> > (time from typing "console dev4-001" to getting a "[Enter `^Ec?' for
> > help]") or, once you're connected, from hitting a key and getting a
> > response from the machine you're connected to.
>
> are we talking about a second or more like 10 second delays?  see below
> for more.

>From 1+ up to about 18 seconds to connect, depending on how many consoles
are in reinit.  Keystroke-to-host response times are typically about half
the connect time.  I timed by counting seconds, so the accuracy leaves
something to be desired.

Strangely, Conserver commands and responses took around 2 seconds
consistently wheter 1 or 7 consoles were in reinit.  This was true for
^Ec commands and for "console down" messages when sending keystrokes to
downed consoles.

> > The slowness affects all 16 consoles managed by a given Conserver process.
>
> that makes sense...when a process is "busy" waiting, it'll hang the
> group of consoles it's managing.

That's a shame.

> > So, is this problem addressed by the reinit changes in 7.2.3?
>
> i'd have to say no.  i believe most things are the same in regards to
> this issue, but, of course, there are other things the new stuff has
> that could be useful.  but, i'm digressing.

Indeed.  The ability to turn off auto-reinit, for one.  I'll have to see
if I can find a way to dump a list of consoles in reinit and to force a
console down/up.  With that, I should be able to automagically find and
fix blocked ports, which is a common problem after network outages and
suchlike.

> > Or does the sleep in the reinit loop hold up the whole Conserver process?
>
> there are a couple of sleep() calls that would hold up a conserver
> process.  each are less than a second, however, and i'd really be
> surprised if they were "stacking up" and giving you long pauses

That's what they appear to be doing.  I found a group that had around 7
ports in reinit, and the delay decreased in a linear fashion as I brought
ports out of reinit.

The reinit retries also happened faster as fewer ports were in reinit.

> (although a second can seem like a long time too).  if you're seeing
> very long pauses, it's more likely the call to connect() that's
> hanging.  was your terminal server actively rejecting the
> reverse-telnet connections, or is it just half-opening the socket?

It was sending "port in use" or some such and then dropping the
connection.