Re: what is normal conserver hang during reconfig

Sun, 23 Oct 2016 06:42:17 GMT · Bryan Stansell

Finally got time to look at things.  strace is perfect, thanks for
suggesting that.

So running something like
strace -t -o strace.out.2 -p 3198  
and sending a SIGHUP to the parent process showed the issue.

So the way we've always set things up was to automatically generate one
config file per console server from our equipment database.   This means
There are 264 files that are #included into the main config file. 
The first 30s of "hang" is each process opening each file reading it in
and closing it, I'm wondering if we need to block I/O during this or
perhaps that could be done before we start blocking?
Once that is done there is another 10s of hang while we do the dns
lookup for each console host as you thought (open /etc/hosts, make a dns
query, resolve it).

I tried putting all the configs into one file but that didnt change
anything.  So then I started wondering.  Our IT had long ago made the
console servers VMs.   Its never seemed like an issue but I compared
some basic dd commands and found my problem server has terrible IO
throughput ... sigh.   To compare one of my good servers has about
80Mbp/s read/write and the bad one has around 15Mbp/s read/write.  

So I'm going to look into moving the VM or get the disk perf up which
should solve most of my issues but I also wonder if the conserver code
could be re-organized without too much trouble to avoid issues of
blocking when there is slow disk?  Its possible what I'm asking is dumb,
just throwing it out there.

-denis

On Tue, Oct 18, 2016 at 08:39:19PM -0700, Bryan Stansell via users wrote:
> Off the top of my head, I agree that there shouldn't be anything fixed in the newer code to address this.  The code does block all activity when it processes a HUP signal, but that's supposed to be "quick".  :-|
> 
> Each process (the main and children) rereads the config file and figures out if there's anything to do.  The main process is in charge of spawning new consoles (or reconfigured), and the children are responsible for letting go of old ones (or reconfigured).
> 
> With that in mind, how many consoles are each child managing?  The compile time default can be seen with a "conserver -V", but it can be overridden with -m.  I'm honestly not sure if having more or less would be better or even change things (more processes would use more cores, but also "slam" the system with that many things reading and processing the config).
> 
> Conserver tries very hard to be multiplex across all the consoles, even when bringing up and tearing down things.  The reread of the config puts all that on hold, so it probably has to do with that.
> 
> One issue I've seen before is the magnitude of DNS lookups done when a config is loaded.  It all depends on the config, of course, but you could end up generating a lot of requests.  Maybe it doesn't apply in your environment, but it can be an unexpected source of trouble.
> 
> Aside from that, another server will certainly share the load (and, set up right, the end users won't even notice).  It would be interesting to look at an strace (assuming linux) of a process when it gets a HUP (even without any changes to configs).  Just send one of the children a HUP so it minimizes the impact.  With timestamps, it might highlight what is causing the issue (like the DNS query case, but could be anything).
> 
> Bryan 
> 
> > On Oct 18, 2016, at 6:04 PM, Denis Hainsworth via users <users@conserver.com> wrote:
> > 
> > Running v 8.1.18.  Rereading the SIGHUP section of the man page I'm
> > still thinking I've configured something wrong.  SIGHUP says conserver
> > rereads the config files and then adds/deletes consoles as needed and
> > only touches running consoles if they have changed.  If thats true I
> > wouldn't expect a 30s buffer of input/output on a console that hasn't
> > changed, should I?
> > I also don't see anything in CHANGES that sounds like this is a bug
> > that has been fixed.
> > 
> > -denis
> > 
> > On Fri, Oct 14, 2016 at 12:05:44PM -0400, Denis Hainsworth wrote:
> >> I love conserver.  I have  a minor issue and I was curious what options
> >> there might be.
> >> 
> >> So I have a conserver setup running against 262 servers (mostly digis or
> >> ser2net machines).  It works great.  However when we need to update due
> >> to a config change we run "kill -HUP" against the parent.  With the
> >> number of consoles (I think) this causes about a 30s "hang" when
> >> interacting with any console which corresponds to the reconfig time.
> >> 
> >> Does this make sense and is per the current design?  Any chance there is
> >> a clever way to make it block for less time?  Barring that I intend to
> >> spin up a new server to share the load of my current server and reduce
> >> the reconfig time.
> >> 
> >> I was mostly curious if there was a config issue or if this description
> >> doesn't make any sense to folks and it means I have something else going
> >> on like too many down consoles or something.
> >> -denis
> >> 
> >> -- 
> >> __________________________
> >> Denis Alan Hainsworth     
> >> denis.hainsworth@gmail.com
> > 
> > -- 
> > __________________________
> > Denis Alan Hainsworth     
> > denis.hainsworth@gmail.com
> > _______________________________________________
> > users mailing list
> > users@conserver.com
> > https://www.conserver.com/mailman/listinfo/users
> 
> 
> _______________________________________________
> users mailing list
> users@conserver.com
> https://www.conserver.com/mailman/listinfo/users

-- 
__________________________
Denis Alan Hainsworth     
denis.hainsworth@gmail.com