[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]

Re: what is normal conserver hang during reconfig

Bryan Stansell bryan@conserver.com
Sun, 23 Oct 2016 17:34:36 GMT


I'm glad you were able to find the source of "most" of your troubles.  I quote that because, yes, theoretically the code could be a lot nicer and not block while reconfiguring.  The code that does that never got folded into the loop that handles I/O, but could...and really should.  No one has ever called it out as a serious enough problem before.  :-)

I'll certainly put it on the list to look at...but it's not a "simple" change, that's for sure.

Bryan

> On Oct 22, 2016, at 11:42 PM, Denis Hainsworth via users <users@conserver.com> wrote:
> 
> Finally got time to look at things.  strace is perfect, thanks for
> suggesting that.
> 
> So running something like
> strace -t -o strace.out.2 -p 3198  
> and sending a SIGHUP to the parent process showed the issue.
> 
> So the way we've always set things up was to automatically generate one
> config file per console server from our equipment database.   This means
> There are 264 files that are #included into the main config file. 
> The first 30s of "hang" is each process opening each file reading it in
> and closing it, I'm wondering if we need to block I/O during this or
> perhaps that could be done before we start blocking?
> Once that is done there is another 10s of hang while we do the dns
> lookup for each console host as you thought (open /etc/hosts, make a dns
> query, resolve it).
> 
> I tried putting all the configs into one file but that didnt change
> anything.  So then I started wondering.  Our IT had long ago made the
> console servers VMs.   Its never seemed like an issue but I compared
> some basic dd commands and found my problem server has terrible IO
> throughput ... sigh.   To compare one of my good servers has about
> 80Mbp/s read/write and the bad one has around 15Mbp/s read/write.  
> 
> So I'm going to look into moving the VM or get the disk perf up which
> should solve most of my issues but I also wonder if the conserver code
> could be re-organized without too much trouble to avoid issues of
> blocking when there is slow disk?  Its possible what I'm asking is dumb,
> just throwing it out there.
> 
> -denis