[Date Prev] [Date Index] [Date Next] [Thread Prev] [Thread Index] [Thread Next]

question about conserver scaling out ability

Guang Cheng Li liguangc@cn.ibm.com
Wed, 18 Nov 2009 07:29:58 GMT


HI,

We are using conserver to handle the consoles in our cluster, everything worked perfect until several months ago when our cluster is growing larger and larger. For now, our cluster has 2,000 nodes and will be growing to 16,000 nodes in the near future, we are seeing problems with the 2,000 nodes.

1. the conserver will start responding slow after the conserver have been running for a while(maybe several days, I am not so sure), when the conserver responds slow, it probably takes more than 10 seconds to open the node console, or occasionally can not open the consoles for the nodes at all. we have to restart the conserver to fix the problem.

2. The conserver restart will take a very long time, about 5 minutes, to finish the initialization with 2,000 nodes, during the conserver initialization, the rcons will get "Connection refused" error.


We did try some scaling tuning for conserver, but does not seem quite helpful. Could you give me some further instructions on the conserver scaling tuning? thank you.

1. Hierarchy: we setup several conserver hosts in the cluster, use the "master" keyword on the central management node to specify which conserver should the console goes to.

#xCAT BEGIN aixcn1 CONS
console aixcn1 {
type exec;
master aixsn1;
}
#xCAT END aixcn1 CONS

2. Change the number of consoles each daemon can handle, we changed the number to 64 by specifying -m 64 with the conserver daemon



Thanks,
-------------------------------------------------------------------------
Li,Guang Cheng (李光成)
IBM China Software Development Laboratory