Minutes of session 6. "Managing Large Facilities in the LHC era" (What works? What doesn't? What won't) Chair - John Gordon. Panel Members: Mona Agerwaal, Catalin Condurache, Alessandra Forti, Pete Gronbech, Lawrence Lowe, Colin Morey, Fraser Speirs, Stephen Childs (standing in for John Walsh). [recorded by: Jamie Ferguson] OPENING REMARKS BY PANEL MEMBERS ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Mona - Site administrator IC. Worked on SC3. Need to focus on security. Need more tests like SC3. Catalin - Involved at Tier-1. Also nvolved in support and has experience of kickstart. Installed first VO box. Alessandra - Northgrid coordinator. Has set up 1000 node farm. Going for kickstart. Pete - Southgrid coordinator. Moved from lcnfg. From central management node all nodes can be coordinated. Lawrence - Birmingham run tripwire. Keeping rpm's up tio date, issues with yum, etc - all this has to be emailed back to sys admin. Fraser - Scotgrid coordinator. Use xcat - IBM offering vendor support. Colin - Sysadmin at Manchester. Need tools such as yum if managing many machines (> approx 7). Stephen - Grid Ireland. need centralised system. Possibly moving to quattor - gained experience on this - better than lfsg. Still on 2.4.0. hopefullly go to 2.6.0 DISCUSSION ^^^^^^^^^^ Pete Gronbech - Package manager indicates what rpms need to be installed/updated. John Walsh - asked panel about using cfs engine. Allessandra Forti - if altering software on one machine, just ssh in. Pete Gronbech - recommend use yum. All nodes should update in coordination. John Walsh - asked panel about central repository for software. Colin Morey - problem with nodes going down. Pete Gronbech - use mirrors from central node. Steve Traylen - yum now called "packety". A discussion then suggested that lcfg quattor tools were causing problems and the package manager in quattor provides extra functionality to yum. Owen Synge - asked panel about how much interaction with the sites computing services during upgrades. Allesandra Forti - At Manchester we manage most of the farm. It depends how free the comp. center lets you be. good relations with the comp. center staff is important. Pete Gronbech - Many sites suffer with firewall issues. Stephen Childs - try upgrades on a testbed first. Fraser Speirs - At Glasgow the software tool "nagios" informs sysadmin when nodes are down. Colin Morey - nagios can check whether switches have went down hence making it more efficient. Stephen Childs - nagios is quite a flexible package. Graham Stewart - What seems to work is simple and flexible tools. A gridpp best practice document would be good, e.g "ten best software tools". Allessandra Forti - What works for one site may not work for other. It should be down to local site to decide. John Walsh - asked panel about Manpower at sites. Allessandra Forti - 24/7 support could be maintained with help from their compting center. Chair - The type of uninterrupted power supply (ups) is a consideration for sites. Allessandra - bdii would be the worst node to go down as it makes entire site dissapear. Pete Gronbech - saving money on cheap hardware is a false economy. Chair - buying really expensive hardware in not an option either. Peter Watkins - asked panel about the decision to locate clusters, comp center or elsewhere. Allessandra Forti - comp center was better as hassle in the physics dept. In a bid, one should think about sundry costs (electricity bills, etc.). We dont pay elec and in return we share part of our cluster. John Walsh - asked the panel about hiebenation techniques to save power. Chair - at the weekend could be shut down. Tony Doyle - AMD looking into more efficient chips so vendors aware of this issue. Colin Morey - A site may ay need new power supply which is expensive. Chair - sometimes need long time to prepare for large increase in hardware. MAIN POINTS ^^^^^^^^^^^ Sys admins seem happy with their package managers. We should share common knowledge (about software tools) more. Extra Costs (over and above the price of the hardware) involved in having large clusters.