Minutes of LCG GD UK/I Phone Conference , 4th August 2004 Present Birmingham: P Watkinson Cambridge: F Brochu Dublin: K Rothford London Tier 2: O Maroney NorthGrid: A Forti RAL: S Traylen, D Kant, C Brew, J Coles, B Saunders RHUL: S George ScotGrid: S Thorn SouthGrid: R Newman 1. Site Reports ScotGrid ======== S Thorn reporting on behalf of F Speirs. Glasgow and Durham have been in a slow phase due to holiday time and other commitments on manpower, so are essentially at the same position as last meeting. Edinburgh moved their ScotGrid hardware to a new location on the 2nd August. There is still a network access problem but this should be fixed early next week. London ====== Imperial HEP farm at LCG-2_1_1, running without problems. Imperial LeSC farm deployment delayed by holidays. There is also a dependancy upon Durham for the Sun Grid Engine information provider. SG reported on RHUL. Site still at LCG-2_0_0. There had been problems with both Atlas and LHCb software installation. An Atlas file appears to be corrupted when transferred to the RHUL SE. Standard tests revealed no problems with replication at the site. Further testing to be made with large files. The LHCb software manager had had problems with job submission (failure to copy the input sandbox from the RB). OM had been unable to replicate the problem and will now attempt to run tests from the LHCb UI using the LHCb RB. UCL-HEP farm at LCG-2_1_1. Some problems with upgrade, but site now running without problems. UCL-CCC still at LCG-2_0_0 as they were waiting for a lull in job submission to upgrade. Unfortunately yesterday there was a major power failure and the site is currently down. Brunel's test site appears to work but gsiftp is being blocked by the firewall to the CE and SE. QMUL upgrading to LCG-2_1_1. CE and SE have been upgraded, but the WN require a manual installation on Fedora so this is taking longer. The site has not been used by Atlas yet, although supplied software has been tested succesfully, as there was a problem with MDS information (SI00) reported. This will be fixed. SouthGrid ========= Birmingham have upgraded to LCG-2_1_1 and are running well. Oxford has just joined the testzone at LCG-2_1_1, although there was immediately a complaint about the wrong SI00 information being published (now fixed). The site has now had LHCb software installed and is flooded with jobs. Cambridge are running LCG-2_1_1 and were in the corezone. They have had problems with the wrong memory information being reported by MDS, and also have recently, for no apparent reason, disappeared from the corezone BDII into the testzone BDII. CB reported on RAL-PPD. Upgrade to LCG-2_1_1 completed and now flooded with LHCb jobs. 24 new nodes were to be added but there have been problems with the power supply so they cannot be added yet. CB had posted a general procedure to LCG-ROLLOUT for taking a site off-line for maintenance and upgrades and asked for comments. ST said it looked like "the most you could possibly do". Bristol, Warwick no activity. Swansea, Sussex made an appearance in the agenda but no-one had any information about them. NorthGrid ========= Manchester upgraded to LCG-2_1_1, now has many Atlas and LHCb jobs arriving. Liverpool has had no changes since last meeting. Lancaster upgraded to LCG-2_1_1 yesterday. Ireland ======= ST read report from email. Planning a transactional rollout to all Ireland sites of LCG-2_1_1 (currently running a version of EDG 2) Tier 1/A ======== 160 nodes added and were immediately filled. A substantial queue configuration rearrangement has been worked out, assigning one queue per VO. This ensures that the resources are fairly shared between Atlas and LHCb (previously flooded by LHCb). There was much interest in how to do this, but it is not an LCFG configuration option. A number of files must be hacked to prevent LCFG from overwriting the modifications. 2. DC reports FB said BDII used by LHCb had limitations so only UCL-CCC, UCL-HEP, Manchester and RAL were being used by LHCb currently. They were also having problems with file replication at Manchester (they are not using edg-replica-manager but instead an Atlas specific replication tool). Nevertheless, UK grid sites were providing 18% of the production (2nd only to Spain). ST suggested RAL could provide additional BDII or RB for Atlas to be able to use more UK sites. 3. Status of LCG releases. LCG-2_1_1 was released 2 weeks ago. ST reported a new release LCG-2_2_0 is in preparation, but is apparently still undergoing testing. This should (finally) include RGMA. OM suggested that more warning be given about new releases, as production systems would need time to upgrade safely. However, it was pointed out that sites were not being required to upgrade immediately upon a new release. 4. Network requirements SG had received some information since the previous meeting, including information on the Atlas Data Challenge mailing list. While not complete, it had been sufficient for SG's needs. It was noted that it was difficult to measure this as a site needed be running in a stable condition first. There will also be half a GridPP support post in September to work on this. 5. Head Nodes. BS reported the order had been made with delivery expected 4-6 weeks from today. 6. Metrics. JC had circulated a document giving an initial list of metrics to start a discussion. There will be a meeting of a task force, set up to decide this, next week. Suggestions and comments are requested. 7. AOB. A request from John Gordon was passed on that sites join the UK Ganglia monitoring. AF suggested adding to the FAQ a page on the upgrade to LCG2_1_1. CB suggested adding to the FAQ a page on 'how to drain a site' as an expansion of the 'how to stop your PBS queues' FAQ.