London Tier 2 Technical Group. Meeting 2. 20 May 2004. 3:00pm to 5:00pm Rm 539 Blackett Labs, Imperial College London. Present: Paul Kyberd (Brunel) Owen Maroney (Imperial HEP) Alex Martin (QMUL) David McBride (Imperial LeSC) Steven Newhouse (Imperial LeSC) Keith Sephton (Imperial LeSC) Ben Waugh (UCL) Chris Williams (QMUL) Apologies: David Colling (Imperial HEP) Sukhbir Johal (RHUL) Barry MacEvoy (Imperial HEP) Grigori Rybkine (RHUL) 2.1 Actions from last meeting - 1.0 Due to email problems OM and PK had failed to circulate minutes from last meeting. The following action list was from notes taken by OM rather than agreed minutes. - 1.1 References to 3 SLA documents circulated by email and added to FAQ page. All sites encouraged to read SLA documents and feed comments back to OM on the proposed metrics. - 1.2 No documents have been found specifying the system requirements of experiments for site (in terms of OS, libraries etc.) The Experiment Software Installation method (link to document is provided on FAQ page) was discussed: - Each experiment has designated Experiment Software Managers. - The Site Administrators will provide a space for each supported experiment to install software (means of doing this is on TB-SUPPORT list and linked from FAQ page). - It is the primary responsibility of the ESM to install, verify and maintain Exp Software in this space. - The ESM decides which sites in the LCG Corezone will actually be used for production/analysis. - The relationship between the ESM and the Site Admin will be highly variable depending upon the local knowledge of the site of the Exp Software needs. Actions: -- Is the measurement of the resources a site supplies to LCG based upon availability of a site or based upon the actual utilisation of a site by experiments? (OM) -- Can LT2 develop a clear statement as to which experiments we plan to support? (Each Site to supply to OM list of experiments wishes to see supported). -- Who are the ESM's for supported experiments? (OM) - 1.3 Grid Operations Centre (GOC) monitoring principally based upon globus-job-run and edg-job-submit using the GridPP "green dot" technology. SN suggested that GOC be informed about the GITS monitoring used by the e-Science network. OM to contact GOC with information about this and see if they plan to implement this or an equivalent. - 1.4 There exists a recipe for installing WN without using LCFG. However, it is still a binary rpm installation assumming RH7.3. From the list of rpm's it is unclear if it can be called "minimal", though. SN reported problems with documentation of CVS including problems compiling the LaTeX documents themselves. It was not clear how errors in documentation should be fed back to developers. OM to investigate if bugzilla exists for this or if this should be via the same support mechanisms as technical problems. If appropriate OM to add bugzilla url to FAQ page. - 1.5 All UK sites should regard their Regional Operations Centre (ROC) ie. RAL, as their primary source of system support. The preferred method of contacting ROC support in the UK is via the TB-Support mailing list. Direct email contact with support staff at ROC should also cc OM. - 1.6 Source RPMS. Requests have been made for source RPMS for installation at QMUL and LeSC. Initial response from LCG was not positive, with the suggestion that services could be compiled with source code from CVS. OM and Steve Traylen have supplied John Gordon with information as to why this does not seem possible and asked for the request to be raised at GDB. It was suggested a possible way forward would be that QMUL and LeSC could offer themselves as 'early adopters' of porting LCG code to other OS's and would feed experiences back into LCG documentation. Action: SN to provide more details on problems already encountered in using CVS source code. OM to follow up on request for source RPMs. SN requested clarification of the management procedures in LCG for handling these requests as it was not clear whether porting an existing release of LCG could be considered deployment/support or development. - 1.7 VO monitoring. Pending more sites joining LCG. - 1.8 Website. FAQ page exists. Some problems encountered in granting admin rights, individual site pages to follow resolution of this problem. OM to add in more links to front page, especially FAQ. All members to provide Distinguished Names (eg. /C=UK/O=eScience/OU=Bristol/L=Physics/CN=owen maroney) of grid certificate to OM. All sites to consider what can be added to website. - 1.9 Mailing lists. LT2 Mailing list updated. KS in process of setting up LT2-Technical mailing list. It was agreed that minutes would be circulated on LT2-Technical mailing list and when agreed sent to LT2-"comprehensive" mailing list. - 1.10 Iptable configuration for each service node (OM). No progress on this. - 1.11 Private addresses and outbound WN access. AM has supplied OM with recipe. OM to test with view to using at Brunel with PK. - 1.12 GridICE monitoring. No progress on this. OM to contact Dave Kant of GOC to see if they can provide a LT2 GridICE page. 2.2 Site Status. - Brunel. Status: LCFG server installed and WN and CE installed on test node. SE in process of installation. Some services on WN and CE not working. Problems also encountered with installation on nodes with SCSI discs (OM to ask Steve Traylen to provide SCSI disc installation kernel). By next meeting: Installed working CE, SE, WN on external network. - Imperial -- HEP. BM not present to report. OM said site was still in LCG, had a green dot and had been in use ("infinite" queue usually full). AM to respond to BM's request for more information on using Maui with PBS. -- LeSC Attempted manual installation using rpms following LCG-2 manual installation notes. However, had problems with workload managment software due to need for two different compiler versions. LeSC OS is locally modified RH7.2 while LCG-2 notes are for installation on CERN RH7.3. Then tried to build from source code from CVS but LaTeX documentation on LCG autobuild environment couldn't be built. OM offered to look further into CVS source code but SN requested effort be concentrated on source RPMs as this would be more reliable method. By next meeting: SN unable to say until situation with source RPMs resolved. - QMUL Status: CE, SE, WN operational. GOC had been reporting problems with MDS on CE so no 'green dot'. CW had not found problems but OM had reproduced GOC failure. OM to check again and notify CW if still having problems. In process of setting up Torque queue on test nodes running Fedora. CW wishes CE to send jobs to both Torque queue and LCFG installed WN queue. This is not possible using existing LCG-2 middleware so modification of LCG-2 scripts necessary. After this is solved will attempt to port LCG-2 WN software to Fedora. Main farm WN on private network but already have implemented NAT using iptables. Added to LeSC request for source RPMS for porting LCG to Fedora or failing that more information on autobuild environment for building from CVS code. RPM build of Atlas DC2 software for Fedora taking place using local expertise. There remains an issue of how software for other experiments could be deployed. Next meeting: resolved PBS queue issue and if source RPMs made available to have begun porting to Fedora. - RHUL No-one present. OM to request status report from them by email. - UCL --HEP Status: CE-SE-WN in testzone. BW working on recipe for integrating existing PBS farm supplied by OM and Steve Traylen, but this requires changing the way the home directories are shared amongst the pbs farm. By next meeting: Existing 10 node PBS farm operational in LCG. --e-Science cluster. No-one present but BW believed they had an LCFG server installed. BW to email them requesting notify OM of current status and plans. 2.3. Website. See 1.8 above. 2.4 Matters arising from LCG. See 1.2 above. 2.5 Matters arising from GridPP. DC not present so no report on frontend nodes or support posts. OM circulated proposal from Rhys Newman (Southern Tier 2 Co-ordinator) for LCG Site Admin training course. General agreement that if such a course was held sites would try to send someone, but that a week course would be impractical. 2.6 AOB. None. Next meeting: week of 7-11 June. Suggestion: Wednesday 9th June 3:00pm?