GridPP PMB Minutes 379 (01.03.10) ================================= Present: David Britton (Chair), Steve Lloyd, Tony Cass, Glenn Patrick, John Gordon, Sarah Pearce, Andrew Sansum, Tony Doyle, Jeremy Coles, Dave Colling, Robin Middleton, Pete Clarke (Suzanne Scott, Minutes) Apologies: David Kelsey, Roger Jones, Neil Geddes 1. GridPP4 Proposal: next steps ================================ DB reported that the GridPP4 proposal had been submitted last week, as planned, by 5.00 pm on 24th February. DB advised that next steps included the PPRP meeting on 15th April in Glasgow, as a consequence of which, a number of people will miss the 2nd day of the RHUL meeting. Between now and then, DB noted that we had to: a) do some financial planning - led by SP; b) understand the CERN hardware paper - the CPU was a good costing match but the disk was not easy to understand - AS to deal with this; c) more depth required to the Work Breakdown Schedule (WBS) - SP to do; d) the area of EGI/NGI/NGS needed to be addressed - our arguments need to be refined and strategy agreed - some background work was needed on this. In relation to Action 367.2, DB advised RM that now would be a good point to progress this over the next week in order that a strategy can be developed. ACTIONS: 379.1 Re GridPP4 proposal and forthcoming PPRP meeting: SP to begin work on 'background' financial planning. 379.2 Re GridPP4 proposal and forthcoming PPRP meeting: AS to look at the CERN hardware paper and work on the CPU and disk costings. 379.3 Re GridPP4 proposal and forthcoming PPRP meeting: SP to add more detailed information to the WBS. 379.4 Re GridPP4 proposal and forthcoming PPRP meeting: RM to progress the EGI/NGI/NGS model for next week's PMB (in relation to Actions 367.2 & 375.9). DB advised that his paper on NGI within the GridPP4 Proposal needed to be expanded with new information, and circulated. An outline on effort was required. RM/SP would assimilate the information and circulate a new updated paper before next week's PMB. This would be a transition document addressing the possibility that: 1. There would be no NGI; 2. There would be no future funding for NGS. ACTION 379.5 RM/SP to assimilate the information in DB's paper on NGI within the GridPP4 Proposal, and circulate a new updated paper before next week's PMB. This would be a transition document addressing the possibility that: 1. There would be no NGI; 2. There would be no future funding for NGS. DB would prepare a talk for the PPRP covering all these subjects. TD advised that Robin Tasker would be giving a talk on networking at an upcoming JANET meeting. PC advised that next week there was a JANET OPN meeting, on Monday or Tuesday. TD noted that if the OC documents could be released as public, then RT could use them. This was agreed. ACTION 379.6 SL to ensure that the OC documents are made publicly available. In addition, DB advised that there would be a site visit at RAL around 12-14 May, which was three weeks following the PPRP meeting. There were no other issues for discussion on the GridPP4 proposal. 2. GridPP24 Agenda =================== DB had circulated a draft Agenda for GridPP24 at RHUL. DB noted that the meeting theme was 'Data'. SP asked about the inclusion of KE? DB advised that first data should be flowing by April, the proposal was that Day 1 would incorporate (during 2nd session) talks from the Experiments on data-taking. The first session would focus on the GridPP4 proposal. SP would summarise the current status of GridPP3. DB advised that there was room for a headline talk focussed on LHC. Last time we had Roger Bailey - were there any suggestions for an external speaker? SL could give a talk? TD thought that someone from CERN would be relevant. DC suggested asking Lyn Evans? TD commented that it wasn't about the machine so much as about the response to CERN computing. Would TC be willing to do this? DB noted that the space available straddles steady data-taking/long-term plans/issues/headlines ... it was about the big picture. TC would think about it. DB considered that the big picture from someone at the Tier-0 would be useful, giving scene-setting. On the first day in the afternoon, there would be a CASTOR discussion for the longer-term. Other Tier-1s were dropping CASTOR, and there were upgrade issues to discuss. What planning is required to address issues historically seen, etc? The issue of CASTOR needed to be raised as it is the highest risk in GridPP4. DB asked for opinions from the PMB. AS agreed that it was sensible to discuss this. DB asked if Matt Viljoen could speak? AS noted yes. On Day Two, it was noted that those attending the PPRP in Glasgow were those who usually attend the OC: DB, SL, TD, SP, JG. DB noted there would be the normal site reports on Day Two, from the Tier-2s. DC would chair the first session. DB would ask the Tier-2s who they wanted to send as speakers. There would then follow a report from the Storage Workshop (JJ or Wahid Bhimji). JC would chair the session after coffee. It was agreed that WB, Sam Skipsey, Brian, and JJ should be on the panel. For the closing session, RJ could chair if possible. This would cover Tier-1 status (AS or deputy). Something about EGEE/EGI landscape including NGI was required (not JG), probably RM or NG, or Andy Richards could deal with this. The slot afterwards could possibly be taken by Will Venters, talking about Cloud Computing, or Tier-2 questionnaire/incidents, or KE/EI. JG also noted that sites and shared resources with Universities from a tech point of view, would also be a good subject to discuss. DB agreed this would be good, but possibly better discussed at Ambleside. For the DB on Friday, DB and TD did not intend to fly back to RHUL, but SL would. DB would ask SL for the Deployment Board Agenda. DB would check on sponsorship arrangements. 3. Week's Notes ================ a) Re the GridPP3-4 bridging posts - SP had sent an email to Tony Medland, but had received no reply as yet. b) Tier-2 JeS Forms - first grants had now been awarded; new JeS forms were required, but the funding was assuming a flat spend profile, which was not suitable, and STFC didn't want to change that. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric: 1) FY09 procurements: - All disk and CPU from one supplier has been delivered. - 40 of 60 disk servers have been delivered by the second supplier, the remaining disk servers and all of the CPU will be delivered this week. 2) UPS supplier. Building Projects Group have reviewed the change made (additional cable) to the UPS supply. There was a 50% improvement in the 3KHz noise. However they have concluded that adding additional cable is unlikely to meet our requirements for the UPS. They are working on alternative solutions. 3) The humidification system is now installed in R89. Service: 1) SAM test availability for the ops VO was 100%. 2) We experienced load-related problems on Monday and Tuesday on an ATLAS software NFS server. A number of interventions were made to try and address the problem and eventually we had to stop ATLAS jobs and start with a different job mix. We gradually re-opened the ATLAS job cap limit through Wednesday and Thursday. Although the problem ceased to manifest we are unclear whether the cause was a pathological set of ATLAS jobs or a problem in our capacity/configuration. SI-2 ATLAS weekly review & plans --------------------------------- RJ was absent. SI-3 CMS weekly review & plans ------------------------------- DC reported that things were on hold, waiting at present. SI-4 LHCb weekly review & plans -------------------------------- GP reported a couple of disk server problems last week at the Tier-1. There had been an incompatibility between root and dCache which was now resolved. They were waiting on data. SI-5 Production Manager's Report --------------------------------- JC reported as follows: 1) The DNS for gridpp.ac.uk was down for much of the weekend with the result that many sites have failed all SAM tests for this period and the GridPP website has been unavailable. An investigation is underway as to why the secondary DNS servers have not worked - a post- mortem report will be available later this week. The problem was known about over the weekend but could not be solved remotely and there is no 'on-call' at Manchester. Sites running their own top-level BDII have been less affected. As yet there have been no user complaints but anyone using lcg-utils is likely to have been affected as are VOs running on the GridPP VOMS (no new proxy creation during the period). Sites want to know the implication for their monthly availability/ reliability figures. 2) Following the APEL outage many GridPP sites have found a need to republish data for parts of January/February. Checking today it is evident that several sites still need to republish and others need to resolve issues to start publishing again. 3) UKQCD have raised a question about merging VO lists – they have some overlap with and desire to share some data with the international ILDG VO. One way forward may be to have a single VO and make use of groups/roles. Will this affect the VO shares? (A question to Glenn / the UB). It was reported that Wahid Bhimji was looking into this. PC reported that these were two separate VOs and they would want to share data, but not want to be considered as a single VO. JC would pursue this issue. ACTION 379.7 JC to follow-up the issue of merging VO lists and ILDG VO. 4) Regional portal testing (using regional Nagios in place of SAM) last week concluded that there were no showstoppers and deployment into production of the new dashboards will proceed from today. Bugs/ issues identified by the UKI ROD team have been addressed. SI-6 LCG Management Board Report --------------------------------- There was no report. SI-7 Dissemination Report -------------------------- SP reported that info on the CERN event was awaited. AOB === JG noted three issues: 1. The Tier-1 was not correctly reporting use of biomed. Biomed had a VO called 'bio', but there was also a separate VO called 'bio' - and incorrect reporting was affecting the percentage level of use. JG was flagging this as an issue. 2. JG asked about the definition of seed resources in the work breakdown - Camont had used 1.1%, were there any other VOs we can add to the seed money total? DB suggested phenogrid? No, they were not applicable here. However, NanoCMOS and Lumerical would certainly be included. JG was advised to contact Dug McNab at Glasgow for exact details. 3. JG asked about the Chair for NGS - suggestions please. DB noted that GridPP needed to keep a correct distance in relation to this. REVIEW OF ACTIONS ================= 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. JC reported that the consultation happened. There were a few suggestions in the deployment team about how to progress in this area. It needs writing up and an implementation plan. Ongoing. 366.8 AS to confirm that the Tier-1 proposes to use Tape-based storage in the period 2011 - 2015. AS noted this depended on money costs. DB advised this related to long-term plans and power capacity. Physical footprint space? Alternatives? Early action on AS required. AS had sent tech questions round the team and would forward inputs when available. AS noted that alternative further costings were required. AS to progress. Ongoing. 367.2 RM to fill-in the grey boxes on DB's UK NGI diagram of a minimal NGI, as to what NGS would be doing in the areas listed. RM reported that there wasn't enough information available at present to carry out this action, but he had met with Andy Richards. Ongoing. 375.9 RM to provide a skeleton outline plan, including post details, of GridPP/NGS convergence. RM reported that a draft plan would be available soon. Ongoing. 377.1 Re DB's UK NGI diagram of a minimal NGI, as to what NGS would be doing in the areas listed, RM would speak to Andy Richards and provide a draft to SL by Wednesday for the GridPP4 proposal. This did not now appear to be required for the proposal. Done, item closed. 378.1 SP to add a GridPP Operations Plan within Project Milestones, noted for September in preparation for the OC meeting. Done, item closed. 378.2 DB to resolve the PPRP representation at the meeting in Glasgow - probably not all PMB members would be required, and some should stay at RHUL as GridPP24 was running simultaneously. Done, item closed. ACTIONS AS AT 01.03.10 ====================== 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. JC reported that the consultation happened. There were a few suggestions in the deployment team about how to progress in this area. It needs writing up and an implementation plan. JC to progress. 366.8 AS to confirm that the Tier-1 proposes to use Tape-based storage in the period 2011 - 2015. DB advised this related to long-term plans and power capacity. Physical footprint space? Alternatives? AS had sent tech questions round the team and would forward inputs when available. AS noted that alternative further costings were required. AS to progress. 367.2 RM to fill-in the grey boxes on DB's UK NGI diagram of a minimal NGI, as to what NGS would be doing in the areas listed. RM reported that there wasn't enough information available at present to carry out this action, but he had met with Andy Richards. RM/SP to circulate a document. 375.9 RM to provide a skeleton outline plan, including post details, of GridPP/NGS convergence. RM reported that a draft plan would be available soon. RM/SP to circulate a document 379.1 Re GridPP4 proposal and forthcoming PPRP meeting: SP to begin work on 'background' financial planning. 379.2 Re GridPP4 proposal and forthcoming PPRP meeting: AS to look at the CERN hardware paper and work on the CPU and disk costings. 379.3 Re GridPP4 proposal and forthcoming PPRP meeting: SP to add more detailed information to the WBS. 379.4 Re GridPP4 proposal and forthcoming PPRP meeting: RM to progress the EGI/NGI/NGS model for next week's PMB (in relation to Actions 367.2 & 375.9). 379.5 RM/SP to assimilate the information in DB's paper on NGI within the GridPP4 Proposal, and circulate a new updated paper before next week's PMB. This would be a transition document addressing the possibility that: 1. There would be no NGI; 2. There would be no future funding for NGS. 379.6 SL to ensure that the OC documents are made publicly available [done following the meeting]. 379.7 JC to follow-up the issue of merging VO lists and ILDG VO. INACTIVE CATEGORY ================= 359.4 JC to follow up dTeam actions from the DB, as follows: --------------------------- 05.02 dTeam to try and sort out CPU shares and priority resources, at Glasgow first (perhaps by raising the job priority in Panda). --------------------------- JC would check the situation with Graeme Stewart (who was currently on annual leave). JC followed up with Graeme and the other experiments. A test was started but this area has been deemed low priority and further progress is not expected for some time. ATLAS see no issues with contention. LHCb are not intending to pursue anything in this area. A CMS discussion has started but again it does not appear to be urgent. If the experiments are not pushing this internally then there is nothing for the deployment team to follow up! It was noted there was no priority in ATLAS at present, this will be pending for a while. Move to inactive as it is a long-term action. --------------------- The next PMB would take place on Monday 8th March at 12:55 pm.