GridPP PMB Minutes 357 (02.09.09) ================================= Present: David Britton (Chair), Roger Jones, Sarah Pearce, John Gordon, Andrew Sansum, Glenn Patrick, David Kelsey, Andy Richards, Robin Middleton, Tony Cass, Dave Colling Apologies: Tony Doyle, Steve Lloyd, Jeremy Coles, Pete Clarke, Neil Geddes 1. Status of the OC Documents ============================== PMB 138 - Project Status ------------------------ DB advised that he had received no feedback on the Project Status report - could all read the main section and provide comments. DB had also written the Introduction and Overview. DB noted that the section on LCG status from TC was fine. For the EGI part, DB had sent comments to RM. RM noted that he had suggested new wording for one section in relation to exchange rates; he would check the other part and provide wording. DB advised that the Tier-1 status section looked fine. DB advised that submission was usually one week in advance, at the F2F meeting - however he needed comments before Friday. Re Deployment Status, DB had received a second version from JC. There was a table of numbers and issues remained outstanding regarding the old MoU - the Tier-1 numbers were incorrect. The reliability/availability graphs were not normalised to CPU capacity - small sites were having an adverse effect on the plots. DB noted that the major user reports were ok, although they had differed in style and focus - experiment reps needed to decide whether they wanted to modify. RJ, DC, and GP to read the 3 documents and let DB have any changes if required. DB noted a question mark in the ATLAS section which RJ needed to respond to. The User Board report was fine. The Dissemination/KE & EI section did not have quite the right focus - it covered all of GridPP3 rather than the last 6 months. DB advised that we would need to provide another report in due course, so we could not duplicate information. SP advised that Neasan O'Neill would go through the report and remove anything that was within the earlier period. DB noted that slightly more formal language would also be required for the OC. ACTION 357.1 ALL: to read the Project Status report and send amendments to DB by Friday. PMB 139 - Project Map --------------------- SP advised that she had circulated a version of this a week ago - no comments had been received. DB noted that PC was currently doing an expanded OPN report, and the wording from the intro could we used by SP. DB commented that there was no high-level summary or conclusion from the Project Map report - a steer to the reader was required at beginning and end. PMB 140 - Resource Report ------------------------- Re the Resource Report, SP suggested that this be amalgamated with the Project Map? DB noted that they were more effective as separate documents, as they were often used again for difference audiences. SP noted that comments on the Resource Report were welcome. RM queried that, in relation to travel, it looked like an underspend but was not - cf the budget vs the report. RM advised that the difference was largely due to the EGEE claim which was lower than other years because we shared the experiment and middleware people at the end of GridPP2, plus in addition a Collaboration Meeting. RM noted that the plot included EGEE figures. DB asked that this be resolved offline, and that a footnote be added to explain the £27k rebate. PMB 141 - OPN Report -------------------- Regarding this document, DB had spoken to PC, who was going to incorporate the previous information which had been presented. He would then pass this to DB for an introduction to be added. The document did not require a large input from the PMB. PMB 142 - CASTOR ---------------- The CASTOR document had been provided by AS. DB noted that there were a lot of references to 2.1.8 - but this should be reviewed in light of the CASTOR email discussion. DB asked that AS add-in the note on concensus previously reached on this topic (in April 2009) - see DB's email summary. AS would revisit the document. PMB 143 - EGI/NGI ----------------- JG reported that he had not progressed this document further, he simply needs to finalise it. DB asked about SSCs? JG noted that he had a 2-page summary in relation to names & titles. It was noted that SSC and Ganga effort had not yet been resolved but we should flag that we are looking to be a partner. JC to circulate the document tomorrow. DB asked if there was any other business in relation to the OC Documents? No. 2. HEPSPEC06 and Tier-2 Accounting =================================== This item was deferred to the OC as neither JC nor SL were currently present. JC was required in relation to benchmarking info and old accounting data being processed; SL was required in relation to how/when this is translated into the Tier-2 hardware allocations. 3. CASTOR 2.1.8 ================ DB reported that he had circulated the current position statement, and this issue would be discussed further at the F2F. Statement reproduced below: --------------------------------------------------- After discussions with the Experiments and assurances of support from CERN, the UK made the decision in April 2009 to remain with CASTOR 2.1.7 for first data. There were a number of issues that informed this decision: 1) ATLAS, CMS and LHCb did not feel at that time the need to upgrade versions. 2) CERN made a sufficient commitment to support 2.1.7. 3) STEP09 and the move of the Tier-1 to R89 made it more difficult to schedule a major upgrade with sufficient contingency with first data then scheduled for September. 4) Previous experience with CASTOR upgrades in 2008 had dictated that extreme caution was required. The issue of CASTOR had been (and still is) under scrutiny by the GridPP Oversight Committee and the funding agency in the UK. Subsequent to this decision, the STEP09 exercise validate the RAL Tier-1 as ready for data. CASTOR performed at, or in some cases, well beyond, the levels anticipated for the first data run. In this perspective it may be understood that a complete upgrade of RAL to CASTOR 2.1.8 at this point is not defensible. However, we also understand the concerns expressed by CERN that (a) support for 2.1.7 becomes increasingly hard with no instance at CERN and (b) the task of upgrading becomes increasingly difficult the longer we wait. The middle-ground that seems most attractive to us, is for the UK to receive a strong request from one of the LHC experiment reps in the UK (presumably this could be LHCb based on the comments at the MB or possibly ALICE) to upgrade to 2.1.8. Providing such experiment is willing to except fully the risks inherent in upgrading, we would be prepared to schedule such an upgrade of a single experiment-specific CASTOR instance at RAL to 2.1.8. The UK would be happy to run a hybrid system and gain the confidence needed in 2.1.8 before upgrading the other experiments. We believe this would be the best way forward, but re- iterate that the timing and the adoption of the associated risks, must be fully accepted by the experiment in question. ---------------------------------------------------- AS noted that we would need to be careful about committing to an upgrade - it involved major work to prepare for it, and we would need to assess our ability to carry this out prior to data-taking. DB agreed, noting that we would need a strong 'ask' from LHCb for us to upgrade, but they would need to be willing to take full responsibility for this measure. GP noted that they needed a realistic assessment and estimate from the Tier-1 in relation to both manpower and timing. LHCb might want to do this but it was contingent on other things. DB advised that we would need an assurance that the risk was low. This issue would be put on the Agenda for the F2F meeting at Cambridge. 4. Week's Notes ================ DB reported that the SSC draft had been submitted on Monday. DB had circulated a one-page proforma relating to input on partners. DB asked that RJ, NG and GP get three x one-page proformas (one each) done for each experiment. Inputs were required by Friday. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS reported on some main issues: - the chiller units had been switched off, so there was no risk of water at present - the situation was under control - there was no update on the aircon issue - re disk server acceptance, it was early days, but they may have a disk drive problem which hopefully would be resolved fairly quickly - re swineflu - there was nothing to add at present - there had been a major outage at the end of last week in order to upgrade security vulnerability, and this had affected the worker nodes - procurement was going well, they had turned around the disk pre-qualification stage and the invitation to tender was out DB noted that the 2009 MoU commitments were due at the end of September. DB also noted that the transfer of £300k from the 2008 to 2009 budget needed to be explained. DK noted they had flagged that this was not an underspend, but rather, a delay - SP needed to make this clear in the Resource Report. SI-2 ATLAS weekly review & plans --------------------------------- RJ noted only the question about the CASTOR version change; no other major news at present. SI-3 CMS weekly review & plans ------------------------------- DC noted he had circulated the recent plots. No other news. SI-4 LHCb weekly review & plans -------------------------------- GP reported there had been smooth monte carlo production; no real UK issues. The disk servers were down at present, and the CASTOR issue was being discussed. SI-5 Production Manager's report --------------------------------- JC was absent. SI-6 LCG Management Board report --------------------------------- DB noted that the main issue was CASTOR. SI-7 Dissemination report -------------------------- SP noted that Neasan O'Neill was looking for helpers to man the stand at EGEE'09. REVIEW OF ACTIONS ================= 348.2 JC to investigate whether the decrease in job success rate metric in the last quarter is due to time-outs at busy sites or due to job-aborts due to incorrectly setup environments. This was still in progress - DB noted that the next Quarterly Reports will help and possibly render the action redundant. SP asked that this remain open until the next Quarterly Reports. 350.5 JC to check and verify that the contact list on the GOCDB is up-to-date - to be done by September. 354.1 JC to get more info on e-NMR status and report-back; JC to also raise this issue of GridPP support for them at dTeam. 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. 354.4 DB to co-ordinate 16-20 page Project Status report for the OC and ensure it is submitted on time. 354.5 DB to write a 1-page Introduction for the OC Project Status Report. 354.6 TC to write a 1-page report on LCG Status for OC Project Status Report. 354.7 RM, with input from NG, to write a 1-2 page report on EGEE/EGI Status for the OC Project Status Report. 354.9 JC, with input from SL, to write a 2-4 page report on UK Deployment, for the OC Project Status Report. 354.10 TD to write a 1-page Technical Director's report, for the OC Project Status Report. 354.11 RJ to provide a 2-page User Report, to include relevant figures, on behalf of ATLAS, for the OC Project Status Report. 354.12 DC to provide a 2-page User Report, to include relevant figures, on behalf of CMS for the OC Project Status Report. 354.13 GP (& RN) to provide a 2-page User Report, to include relevant figures, on behalf of LHCb for the OC Project Status Report. 354.14 GP to provide a 1-page User Board Report, for the OC Project Status Report. 354.15 SP to provide a 1-2 page summary report on EI/KT & Dissemination, for the OC Project Status Report. 354.16 SP to provide the ProjectMap Report, to include the Risk Register, for the OC Project Status Report. 354.17 SP to provide the Resource Report, for the OC Project Status Report. 354.20 JG to provide a report on EGI/NGI/NGS and future scenarios (point 35 and Action from last OC meeting), for the OC Project Status Report. 355.4 JG to do a draft Agenda for the e-science review visit. 356.1 JG to deal with EGI issues for EGI section of the OC document. 356.2 RJ to provide DB with targets/rates context for STEP'09 and draft distribution rates; RJ to provide text on figures meeting the requirements for Tier-1 running; RJ to provide DB with info on Tier-2 numbers. 356.3 DB to discuss the issue of HEPSPEC06 benchmarking with SL and JC offline, and raise an appropriate action following discussion. 356.4 A new individual document on the case for the OPN back-up link to be prepared for the OC by DB and PC, addressing all issues required. ACTIONS AS AT 02.09.09 ====================== 348.2 JC to investigate whether the decrease in job success rate metric in the last quarter is due to time-outs at busy sites or due to job-aborts due to incorrectly setup environments. This was still in progress - DB noted that the next Quarterly Reports will help and possibly render the action redundant. SP asked that this remain open until the next Quarterly Reports. 350.5 JC to check and verify that the contact list on the GOCDB is up-to-date - to be done by September. 354.1 JC to get more info on e-NMR status and report-back; JC to also raise this issue of GridPP support for them at dTeam. 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. 354.4 DB to co-ordinate 16-20 page Project Status report for the OC and ensure it is submitted on time. 354.5 DB to write a 1-page Introduction for the OC Project Status Report. 354.6 TC to write a 1-page report on LCG Status for OC Project Status Report. 354.7 RM, with input from NG, to write a 1-2 page report on EGEE/EGI Status for the OC Project Status Report. 354.9 JC, with input from SL, to write a 2-4 page report on UK Deployment, for the OC Project Status Report. 354.10 TD to write a 1-page Technical Director's report, for the OC Project Status Report. 354.11 RJ to provide a 2-page User Report, to include relevant figures, on behalf of ATLAS, for the OC Project Status Report. 354.12 DC to provide a 2-page User Report, to include relevant figures, on behalf of CMS for the OC Project Status Report. 354.13 GP (& RN) to provide a 2-page User Report, to include relevant figures, on behalf of LHCb for the OC Project Status Report. 354.14 GP to provide a 1-page User Board Report, for the OC Project Status Report. 354.15 SP to provide a 1-2 page summary report on EI/KT & Dissemination, for the OC Project Status Report. 354.16 SP to provide the ProjectMap Report, to include the Risk Register, for the OC Project Status Report. 354.17 SP to provide the Resource Report, for the OC Project Status Report. 354.20 JG to provide a report on EGI/NGI/NGS and future scenarios (point 35 and Action from last OC meeting), for the OC Project Status Report. 355.4 JG to do a draft Agenda for the e-science review visit. 356.1 JG to deal with EGI issues for EGI section of the OC document. 356.2 RJ to provide DB with targets/rates context for STEP'09 and draft distribution rates; RJ to provide text on figures meeting the requirements for Tier-1 running; RJ to provide DB with info on Tier-2 numbers. 356.3 DB to discuss the issue of HEPSPEC06 benchmarking with SL and JC offline, and raise an appropriate action following discussion. 356.4 A new individual document on the case for the OPN back-up link to be prepared for the OC by DB and PC, addressing all issues required. 357.1 ALL: to read the Project Status report and send amendments to DB by Friday. TIMELINE FOR OC DOCUMENTS: Sep 7th: F2F at Cambridge and OC papers submitted. Sep 15th: OC at MRC in London FURTHER PMB MEETING DATES: Sep 7th - F2F at Cambridge