GridPP PMB Minutes 358 F2F (07.09.09) ===================================== Present: David Britton (Chair), Roger Jones, Sarah Pearce, John Gordon, Andrew Sansum, Glenn Patrick (remote), David Kelsey, Robin Middleton, Tony Cass, Dave Colling, Tony Doyle, Steve Lloyd, Jeremy Coles, Pete Clarke Apologies: Trish Mullins, Neil Geddes 1. OC Papers - final sign-off ============================== DB advised that the OC papers needed to be submitted tonight, therefore they had to be signed-off today. Project Status v9 had been sent round on Saturday. Outstanding issues were the Tier-1 percentages (which it was agreed to delete); and the CMS section - DC confirmed he would remove the question mark re Oxford being certified for CMS production. DB asked if there was anything else outstanding that required to be changed or corrected? JC advised that he had sent a list of errata to DB. Project Map - SP reported that this was complete, but she had received no comments. Typos had been corrected. SP would send the final version to DB today. SP noted one outstanding issue - that of the post at the Tier-1, she needed to check this with AS. Resource Report - DB had added 30,000 CPU for 2010. He had spoken to Ian Bird regarding definitive experiment requirements. There were issues about the LHCb numbers - the new ones from Nick and Raja did not agree with the global ones from IB. Therefore the CPU should be put back to the 'normal' level. OPN Document - PC had received no comments; it was noted that it was only the UK which did not have a backup link. DB would change the wording. PC would send the document to DB. CASTOR - AS reported that the document was almost complete. DB advised that the references to 5.1.8 should be changed. EGI/NGI - JG had circulated a final version with extra comments. DB noted that issues in the last paragraph needed to be changed - who was the single body in the UK? STFC? There was discussion regarding membership fee and JISC. What would we tell the OC regarding STFC? JG was happy to remove the last sentence, or alternatively, add information in which steered a course between taking ownership, and not - liaison with NGS was required. JG would re-draft the paragraph. SP asked if there were anything else to add in relation to the OC documents? No. The meeting would take place next Tuesday 15 September at the Medical Research Council in London. 2. GridPP/NGS Convergence ========================== SP noted that a working group had been set up at the last F2F meeting at UCL. What progress had there been, and what was the current status? It was reported that there was now site involvement in GridPP, there was a wiki available from the GridPP website. There were also affiliates, but only 2 partner sites. DB asked if anyone used NGS? What was the practical status? Did anyone use GridPP? RJ noted that NGS users just use us as a remote login. JC advised that the next step was to enable VOs coming in through NGS. There was a discussion regarding GridPP & NGS involvement, and VO communities using portals. JC noted that there had been a problem with Bristol where they closed the NGS VO due to non- use. DB observed that we have no users who cross the boundaries and few have benefitted from it. DB also noted that shared resources might make funding difficult. There was little progress reported from the working group. JG advised that the issue was strategic for GridPP4, and we may not get funding. DB advised that over the next 5 years we need a high quality responsive service over which GridPP had substantive control - this was the essential premise, although specifics were difficult. TD noted that the subtext was to maintain the high-level service we currently have - people were converging towards this at present in a natural way. JC advised that convergence presented operational issues, and he was focussing on progress at various sites and on what next steps were required operationally. Following that, he would focus on NGI activities and the task list for both now and future. JC talked through the current status, using the same table of tasks broken down as had been presented last time at UCL. DB cited issues about funding and staffing; he had a high-level understanding with NGS, but how did this list map to NGI/EGI tasks? DB observed that, given things were drifting, was there another meeting scheduled, or a clear set of actions and timeframe? JC noted no - but what did the PMB want them to achieve over the next 3-6 months? SP noted that the table under discussion looked like a transferrance from GridPP to NGS, and this was not actually happening. DB asked whether we could identify this in terms of manpower in GridPP3 - what manpower effort do we put into this and what would the bid be for in GridPP4? DB advised that we needed VOMS and a CA therefore we would need to bid for some crucial tasks in GridPP4 - and focus on this from the manpower side, to see where the effort was going to be allocated in GridPP4. JG reported that NGS had not yet signed-off on any of these proposals. DB wanted to focus on a subset - eg: what was the specific aim relating to VOMS for the next 6 months? TD advised that this should be listed under Institutions rather than GridPP/NGS - a lot of these tasks were handled at RAL. SP summarised that the PMB had agreed to: 1. identify Institutes 2. identify manpower 3. decide who is bidding for what SP said she would work with the working group on these issues, and would propose something for a draft transition plan by the end of the year - she would also consider GridPP4 requirements. ACTION 358.1 SP to work with the working group on the following issues in relation to GridPP/NGS convergence: 1. identify Institutes 2. identify manpower 3. decide who is bidding for what - a draft transition plan would be made available by the end of the year; GridPP4 requirements would also be considered. 3. EGEE/EGI update =================== RM reported on some updates in relation to the upcoming EGEE'09 meeting in Barcelona, EGI & NGI. JG advised that there would be no extension of funding at the end of contracts, but money could be moved around within the UK if necessary. DB advised Institutes to wait and get further info at Barcelona. JG noted that he had circulated a document regarding EGI & gLite, also the EGI first draft for the OC. Regarding SSC, DB noted 20+ FTE's were envisaged within SSCs; we want to be involved as an active partner and we hope for funded posts. DB advised that he would be in Barcelona. The SSC proposal had already been submitted. It was noted that for the mapping of existing EU-funded posts, particularly those of the Deputy Tier-2 Co-ordinators, we would need to check what the transition is, and what its consequences are. JG reported that the transition document covered this - we have put in a 'high' bid (done with Andy). DB noted dubiety about the 'international tasks', specifically CIC and monitoring - can we justify these? JG advised that we need to wait and see how it spreads out across the board - we don't know yet. DB noted that contracts will expire and we need to carry out a mapping exercise for our own purposes. The meeting broke for lunch. 4. CASTOR report ================= AS provided an update. It was noted that LHCb want to upgrade to 2.1.8 in September, but we have said that this is not possible. DB advised that we needed a strong plea that LHCb wanted this by December - we need to judge the effect on other experiments and on data-taking with reference to 2.1.7, which both works and is supported. JG noted that going from 2.1.7 to 2.1.8 was a substantial move. ACTION 358.2 GP will talk to LHCb and see if they can progress the issue of CASTOR 2.1.8, and come back to us. We would require a strong plea from LHCb that they want this by December. 358.3 AS will review 2.1.8 statements in the CASTOR paper in light of the LHCb situation, and make them less definitive. 5. HEPSPEC06 benchmarking ========================== DB asked what the background to this was? DC noted that if you know what you've been publishing, there is a conversion factor. JC advised that he was trying to understand what the status was of sites at present. JC gave a brief presentation. 13 sites were doing HEPSPEC06, so we were 2/3 of the way there, however there was some concern about hardware allocations. DB noted that the ATLAS accounting didn't agree, as Graeme Stewart had already highlighted. JC responded that some sites had simply guessed the equivalent ksint. DB advised that we need to be able to defend the numbers, so we need to be clear about how we do this. JC asked whether it should be recalibrated? DB noted that the meeting needed to make a proposal to the Deployment Board on Thursday, about how we handle this issue - how do we calculate the numbers? SL noted we should benchmark everyone then calculate back. JG observed that we should soon have the figures per site, as we have figures per quarter, therefore we have two sets of numbers - we could renormalise. DC suggested that setting up a sub- group to handle this specific issue would be useful. DB agreed, proposing the four tier-2 co-ordinators plus a couple of senior people (including SL) to moderate - there should be a proposal to go to the Deployment Board. DC proposed the following course of action: 1. specint them all 2. calculate what we can 3. adjust the ones we can't 4. compare the adjustment with those who haven't done it properly 5. if within 10% then ok 6. set-up a sub-group comprising JC, SL and the four Tier-2 Co-ordinators 7. agree timescale Figure was £400k this financial year, from STFC. One month only could be allowed for convergence, as time was short - proposed date was Friday 16 October. SL advised that it should not advantage sites who can't do it. Decisions should be referred to the PMB. This was all agreed. 6. Tier-2 Accounting ===================== It was reported that £1.998 million was available; £400k this financial year. SL noted that it had been agreed to include disk (declared) - this was 3 to 1 last time, and we should use the same formula as in the Quarterly Reports. What about SAM performance, it was asked? DB queried whether we would be double-counting? SL advised that all were above 90%, so we should not include the SAM tests this time. DB agreed - the more factors taken into account meant more complexity, and we needed to be able to defend the decisions, therefore the simpler the better. It was agreed not to use the SAM performance tests this time. Would there be a hardship fund? TD noted no - it had created problems last time. DB noted we should be sensitive about this issue and not publicise it at this point until there was a need. DC suggested that it would be wrong to give money to sites who had consistently failed to deliver. There was also the issue of 'fiddling' the figures. TD advised that cross-calibration should solve the latter issue. DC agreed that in principle, it was feasible, but we would need analysis tests. GP advised that LHCb could look at calibration for the UK sites. DC noted that HEPSPEC should help make this easier. Already-agreed figures were 100 for LHC; 90 for other HEP; 75 for all others. SL noted that there was a matrix of which experiments were included in what Institutes - last time they only received resources according to which experiment you were a part of - but there remained the Manchester issue. DB asked whether we should determine this? Should it be down to the experiments to say? TD asked what do we do about new affiliations? DB asked whether CMS wanted to limit resources to specific Institutes? A matrix used to distribute results had a larger effect. DC advised that CMS had used RHUL greatly - it was not a bad idea to ask the experiments about what sites they wanted to use, and use this criterion to help define metrics. DB advised that it didn't add a complexity - further, should the experiments determine the 1s and the 0s and let our metrics determine the allocation? This was generally agreed. TD noted that Durham didn't affiliate to experiments - that was their policy. SL noted that they would have a share for CMS. DB proposed that a decision should not be made right now. SL advised that it was a step towards cutting out some of the Institutes. PC also noted the danger of inconsistent schemes. It was agreed that this issue required further discussion and deliberation. A decision was not required until October. However, it had to be spelled out for the Collaboration Board. DB suggested that PMB members should think on this issue over the next day or so, and decide by the end of the week on a concrete proposal to let the experiments decide on the 0s and 1s. This was agreed. 358.4 ALL: PMB members to think on the issue of Tier-2 accounting over the next day or so, and decide by the end of the week on a concrete proposal to let the experiments decide on the 0s and 1s. 7. Tier-1 Spend Planning ========================= AS presented disk, CPU and tape tender info, and delivery plan. It was noted that we needed to confirm volumes at the start of October and place orders then. DB advised that he had the model for resource requirements, and would discuss offline. DB noted that AS's pricing models were different to the ones he had; AS's slide was for internal use only. DK noted that the only issue was the allocation at RAL, and for that we would need to wait for the OC meeting. There was no other business. REVIEW OF ACTIONS ================= 348.2 JC to investigate whether the decrease in job success rate metric in the last quarter is due to time-outs at busy sites or due to job-aborts due to incorrectly setup environments. This was still in progress - DB noted that the next Quarterly Reports will help and possibly render the action redundant. SP asked that this remain open until the next Quarterly Reports. 350.5 JC to check and verify that the contact list on the GOCDB is up-to-date - to be done by September. 354.1 JC to get more info on e-NMR status and report-back; JC to also raise this issue of GridPP support for them at dTeam. 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. 354.4 DB to co-ordinate 16-20 page Project Status report for the OC and ensure it is submitted on time. 354.5 DB to write a 1-page Introduction for the OC Project Status Report. 354.6 TC to write a 1-page report on LCG Status for OC Project Status Report. 354.7 RM, with input from NG, to write a 1-2 page report on EGEE/EGI Status for the OC Project Status Report. 354.9 JC, with input from SL, to write a 2-4 page report on UK Deployment, for the OC Project Status Report. 354.10 TD to write a 1-page Technical Director's report, for the OC Project Status Report. 354.11 RJ to provide a 2-page User Report, to include relevant figures, on behalf of ATLAS, for the OC Project Status Report. 354.12 DC to provide a 2-page User Report, to include relevant figures, on behalf of CMS for the OC Project Status Report. 354.13 GP (& RN) to provide a 2-page User Report, to include relevant figures, on behalf of LHCb for the OC Project Status Report. 354.14 GP to provide a 1-page User Board Report, for the OC Project Status Report. 354.15 SP to provide a 1-2 page summary report on EI/KT & Dissemination, for the OC Project Status Report. 354.16 SP to provide the ProjectMap Report, to include the Risk Register, for the OC Project Status Report. 354.17 SP to provide the Resource Report, for the OC Project Status Report. 354.20 JG to provide a report on EGI/NGI/NGS and future scenarios (point 35 and Action from last OC meeting), for the OC Project Status Report. 355.4 JG to do a draft Agenda for the e-science review visit. 356.1 JG to deal with EGI issues for EGI section of the OC document. 356.2 RJ to provide DB with targets/rates context for STEP'09 and draft distribution rates; RJ to provide text on figures meeting the requirements for Tier-1 running; RJ to provide DB with info on Tier-2 numbers. 356.3 DB to discuss the issue of HEPSPEC06 benchmarking with SL and JC offline, and raise an appropriate action following discussion. 356.4 A new individual document on the case for the OPN back-up link to be prepared for the OC by DB and PC, addressing all issues required. 357.1 ALL: to read the Project Status report and send amendments to DB by Friday. ACTIONS AS AT 07.09.09 ====================== 348.2 JC to investigate whether the decrease in job success rate metric in the last quarter is due to time-outs at busy sites or due to job-aborts due to incorrectly setup environments. This was still in progress - DB noted that the next Quarterly Reports will help and possibly render the action redundant. SP asked that this remain open until the next Quarterly Reports. 350.5 JC to check and verify that the contact list on the GOCDB is up-to-date - to be done by September. 354.1 JC to get more info on e-NMR status and report-back; JC to also raise this issue of GridPP support for them at dTeam. 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. 354.4 DB to co-ordinate 16-20 page Project Status report for the OC and ensure it is submitted on time. 354.5 DB to write a 1-page Introduction for the OC Project Status Report. 354.6 TC to write a 1-page report on LCG Status for OC Project Status Report. 354.7 RM, with input from NG, to write a 1-2 page report on EGEE/EGI Status for the OC Project Status Report. 354.9 JC, with input from SL, to write a 2-4 page report on UK Deployment, for the OC Project Status Report. 354.10 TD to write a 1-page Technical Director's report, for the OC Project Status Report. 354.11 RJ to provide a 2-page User Report, to include relevant figures, on behalf of ATLAS, for the OC Project Status Report. 354.12 DC to provide a 2-page User Report, to include relevant figures, on behalf of CMS for the OC Project Status Report. 354.13 GP (& RN) to provide a 2-page User Report, to include relevant figures, on behalf of LHCb for the OC Project Status Report. 354.14 GP to provide a 1-page User Board Report, for the OC Project Status Report. 354.15 SP to provide a 1-2 page summary report on EI/KT & Dissemination, for the OC Project Status Report. 354.16 SP to provide the ProjectMap Report, to include the Risk Register, for the OC Project Status Report. 354.17 SP to provide the Resource Report, for the OC Project Status Report. 354.20 JG to provide a report on EGI/NGI/NGS and future scenarios (point 35 and Action from last OC meeting), for the OC Project Status Report. 355.4 JG to do a draft Agenda for the e-science review visit. 356.1 JG to deal with EGI issues for EGI section of the OC document. 356.2 RJ to provide DB with targets/rates context for STEP'09 and draft distribution rates; RJ to provide text on figures meeting the requirements for Tier-1 running; RJ to provide DB with info on Tier-2 numbers. 356.3 DB to discuss the issue of HEPSPEC06 benchmarking with SL and JC offline, and raise an appropriate action following discussion. 356.4 A new individual document on the case for the OPN back-up link to be prepared for the OC by DB and PC, addressing all issues required. 357.1 ALL: to read the Project Status report and send amendments to DB by Friday. 358.1 SP to work with the working group on the following issues in relation to GridPP/NGS convergence: 1. identify Institutes 2. identify manpower 3. decide who is bidding for what - a draft transition plan would be made available by the end of the year; GridPP4 requirements would also be considered. 358.2 GP will talk to LHCb and see if they can progress the issue of CASTOR 2.1.8, and come back to us. We would require a strong plea from LHCb that they want this by December. 358.3 AS will review 2.1.8 statements in the CASTOR paper in light of the LHCb situation, and make them less definitive. 358.4 ALL: PMB members to think on the issue of Tier-2 accounting over the next day or so, and decide by the end of the week on a concrete proposal to let the experiments decide on the 0s and 1s. The date of the NEXT F2F meeting was 10th December at Imperial. The date of the next REGULAR PMB meeting was 21st September at 12.55pm.