GridPP PMB Minutes 378 (22.02.10) ================================= Present: John Gordon (Chair), Sarah Pearce remote), Andrew Sansum, Tony Doyle, Jeremy Coles, Roger Jones, Dave Colling, Robin Middleton, Pete Clarke, (Suzanne Scott, Minutes) Apologies: David Britton, David Kelsey, Steve Lloyd, Tony Cass, Glenn Patrick, Neil Geddes 1. GridPP4 Proposal - latest status ==================================== JG advised that various actions had been left by DB in relation to v10.0. Submission of the proposal would happen on 24th. DB would return tomorrow, and SL would update the current version of the proposal tonight, following any comments today. It was noted that in the 'Wider Computing Context' former section 12, the beginning paragraphs had been cut. These could be used within an outline, or as part of a covering letter. The remainder of the section was now included within a Cloud Computing part in section 7. In relation to the Tier-2 posts, RJ had sent comments back to DB in relation to the work packages. DC would also do likewise later today. PC had done the case for Edinburgh, and had sent this to DB, copied to SL. RJ noted that he took issue with the two Tier-2 Co-ordinators as they seemed to be justified differently, he had sent his comments to DB. Was TD happy about the ScotGrid post descriptions? TD noted yes. DK had sent comments to DB on RALPP T2. JG advised that the meeting would go through the document, dealing with each comment in turn. The following was agreed: SL1 - SS to do a page of Acronyms SP2 - agreed to change to 'Computing in the LHC era' SP4 - change to '10's' of PetaBytes (this may be mentioned later on in the document also, should be changed there too) SP5 - remove 'international' and 'national' SP6 - change to EGI both times SP8 - leave as is, but change 'One' UK Tier-2 site to 'A' UK Tier-2 site; and delete sentence after STEP09. SP10 - change to 'File Transfer Services' SP11 - JC to re-write this WP-C section as description, rather than a list SP12 - add 'Computing' Resource Review Board (CRRB); and in the following paragraph, use the acronym only SP13 - move highlighted text to the beginning of section 6.5 (omit 'also) comment P15 - let DB deal with this feedback (TD noted that a footnote might be better) SL17 - Alice and Atlas in the table should be in CAPS On Page 10 (table at top) TD noted that a footnote is required re CPU (KHS06). Move the footnote on p12 re HEPSPEC to this page. SP18 - split up these paragraphs into two sections, entitled: 'Group Analysis Sites' and 'Generic Support' - it was suggested to leave to DB to do. SP19 - move sentence to beginning of para that starts 'The location of potential EGI funded ...' on p18 Table 4 (on p17) to be moved to the end of the Tier-2 effort section (on p18) SP20 - leave as is SP21 - not sure, DB to decide SP22/SL23 - fine as is, deleted section could be used in covering letter or in an outline SL24 - change figure to 100k SP25 - sentence should be trimmed to be: 'A number of security and privacy issues remain'. SP27 - list does justify the post, so leave as is SP28 - omit highlighted text SP30 - omit highlighted text SP32 - delete highlighted text, sentence should read: '... GridPP will provide an effective ...'. SP33 - delete highlighted text; the sentence above should be amended to read: 'The planned posts will share the operation of the monitoring ...'. SL34 - leave as is SP35 - leave as is SL36 - the meeting supported deleting the highlighted text - DB to decide SP37 - delete the highlighted text SP38 - leave as is, SP will provide an added sentence SL39 - use revised quote if one has been agreed - DB to advise SP40 - leave as is SL41 - SP will proof-read these post descriptions on wednesday P42 - DB to decide, but generally it reads ok as is SL43 - delete entire sentence that ends in 'assigned' SP44 - risk register is ok - it prints landscape PC advised that the proposal seemed in good shape overall. JG noted that any other comments on 10.4 should be advised to SL today; anything new should be circulated to the PMB. Appendices 21: Project Milestones --------------------------------- SP had checked if we could just do an annual list, and had been advised yes. JG noted that we could add that after one whole cycle, the milestones would be reviewed. TD noted that deployment issues were missing from here. JG asked about the User Board. TD suggested that a broader operations plan was needed. SP asked whether no 6 should be changed to an operations plan? There ensued a discussion on the Tier-1 and general review dates. AS had hoped for late April for a review as the procurement cycle needed feedback from the previous year. It was agreed that SP should add a GridPP Operations Plan in September, in preparation for the OC meeting. ACTION 378.1 SP to add a GridPP Operations Plan within Project Milestones, in September in preparation for the OC meeting. Following the above review of points, the document would be handed back to SL this afternoon. Week's Notes ============ 1. GridPP3-4 bridging posts - SP advised that she had the information she needed from TD, but was awaiting information from DC. He would provide the info this afternoon. 2. Tier-2 JeS forms - STFC had said that grant letters would be sent out today for the Tier-2 hardware, but it appeared that a flat profile had been produced from the JeS forms, and this was being queried, which meant that the grant letters were delayed. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS reported as follows: Fabric: 1) The problem lot of disk servers have now passed our acceptance tests and are available for deployment. 2) FY09 procurements: - Delivery of the disk servers is scheduled for Mid February with a second tranche from one supplier on March 4th. - CPU deliveries are scheduled for mid-February (problems leading to delays to one tranche have been resolved but we are waiting for a new delivery date). We are finalising delivery details, but expect several deliveries this week. 3) A meeting to review plans for the UPS supply is scheduled for Tuesday. 4) An initial meeting to plan the FY10 procurements is scheduled for Tuesday. Our intention is to place orders in September for a December delivery. 5) Planning is underway to commence the migration of CMS to T10KB tape drives. Service: 1) SAM test availability for the ops VO was 100%. 2) We are experiencing load-related problems on an ATLAS NFS software server. We are investigating the cause and possible solutions. SI-2 ATLAS weekly review & plans --------------------------------- RJ had nothing to report. SI-3 CMS weekly review & plans ------------------------------- DC had nothing to report. SI-4 LHCb weekly review & plans -------------------------------- GP was absent. SI-5 Production Manager's Report --------------------------------- JC reported as follows: 1) With the transition to EGI, EGEE tools and services are being reviewed. Those that are not widely used are being removed. This is the case for the site weekly reports – they were part of the CIC portal for Resource Centres. There is now a consultation about the dteam VO. The deployment team are keen to keep this VO as it is a common core that allows help with testing and debugging between countries. Other transition areas require further understanding. One that might be of particular concern to GridPP is that of the operational security tools (such as Pakiti, the secure mail lists and wiki). 2) The APEL status has much improved in the last week. The database is now recovered but running “at risk”. Updates can be tracked via http://goc.grid.sinica.edu.tw/gocwiki/ApelIssues-Jan_Feb_2010 Looking at GridPP site results for January and February there is a need to check whether gaps are real. For example RAL Tier-1 has a gap in the last week of January, Oxford and Cambridge that week plus the start of February, and Manchester misses a large section, and Sheffield, Durham and ECDF seem to have stopped in January. The message from APEL support read: “After our integrity checks, we can guarantee that 99.8% of the data have been restored, and we have no reason to think that the remaining 0.2% have been lost. We however advise all sites to check their data on the accounting portal and report any inconsistencies through a GGUS ticket.” 3) As required for any site that schedules a downtime of longer than 1 month, the site UKI-LT2-RHUL has been moved to the status “suspended”. Once the machine room move has completed and services are fully available the site will need to be retested to regain certified status. 4) This week UKI is taking part in a trial of a new Nagios regional dashboard. This will be run in parallel with the current dashboard to evaluate it and provide feedback before it is moved into full use. The plan is to move to a CERN run regional Nagios on 1st March (i.e. move away from the SAM submission framework – availability will still be SAM based). Also from 1st March ROCs will be deploying their own (latest) Nagios that once validated will (from the end of March) take over from the CERN run regional Nagios. 5) The WLCG Tier-2 availability report for January is now available: https://twiki.cern.ch/twiki/bin/viewfile/LCG/SamMbReports?filename=Tier2_Reliab_201001.pd f Results overall are down on the previous month (likely due to more interventions). The (reliability:availability) figures are: London (96%:95%); NorthGrid (91%:91%); ScotGrid (97%:97%) and SouthGrid (92%: 72%). The SouthGrid figures are due to Oxford cooling problems (late December until mid-January) and RAL-PPD which has had several outages due to building power supply interventions and a reconfiguration of dCache nodes. 6) It was previously reported that benchmarking on SL5 required further guidance. As it turns out the HEPiX guidelines can be followed and will lead to a consistent benchmark value. The issue was that certain libraries were required to run the suite in 32-bit compatibility mode. SI-6 LCG Management Board Report --------------------------------- There was no report. SI-7 Dissemination ------------------- SP advised that the STFC event for First Physics would take place in March in London (to coincide with the CERN event). RJ asked whether there was a Tier-1 inauguration event happening on 30th March, in the middle of IoP? Yes, however feedback on this had been given. JG noted an action on DB to resolve the PPRP representation at the meeting in Glasgow - probably not all PMB members would be required, and some should stay at RHUL as GridPP24 was running simultaneously. ACTION 378.2 DB to resolve the PPRP representation at the meeting in Glasgow - probably not all PMB members would be required, and some should stay at RHUL as GridPP24 was running simultaneously. REVIEW OF ACTIONS ================= 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. JC reported that the consultation happened. There were a few suggestions in the deployment team about how to progress in this area. It needs writing up and an implementation plan. Ongoing. 366.8 AS to confirm that the Tier-1 proposes to use Tape-based storage in the period 2011 - 2015. AS noted this depended on money costs. DB advised this related to long-term plans and power capacity. Physical footprint space? Alternatives? Early action on AS required. AS had sent tech questions round the team and would forward inputs when available. AS noted that alternative further costings were required. AS to progress. Ongoing. 367.2 RM to fill-in the grey boxes on DB's UK NGI diagram of a minimal NGI, as to what NGS would be doing in the areas listed. Ongoing. RM reported that he had met with Andy Richards, but there was more work to be done and nothing definitive had been decided. There were ongoing discussions about EGI effort but no direct answer. DB noted that the PPRP needed to ensure they were not funding beyond the GridPP remit, and that GridPP were not under threat if NGS4 did not get funded. DB advised that this issue was important, and information on a UK NGI and NGS remits, would be needed by Wed 24th February when the final version is submitted. DB would circulate the version of the proposal to SL on Friday, who would have the token until DB returned. Comments to SL next week. RM reported that there wasn't enough information available at present to carry out this action. However for the GridPP4 proposal, RM would speak to Andy Richards and provide a draft to SL by Wednesday. Ongoing. 375.2 DB to co-ordinate post descriptions for the Tier-2 posts, which should be as unique as possible in order to present a strong case. Done, item closed. 375.3 TD to do the data posts. TD would do this by Friday and forward to SL. Done, item closed. 375.4 PMB ALL: those relevant to do their own post descriptors. Done, item closed. 375.5 DB to do the Admin Asst post. SS/TD would do this and forward to SL. Done, item closed. 375.9 RM to provide a skeleton outline plan, including post details, of GridPP/NGS convergence. [Previous action background: SP to work with the working group on the following issues in relation to GridPP/NGS convergence: 1. identify Institutes 2. identify manpower 3. decide who is bidding for what - a draft transition plan would be made available by the end of the year; GridPP4 requirements would also be considered. SP was waiting on the Working Group to reply to her. A meeting had been held before Christmas re a transition plan. SP was awaiting a skeleton outline plan from RM, allocating people to sections. This action to be re-allocated to RM. Done for SP - action closed.] RM reported that this action was on hold, but ongoing, until a clearer picture emerged. RM noted that a draft plan was almost finished. 376.1 SP to feed-in management information to SL whilst DB is away (for the proposal document and in line with RMR information required). Done, item closed. 376.4 All: Risk Register owners to send text comments to SP by the end of the week, including numbers if possible, but 'low', 'medium' or 'high' was also fine, and the table would be checked at the next PMB. Done, item closed. 377.1 Re DB's UK NGI diagram of a minimal NGI, as to what NGS would be doing in the areas listed, RM would speak to Andy Richards and provide a draft to SL by Wednesday for the GridPP4 proposal. ACTIONS AS AT 22.02.10 ====================== 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. JC reported that the consultation happened. There were a few suggestions in the deployment team about how to progress in this area. It needs writing up and an implementation plan. 366.8 AS to confirm that the Tier-1 proposes to use Tape-based storage in the period 2011 - 2015. AS noted this depended on money costs. DB advised this related to long-term plans and power capacity. Physical footprint space? Alternatives? Early action on AS required. AS had sent tech questions round the team and would forward inputs when available. AS noted that alternative further costings were required. AS to progress. 367.2 RM to fill-in the grey boxes on DB's UK NGI diagram of a minimal NGI, as to what NGS would be doing in the areas listed. RM reported that there wasn't enough information available at present to carry out this action, but he had met with Andy Richards. 375.9 RM to provide a skeleton outline plan, including post details, of GridPP/NGS convergence. RM reported that a draft plan would be available soon. 377.1 Re DB's UK NGI diagram of a minimal NGI, as to what NGS would be doing in the areas listed, RM would speak to Andy Richards and provide a draft to SL by Wednesday for the GridPP4 proposal. This did not now appear to be required for the proposal. 378.1 SP to add a GridPP Operations Plan within Project Milestones, noted for September in preparation for the OC meeting. 378.2 DB to resolve the PPRP representation at the meeting in Glasgow - probably not all PMB members would be required, and some should stay at RHUL as GridPP24 was running simultaneously. INACTIVE CATEGORY ================= 359.4 JC to follow up dTeam actions from the DB, as follows: --------------------------- 05.02 dTeam to try and sort out CPU shares and priority resources, at Glasgow first (perhaps by raising the job priority in Panda). --------------------------- JC would check the situation with Graeme Stewart (who was currently on annual leave). JC followed up with Graeme and the other experiments. A test was started but this area has been deemed low priority and further progress is not expected for some time. ATLAS see no issues with contention. LHCb are not intending to pursue anything in this area. A CMS discussion has started but again it does not appear to be urgent. If the experiments are not pushing this internally then there is nothing for the deployment team to follow up! It was noted there was no priority in ATLAS at present, this will be pending for a while. Move to inactive as it is a long-term action. --------------------- The next PMB would take place on Monday 1st March at 12:55 pm, chaired by DB.