GridPP PMB Minutes 377 (15.02.10) ================================= Present: Sarah Pearce_Chair (remote), Tony Doyle, Jeremy Coles, Steve Lloyd, Tony Cass (remote), Glenn Patrick, Roger Jones, Dave Colling, Robin Middleton, Pete Clarke, Neil Geddes (Suzanne Scott, Minutes) Apologies: David Britton, David Kelsey, John Gordon, Andrew Sansum 1. GridPP4 Proposal ==================== SP noted that DB had circulated version 10.0 and SL was awaiting inputs. PC advised that he had put time aside to read through the proposal, what was the plan going forward? SL reported that the plan was to take inputs over the next 2-3 days, he would speak to SP on Wednesday morning. Inputs from everyone should be sent to SL before Wednesday. DB had left a 'to do' list. The new version of the proposal would be circulated on Wednesday. PC advised that in the document, at the end of section 4 re NGI, words like 'contribute' etc should be revisited, as it sounded like we were contributing to keeping things going - NG could reword? SL advised this related to section 8 as well and changes to earlier sections of the document should fall into line with section 8. PC thought it was important in the first few pages not to give incorrect flags. Post descriptions - SL noted that post descriptions were still awaited. TD would do his own role and the data roles. The experiment support posts were already in. It was agreed to use 'RAL' or 'Daresbury' for post locations, rather than 'STFC'. SL would look at the Executive Summary. WP breakdown - SP noted that the latest version was in version 10. The Project Map had boxes with a 'Manager' allocated to each box. GP thought that 'Manager' within the WP boxes was fine, but for tasks it would be preferable to just put someone's name, rather than a title as well. This was agreed. NGS/EGI dependencies - SP advised that the OC had been strong on this. NG/JG should look at this, section 8.2 particularly. SP would circulate high-level milestones. For the experiment milestones, DC noted it was preferable to make them July or September. It was agreed that July was better. Risk Register - SP noted she had received inputs. Were the items in the correct order? There ensued a discussion on each point and amendments were made. SP would change the document accordingly. Everyone should provide further comments to SP on the risk register by Wednesday. SP would circulate the updated version. 2. RAL memory limit ==================== RJ had circulated an email regarding this. There was nothing more to add at present: this could be discussed again next week if necessary, when AS was present. 3. Week's Notes ================ SP noted that there were two deputy Tier-2 posts at both Imperial and Glasgow - she was awaiting figures from DC and TD. TD would provide costs by the end of the week. JeS forms for the Tier-2 hardware funding had to be completed. DC would do this. An award letter was awaited. Re Cloud Computing, TD had an hour-long technical interview, commissioned by JISC, with someone from Southampton. TD had also spoken to Will Venters, who had sent some articles. TD would circulate to the PMB and do a response draft. STANDING ITEMS ============== SI-1 Tier-1 Manager's Report ----------------------------- AS was absent. SI-2 ATLAS weekly review & plans --------------------------------- RJ advised that the main item was the memory issue at RAL. RJ noted that the machine schedule would have an effect on the resource request. SI-3 CMS weekly review & plans ------------------------------- DC reported there had been a problem over the weekend which required investigation, due to skimming jobs being sent which had an effect on RAL. This related to ~21 jobs which had affected the network with downloads. They would need to check with RAL re the head buffers etc, it wasn't good that 21 jobs could cause havoc. All else was quiet at present. SI-4 LHCb weekly review & plans -------------------------------- GP noted low activity with ~300 running jobs. There was the issue of incompatibility between dCache and root in relation to libraries - this had stopped LHCb running at dCache sites, therefore 4 of the Tier-1 sites were down, RAL and CERN were active, picking up the jobs. This week DIRAC was being updated, so things were likely to be quiet. SI-5 Production Manager's Report --------------------------------- JC reported as follows: 1) The GDB last week (http://indico.cern.ch/conferenceDisplay.py?confId=72049) covered: Middleware version control; ARGUS (looking good for further rollout); Middleware; Operational security; EGI without ROSCOE (there is an attempt to cover the posts that would have been funded); Tier-1 service coordination and distributed databases; virtualisation working group update; site management issues; multi-user pilot job update; CREAM status (results from all experiments look promising); OSG update and the EGEE-EGI transition. The Tier-2 summary report is now online: https://www.gridpp.ac.uk/wiki/GDB_10th_February_2010. 2) SuperBvo.org has now been enabled at QMUL and configuration information placed in the approved VOs page. Enablement at the Tier-1 is pending on the addition of a new CE for non-LHC access to SL5 resources. 3) Access to UCL-CENTRAL resources is now available through the UCL-HEP CE and as such the resources now appear under UCL-HEP: http://gstat- prod.cern.ch/gstat/summary/GRID/GRIDPP/. 4) There is still no sign of an APEL database recovery: http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php. This has led to some frustration/questions from the WLCG/EGEE operations community. Has there been any news in this area? 5) EGEE publishes a report on the “quality” of the regional operations work (it looks at things like the number of alarms closed/escalated in a set time frame and to some extent shows up regional preferences for allowing alarms to remain without ticketing if the situation is already known). For January the report indicates that the UKI work is running smoothly: https://edms.cern.ch/file/1020727/1/EGEE-III-SA1-1020727- Regional_and_global_assessment_of_operations-January10.pdf. December was slightly worse due to the vacation period. Going forward we need to rework the rota as James Cullen (the NorthGrid deputy coordinator) will be leaving soon. SI-6 LCG Management Board Report --------------------------------- There was no report available. SI-7 Dissemination Report -------------------------- Nothing to report at present. REVIEW OF ACTIONS ================= 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. JC reported that the consultation happened. There were a few suggestions in the deployment team about how to progress in this area. It needs writing up and an implementation plan. ONGOING. 366.8 AS to confirm that the Tier-1 proposes to use Tape-based storage in the period 2011 - 2015. AS noted this depended on money costs. DB advised this related to long-term plans and power capacity. Physical footprint space? Alternatives? Early action on AS required. AS had sent tech questions round the team and would forward inputs when available. DC noted to the meeting that today was the 16th Nov - only 4 weeks remained until Imperial, by which time we needed to have made extensive progress. To be discussed at the F2F on Friday. AS noted that alternative further costings were required. AS to progress. ONGOING. 367.2 RM to fill-in the grey boxes on DB's UK NGI diagram of a minimal NGI, as to what NGS would be doing in the areas listed. Ongoing. RM reported that he had met with Andy Richards, but there was more work to be done and nothing definitive had been decided. There were ongoing discussions about EGI effort but no direct answer. DB noted that the PPRP needed to ensure they were not funding beyond the GridPP remit, and that GridPP were not under threat if NGS4 did not get funded. DB advised that this issue was important, and information on a UK NGI and NGS remits, would be needed by Wed 24th February when the final version is submitted. DB would circulate the version of the proposal to SL on Friday, who would have the token until DB returned. Comments to SL next week. RM reported that there wasn't enough information available at present to carry out this action. However for the GridPP4 proposal, RM would speak to Andy Richards and provide a draft to SL by Wednesday. ACTION 377.1 Re DB's UK NGI diagram of a minimal NGI, as to what NGS would be doing in the areas listed, RM would speak to Andy Richards and provide a draft to SL by Wednesday for the GridPP4 proposal. 375.1 GP to provide post descriptions for experiment-specific posts in Appendix A. GP would forward this today. Done, item closed. 375.2 DB to co-ordinate post descriptions for the Tier-2 posts, which should be as unique as possible in order to present a strong case. ONGOING. 375.3 TD to do the data posts. TD would do this by Friday and forward to SL. ONGOING. 375.4 PMB ALL: those relevant to do their own post descriptors. ONGOING. 375.5 DB to do the Admin Asst post. SS/TD would do this and forward to SL. 375.9 RM to provide a skeleton outline plan, including post details, of GridPP/NGS convergence. [Previous action background: SP to work with the working group on the following issues in relation to GridPP/NGS convergence: 1. identify Institutes 2. identify manpower 3. decide who is bidding for what - a draft transition plan would be made available by the end of the year; GridPP4 requirements would also be considered. SP was waiting on the Working Group to reply to her. A meeting had been held before Christmas re a transition plan. SP was awaiting a skeleton outline plan from RM, allocating people to sections. This action to be re-allocated to RM. Done for SP - action closed.] RM reported that this action was on hold, but ongoing, until a clearer picture emerged. 376.1 SP to feed-in management information to SL whilst DB is away (for the proposal document and in line with RMR information required). ONGOING. 376.2 SP to check the Risk Register terminology, specifically the difference between 'existing' and 'current', 'inherent' and 'residual' on the form, also the effect of mitigation and how that should be correctly expressed. Done, item closed. 376.3 SP would make the agreed changes to the STFC Risk Register and complete the rest of the table, and bring this back to the next PMB for discussion. Done, item closed. 376.4 All: Risk Register owners to send text comments to SP by the end of the week, including numbers if possible, but 'low', 'medium' or 'high' was also fine, and the table would be checked at the next PMB. ONGOING. ACTIONS AS AT 15.02.10 ====================== 354.2 JC to consult with site admins on a framework policy for releases, with a mechanism for escalation, plus a mechanism for monitoring. JC reported that the consultation happened. There were a few suggestions in the deployment team about how to progress in this area. It needs writing up and an implementation plan. 366.8 AS to confirm that the Tier-1 proposes to use Tape-based storage in the period 2011 - 2015. AS noted this depended on money costs. DB advised this related to long-term plans and power capacity. Physical footprint space? Alternatives? Early action on AS required. AS had sent tech questions round the team and would forward inputs when available. DC noted to the meeting that today was the 16th Nov - only 4 weeks remained until Imperial, by which time we needed to have made extensive progress. To be discussed at the F2F on Friday. AS noted that alternative further costings were required. AS to progress. 367.2 RM to fill-in the grey boxes on DB's UK NGI diagram of a minimal NGI, as to what NGS would be doing in the areas listed. Ongoing. RM reported that he had met with Andy Richards, but there was more work to be done and nothing definitive had been decided. There were ongoing discussions about EGI effort but no direct answer. DB noted that the PPRP needed to ensure they were not funding beyond the GridPP remit, and that GridPP were not under threat if NGS4 did not get funded. DB advised that this issue was important, and information on a UK NGI and NGS remits, would be needed by Wed 24th February when the final version is submitted. DB would circulate the version of the proposal to SL on Friday, who would have the token until DB returned. Comments to SL next week. RM reported that there wasn't enough information available at present to carry out this action. However for the GridPP4 proposal, RM would speak to Andy Richards and provide a draft to SL by Wednesday. 375.2 DB to co-ordinate post descriptions for the Tier-2 posts, which should be as unique as possible in order to present a strong case. 375.3 TD to do the data posts. TD would do this by Friday and forward to SL. 375.4 PMB ALL: those relevant to do their own post descriptors. 375.5 DB to do the Admin Asst post. SS/TD would do this and forward to SL. 375.9 RM to provide a skeleton outline plan, including post details, of GridPP/NGS convergence. [Previous action background: SP to work with the working group on the following issues in relation to GridPP/NGS convergence: 1. identify Institutes 2. identify manpower 3. decide who is bidding for what - a draft transition plan would be made available by the end of the year; GridPP4 requirements would also be considered. SP was waiting on the Working Group to reply to her. A meeting had been held before Christmas re a transition plan. SP was awaiting a skeleton outline plan from RM, allocating people to sections. This action to be re-allocated to RM. Done for SP - action closed.] RM reported that this action was on hold, but ongoing, until a clearer picture emerged. 376.1 SP to feed-in management information to SL whilst DB is away (for the proposal document and in line with RMR information required). 376.4 All: Risk Register owners to send text comments to SP by the end of the week, including numbers if possible, but 'low', 'medium' or 'high' was also fine, and the table would be checked at the next PMB. 377.1 Re DB's UK NGI diagram of a minimal NGI, as to what NGS would be doing in the areas listed, RM would speak to Andy Richards and provide a draft to SL by Wednesday for the GridPP4 proposal. INACTIVE CATEGORY ================= 359.4 JC to follow up dTeam actions from the DB, as follows: --------------------------- 05.02 dTeam to try and sort out CPU shares and priority resources, at Glasgow first (perhaps by raising the job priority in Panda). --------------------------- JC would check the situation with Graeme Stewart (who was currently on annual leave). JC followed up with Graeme and the other experiments. A test was started but this area has been deemed low priority and further progress is not expected for some time. ATLAS see no issues with contention. LHCb are not intending to pursue anything in this area. A CMS discussion has started but again it does not appear to be urgent. If the experiments are not pushing this internally then there is nothing for the deployment team to follow up! It was noted there was no priority in ATLAS at present, this will be pending for a while. Move to inactive as it is a long-term action. --------------------- DB would be absent next week (22.02.10) so JG will chair. The next PMB will take place on 22.02.10 at 12:55 pm. The meeting closed at 2.25 pm.