Deployment Team Completed Actions

From GridPP Wiki
Jump to: navigation, search

This is a Wiki area to track deployment actions

Action ID prefix Status
D = From Deployment team meeting Open = Action has been created
O = From monthly Operations meeting Progress = Action is being worked on
BR = Created by Buck Rogers Closed = Action is complete


Actions from dteam meetings
Action ID Action description Owner Target date Status Date closed Notes



O-151215-02 Provide status of RAL WMSes for JC. Catalin Closed Discussed 15-12-2015
O-151215-03 Clarify process for declaring data loss to ATLAS. Sam Closed Discussed 19/1/2016, Sam to double check this and close if appropriate.
O-151013-01 HTTP TF SAM Test organising for volunteers Glasgow, Bristol, Oxford (also IPv6 only?), Imperial (dual-stack) Closed - will be managed via the new GGUS tickets for HTTP TF Discussed 1-12-2015
D-091013-02 All sites please check their country/ROC designation, http://gstat-prod.cern.ch/gstat/summary/country/; For help, see http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_my_site_information . Also check logical / physical CPU and storage info. ALL sites 2009-10-27 CLOSED Possible meeting at the next sites meeting.
D-100907-03 Make a decision on whether to use WMS monitoring a la http://svr031.gla.scotgrid.ac.uk/rbwmsmon/monitoring.html at RAL and IC. Gareth, Catalin, Daniela, Duncan 2010-09-21 CLOSED (2011-02-8 meeting) 2010-11-02 Not at Imperial - not used enough to justify.
D-101019-01 Review and document experiment procedures for failed disk servers at Tier-2s Sam, Brian, Wahid 2010-10-19 Closed 2011-01-12 Relevant pages exist at SRM_File_Loss and SE_Lost_Disk-Server which are being updated in the latter case.
D-110208-01 Review status of VO share publishing at sites in GStat2. All 2011-02-08 Closed
D-110222-01 Publicise ATLAS sonar test links and presentations Graeme 2011-03-01 Closed 2011-02-22 Email send to dteam list.
D-110222-02 Pass site request to be able to ask LHCb pilots not to pickup new work to DIRAC team Raja 2011-03-01 closed
D-110222-03 Find a NorthGrid site willing to suport CERN@SCHOOL VO Alessandra 2011-03-1 closed Manchester is enabling it
D-110503-01 Confirm if LHCb jobs can be restricted to 24hrs at T2s Raja 2011-05-10 Closed Sites can restrict LHCb jobs to 24hrs - the jobs will not terminate automatically at 24hrs!



D-101019-01 WLCG MB will be reviewing information about Storage and CPU deployment at end of October. All sites should check GSTAT to ensure that numbers are being published correctly T2 coordinators 2010-10-19 Closed 2010-11-16
D-110111-01 Investigate the procedure for adjusting site availabilites/reliabilites for test failures due to monitoring failures Jeremy 2011-01-11 Closed Jan: Feedback indicates that WLCG office needs to be informed of cases to adjust (i.e. we can not modify DB entries ourselves or somehow tag result periods in question). All changes have to be flagged manually!


D-051020-1 Write a Wiki page on how the site local LFC is used in LCG are the requirements at sites. Graeme 28-10-05 Closed 04-11-05 Site Local Catalog Middleware


D-051101-7 Create wiki entry for 10 Easy Questions answers and mail URL to list Fraser 01-11-05 Closed 01-11-05 See GridPP Answers to 10 Easy Network Questions
D-051020-2 Talk to Catalin Condurache who installed LFCs at Tier-1 Graeme 28-10-05 Closed 25-10-05 See LFC YAIM Install and LFC Mysql Remote Host
D-051020-5 Forward documentation summary email to dteam list Stephen 21-10-05 Closed 21-10-05
D-051020-8 Call Cambridge to discuss how they can be more involved in deployment activities Pete 26-10-05 Closed 25-10-05 Camb installed DPM , Yves and Pete to visit on 2.11.05
D-051101-10 Upload contents of EGEE security handbook to GridPP Wiki Alessandra 08-11-05 Closed 04-11-05 cut&pasted&formatted
D-051101-5 Follow up with QMU's storage problems Greig Cowan 08-11-05 Closed 08-11-05 QMUL have now installed DPM with 18TB of disk attached
D-051020-3 Write in wiki about Tier1 experiences with LFC Steve 28-10-05 Closed 11-Nov-05 RAL Tier1 LCG File Catalog
D-051101-3 Follow up on questions about Brunel and submitting SFTs to sites not in the RB used by Polish SFT submission Jeremy 08-11-05 Closed Site requires a new CE. Approach was documented by Henry.
D-051101-9 Contact Ian Neilson for clarification on purpose of GOC DB security contacts - who is expected to be listening to it, and what response/authority capabilities should they have? Jeremy 08-11-05 Closed Ian's response was circulated 25-11-05. Security contacts should have the ability to contain and investigate an incident at a site. The CSIRT list is to keep "related" security people involved.
D-051111-2 Circluate link to talk to PMB. Jeremy 11-11-05 Closed 11-11-05
D-051111-8 For next ROC report, include request for procedure to get VOs to remove data from sites Jeremy 14-11-05 Closed 14-11-05 Request was included
D-051111-9 Follow up with Peter Kunszt regarding FTS talking to different flavours of SRM2. Jens 25-11-05 Closed 25-11-05 See mail sent to tb-support 16/11/05.
O-051115-4 Change At to For in milestone document for VO Box targets Jeremy 15-11-05 Closed    
D-051020-7 Send a list of quarterly report source information for each page of the report to Jeremy. List any problems encountered with each source. Coordinators 26-10-05 Closed 05-01-06 FS Completed 2005-11-04

PG Completed 2005-10-31 JC closed action 2006-01-05

D-051101-2 Decide how to use testbed machines in relation to PPS Jeremy 08-11-05 Closed 05-01-06 The current feeling is that the PPS and testbed are to be kept separate. Several sites are joining the PPS. The testbed machines are arriving mid-late Nov.
D-051101-6 Follow up on networking document Jeremy 08-11-05 Closed 05-01-06 Action dealt with elsewhere.
D-051111-3 Reconfirm expectations for Tier-2 hardware Jeremy 25-11-05 Closed 05-01-06 This is in relation to SC4. The hardware should be sufficient to cope with 1TB continuous transfer - the files may be deleted.
O-051115-2 Clarify "Feedback FTS upgrade issue to CERN team or BD to raise on SC list" Graeme (Formerly: Jeremy (Brian Davies)) 15-11-05 (Mod: 05-12-05) Closed 12-12-05 Configuration error, which was resolved.
SB-051123-2 Enable non-members to send mail to dteam, and preserve CCs Jeremy 25-12-05 Closed 05-01-06 This is not possible with the JISCMAIL service. Do we want to move our lists?
SB-051123-3 Update the user data management web pages Graeme 25-12-05 Closed 05-01-06 First revision is leaner and meaner - we may want to do more.
SB-051123-4 Update the user documentation web pages Stephen 25-12-05 Closed 23-12-05 Completed to first order: http://www.gridpp.ac.uk/deployment/users/ - will need ongoing maintenance
D-051203-2 Contact T1 to confirm their readiness to do T2 transfers in 2nd and 3rd weeks of December. Jeremy 03-12-05 Closed 05-01-06 Confirmed in December
O-051115-3 Investigate methods to remove old transfer files (for SC4) Team 15-11-05 Closed 23-12-05 Graeme's python script can do this.
D-051125-1 Follow up on LeSC/SFT/RB problem with LCG/lcg-rollout Olivier 25-12-05 closed    
D-051203-1 Report to the DTeam list about readiness of sites Coordinators 03-12-05 Closed 2006-01-23  
D-051020-6 Discuss if and how to move the GridPP deployment page content to the wiki area All 30-01-06 Closed 2006-01-23 By email then dedicated meeting (JC).
D-051101-8 Ask sites to complete 10 Easy Networking Questions on wiki - timescale 3 weeks, then escalate to T2b Coordinators 08-11-05 Closed 2006-01-23 FS Completed 01-11-05, AF completed 09-11-05
O-051115-5 Ensure sites are warning site network contacts about data transfer test schedules Coordinators 15-11-05 Closed 2006-01-23 FS: Done.
D-051125-3 Nominate site in each T2 to have LFC running by end Dec Coordinators 25-12-05 Closed 2006-01-23 PG Bham, Oxf and Cam done by end of Dec.
D-060105-2 Move the DTeam mailing list to CERN or Glasgow (preserve CCs, allow non-members to send) Jeremy 25-01-06 Closed 2006-01-23  
D-051111-4 Audit sites to find out what sysadmins currently do for security monitoring/updates. Revisit at future meeting. Coordinators 25-11-05 Closed 2006-01-23 AF sent an email, received 2 answers waiting for the others. FS sent email 2005-11-14. PG sent email 2006-01-05
D-051111-6 Ensure security incident prevention is topic of future meeting. Jeremy 25-11-05 Closed This links in with Linda Cornwalls work. Review in January. Scheduled for 10th March meeting
D-051115-6 Follow up on ATLAS numbers Alessandra 31-01-06 Closed   See O-051115-6
D-051203-4 Define plan for completing weekly CIC reports - T2Cs to do it, or site admins to do it? Jeremy 03-12-05 Closed   The site managers do it. The T2Cs check on the results! Working as of 23rd January.
D-051221-3 Nominate next 2 sites per Tier2 for throughput testing T2Cs 13-01-06 Closed   PG Bham,Oxf. FS: Only Durham left.
D-051221-4 Follow up with Matt about getting security contact info out of GOCDB Jeremy 13-01-06 Closed   It is on his to-do list already. Scheduled for week starting 30-01-06. Ian Neilson has been provided an interim script.
D-051221-5 CIC Site Reports: Raise editing timeframe issues with ... ? Jeremy 20-01-06 Closed   This was rasied at weekly ops meeting. Sites now have from Friday to Monday to update reports.


JC-060103-1 Complete Q4 2005 reports Coordinators 09-01-06 Closed 09-01-06  
D-060105-3 Check responsibilities for Condor and SGE support in APEL Jeremy 25-01-06 Closed 25-01-06 Dave Kant has written the RPMs. Condor will be tested by Santanu. SGE requires reworking by David McBride to fit the standard LCG approach.
D-051101-11 Talk to NeSC people about possible training opportunities for GridPP people. Jeremy 04-04-06 Closed http://www.nesc.ac.uk/training/events/index.html & http://www.egee.nesc.ac.uk/schedreg/index.html reveal current courses. We can register people on any course if spaces available or request new courses.
D-051111-1 Review training courses provided by NESC (see links: http://www.nesc.ac.uk/training/events/index.html and http://www.egee.nesc.ac.uk/schedreg/index.html). Report at next meeting - courses of interest (to yourself and Tier-2 in general) and others that should be organised. All 04-04-06 Closed FS completed 2005-11-14
D-051111-7 Follow up on new server purchase and ensure Birmingham added to federated Ganglia area asap. Alessandra 04-04-06 Closed 11-11-05 Contacted vendor and got a reply. 15-11-05 Machines arrived at the department. Will follow up on Birmingham in ganglia at a later stage when machines get installed.
D-051203-3 Circulate to DTeam the reference to the 90 day logging requirement Alessandra 04-04-06 Closed    
D-060105-1 Collect answers about security update procedure (D-051111-4) and add them to the wiki Olivier 04-04-06 Closed    
D-060105-4 Follow up with IN2P3 to ensure that the SFT history has sufficient information for the quarterly reports Jeremy 04-04-06 Closed   Followed up at ROC manager's meeting 16-01-06. There is information and the requirement for a query with time has been put forward.
D-060105-5 Create a web page with information about GridPP-approved VOs (or links to info on the CIC portal) Jeremy 04-04-06 Closed   Looks like it will be a GridPP page as the ROC manager's did not agree a procedure.
D-060113-1 Check up on whether ATLAS and other expts will really need LFC boxes at T2’s given that ATLAS are moving VO functionality back to CERN. Graeme 04-04-06 Closed    
D-060113-2 Follow up with Andrew about T1 representation at experiments meetings. Jeremy 04-04-06 Closed 25-01-06 Andrew is happy for his members to be information links for T2s. Not all the meetings they are attend are relevant but where they are information can be passed both ways. Go via Jeremy or Steve T.
D-060119-1 Forward link to ROC mgr when presentations uploaded Jeremy 04-04-06 Closed    
D-060310-1 Recommend to each site that they send someone to the SC4 Tier-2 workshop. T2 coords 04-04-06 Closed   FS Completed 2006-03-13.
D-060310-4 Study TPM material Alessandra/Pete 04-04-06 Closed    
D-051101-1 Contact LCG/gLite working group in relation to tools. Alessandra 03-02-06 Closed 30-05-06 First phone meeting on the 10-11-05 + email exchange
O-051115-1 Forward sizing formula to TB-SUPPORT Jeremy 05-02-06 Closed 30-05-06 Discussion has now moved to the storage group and dteam lists. Ratios of kSI2K:TB ranging from 2:1 (ATLAS) to 4:1 (LHCb) have been circulated and are to be updated.
O-051115-6 Follow up on ATLAS TDR numbers - number of jobs, how long jobs run for etc. Alessandra 15-11-05 Closed 30-05-06 See Roger Jones' (ATLAS) talk at CHEP06 via these links: http://agenda.cern.ch/fullAgenda.php?ida=a056461 and http://indico.cern.ch/conferenceTimeTable.py?confId=048
O-051115-8 Find out whether experiments are happy with service from sites Jeremy 18-04-06 Closed 30-05-06; Raised at PMB for UB input - none received yet. Update 15-01-06: Mentioned at GridPP15. Update 30-05-06: At the EGEE ops meeting on Monday the VOs expressed satisfaction with the service currently being provided.
O-051115-9 Follow up on VO published information (publication of VOs as active, VO server information, VOMS endpoints...) Jeremy 05-02-06 Closed   Raised on ROC managers list and directly with Rolf Rumler. CIC will publish more but we should still follow up. Update 05-01-06: Will be discussed by PMB/UB in January. Update 30-05-06: Matter now taken up by Grid Operations at CERN.
D-051221-1 Distill advice on tuning dCache for optimal performance Greig 01-05-06 Closed 30-05-06;  Optimising_dCache_Performance This will be an ongoing task as we discover more about tuning dCache. See also FTS_vs_srmcp.


D-051221-2 Distill advice on tuning DPM for optimal performance Graeme 01-05-06 Closed 01-04-06 See Performance and Tuning#DPM
D-060119-2 Document in Wiki how to remove an RLS or LFC entry for files gone AWOL Graeme 10-05-06 Closed 02-05-06 See File Catalog Maintenance
D-060310-2 Dteam should review the current agenda for the Tier-2 workshop/training sessions and comment to the list. All 12-04-06 Closed 30-05-06; Comments received from some members but not all. Still the agenda is now taking shapre.;
D-060328-1 Ensure fair share policies correctly implemented at sites Tier-2 Coordinators 21-04-06 Closed 30-05-06; Action superseded by request for sites to upload policies to wiki;
O-060412-2 Develop fuller proposal for SE security tests Jens - storage group 17-05-06 Closed 02-05-06 Security Service Challenges
O-060425-1 Add Pete, David and Barry to the pre-release/testing mail list Jeremy 26-04-06 Closed 30-05-06; 30-05-06;
O-060502-2 Check that Dave Colling is on egee-uki-testing list. Jeremy 09-05-06 Closed 30-05-06; He is on the list;
O-060502-3 GridPP dteam should have a deployment plan to deal with gLite 3.0 release. Jeremy 16-05-06 Closed 30-05-06; There will be a limited deployment during the week commencing 29th June. Other sites to upgrade by the end of June (mid-July at the latest if they attended the Tier-2 workshop!);
O-060502-4 Create GridPP dteam Google calendar. Greig/Alessandra 09-05-06 Closed 10-05-06  
D-051020-4 Update the GridPP security challenges pages Alessandra 28-02-06 Closed 30-05-06 Put a link to a security service challenge wiki page, start editing also the wiki page.


O-060502-5 Transfer test assessment Greig, Graeme, Jeremy 09-05-06 Closed 12-05-06 See Service_Challenge_Transfer_Test_Summary.
D-060523-2 Clarify T2 gLite upgrade timescale with Markus Jeremy 30-05-06 Closed 30-05-06; The official request is 1st June for LCG sites. This is not possible and the UKI plan will use a 4 week window from the end of May.
O-060412-3 Confirm workshop dates of 19th & 20th June Jeremy 13-04-06 Closed 13-04-06; The dates are correct. Maite is working on the agenda now.
O-060425-2 Fraser to update his mail and resend to the list. All to then comment in assigned order Fraser then TPMers 29-04-06 Closed 30-05-06; presentation made to ROC managers in mid-May. GGUS will respond;
O-060502-7 Report on gLite 3.0 RC2 deployment issues Yves 02-05-06 Closed 02-05-06;  


D-051101-4 To review SB's list of common failure modes, investigate scriptability and circulate results to list Alessandra 31-01-06 Closed 11-07-06 NAGIOS is seen as the way forward here.
D-051111-5 Starting with Alessandra review document in Wiki, edit areas of concern (highlight issues with an asterix), then pass the "editing token to the next peron in the team to ensure everyone contributes. All 01-05-06 Closed 11-07-06 Review of this document is now out of date. It was felt this action had outlived its usefulness (if indeed it ever had any...)


O-051115-7 Follow up on CMS plans for file server requirements in light of computing model Team/Olivier 01-07-06 Closed 11-07-06 CMS's plans will become clearer at CSA06 .


D-051125-2 Get dCache gridftp xfer performance out of dCache and publish via RGMA Steve T (was Jens and Graeme) 15-02-06 Closed 11-07-06 See here for globus format publisher. See DCache and GridView for progress or rather lack of it. Will probably hand it off to the storage group. Update 30-05-06. Action passed to Storage Group.


D-060113-3 Each Dteam member should have a look at their area of the Web support pages and update if necessary. Dteam 01-07-06 Closed 11-07-06 Fraser moved the "static" web-pages to the wiki. Jeremy to review allocations (new action created). Neason is working on automatic reminders to content owners which superceeds this action.
D-060113-4 Provide outline of Site 'Care and Maintenance' Doc/page. Stephen 01-07-06 Closed 11-07-06 Presentation at GridPP 16. Stephen is working on final document.


D-060310-3 Follow up with Andrew McNab RE: logging tool. Alessandra 20-06-06 Closed 11-06-07 No effort in this area now.


O-060412-4 Follow up on multiple tickets being assigned for same problem (ref. Durham) Jeremy/Philippa 10-05-06 Closed 11-07-06 30-05-06: issue has been mentioned again in weekly ops report. It was thought this happend due to a ticket closure notice not going from UKI to GGUS. Philippa needs to confirm this explanation. Issue not seen again.


O-060502-1 Create a central location to record fair share policies. T2 coordinators 09-05-06 Closed 01-06-06; Olivier has created the pages: http://www.gridpp.ac.uk/wiki/Current_VO_Fairshares_at_T2/T1;


O-060502-6 Speak to Andrew about the acceptance testing of CASTOR Jens 09-05-06 Closed 09-07-06 'Tis done. Main issue is availability atm.


D-060523-1 Check CASTOR sticky bit support Jens 30-05-06 Closed 09-07-06 No sticky bit.


D-060523-3 Check published values of GlueCEPolicyMaxCPUTime Coordinators 30-05-06 Closed   LT2 sites checked. This is really only required for SGE and Condor sites;


D-060530-2 Circulate list of Tier-2 open UKI session proposed discussion topics Jeremy 10-06-06 Closed 11-07-06 Meeting at CERN has happened


D-060530-3 Steup wiki page for fair share allocations to be recorded by sites Olivier 07-06-06 Closed 01-06-06 http://www.gridpp.ac.uk/wiki/Current_VO_Fairshares_at_T2/T1


D-060609-2 create a wiki page listing required and provided nagios sensors. Fraser 16-06-06 Closed 11-07-06 Rationalising NAGIOS actions.


D-060609-3 Send email to the list detailing the required nagios sensors. Alessandra/Steve 16-06-06 Closed 11-07-06 Rationalising NAGIOS actions.


O-060613-1 Raise purchasing at T2 Board. Alessandra Next Mtg. Closed 11-07-06 Was raised.


O-060613-6 Report to UK/I about MonAMI deployment on Glasgow cluster Graeme 31-08-06 Closed 11-07-06 No deployment was made on the current cluster. Revisit after new cluster is installed.



O-060613-7 Care and Maintenance document should include advice on backup. Stephen B. 31-07-06 Closed 11-07-06 Has been noted (this wasn't a real action)


O-060613-9 Follow up on logging problems discovered with PBS at RHUL during the UKI Security Service Challenge Jeremy/Alessandra 2006-07-31 Closed 11-07-06 Alessandra raised the issue at the OSCT meeting. Pal Andersen has an action on him to follow up the tickets. Tickets have been raised.


D-060630-1 Follow up lancaster dcache WAN tests Storage group TBD Closed 11-07-06 Working dCache pool at Manchester accessible from Lancaster. Security concerns remain.
D-060630-5 Start populating wiki with nagios scripts All Target date Closed 11-07-06 Rationalisation of NAGIOS actions.


D-060630-6 Look if it is possible to use lemon sensors underneath nagios TBD Target date Closed 11-07-06 No effort identified to carry this through. Lemon has been linked from NAGIOS wiki page in the hope that some kindly fairy will do this for us....


D-060630-9 Convert the pictorial diagram into wiki page Fraser Target date Closed 11-07-06 Deployment Team Page Ownership


D-060630-11 Check what sheffield is doing the number of supported VOs has gone down again Alessandra 07/07/06 Closed 11-07-06 Incatious use of yaim tool not advised ;-)
O-060613-3 Feed back to LHCb that error logging infrastructure is useful, but actual errors are not always that useful Jeremy 2006-07-31 Closed 2006-08-01 This was raised at the SC meeting and LHCb recognised the limitations, but they lacked effort to address it right now. Other experiments were informed of the potential usefulness of such logging.
D-060630-8 Give suggestions for security service challenges All Unkown Closed 2006-08-01 Suggestions were made.
D-060711-1 Email to TB-SUPPORT appealing for NAGIOS plugins to be made available Fraser 14-07-06 Closed 2006-08-01 Greig found a DESY dCache plugin (see Nagios Plugins)
D-060711-2 Put Stephen's list of common site problems into the wiki to identify necessary sensors Olivier 14-07-06 Closed 14-07-06 Added the information in the Nagios_Plugins


O-060412-1 Rework SC milestones Jeremy, Graeme & Data Management Post Holder 11-08-06 Closed 22-08-06; Update 30-05-06: Approach was reviewed after HEPSYSMAN but milestones to be agreed. No progress will be made this month after which Graeme takes over a new role.
D-060630-2 Check if there is an SFTs history Jeremy Target date Closed 22-08-06; There is an SFT history we can use. ;
D-060630-7 Raise at the ops meeting the problem of consistency between SFT and CIC portal report Jeremy Target date Closed 22-08-06; Raised and it was to be checked;
D-060801-2 Upload 1/4 reports for ScotGrid and NorthGrid Alessandra and Graeme 2006-08-03 Closed 10-08-06
D-060808-5 Tell the plan with the transfer tests at the next UKI-Meeting (16/08) Jeremy 08/08/06 Closed 16-08-06
D-060530-5 Check on status of minos VO in the UK - do we remove from VOMS? Jeremy 07-06-06 Closed 22-08-06 Did not remove them from VOMS, because US VO is not VOMS enabled.
O-060613-2 Improve contact with ATLAS software managers Jeremy 2006-07-31 Closed 2006-08-22 22-08-06 There are a series of discussions taking place in GridPP and EGEE operations. Check with Alessandra about the operations meeting VO software discussion. Subsequent discussion at ops meeting - Harry Renshall to gt VOs to provide information on what they need.
O-060613-8 Clarify how admins can invoke an OPS-VO SFT run on their own site Jeremy 2006-07-31 Closed 2006-08-22 Will be done through SFT admin tool.


D-060630-3 Ask CIC portal people if we can have access to the reports database for analysis Jeremy 15-09-06 Closed 2006-08-22 CIC portal people have done this.
D-060630-10 Go through Fraser page (diagram) and check the under each person name there are the correct pages All 2006-08-22 Closed    
D-060808-2 Apel accounting stopped publishing at Sheffield Alessandra 21/08/06 Closed 2006-08-22 No progress - need to now raise a ticket against the site.
D-060808-3 Apel accounting stopped publishing at QMUL Olivier 17/08/06 Closed 2006-08-22 Transient problem.


D-060808-4 Apel accounting stopped publishing at Birmingham/RALPP Pete 21/08/06 Closed 22/08/06 All southgrid sites publishing OK


D-060808-6 Concerning the transfer tests, come up with a bandwidth milestone per site which scales with the number of CPU Jamie/Graeme 22/08/06 Closed 22/08/06 Email sent to gridpp-sc mailing list (containing a formula) for discussion. Now waiting on figures from exp. to plug into hat formula


D-060815-2 Define SC milestones for October, to be reviewed by dteam Jamie 22/08/06 Closed 22/08/06 New milestones added for simultaneous read/write tests and wording altered to make targets more flexible
D-060815-3 Send list of filenames of failed Castor transfers to RAL, and RAL to debug Jamie/Jens 22/08/06 Closed 22/08/06 Filenames were sent. Castor had many issues when file copies failed. Should be OK now. Possible repeat of copies if time permits
D-060523-4 Follow up with sites consistently marking SFT probs non-relevant Jeremy 19-08-06 Closed 20-09-06 Raised at UKI meeting and some sites contacted directly. 22-08-06 Some sites contacted. CIC portal now reports which sites have updated in table. ;
D-060830-1 Have meeting with T2 reference site personnel to discuss transfer test plans Jeremy, Jamie, Greig 2006-09-06 Closed September
D-060830-3 Contact sites that have yet to hand over details of personnel that will conduct transfer tests Jeremy, Jamie 2006-09-06 Closed September
D-060822-1 Draw up revised CASTOR-T2 testing schedule Jamie 2006-08-29 Closed 10-09-06
D-060822-2 Circulate CMS testing schedule Matt and Jens 2006-08-29 Closed 2006-08-23 [1]
D-060822-3 Contact Simon Metson (CMS) to ensure testing schedules are coordinated. Matt and Jens 2006-08-29 Closed 2006-08-23 Done - Jamie and Simon are coordinating their tests.
D-060808-1 Find out how the SFT and GSTAT collect the software version information. Graeme 2006-09-19 Closed 2006-09-19 GS - SFTs execute /opt/lcg/bin/lcg-version. gStat takes the highest "version" of GlueHostApplicationSoftwareRunTimeEnvironment published in the BDII.


SB-051123-1 Update the security policy page Stephen/Alessandra 05-09-06 Closed   30-05-06: Stephen to confirm the updates. Link found to non-existant pages - needs investigated.


O-060425-3 Contact Maria-Dimou to resolve David's VOMRS status problem Jeremy 19-08-06 Closed   Update 30-05-06: request for more information has been sent. Issue not yet resolved. 22-8-06 Checking on progress;
D-060609-1 Report to list on CMS monitoring of sites. Dave 16-06-06 Closed  


D-060630-4 Verify who can do the analysis Jeremy 2006-09-19 Closed   22-08-06 Who has the time or interest!?; 22-09-06: No progress. Now also needs to be linked with GridView and RTM monitoring? Olivier is working on this.


D-060630-13 Raise phenogrid problems with the lcg-utils at the TCG (Alessandra) Olivier/Data Management Person 31-09-06 Closed   Progress. A reported to TCG. GS reports that J-P B said this would be worked on in August. Need to verify fix as it comes though.
D-060808-8 Register UK VOs with CIC Portal Alessandra 2006-09-08 Closed


D-060815-5 Send to Olivier info about killing jobs that exceed memory limit Steve 22/08/06 Closed    


D-060830-2 Review minutes of todays meeting regarding security and discuss at next meeting All 2006-09-12 Closed
D-061003-1 Start a file transfer blog Jamie 2006-10-10 Closed http://filetrasfertests.blogspot.com/
D-061003-5 Investigate "self-service" method in VOMSRS for getting re-registered Alessandra, Graeme 2006-10-10 Closed It works. See Graeme's dteam message of 17 October 2006 17:43:21 BST. No wiki article though.
D-061003-6 Check open actions (!) All 2006-10-17 Closed Jens, Graeme, Greig and Jamie reviewed as of 2006-10-10.


D-061010-2 Discussion of issues involved in automating bandwidth transfer tests - see minutes for background. All 2006-11-03 Closed Conclusion was it is too much effort to maintain. Concentrate on using VO and FTS monitoring and our manual transfer tests when required.
D-061010-3 Send top ROC issues list to TB-SUPPORT for comments. Phillipa 2006-10-17 Closed


D-061010-5 TPM team 7 shift clashes with GridPP 16. Phillipa 2006-10-17 Closed
D-061024-1 Contact GridPP networking people to ask about support. Jeremy 2006-11-07 Closed JC contacted Robin Tasker et al but for MAN type problems route is via local site networking people upwards.
D-061024-2 Show dteam how fairshares have been implemented in London. Olivier 2006-11-07 Closed OvdA provided details at the F2F dteam meeting
D-061103-4 Greig 2006-11-28 Closed dpm-drain works in v1.5.10 but with some bugs that will be fixed in the next version
D-060530-1 Review deployment page allocations Jeremy 19-08-06 Closed 07-12-19 Update 22-09-06: Still to be followed up.
O-060613-5 T2 Technical boards to discuss feasibility of cross-supporting sites to meet MoU. Coordinators 2006-09-19 Closed 07-12-19 In LT2 no access at Brunel except for Olivier. Most other sites happy with sudo access. Need to document problems in NorthGrid (generally no joy). See minutes of last technical board meeting for details of implementation. SouthGrid all sys admins agree, but we have to check with RAL security before going forward. Cambridge and Birmingham in progress. Completed at ScotGrid, see description in minutes of [2]
D-060815-1 Invent formula for site's expected transfer rate (see also D-060808-6) dteam 1/12/06 Closed 07-12-19 Jamie created a formula, we are now waiting for the experiments input.
D-061003-2 Clarify roles and mapping to roles in dteam VO Jeremy, Olivier, Yves 2006-10-18 Closed 07-12-19
D-061010-4 Resurrect dTeam issues list to enable tracking of new issues which arise. Jeremy 2006-10-17 Closed 07-12-19 The issue log exists Deployment Issues
D-061010-7 Document proceedure for registering a new certificate in VOMS using the old one. Wiki + circulate to UKHEPGRID list. Alessandra 2006-10-17 Done See D-061003-5


D-061017-1 Followup UCL Transfer tests. Olivier 2006-10-26 Closed 07-12-19
D-061024-4 Report current levels of Tier-2 disk usage to the list. Greig 2006-10-31 Closed 07-12-19 'Tis done for London (Olivier's document)
D-061031-1 Plan remaining T2 file transfer tests. Graeme 2006-11-07 Done Timetable Not many tests happened. Action now to be raised on User:Andrew elwell
D-061031-2 Send LHC experiment contacts for LT2 to dteam list. Olivier 2006-11-07 Completed
D-061031-3 Send Phillipa the ticket number about not publishing SAM/SFTs for UKI-SCOTGRID-GLASGOW Graeme 2006-11-07 Completed GGUS Ticket
D-061103-2 Write a sensor to get numbers from GStat and publish it into RGMA. Dave Kant 2006-11-28 Done 2006-12-05  
D-061103-5 Put graphs on storage accounting page. Dave Kant/Greig 2006-11-28 closed 07-12-19
D-061103-9 Post fairshares talk from GDB to list. Jeremy 2006-11-14 closed 07-12-19
D-061103-17 Discuss the collection of contact details of the individual sites. T2Cs 2006-11-28 closed 07-12-19 ScotGrid filling in security contacts - done. SouthGrid PDG has phone numbers of Mobiles of other sysadmins.
D-061114-2 Check the status of sites in their Tier-2 with respect to the Torque vulnerability/patch T2Cs 2006-11-17 Progress   ScotGrid ok. SouthGrid ok.
D-061212-3 Give the original url of the document copied at http://www.gridpp.ac.uk/wiki/Incident_Response_Handbook. The url should be put in the wiki. Alessandra 2006-12-20 closed 06-12-19  
D-070220-1 Pete to update existing site-info.def entries removing old ldap references where appropriate Pete 2007-03-07 Done 2007-02-20 Entries updated as using Oxford and RALPPD info.


O-060613-4 Raise at Tier 2 Board whether Tier2s can treat MoU commitments as aggregated over all sites in the T2 Jeremy Next Meeting (of T2B) Closed 2006-02-16 JC sent information in June. 22-09-06: This needs to be raised again.


D-061103-14 Ask the expt reps to come along to the dteam meeting when data challenges are ongoing. Jeremy 2006-11-14 Closed
D-061103-20 Take issues from dteam f2f to the ops meeting. Jeremy 2006-11-28 Closed
D-061114-3 Check the situation in LHCb regarding legal responsibilities around use of generic pilot jobs as seen within the experiment Raja (JC) 2006-11-21 Closed 07-12-19 Email circulated "some time ago"
D-061121-1 Forward links to OSCT and vuln. proc. again Alessandra 2006-11-28 Done 2006-11-21 [3]
D-061212-1 Remind the sites at the UKI Monthly meeting that they have to notify a downtime via the broadcast tool. They have to select ROC and VO users. Jeremey 2006-12-13 Closed    
D-070206-3 Write a GridPP abstract for EGEE UF and circulate it. Deadline 2007/02/14 Jeremy 2007-02-13 Closed    
D-060815-6 Upload Nagios scripts Derek 30/11/06 Closed 2006-02-28 Reassigned from Steve to Derek in mtg 20061205.

Nagios scripts now available in T1 CVS and linked to from wiki

D-061103-11 Calculate the current CPU to disk ratios so that Jeremy can take to the UB. Greig 2007-01-20 Closed Need to speak to the T2Cs or the quarterly reports to get the CPU numbers. Greig wrote a script to extract information from BDII. We need more disk!
D-070206-1 Greate Gridpp operations blog Alessandra 2007-02-13 Done    
D-070227-1 Open a ticket against Liverpool for Steve Lloyds jobs failure Alessandra 2007-03-07 Done   No ticket for now until I hear progress on the switches.
D-070227-4 Post a summary on daemons running on the WN to TB-SUPPORT and ask sites opinion Alessandra 2007-03-07 Done   Summary sent to TBS
D-070313-1 Send link to EGEE outside EU travel Graeme 2007-03-16 Closed   [4]
D-060530-4 Update VO information page with new VO information Alessandra (for UK VOMS enabled VOs) & All (rest) 07-01-31 Done When a VO fills in their ID card on the CIC portal replace their entry with a link. Hard to reach certain VO mgrs. I think that maintaining uptodate YAIM snapshots is more efficient.
D-061003-3 Raise bug against VOMS for storing CA issuer name Jens 2006-10-10 Done Savannah bug #20789. VOMS modified but mod v. not deployed
D-061003-4 Contact Iain Neilson for workaround proceedure in VOMS when CA issuer name changes Alessandra 2006-10-10 Done Procedure is now in gridpp and goc wiki
D-061103-13 Birmingham should speak to their local networking people in order to report back what he learnt from MAN. Need to ask them not to

rate cap.

Pete 2006-11-28 Done MID MAN are not rate capping


D-061107-1 JC concerned abot eg Liverpool running 150 jobs but have 614 free cpu's, need to compare no of jobs waiting, running with bdii info. Alessandra 2006-11-28 Done The problem is bad communication between the rack switches and the University one. It will hopefully be solve in April 2007.
D-061114-1 install a CVS repository on the GridPP site and to mail GridPP/UKI sites seeking information about local repositories (how many use them). Alessandra 2006-12-08 Done   Repository has been installed at the end of January and can be used.
D-061103-3 Go through the same accounting analysis that OvdA has done for London. T2Cs 2007-01-31 Closed. 2007-05-15 Extended 07-12-19. Closed 2007-05-15: No longer needed.


D-061103-6 Set milestones within storage group to enable VO specific pools. Get dpm sites to deploy the new plugin. Greig/Jens 2006-11-28 Closed 2007-05-15 Deployment of plugin is complete. It will move into prodcution gLite. Milestones for VO specific pools may no longer be required once we have SRM2.2 spaces.
D-061103-7 Write up something about about quality of storage to give to UB. Jens 2006-11-28 Closed 2007-05-15 This is presumably about custodial/output/replica. No further work needed now.
D-070109-1 Check how VO support units are setup in GGUS Philippa/Jeremy 2007-01-31 Closed 2007-05-15 Original meaning unclear but GGUS tickets can forward to experiment support lists. Philippa sent additional infromation to list 2007-05-15.
D-070206-2 Send a list of sites that have made the change to Frederic. Might be overridden by script sent by Greig. T2C 2007-02-13 Closed 2007-05-15 ScotGrid: Glasgow and Durham done (Ed are dCache). SouthGrid complete (5.3.07).
D-070213-1 Report which UK T2s have Atlas transfer problems Frederic 2007-02-20 Closed 2007-05-15 Either done or no longer needed (or both).
D-061130-1 check site purchase plans till July 2007. (I've just entered this action originally talked about in Nov 06....PDG) T2Cs 2007-03-07 Closed 2007-05-15 Now part of T2 reviews. Closed.
D-070410-2 Add dates for experiment software weeks to the dteam calendar Greig 2007-04-17 Closed 2007-05-15 Still waiting for CMS dates.
OV-070424-2 Put the text in the quaterly report Olivier 2007-04-30 Closed   Not clear what this is, but is probably done?
D-070213-2 Add Total and CamOnt details to wiki Yves (and Pete) 2007-02-20 Closed   Camont and totalep are in the wiki (PG)
D-061031-4 T2s to establish experiment contacts with the LHC VOs. T2Cs 2006-11-28 Closed Done for LT2 (see D-061031-2). ScotGrid now done (ATLAS, LHCb). Progress in SouthGrid for Atlas, CMS, and Alice. NorthGrid has contacts with Atlas, Lhcb,Babar and Dzero.
D-061103-1 Understand networking topology at each site. Particularly which other departments are using the same equipment that you are using. T2Cs 2006-11-28 Closed   ScotGrid diagrams now pu to date. Southgrid diagrams need updating. Northgrid current as far as HEP concerned. London done.


D-061103-10 Summarise points regarding site installation and configuration to the list Alessandra 2006-11-14 Closed This was not site installation and configuration it was configuration of VOs and subgroups and the fact that yaim is not modular enough to allow a simple reconfiguration. (More info reqd?)
D-061103-12 Go to the UB and take the storage figures in order to find out if OPN is required. Jeremy 2006-11-28 Closed Particularly between sites that are known to have large amounts of storage. Next UB in March - joint UB/DTEAM. JC: No sites are going to need a dedicated connection at the moment.
D-061103-16 Check that Universities have network security teams. T2Cs 2006-11-28 Closed ScotGrid filling in security contacts - done. SouthGrid added a private page on Southgrid web site detailing contacts. Olivier says done in GOCDB for LT2.


D-061103-18 Understand how best to use each of the experiment dashboards/tools. Put on a future agenda.  ?? 2006-11-28 Closed There are now links to the dashboards in the wiki
D-061205-1 Check whether megatable entries correspond to resources actually available in UK Jeremey 2006-12-19 Closed   Note the change in MB reporting available resources as opposed to allocated resources.
D-061212-2 Assess the storage failure cases from the user point of view and see what need to be improved in the middleware. Storage Group 2006-02-01 Closed.   Not always storage problem - access may depend on other services. E.g. infosys and availability of closeSE(s). Now a Storage Group issue.
D-070220-2  : Tier-2 coordinators to review VOs supported in their Tier-2 and update the wiki with VO information not currently included (creating an “Other” category where the VO is Tier-2 specific and not GridPP approved) Tier-2 coordinators 2007-03-07 Closed   Alessandra (8/3/07): updated dzero and gridpp and removed ldap instances from few VOs like hone and biomed. PG edited the wiki twice in March (including camont details). GS nothing to add. ovda added ltwo


JC-070424-1 Send which sam test will be added as critical Jeremy 2007-04-30 Closed    


D-070501-1 Find out about tape congestion issues at RAL over weekend Derek 2007-05-7 Closed   Problem was with ADS being overloaded, losing files. Derek has identified files for LHCb, need feedback from Tim Folkes for Atlas files.
D-060801-1 Revisit security audit proceedures and incident response Jeremy and Alessandra 2006-08-26 Closed 2007-08-17 Propose that sites audit each other? Inc.resp. mirrors EGEE OSCG. Closed - new action opened for Mingchao
D-060922-1 Start a services page Jeremy 2006-10-03 Closed 2007-08-17 VOMS, helpdesk, RB, file catalog, MyProxy, RGMA registry, CA, APEL acct. http://www.gridpp.ac.uk/wiki/Grid_services - created early 2007. Not much take-up.
D-06-11-21-2 Forward info about NeSC-hosted UKI web site Jeremy 2006-11-28 Closed 2008-08-17 http://www.eu-egee.org.uk/home.html
JC-070228-1 Complete the experiment contacts at sites table http://www.gridpp.ac.uk/wiki/Site_contacts T2Cs 2007-03-16 Closed 17-08-2007 Northgrid has no local contacts. Southgrid already completed this. ScotGrid done. LT2 Done
D-070626-1 Review the services page and incorporate missing services Jeremy 2007-07-31 Closed 2007-08-15; Services covered but not much support for the page!


D-070731-1 Check possible signals that can be passed to job-wrapper for PMS Alessandra 2007-08-07 Closed  
D-070731-2 Check with Yves the reasons why his ngs.ac.uk installation tests okay but there are problems for Oxford and Liverpool Jeremy 2007-08-07 Closed 17-08-2007; Mail exchange initiated between Matt and Yves. Progress made - to be reviewed at future meeting.


D-070814-2 Follow up on Northgrid and London reports Jeremy 2007-08-14 Closed  


D-070814-3 Check the routing for the support form on the GridPP site. Jeremy 2007-08-21 Open 2007-08-17; Ticket goes into the request queue as "less urgent".


D-061010-6 Document storage class implementation policy at RAL on wiki. Jens 2006-10-24 CLosed See RAL Tier1 CASTOR SRM#Endpoint and SAPaths for a rudimentary overview of LCG svc classes available. Greig will check whether info is sufficient.
D-061024-3S Document optimal combinations of storage hardware and software/kernel/OSes Greig 2006-11-07 close Biggie. Work ongoing in storage group.


D-070227-3 Make a list of SE occupation problems on the wiki Greig 2007-03-07 Close   This is probably about full SEs. discussed at gridpp and is documented on the gridpp wiki. Over to storage group.


D-070508-1 Check that T1-T2 channels support OPS VO Andrew 2007-05-15 Closed   Complete, Andrew will check with Matt Hodges shortly.
D-070710-1 Send config_BDII commands to TB-SUPPORT Alessandra 2007-07-17 Closed http://northgrid-tech.blogspot.com/2007/08/updating-glue-schema_10.html
D-070807-4 Ensure T2Cs at least have required access to Footprints and that online instructions for use are up-to-date Philippa 2007-08-21 Closed  
D-070814-1 Develop policy for removing a storage element Greig+Jeremy+All 2007-09-28 Closed   JJ the policy has been around for some time
D-070814-4 Suggest discussion topics for PMB-DTEAM F2F All 2007-08-21 Closed  
D-060630-12 Check never ending jobs for the metrics of failed hours Dave Colling 2006-08-29 CLosed   If a job has no life after 48 hours it gets cutt off, if come sback to life it can be restored.


D-061017-2 Check that the new version of the CIC report contains less entries. Olivier 2006-10-26 Closed SAM tests failing due to external problems should be filtered out but are not yet. 17/08/07 should this go to someone else now? This problem affects all sites equally, and there is not much we can do about it right now.


D-070227-2 Look at the 2 days BDII plot of gstat for all the UK sites to identify possible failure patterns. Olivier 2007-03-07 Closed   Result was plots show single plugin failure as expected. Leave open because people still interested in the plots. See plot at http://londongrid.blogspot.com/2007/06/bdii-counts.html


D-070807-2 write up publishing queue prioritisation use case for Glueschema 1.2 David 2007-08-14 Closed  


D-070410-1 Send URL to agendas for CMS software weeks to the list Dave C 2007-04-17 Closed Email sent


D-061103-8 Follow up with Andrew and the T1 to discuss transfer testing to international sites. Jeremy 2007-11-28 Closed 2007-10-23 Do we need to actively deal with T2 international transfers? May deal with via experiment transfers. 17/08/07 - not seen as a priority. Held up while CASTOR was unstable. Revisit in September. revisited in October - some testing done via experiment challenges. AS considering other posssibilities.


D-061010-1 Review notes on site status in the wiki to ensure they reflect accurately each site's status. Alessandra 2006-10-17 Closed 2007-10-23 Graeme did this for ScotGrid 2006-10-31. PDG has emailed the sysadmins in SouthGrid and asked for updates. 2006-11-14. Alessandra Can't remember what this was about.


Action ID prefix Status
D = From Deployment team meeting Open = Action has been created
O = From monthly Operations meeting Progress = Action is being worked on
BR = Created by Buck Rogers Closed = Action is complete


Actions from dteam meetings
Action ID Action description Owner Target date Status Date closed Notes


D-060822-4 Document ticket follow-up/closure procedures Jeremy 2007-01-31 Open 22-09-06: Procedures touched upon at last UKI meeting but nothing documented. Needs to be on UKI ROC web-site too. Reassigned 07-12-19. 15-05-07 workflow for UKIROC helpdesk put in this week, published when ready. The current process is for T2Cs to follow up with reminders sent from Footprints each week - do we need more? 30-10-07: Still need a reference web-page/documentation.



D-061103-15 Create a checklist of 5 security items for the T2Cs to check at each site. Dave Kelsey/ Mingchao 2006-11-28 Open


D-061103-19 Setting of alarms in the GridLoad system. Dave Colling 2006-11-28 Ongoing Gidon is rewriting the code and will add alarms. 23-10-07: Not top priority but in progress.


D-061219-1 Follow up with OSCT re handbook Mingchao 2007-11-29 Open   Transferred to Mingchao (from AF/JC). This relates to whether updates to the OSCT incident handling handbook need to be reflected in the GridPP wiki etnry. How will we take incident response forward?


D-070220-3 Collate plans from CMS and what is know regarding the ATLAS transfers to derive a more detailed schedule for joint and dteam testing – for site bandwidth and site SE-WAN capabilities Andrew Elwell 2007-05-31 Progress   Should mainitain a wiki page with collated plans. 23-10-07: Still in progress.



D-070501-2 Submit a test ticket to GGUS to see if it gets to LHCB production list Jeremy 2007-05-7 Closed 2007-10-23 See also D-070109-1. Ticket submitted. Ticket was submitted in June. The TPM did not know how to deal with the request and asked "the user" me what to do with it but I never received the question so the ticket was closed - they assumed it was no longer relevant - my response 23/10 reopened ticket 21389. I've now submitted a new ticket: 28248 (on 23rd October). 30-10-07: Second ticket resolved within 12hrs.


PMB-070815-1 Raise the issue of PPS feedback information relating to upgrades issues with the relevant individual(s) on the PPS, and ask if there was anything else that could be done. Jeremy 2007-09-14 Closed 2007-10-23; A blog will be created by Marian and will welcome input from Yves and Barry.


D-070807-1 Resend the storage accounting use case to Alessandra and to the mail list (previously sent to Lawrence). Greig 2007-08-14 Closed  
D-071015-2 Remove Glasgow's PPS queue from footprint. Graeme/Jeremy 2007-10-31 Closed Ticket raised: https://gus.fzk.de/ws/ticket_info.php?ticket=28259 and done.
D-071106-2 Setup vo.scotgrid.ac.uk Graeme 2007-11-30 Closed


JC-060801-1 Revisit security audit proceedures and incident response Mingchao 2007-08-17 Closed 2007-10-17 Propose that sites audit each other? Inc.resp. mirrors EGEE OSCG. Was action D-060801-1
D-070807-21 Circulate administration tool suggestions to the DTEAM list. All 2007-08-21 Closed 2008-01-22
D-071009-1 Follow up the set up of net mon boxes at each site. At RAL the box is not on the same network as the SE which makes it useless. Andrew 2007-10-31 Closed 2008-01-22 23-10-07: Andrew will follow up one last time!


D-071106-1 See why LHCb software not being installed on Oxford SL4 cluster. Raja 2007-11-13 Closed PG and RN will talk
D-071015-1 Set up Glasgow to take part in the adhoc ATLAS DDM functionality tests. Graeme 2007-11-31 Closed Glasgow (and other T2s) took part in successful DDM FT07 test in January.
D-071120-1 Ticket gStat about filtering 4444/6666 values with CE plugins fail (otherwise RRD plots are useless) Graeme 2007-11-27 Closed https://gus.fzk.de/pages/ticket_details.php?ticket=29296
D-070710-2 Create wiki page containing information about SRM2.2 for sites Greig 2007-07-24 Closed 2008-03-25 Wiki page started http://www.gridpp.ac.uk/wiki/SRM
D-080115-1 Test VO configurator tool and feedback the result Alessandra 2008-03-01 Closed 2008-03-25 AF emailed the dteam list, "not too impressed"
D-080318-1 Ask Stephen McCallister to close all footprints tickets where GGUS ticket is solved or verified. Jeremy Closed 2008-03-25


D-071120-2 Send ATLAS MC monitoring links to dTeam list. Graeme 2007-11-27 Closed 2008-04-08 ATLAS Monitoring For Sites


D-080226-1 Ask how to specify downtimes with different vos and queue lengths Jeremy Closed It is not currently possible. The requirement has been passed on to the GOCDB team.


D-080506-1 Follow up with supernemo if they need castor space Jens or Derek 2008-05-13 Closed
D-080506-2 Check with Matt who's the list of NGS software is for. Jeremy 2008-05-13 Closed The document is not really aimed at GridPP. The PMB recently reviewed GridPP-NGS overlaps and while we should continue to support the interoperation, adjusting the software list is not a top priority.


D-080506-3 Ask sites comments on the networking results circulated by Andrew T2C 2008-05-13 Closed
D-080506-4 Follow up with sites on the local mon box throughput T2C 2008-05-13 Closed


D-080603-1 Setup a Wiki for T2Cs to post a report of the highlights of the GDB meetings Jeremy Coles 2008-06-14 Closed 2008-06-10 Setup by someone and edited by me!
D-080610-2 Sites to ensure one CE is marked in GOC DB as an APEL node. T2C 2008-06-17 Closed ScotGrid done.
D-080610-1 Any UK sites interested in trialling a CREAM CE? (e.g., IC) JC 2008-06-17 Closed Barry is doing this at IC


D-071009-1 Get public access to the network tests results at RAL. Derek 2007-10-31 Ongoing Looks unlikely to happen after speaking to Martin. Andrew to follow up with why this is necessary.
D-080520-1 T2s to discuss deployment of federated nagios monitoring of SAM tests T2C 2008-05-27 closed This is now becoming urgent. Sites should have Nagios installed by the end of September. AE explained his findings at the dteam meeting on 9.9.08 .See ST tutorial at EGEE conference.
D-090113-01 Confirm how to apply for regional status in the GOC. Jeremy 2009-01-20 closed 2009-02-10 Regional status is applied for in the GOC DB under 'request new role'


D-080325-1 Follow up with Guenter re UK TPM team update (Duncan replacing Olivier) Jeremy 2008-03-25 Closed Followed up and Pete also mailed Guenter. What about training? TPM update done but what about the training? Duncan now T7. This must be completed now...
D-090113-02 T2Cs to apply for regional status in the GOC. T2Cs 2009-01-20 Closed (by DTR) Done: PG,KM,DM,JCullen,AF,DB,DTR,GS
D-090505-01 Put Kashif in contact with appropriate people for WLCG Nagios Jeremy 2009-05-12 Completed
D-080226-2 Follow up on downtime broadcasts scope and the current status Jeremy Closed 2008-06-10 Broadcast scope was changed and spam notices reduced by use of more targeted selections.
D-080826-1 Follow up with Steve Lloyd re maintenance of SL's tests Jeremy 2008-08-26 Closed QMUL have now installed a new machine for the tests. JC to follow up again with SL on a backup to cover periods when he is away. JC followed up and QMUL now have backup sysadmin locally. No new problems reported.
D-080930-2 to follow up re updating data in gocdb Jeremy 2008-09-30 Closed JC could not remember which data so would review the minutes again. It was contact information and this was reviewed.


D-081021-1 to check where email addresses for GGUS ticket allocation is being taken from Jeremy 2008-10-21 Closed A wiki page was setup by Alessandra and this is now in use by the UKI ticket assignment team.


D-081125-01 to check what's the plan for the availability accounting and how it is done now to exclude periods that are not site fault Jeremy 2008-12-02 Closed Various discussions indicated that the only flexible system is for sites to raise events on a case by case basis. Sites should do this either via their T2 coordinator or directly to Jeremy.
D-090120-02 Feedback information about rpm versioning for updates and patches to GDB. Pete and everyone 2009-01-27 Progress Feedback collated from tb-support emails. Marcus was sent UK input.
D-090317-01 Look for failures during CERN network downtime on 19th, report to Jeremy everyone 2009-03-19 CLOSED, as the 19th is long passed. Nothing was reported.
D-070807-3 Look at whether it is feasible for information about an interrupted job to flow into the L&B system David 2007-08-14 Progress   Had no response from Italy, will ask again about this. 23-10-07: Still in progress.


JC-070822-1 Develop criteria and process for removing site performance data where the site is taking risks for GridPP Jeremy/DTEAM 2007-10-12 Progress 2008-06-10 The dates for the next round will be discussed at the DB - therefore this is becoming a priority. June08- Next round will run from September.


JC-080708-1 Implement site performance wiki page that reads current graphs Jeremy 2008-07-30 Progress 2009-07-30 Item was on hold pending questions about site report integration. Template page now created and this will go live by the end of July.


D-080930-1 Follow up with NGS, one from NGS should join dteam mailing list or dteam meeting when request Jens 2008-09-30 CLOSED John Quley (sp?) and Jens Jensen are both know to be available - Jens is an optimal choice, since he attends anyway.


D-090120-01 Sergey to create a post-mortem about VOMS incident. Sergey 2009-01-27 Progress the post-mortem has been created. Wiki entry still outstanding.
D-090317-02 Feedback on gridview dashboard to Jeremy everyone 2009-03-24 Some items in mins of meeting 090317 (CLOSED)
D-090623-01 Formally announce closure of IC-LeSC site to ROC (ROC = JC in this case) LT2 (Daniela/Duncan/Dave) 2009-06-30 Closed 2009-07-07 Site marked as closed in early July. Warnings were sent to users in advance. No feedback from GridPP management or users.
D-090623-04 Publicise Glasgow STEP post mortem on TB-SUPPORT Graeme/Sam 2009-06-30 Closed 2009-06-23 Message sent to TB-SUPPORT
D-090623-13 Review actions older than this meeting to update status and close if possible Jeremy 2009-07-07 Closed Actions reviewed and closed where possible/appropriate.
D-090505-02 Find information about TeraGrid tag publishing Jens 2009-05-12 Closed 2009-07-07 Publishing applications on NGS and TG


D-090623-02 Investigate load/performance issues on Glasgow WMSs Mike 2009-07-07 Closed 2009-07-31 Steve Lloyd test failures coincident with backup of WMS database, which was locking the DB (for ~45mins) and preventing job-handling. Glasgow will implement binary logging on the DB to greatly improve backup performance.


D-090623-06 Cross check between LRMS records and APEL at Glasgow Mike 2009-07-07 Done Largely ok, but open issue with some VOs. GGUS #49246. Script for torque logs was written + circulated.
D-090623-07 Cross check between LRMS records and APEL at Manchseter James 2009-07-07 Done 2009-08-18 Manchester had problems with accounting records around Christmas 2008. In the months since then the APEL and pbs records match well.
D-090623-08 Cross check between LRMS records and APEL at Oxford Pete 2009-07-07 Done 2009-7-14 The script has been run and a spreadsheet comparing batch results with APEL produced.


D-090623-10 Sumarise Glasgow progress on resiliency Dug 2009-07-07 CLOSED Page created, Glasgow's info added, space for other sites to add info.


D-090707-01 Examine and potentially revise method of site reports (with reference to comments on reliability graphs, etc) Jeremy 2009-07-14 Closed 2009-10-21 Feedback sent to EGEE ops & SA1. New GridPP page in prototype.
D-090714-01 Promote the 'Glasgow' method to discover site scaling limitations for ATLAS analysis jobs. Schedule HC panda test. Sam 2009-07-21 Closed 2009-07-21


D-090714-02 Understand why WMSs refuse some Manchester VOMS server issued credentials and raise GGUS ticket. Jens 2009-07-21 Closed 2009-07-21 See Jon Churchill's mail to Rollout or NGS-Ops


D-090714-03 Investigate ATLAS feedback for running on SL5. Graeme 2009-07-28 Closed RALPP to migrate; also T1 (check FZK for progress).
D-090714-05 Summarise SL5 discussion for PMB Jeremy 2009-07-21 Closed Summary with JG. Done September.
D-090623-11 LHCb to compare APEL records with internal accounting Raja 2009-07-21 Closed Raja and Graeme agreed on what to do - and did it.


D-091013-01 Setup a wiki page to track site SL5 migiration's status Jeremy Closed Set up as http://www.gridpp.ac.uk/wiki/Site_status_and_plans
D-090623-03 Investigate load/performance issues on IC WMS Daniela -> Barry 2009-07-07 Closed 2009-07-07 Hardware is v old (from Barry)
D-090623-09 Cross check between LRMS records and APEL at IC-HEP Duncan 2009-07-07 Closed Waiting on scripts (should have been distributed by now)
D-091027-01 Distribute details on tweaking NFS threads for software areas. Sam 2009-11 Close
D-091027-02 Check on issues with multiple LHCb VOMS servers and post instructions if necessary Raja 2009-11 Closed 2010-26
D-091027-03 Find out details of Lancaster SCAS trials. Jeremy 2009-11 Closed Information circulated early November.
D-091027-04 Post some notes on subclusters/CEs to help with logical/physical CPU publishing Derek 2009-11 Closed Report sent to DTeam list
D-090623-12 Prioritisation schemes used for T3 access in other countries? JC to ask GDB contacts Jeremy 2009-07-14 Closed 2010-04-27 20-07-09 Asked John Gordon on how to best progress this in GDB context and also for his initial input. DB request (and PMB action) for test with Panda (with Graeme). Experiments have no interest in this at the moment - feedback given to PMB. Need to mail DB.
D-091103-01 Tier 2 coordinators to review security document Tier 2 Coordinators Closed 2010-04-27
D-100511-02 Circulate details on BNL VOMS certificate configuration Graeme Closed 2010-05-17 Message posted to TB-SUPPORT
D-100620-01 Verify that Santanu has actually fixed Cambridge for ATLAS. Pete Closed 22-07-2010 Santanu has double checked the problem is fixed, and Graeme reopened the site
D-100620-02 Pete to ask the PMB for clarity on using the 2009 or 2010 MOUs for the Tier-2 Q2 reports. Pete Closed 20-07-2010 Steve Lloyd confirmed we should start using the 2010MoU's in our Q3 report, although the WLCG will be looking for these levels from June 2010.
D-091103-01 Tier 2 coordinators to review security document Tier 2 Coordinators Closed 2010-09-21
D-100511-01 Ensure BNL VOMS certificate is accepted for ATLAS users T2Cs Closed 2010-09-21


D-090707-03 With reference to D-090623-10, T2s and sites to contribute to resiliency page everyone? http://www.gridpp.ac.uk/wiki/Resiliency_and_Disaster_Planning 2009-07-14 Closed 2010-11-02 20-07-09 JC asked sites to add to the page. Review over next month. 02-11-10 Noted no further additions, considered a product of its time. Perhaps return to it at HepSysMan.


D-100616-01 Post links from HEPSYSMAN talk on "the wiki" Mingchao Closed 2010-11-02
D-100907-01 Ask CMS and LHCb for inputs on building an "upcoming events" page for sites (ATLAS sample: http://atlas-speakers-committee.web.cern.ch/atlas-speakers-committee/ConfTalks2010.html). Jeremy 2010-09-21 Closed 2010-11-02 No such page avialable. iCal page at http://svr001.gla.scotgrid.ac.uk/cgi-bin/ukidowntime.py list Atlas major dates and other UKI sites downtimes.
D-100907-02 Continue to investigate transfer problems from RAL-NDGF (GGUS 61306, 61835). Graeme, Gareth 2010-09-14 Closed Took 29 days to identify a faulty line card in CERN OPN router. Atlas have requested a post-mortem as it took far too long
D-101019-02 GSTAT now contains lists of what each Federation has pledged, each Tier 2 coordinator should check to see that their Tier 2 pledge is correct T2 coordinators 2010-10-19 Closed 2010-11-02
D-090721-01 Produce DN list for dteam for AuthZ Jeremy 2009-07-28 Closed 2010-11-02 List available but not online. Still progressing.
D-091013-02 All sites please check their country/ROC designation, http://gstat-prod.cern.ch/gstat/summary/country/; For help, see http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_my_site_information . Also check logical / physical CPU and storage info. ALL sites 2009-10-27 CLOSED Possible meeting at the next sites meeting.
D-100907-03 Make a decision on whether to use WMS monitoring a la http://svr031.gla.scotgrid.ac.uk/rbwmsmon/monitoring.html at RAL and IC. Gareth, Catalin, Daniela, Duncan 2010-09-21 CLOSED (2011-02-8 meeting) 2010-11-02 Not at Imperial - not used enough to justify.
D-101019-01 Review and document experiment procedures for failed disk servers at Tier-2s Sam, Brian, Wahid 2010-10-19 Closed 2011-01-12 Relevant pages exist at SRM_File_Loss and SE_Lost_Disk-Server which are being updated in the latter case.
D-110222-01 Publicise ATLAS sonar test links and presentations Graeme 2011-03-01 Closed 2011-02-22 Email send to dteam list.
D-110222-02 Pass site request to be able to ask LHCb pilots not to pickup new work to DIRAC team Raja 2011-03-01 close
D-110222-03 Find a NorthGrid site willing to suport CERN@SCHOOL VO Alessandra 2011-03-1 closed Manchester is enabling it
O-130205-01 Check that the documentation on how to close a ce for downtime is complete. Steve Jones/CJW & JC 05-02-2013 closed CJW has agreed to review this document when created. Up for discussion in March meeting. Some discussion. SJ documented the options at https://www.gridpp.ac.uk/wiki/Scheduled_Downtimes Can't reach full consensus. Close.
O-160524-03 Create new ROD rota Jeremy 2016-05-31 Closed Rota extended w/c 1st August. Need to reconfirm Tier-1 effort.


O-160116-01 Jeremy to follow up about making glue alarm not critical Jeremy 2016-08-23 Closed 2016-08-23 Pass on feedback to ask for glue alarm to be made non-critical [referencing https://ggus.eu/?mode=ticket_info&ticket_id=118930]


O-160712-00 Assign ticket to Argo devs regarding failures at Glasgow Kashif 27-06-2016 Closed As discussed in today's meeting, argo needs to add some tweaks to the setting that Kashif had applied to the old gridppnagios services. 23-08-2016

See also: Deployment Team Action items