Deployment Team Completed Actions
From GridPP Wiki
Revision as of 12:40, 16 February 2016 by Peter Gronbech 55c96900ce (Talk | contribs)
This is a Wiki area to track deployment actions
Action ID prefix | Status |
---|---|
D = From Deployment team meeting | Open = Action has been created |
O = From monthly Operations meeting | Progress = Action is being worked on |
BR = Created by Buck Rogers | Closed = Action is complete |
Action ID | Action description | Owner | Target date | Status | Date closed | Notes
| |
---|---|---|---|---|---|---|---|
O-151215-02 | Provide status of RAL WMSes for JC. | Catalin | In progress | Discussed 15-12-2015 | |||
O-151215-03 | Clarify process for declaring data loss to ATLAS. | Sam | In progress | Discussed 19/1/2016, Sam to double check this and close if appropriate. | |||
O-151013-01 | HTTP TF SAM Test organising for volunteers | Glasgow, Bristol, Oxford (also IPv6 only?), Imperial (dual-stack) | In progress - will be managed via the new GGUS tickets for HTTP TF | Discussed 1-12-2015 | |||
D-091013-02 | All sites please check their country/ROC designation, http://gstat-prod.cern.ch/gstat/summary/country/; For help, see http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_my_site_information . Also check logical / physical CPU and storage info. | ALL sites | 2009-10-27 | CLOSED | Possible meeting at the next sites meeting. | ||
D-100907-03 | Make a decision on whether to use WMS monitoring a la http://svr031.gla.scotgrid.ac.uk/rbwmsmon/monitoring.html at RAL and IC. | Gareth, Catalin, Daniela, Duncan | 2010-09-21 | CLOSED (2011-02-8 meeting) | 2010-11-02 Not at Imperial - not used enough to justify. | ||
D-101019-01 | Review and document experiment procedures for failed disk servers at Tier-2s | Sam, Brian, Wahid | 2010-10-19 | Closed | 2011-01-12 Relevant pages exist at SRM_File_Loss and SE_Lost_Disk-Server which are being updated in the latter case. | ||
D-110208-01 | Review status of VO share publishing at sites in GStat2. | All | 2011-02-08 | Closed | |||
D-110222-01 | Publicise ATLAS sonar test links and presentations | Graeme | 2011-03-01 | Closed | 2011-02-22 | Email send to dteam list. | |
D-110222-02 | Pass site request to be able to ask LHCb pilots not to pickup new work to DIRAC team | Raja | 2011-03-01 | closed | |||
D-110222-03 | Find a NorthGrid site willing to suport CERN@SCHOOL VO | Alessandra | 2011-03-1 | closed | Manchester is enabling it | ||
D-110503-01 | Confirm if LHCb jobs can be restricted to 24hrs at T2s | Raja | 2011-05-10 | Closed | Sites can restrict LHCb jobs to 24hrs - the jobs will not terminate automatically at 24hrs!
| ||
D-101019-01 | WLCG MB will be reviewing information about Storage and CPU deployment at end of October. All sites should check GSTAT to ensure that numbers are being published correctly | T2 coordinators | 2010-10-19 | Closed | 2010-11-16 | ||
D-110111-01 | Investigate the procedure for adjusting site availabilites/reliabilites for test failures due to monitoring failures | Jeremy | 2011-01-11 | Closed | Jan: Feedback indicates that WLCG office needs to be informed of cases to adjust (i.e. we can not modify DB entries ourselves or somehow tag result periods in question). All changes have to be flagged manually!
| ||
D-051020-1 | Write a Wiki page on how the site local LFC is used in LCG are the requirements at sites. | Graeme | 28-10-05 | Closed | 04-11-05 | Site Local Catalog Middleware
| |
D-051101-7 | Create wiki entry for 10 Easy Questions answers and mail URL to list | Fraser | 01-11-05 | Closed | 01-11-05 | See GridPP Answers to 10 Easy Network Questions | |
D-051020-2 | Talk to Catalin Condurache who installed LFCs at Tier-1 | Graeme | 28-10-05 | Closed | 25-10-05 | See LFC YAIM Install and LFC Mysql Remote Host | |
D-051020-5 | Forward documentation summary email to dteam list | Stephen | 21-10-05 | Closed | 21-10-05 | ||
D-051020-8 | Call Cambridge to discuss how they can be more involved in deployment activities | Pete | 26-10-05 | Closed | 25-10-05 | Camb installed DPM , Yves and Pete to visit on 2.11.05 | |
D-051101-10 | Upload contents of EGEE security handbook to GridPP Wiki | Alessandra | 08-11-05 | Closed | 04-11-05 | cut&pasted&formatted | |
D-051101-5 | Follow up with QMU's storage problems | Greig Cowan | 08-11-05 | Closed | 08-11-05 | QMUL have now installed DPM with 18TB of disk attached | |
D-051020-3 | Write in wiki about Tier1 experiences with LFC | Steve | 28-10-05 | Closed | 11-Nov-05 | RAL Tier1 LCG File Catalog | |
D-051101-3 | Follow up on questions about Brunel and submitting SFTs to sites not in the RB used by Polish SFT submission | Jeremy | 08-11-05 | Closed | Site requires a new CE. Approach was documented by Henry. | ||
D-051101-9 | Contact Ian Neilson for clarification on purpose of GOC DB security contacts - who is expected to be listening to it, and what response/authority capabilities should they have? | Jeremy | 08-11-05 | Closed | Ian's response was circulated 25-11-05. Security contacts should have the ability to contain and investigate an incident at a site. The CSIRT list is to keep "related" security people involved. | ||
D-051111-2 | Circluate link to talk to PMB. | Jeremy | 11-11-05 | Closed | 11-11-05 | ||
D-051111-8 | For next ROC report, include request for procedure to get VOs to remove data from sites | Jeremy | 14-11-05 | Closed | 14-11-05 | Request was included | |
D-051111-9 | Follow up with Peter Kunszt regarding FTS talking to different flavours of SRM2. | Jens | 25-11-05 | Closed | 25-11-05 | See mail sent to tb-support 16/11/05. | |
O-051115-4 | Change At to For in milestone document for VO Box targets | Jeremy | 15-11-05 | Closed | |||
D-051020-7 | Send a list of quarterly report source information for each page of the report to Jeremy. List any problems encountered with each source. | Coordinators | 26-10-05 | Closed | 05-01-06 | FS Completed 2005-11-04
PG Completed 2005-10-31 JC closed action 2006-01-05 | |
D-051101-2 | Decide how to use testbed machines in relation to PPS | Jeremy | 08-11-05 | Closed | 05-01-06 | The current feeling is that the PPS and testbed are to be kept separate. Several sites are joining the PPS. The testbed machines are arriving mid-late Nov. | |
D-051101-6 | Follow up on networking document | Jeremy | 08-11-05 | Closed | 05-01-06 | Action dealt with elsewhere. | |
D-051111-3 | Reconfirm expectations for Tier-2 hardware | Jeremy | 25-11-05 | Closed | 05-01-06 | This is in relation to SC4. The hardware should be sufficient to cope with 1TB continuous transfer - the files may be deleted. | |
O-051115-2 | Clarify "Feedback FTS upgrade issue to CERN team or BD to raise on SC list" | Graeme (Formerly: Jeremy (Brian Davies)) | 15-11-05 (Mod: 05-12-05) | Closed | 12-12-05 | Configuration error, which was resolved. | |
SB-051123-2 | Enable non-members to send mail to dteam, and preserve CCs | Jeremy | 25-12-05 | Closed | 05-01-06 | This is not possible with the JISCMAIL service. Do we want to move our lists? | |
SB-051123-3 | Update the user data management web pages | Graeme | 25-12-05 | Closed | 05-01-06 | First revision is leaner and meaner - we may want to do more. | |
SB-051123-4 | Update the user documentation web pages | Stephen | 25-12-05 | Closed | 23-12-05 | Completed to first order: http://www.gridpp.ac.uk/deployment/users/ - will need ongoing maintenance | |
D-051203-2 | Contact T1 to confirm their readiness to do T2 transfers in 2nd and 3rd weeks of December. | Jeremy | 03-12-05 | Closed | 05-01-06 | Confirmed in December | |
O-051115-3 | Investigate methods to remove old transfer files (for SC4) | Team | 15-11-05 | Closed | 23-12-05 | Graeme's python script can do this. | |
D-051125-1 | Follow up on LeSC/SFT/RB problem with LCG/lcg-rollout | Olivier | 25-12-05 | closed | |||
D-051203-1 | Report to the DTeam list about readiness of sites | Coordinators | 03-12-05 | Closed | 2006-01-23 | ||
D-051020-6 | Discuss if and how to move the GridPP deployment page content to the wiki area | All | 30-01-06 | Closed | 2006-01-23 | By email then dedicated meeting (JC). | |
D-051101-8 | Ask sites to complete 10 Easy Networking Questions on wiki - timescale 3 weeks, then escalate to T2b | Coordinators | 08-11-05 | Closed | 2006-01-23 | FS Completed 01-11-05, AF completed 09-11-05 | |
O-051115-5 | Ensure sites are warning site network contacts about data transfer test schedules | Coordinators | 15-11-05 | Closed | 2006-01-23 | FS: Done. | |
D-051125-3 | Nominate site in each T2 to have LFC running by end Dec | Coordinators | 25-12-05 | Closed | 2006-01-23 | PG Bham, Oxf and Cam done by end of Dec. | |
D-060105-2 | Move the DTeam mailing list to CERN or Glasgow (preserve CCs, allow non-members to send) | Jeremy | 25-01-06 | Closed | 2006-01-23 | ||
D-051111-4 | Audit sites to find out what sysadmins currently do for security monitoring/updates. Revisit at future meeting. | Coordinators | 25-11-05 | Closed | 2006-01-23 | AF sent an email, received 2 answers waiting for the others. FS sent email 2005-11-14. PG sent email 2006-01-05 | |
D-051111-6 | Ensure security incident prevention is topic of future meeting. | Jeremy | 25-11-05 | Closed | This links in with Linda Cornwalls work. Review in January. Scheduled for 10th March meeting | ||
D-051115-6 | Follow up on ATLAS numbers | Alessandra | 31-01-06 | Closed | See O-051115-6 | ||
D-051203-4 | Define plan for completing weekly CIC reports - T2Cs to do it, or site admins to do it? | Jeremy | 03-12-05 | Closed | The site managers do it. The T2Cs check on the results! Working as of 23rd January. | ||
D-051221-3 | Nominate next 2 sites per Tier2 for throughput testing | T2Cs | 13-01-06 | Closed | PG Bham,Oxf. FS: Only Durham left. | ||
D-051221-4 | Follow up with Matt about getting security contact info out of GOCDB | Jeremy | 13-01-06 | Closed | It is on his to-do list already. Scheduled for week starting 30-01-06. Ian Neilson has been provided an interim script. | ||
D-051221-5 | CIC Site Reports: Raise editing timeframe issues with ... ? | Jeremy | 20-01-06 | Closed | This was rasied at weekly ops meeting. Sites now have from Friday to Monday to update reports.
| ||
JC-060103-1 | Complete Q4 2005 reports | Coordinators | 09-01-06 | Closed | 09-01-06 | ||
D-060105-3 | Check responsibilities for Condor and SGE support in APEL | Jeremy | 25-01-06 | Closed | 25-01-06 | Dave Kant has written the RPMs. Condor will be tested by Santanu. SGE requires reworking by David McBride to fit the standard LCG approach. | |
D-051101-11 | Talk to NeSC people about possible training opportunities for GridPP people. | Jeremy | 04-04-06 | Closed | http://www.nesc.ac.uk/training/events/index.html & http://www.egee.nesc.ac.uk/schedreg/index.html reveal current courses. We can register people on any course if spaces available or request new courses. | ||
D-051111-1 | Review training courses provided by NESC (see links: http://www.nesc.ac.uk/training/events/index.html and http://www.egee.nesc.ac.uk/schedreg/index.html). Report at next meeting - courses of interest (to yourself and Tier-2 in general) and others that should be organised. | All | 04-04-06 | Closed | FS completed 2005-11-14 | ||
D-051111-7 | Follow up on new server purchase and ensure Birmingham added to federated Ganglia area asap. | Alessandra | 04-04-06 | Closed | 11-11-05 Contacted vendor and got a reply. 15-11-05 Machines arrived at the department. Will follow up on Birmingham in ganglia at a later stage when machines get installed. | ||
D-051203-3 | Circulate to DTeam the reference to the 90 day logging requirement | Alessandra | 04-04-06 | Closed | |||
D-060105-1 | Collect answers about security update procedure (D-051111-4) and add them to the wiki | Olivier | 04-04-06 | Closed | |||
D-060105-4 | Follow up with IN2P3 to ensure that the SFT history has sufficient information for the quarterly reports | Jeremy | 04-04-06 | Closed | Followed up at ROC manager's meeting 16-01-06. There is information and the requirement for a query with time has been put forward. | ||
D-060105-5 | Create a web page with information about GridPP-approved VOs (or links to info on the CIC portal) | Jeremy | 04-04-06 | Closed | Looks like it will be a GridPP page as the ROC manager's did not agree a procedure. | ||
D-060113-1 | Check up on whether ATLAS and other expts will really need LFC boxes at T2’s given that ATLAS are moving VO functionality back to CERN. | Graeme | 04-04-06 | Closed | |||
D-060113-2 | Follow up with Andrew about T1 representation at experiments meetings. | Jeremy | 04-04-06 | Closed | 25-01-06 | Andrew is happy for his members to be information links for T2s. Not all the meetings they are attend are relevant but where they are information can be passed both ways. Go via Jeremy or Steve T. | |
D-060119-1 | Forward link to ROC mgr when presentations uploaded | Jeremy | 04-04-06 | Closed | |||
D-060310-1 | Recommend to each site that they send someone to the SC4 Tier-2 workshop. | T2 coords | 04-04-06 | Closed | FS Completed 2006-03-13. | ||
D-060310-4 | Study TPM material | Alessandra/Pete | 04-04-06 | Closed | |||
D-051101-1 | Contact LCG/gLite working group in relation to tools. | Alessandra | 03-02-06 | Closed | 30-05-06 | First phone meeting on the 10-11-05 + email exchange | |
O-051115-1 | Forward sizing formula to TB-SUPPORT | Jeremy | 05-02-06 | Closed | 30-05-06 | Discussion has now moved to the storage group and dteam lists. Ratios of kSI2K:TB ranging from 2:1 (ATLAS) to 4:1 (LHCb) have been circulated and are to be updated. | |
O-051115-6 | Follow up on ATLAS TDR numbers - number of jobs, how long jobs run for etc. | Alessandra | 15-11-05 | Closed | 30-05-06 | See Roger Jones' (ATLAS) talk at CHEP06 via these links: http://agenda.cern.ch/fullAgenda.php?ida=a056461 and http://indico.cern.ch/conferenceTimeTable.py?confId=048 | |
O-051115-8 | Find out whether experiments are happy with service from sites | Jeremy | 18-04-06 | Closed | 30-05-06; | Raised at PMB for UB input - none received yet. Update 15-01-06: Mentioned at GridPP15. Update 30-05-06: At the EGEE ops meeting on Monday the VOs expressed satisfaction with the service currently being provided. | |
O-051115-9 | Follow up on VO published information (publication of VOs as active, VO server information, VOMS endpoints...) | Jeremy | 05-02-06 | Closed | Raised on ROC managers list and directly with Rolf Rumler. CIC will publish more but we should still follow up. Update 05-01-06: Will be discussed by PMB/UB in January. Update 30-05-06: Matter now taken up by Grid Operations at CERN. | ||
D-051221-1 | Distill advice on tuning dCache for optimal performance | Greig | 01-05-06 | Closed | 30-05-06; | Optimising_dCache_Performance This will be an ongoing task as we discover more about tuning dCache. See also FTS_vs_srmcp.
| |
D-051221-2 | Distill advice on tuning DPM for optimal performance | Graeme | 01-05-06 | Closed | 01-04-06 | See Performance and Tuning#DPM | |
D-060119-2 | Document in Wiki how to remove an RLS or LFC entry for files gone AWOL | Graeme | 10-05-06 | Closed | 02-05-06 | See File Catalog Maintenance | |
D-060310-2 | Dteam should review the current agenda for the Tier-2 workshop/training sessions and comment to the list. | All | 12-04-06 | Closed | 30-05-06; | Comments received from some members but not all. Still the agenda is now taking shapre.; | |
D-060328-1 | Ensure fair share policies correctly implemented at sites | Tier-2 Coordinators | 21-04-06 | Closed | 30-05-06; | Action superseded by request for sites to upload policies to wiki; | |
O-060412-2 | Develop fuller proposal for SE security tests | Jens - storage group | 17-05-06 | Closed | 02-05-06 | Security Service Challenges | |
O-060425-1 | Add Pete, David and Barry to the pre-release/testing mail list | Jeremy | 26-04-06 | Closed | 30-05-06; | 30-05-06; | |
O-060502-2 | Check that Dave Colling is on egee-uki-testing list. | Jeremy | 09-05-06 | Closed | 30-05-06; | He is on the list; | |
O-060502-3 | GridPP dteam should have a deployment plan to deal with gLite 3.0 release. | Jeremy | 16-05-06 | Closed | 30-05-06; | There will be a limited deployment during the week commencing 29th June. Other sites to upgrade by the end of June (mid-July at the latest if they attended the Tier-2 workshop!); | |
O-060502-4 | Create GridPP dteam Google calendar. | Greig/Alessandra | 09-05-06 | Closed | 10-05-06 | ||
D-051020-4 | Update the GridPP security challenges pages | Alessandra | 28-02-06 | Closed | 30-05-06 | Put a link to a security service challenge wiki page, start editing also the wiki page.
| |
O-060502-5 | Transfer test assessment | Greig, Graeme, Jeremy | 09-05-06 | Closed | 12-05-06 | See Service_Challenge_Transfer_Test_Summary. | |
D-060523-2 | Clarify T2 gLite upgrade timescale with Markus | Jeremy | 30-05-06 | Closed | 30-05-06; | The official request is 1st June for LCG sites. This is not possible and the UKI plan will use a 4 week window from the end of May. | |
O-060412-3 | Confirm workshop dates of 19th & 20th June | Jeremy | 13-04-06 | Closed | 13-04-06; | The dates are correct. Maite is working on the agenda now. | |
O-060425-2 | Fraser to update his mail and resend to the list. All to then comment in assigned order | Fraser then TPMers | 29-04-06 | Closed | 30-05-06; | presentation made to ROC managers in mid-May. GGUS will respond; | |
O-060502-7 | Report on gLite 3.0 RC2 deployment issues | Yves | 02-05-06 | Closed | 02-05-06; |
| |
D-051101-4 | To review SB's list of common failure modes, investigate scriptability and circulate results to list | Alessandra | 31-01-06 | Closed | 11-07-06 | NAGIOS is seen as the way forward here. | |
D-051111-5 | Starting with Alessandra review document in Wiki, edit areas of concern (highlight issues with an asterix), then pass the "editing token to the next peron in the team to ensure everyone contributes. | All | 01-05-06 | Closed | 11-07-06 | Review of this document is now out of date. It was felt this action had outlived its usefulness (if indeed it ever had any...)
| |
O-051115-7 | Follow up on CMS plans for file server requirements in light of computing model | Team/Olivier | 01-07-06 | Closed | 11-07-06 | CMS's plans will become clearer at CSA06 .
| |
D-051125-2 | Get dCache gridftp xfer performance out of dCache and publish via RGMA | Steve T (was Jens and Graeme) | 15-02-06 | Closed | 11-07-06 | See here for globus format publisher. See DCache and GridView for progress or rather lack of it. Will probably hand it off to the storage group. Update 30-05-06. Action passed to Storage Group.
| |
D-060113-3 | Each Dteam member should have a look at their area of the Web support pages and update if necessary. | Dteam | 01-07-06 | Closed | 11-07-06 | Fraser moved the "static" web-pages to the wiki. Jeremy to review allocations (new action created). Neason is working on automatic reminders to content owners which superceeds this action. | |
D-060113-4 | Provide outline of Site 'Care and Maintenance' Doc/page. | Stephen | 01-07-06 | Closed | 11-07-06 | Presentation at GridPP 16. Stephen is working on final document.
| |
D-060310-3 | Follow up with Andrew McNab RE: logging tool. | Alessandra | 20-06-06 | Closed | 11-06-07 | No effort in this area now.
| |
O-060412-4 | Follow up on multiple tickets being assigned for same problem (ref. Durham) | Jeremy/Philippa | 10-05-06 | Closed | 11-07-06 | 30-05-06: issue has been mentioned again in weekly ops report. It was thought this happend due to a ticket closure notice not going from UKI to GGUS. Philippa needs to confirm this explanation. Issue not seen again.
| |
O-060502-1 | Create a central location to record fair share policies. | T2 coordinators | 09-05-06 | Closed | 01-06-06; | Olivier has created the pages: http://www.gridpp.ac.uk/wiki/Current_VO_Fairshares_at_T2/T1;
| |
O-060502-6 | Speak to Andrew about the acceptance testing of CASTOR | Jens | 09-05-06 | Closed | 09-07-06 | 'Tis done. Main issue is availability atm.
| |
D-060523-1 | Check CASTOR sticky bit support | Jens | 30-05-06 | Closed | 09-07-06 | No sticky bit.
| |
D-060523-3 | Check published values of GlueCEPolicyMaxCPUTime | Coordinators | 30-05-06 | Closed | LT2 sites checked. This is really only required for SGE and Condor sites;
| ||
D-060530-2 | Circulate list of Tier-2 open UKI session proposed discussion topics | Jeremy | 10-06-06 | Closed | 11-07-06 | Meeting at CERN has happened
| |
D-060530-3 | Steup wiki page for fair share allocations to be recorded by sites | Olivier | 07-06-06 | Closed | 01-06-06 | http://www.gridpp.ac.uk/wiki/Current_VO_Fairshares_at_T2/T1
| |
D-060609-2 | create a wiki page listing required and provided nagios sensors. | Fraser | 16-06-06 | Closed | 11-07-06 | Rationalising NAGIOS actions.
| |
D-060609-3 | Send email to the list detailing the required nagios sensors. | Alessandra/Steve | 16-06-06 | Closed | 11-07-06 | Rationalising NAGIOS actions.
| |
O-060613-1 | Raise purchasing at T2 Board. | Alessandra | Next Mtg. | Closed | 11-07-06 | Was raised.
| |
O-060613-6 | Report to UK/I about MonAMI deployment on Glasgow cluster | Graeme | 31-08-06 | Closed | 11-07-06 | No deployment was made on the current cluster. Revisit after new cluster is installed.
| |
O-060613-7 | Care and Maintenance document should include advice on backup. | Stephen B. | 31-07-06 | Closed | 11-07-06 | Has been noted (this wasn't a real action)
| |
O-060613-9 | Follow up on logging problems discovered with PBS at RHUL during the UKI Security Service Challenge | Jeremy/Alessandra | 2006-07-31 | Closed | 11-07-06 | Alessandra raised the issue at the OSCT meeting. Pal Andersen has an action on him to follow up the tickets. Tickets have been raised.
| |
D-060630-1 | Follow up lancaster dcache WAN tests | Storage group | TBD | Closed | 11-07-06 | Working dCache pool at Manchester accessible from Lancaster. Security concerns remain. | |
D-060630-5 | Start populating wiki with nagios scripts | All | Target date | Closed | 11-07-06 | Rationalisation of NAGIOS actions.
| |
D-060630-6 | Look if it is possible to use lemon sensors underneath nagios | TBD | Target date | Closed | 11-07-06 | No effort identified to carry this through. Lemon has been linked from NAGIOS wiki page in the hope that some kindly fairy will do this for us....
| |
D-060630-9 | Convert the pictorial diagram into wiki page | Fraser | Target date | Closed | 11-07-06 | Deployment Team Page Ownership
| |
D-060630-11 | Check what sheffield is doing the number of supported VOs has gone down again | Alessandra | 07/07/06 | Closed | 11-07-06 | Incatious use of yaim tool not advised ;-) | |
O-060613-3 | Feed back to LHCb that error logging infrastructure is useful, but actual errors are not always that useful | Jeremy | 2006-07-31 | Closed | 2006-08-01 | This was raised at the SC meeting and LHCb recognised the limitations, but they lacked effort to address it right now. Other experiments were informed of the potential usefulness of such logging. | |
D-060630-8 | Give suggestions for security service challenges | All | Unkown | Closed | 2006-08-01 | Suggestions were made. | |
D-060711-1 | Email to TB-SUPPORT appealing for NAGIOS plugins to be made available | Fraser | 14-07-06 | Closed | 2006-08-01 | Greig found a DESY dCache plugin (see Nagios Plugins) | |
D-060711-2 | Put Stephen's list of common site problems into the wiki to identify necessary sensors | Olivier | 14-07-06 | Closed | 14-07-06 | Added the information in the Nagios_Plugins
| |
O-060412-1 | Rework SC milestones | Jeremy, Graeme & Data Management Post Holder | 11-08-06 | Closed | 22-08-06; | Update 30-05-06: Approach was reviewed after HEPSYSMAN but milestones to be agreed. No progress will be made this month after which Graeme takes over a new role. | |
D-060630-2 | Check if there is an SFTs history | Jeremy | Target date | Closed | 22-08-06; | There is an SFT history we can use. ; | |
D-060630-7 | Raise at the ops meeting the problem of consistency between SFT and CIC portal report | Jeremy | Target date | Closed | 22-08-06; | Raised and it was to be checked; | |
D-060801-2 | Upload 1/4 reports for ScotGrid and NorthGrid | Alessandra and Graeme | 2006-08-03 | Closed | 10-08-06 | ||
D-060808-5 | Tell the plan with the transfer tests at the next UKI-Meeting (16/08) | Jeremy | 08/08/06 | Closed | 16-08-06 | ||
D-060530-5 | Check on status of minos VO in the UK - do we remove from VOMS? | Jeremy | 07-06-06 | Closed | 22-08-06 | Did not remove them from VOMS, because US VO is not VOMS enabled. | |
O-060613-2 | Improve contact with ATLAS software managers | Jeremy | 2006-07-31 | Closed | 2006-08-22 | 22-08-06 There are a series of discussions taking place in GridPP and EGEE operations. Check with Alessandra about the operations meeting VO software discussion. Subsequent discussion at ops meeting - Harry Renshall to gt VOs to provide information on what they need. | |
O-060613-8 | Clarify how admins can invoke an OPS-VO SFT run on their own site | Jeremy | 2006-07-31 | Closed | 2006-08-22 | Will be done through SFT admin tool.
| |
D-060630-3 | Ask CIC portal people if we can have access to the reports database for analysis | Jeremy | 15-09-06 | Closed | 2006-08-22 | CIC portal people have done this. | |
D-060630-10 | Go through Fraser page (diagram) and check the under each person name there are the correct pages | All | 2006-08-22 | Closed | |||
D-060808-2 | Apel accounting stopped publishing at Sheffield | Alessandra | 21/08/06 | Closed | 2006-08-22 | No progress - need to now raise a ticket against the site. | |
D-060808-3 | Apel accounting stopped publishing at QMUL | Olivier | 17/08/06 | Closed | 2006-08-22 | Transient problem.
| |
D-060808-4 | Apel accounting stopped publishing at Birmingham/RALPP | Pete | 21/08/06 | Closed | 22/08/06 | All southgrid sites publishing OK
| |
D-060808-6 | Concerning the transfer tests, come up with a bandwidth milestone per site which scales with the number of CPU | Jamie/Graeme | 22/08/06 | Closed | 22/08/06 | Email sent to gridpp-sc mailing list (containing a formula) for discussion. Now waiting on figures from exp. to plug into hat formula
| |
D-060815-2 | Define SC milestones for October, to be reviewed by dteam | Jamie | 22/08/06 | Closed | 22/08/06 | New milestones added for simultaneous read/write tests and wording altered to make targets more flexible | |
D-060815-3 | Send list of filenames of failed Castor transfers to RAL, and RAL to debug | Jamie/Jens | 22/08/06 | Closed | 22/08/06 | Filenames were sent. Castor had many issues when file copies failed. Should be OK now. Possible repeat of copies if time permits | |
D-060523-4 | Follow up with sites consistently marking SFT probs non-relevant | Jeremy | 19-08-06 | Closed | 20-09-06 Raised at UKI meeting and some sites contacted directly. | 22-08-06 Some sites contacted. CIC portal now reports which sites have updated in table. ; | |
D-060830-1 | Have meeting with T2 reference site personnel to discuss transfer test plans | Jeremy, Jamie, Greig | 2006-09-06 | Closed | September | ||
D-060830-3 | Contact sites that have yet to hand over details of personnel that will conduct transfer tests | Jeremy, Jamie | 2006-09-06 | Closed | September | ||
D-060822-1 | Draw up revised CASTOR-T2 testing schedule | Jamie | 2006-08-29 | Closed | 10-09-06 | ||
D-060822-2 | Circulate CMS testing schedule | Matt and Jens | 2006-08-29 | Closed | 2006-08-23 | [1] | |
D-060822-3 | Contact Simon Metson (CMS) to ensure testing schedules are coordinated. | Matt and Jens | 2006-08-29 | Closed | 2006-08-23 | Done - Jamie and Simon are coordinating their tests. | |
D-060808-1 | Find out how the SFT and GSTAT collect the software version information. | Graeme | 2006-09-19 | Closed | 2006-09-19 | GS - SFTs execute /opt/lcg/bin/lcg-version. gStat takes the highest "version" of GlueHostApplicationSoftwareRunTimeEnvironment published in the BDII.
| |
SB-051123-1 | Update the security policy page | Stephen/Alessandra | 05-09-06 | Closed | 30-05-06: Stephen to confirm the updates. Link found to non-existant pages - needs investigated.
| ||
O-060425-3 | Contact Maria-Dimou to resolve David's VOMRS status problem | Jeremy | 19-08-06 | Closed | Update 30-05-06: request for more information has been sent. Issue not yet resolved. 22-8-06 Checking on progress; | ||
D-060609-1 | Report to list on CMS monitoring of sites. | Dave | 16-06-06 | Closed |
| ||
D-060630-4 | Verify who can do the analysis | Jeremy | 2006-09-19 | Closed | 22-08-06 Who has the time or interest!?; 22-09-06: No progress. Now also needs to be linked with GridView and RTM monitoring? Olivier is working on this.
| ||
D-060630-13 | Raise phenogrid problems with the lcg-utils at the TCG (Alessandra) | Olivier/Data Management Person | 31-09-06 | Closed | Progress. A reported to TCG. GS reports that J-P B said this would be worked on in August. Need to verify fix as it comes though. | ||
D-060808-8 | Register UK VOs with CIC Portal | Alessandra | 2006-09-08 | Closed |
| ||
D-060815-5 | Send to Olivier info about killing jobs that exceed memory limit | Steve | 22/08/06 | Closed |
| ||
D-060830-2 | Review minutes of todays meeting regarding security and discuss at next meeting | All | 2006-09-12 | Closed | |||
D-061003-1 | Start a file transfer blog | Jamie | 2006-10-10 | Closed | http://filetrasfertests.blogspot.com/ | ||
D-061003-5 | Investigate "self-service" method in VOMSRS for getting re-registered | Alessandra, Graeme | 2006-10-10 | Closed | It works. See Graeme's dteam message of 17 October 2006 17:43:21 BST. No wiki article though. | ||
D-061003-6 | Check open actions (!) | All | 2006-10-17 | Closed | Jens, Graeme, Greig and Jamie reviewed as of 2006-10-10.
| ||
D-061010-2 | Discussion of issues involved in automating bandwidth transfer tests - see minutes for background. | All | 2006-11-03 | Closed | Conclusion was it is too much effort to maintain. Concentrate on using VO and FTS monitoring and our manual transfer tests when required. | ||
D-061010-3 | Send top ROC issues list to TB-SUPPORT for comments. | Phillipa | 2006-10-17 | Closed |
| ||
D-061010-5 | TPM team 7 shift clashes with GridPP 16. | Phillipa | 2006-10-17 | Closed | |||
D-061024-1 | Contact GridPP networking people to ask about support. | Jeremy | 2006-11-07 | Closed | JC contacted Robin Tasker et al but for MAN type problems route is via local site networking people upwards. | ||
D-061024-2 | Show dteam how fairshares have been implemented in London. | Olivier | 2006-11-07 | Closed | OvdA provided details at the F2F dteam meeting | ||
D-061103-4 | Greig | 2006-11-28 | Closed | dpm-drain works in v1.5.10 but with some bugs that will be fixed in the next version | |||
D-060530-1 | Review deployment page allocations | Jeremy | 19-08-06 | Closed | 07-12-19 | Update 22-09-06: Still to be followed up. | |
O-060613-5 | T2 Technical boards to discuss feasibility of cross-supporting sites to meet MoU. | Coordinators | 2006-09-19 | Closed | 07-12-19 | In LT2 no access at Brunel except for Olivier. Most other sites happy with sudo access. Need to document problems in NorthGrid (generally no joy). See minutes of last technical board meeting for details of implementation. SouthGrid all sys admins agree, but we have to check with RAL security before going forward. Cambridge and Birmingham in progress. Completed at ScotGrid, see description in minutes of [2] | |
D-060815-1 | Invent formula for site's expected transfer rate (see also D-060808-6) | dteam | 1/12/06 | Closed | 07-12-19 | Jamie created a formula, we are now waiting for the experiments input. | |
D-061003-2 | Clarify roles and mapping to roles in dteam VO | Jeremy, Olivier, Yves | 2006-10-18 | Closed | 07-12-19 | ||
D-061010-4 | Resurrect dTeam issues list to enable tracking of new issues which arise. | Jeremy | 2006-10-17 | Closed | 07-12-19 | The issue log exists Deployment Issues | |
D-061010-7 | Document proceedure for registering a new certificate in VOMS using the old one. Wiki + circulate to UKHEPGRID list. | Alessandra | 2006-10-17 | Done | See D-061003-5
| ||
D-061017-1 | Followup UCL Transfer tests. | Olivier | 2006-10-26 | Closed | 07-12-19 | ||
D-061024-4 | Report current levels of Tier-2 disk usage to the list. | Greig | 2006-10-31 | Closed | 07-12-19 | 'Tis done for London (Olivier's document) | |
D-061031-1 | Plan remaining T2 file transfer tests. | Graeme | 2006-11-07 | Done | Timetable Not many tests happened. Action now to be raised on User:Andrew elwell | ||
D-061031-2 | Send LHC experiment contacts for LT2 to dteam list. | Olivier | 2006-11-07 | Completed | |||
D-061031-3 | Send Phillipa the ticket number about not publishing SAM/SFTs for UKI-SCOTGRID-GLASGOW | Graeme | 2006-11-07 | Completed | GGUS Ticket | ||
D-061103-2 | Write a sensor to get numbers from GStat and publish it into RGMA. | Dave Kant | 2006-11-28 | Done | 2006-12-05 | ||
D-061103-5 | Put graphs on storage accounting page. | Dave Kant/Greig | 2006-11-28 | closed | 07-12-19 | ||
D-061103-9 | Post fairshares talk from GDB to list. | Jeremy | 2006-11-14 | closed | 07-12-19 | ||
D-061103-17 | Discuss the collection of contact details of the individual sites. | T2Cs | 2006-11-28 | closed | 07-12-19 | ScotGrid filling in security contacts - done. SouthGrid PDG has phone numbers of Mobiles of other sysadmins. | |
D-061114-2 | Check the status of sites in their Tier-2 with respect to the Torque vulnerability/patch | T2Cs | 2006-11-17 | Progress | ScotGrid ok. SouthGrid ok. | ||
D-061212-3 | Give the original url of the document copied at http://www.gridpp.ac.uk/wiki/Incident_Response_Handbook. The url should be put in the wiki. | Alessandra | 2006-12-20 | closed | 06-12-19 | ||
D-070220-1 | Pete to update existing site-info.def entries removing old ldap references where appropriate | Pete | 2007-03-07 | Done | 2007-02-20 | Entries updated as using Oxford and RALPPD info.
| |
O-060613-4 | Raise at Tier 2 Board whether Tier2s can treat MoU commitments as aggregated over all sites in the T2 | Jeremy | Next Meeting (of T2B) | Closed | 2006-02-16 | JC sent information in June. 22-09-06: This needs to be raised again.
| |
D-061103-14 | Ask the expt reps to come along to the dteam meeting when data challenges are ongoing. | Jeremy | 2006-11-14 | Closed | |||
D-061103-20 | Take issues from dteam f2f to the ops meeting. | Jeremy | 2006-11-28 | Closed | |||
D-061114-3 | Check the situation in LHCb regarding legal responsibilities around use of generic pilot jobs as seen within the experiment | Raja (JC) | 2006-11-21 | Closed | 07-12-19 | Email circulated "some time ago" | |
D-061121-1 | Forward links to OSCT and vuln. proc. again | Alessandra | 2006-11-28 | Done | 2006-11-21 | [3] | |
D-061212-1 | Remind the sites at the UKI Monthly meeting that they have to notify a downtime via the broadcast tool. They have to select ROC and VO users. | Jeremey | 2006-12-13 | Closed | |||
D-070206-3 | Write a GridPP abstract for EGEE UF and circulate it. Deadline 2007/02/14 | Jeremy | 2007-02-13 | Closed | |||
D-060815-6 | Upload Nagios scripts | Derek | 30/11/06 | Closed | 2006-02-28 | Reassigned from Steve to Derek in mtg 20061205.
Nagios scripts now available in T1 CVS and linked to from wiki | |
D-061103-11 | Calculate the current CPU to disk ratios so that Jeremy can take to the UB. | Greig | 2007-01-20 | Closed | Need to speak to the T2Cs or the quarterly reports to get the CPU numbers. Greig wrote a script to extract information from BDII. We need more disk! | ||
D-070206-1 | Greate Gridpp operations blog | Alessandra | 2007-02-13 | Done | |||
D-070227-1 | Open a ticket against Liverpool for Steve Lloyds jobs failure | Alessandra | 2007-03-07 | Done | No ticket for now until I hear progress on the switches. | ||
D-070227-4 | Post a summary on daemons running on the WN to TB-SUPPORT and ask sites opinion | Alessandra | 2007-03-07 | Done | Summary sent to TBS | ||
D-070313-1 | Send link to EGEE outside EU travel | Graeme | 2007-03-16 | Closed | [4] | ||
D-060530-4 | Update VO information page with new VO information | Alessandra (for UK VOMS enabled VOs) & All (rest) | 07-01-31 | Done | When a VO fills in their ID card on the CIC portal replace their entry with a link. Hard to reach certain VO mgrs. I think that maintaining uptodate YAIM snapshots is more efficient. | ||
D-061003-3 | Raise bug against VOMS for storing CA issuer name | Jens | 2006-10-10 | Done | Savannah bug #20789. VOMS modified but mod v. not deployed | ||
D-061003-4 | Contact Iain Neilson for workaround proceedure in VOMS when CA issuer name changes | Alessandra | 2006-10-10 | Done | Procedure is now in gridpp and goc wiki | ||
D-061103-13 | Birmingham should speak to their local networking people in order to report back what he learnt from MAN. Need to ask them not to
rate cap. |
Pete | 2006-11-28 | Done | MID MAN are not rate capping |
| |
D-061107-1 | JC concerned abot eg Liverpool running 150 jobs but have 614 free cpu's, need to compare no of jobs waiting, running with bdii info. | Alessandra | 2006-11-28 | Done | The problem is bad communication between the rack switches and the University one. It will hopefully be solve in April 2007. | ||
D-061114-1 | install a CVS repository on the GridPP site and to mail GridPP/UKI sites seeking information about local repositories (how many use them). | Alessandra | 2006-12-08 | Done | Repository has been installed at the end of January and can be used. | ||
D-061103-3 | Go through the same accounting analysis that OvdA has done for London. | T2Cs | 2007-01-31 | Closed. | 2007-05-15 | Extended 07-12-19. Closed 2007-05-15: No longer needed.
| |
D-061103-6 | Set milestones within storage group to enable VO specific pools. Get dpm sites to deploy the new plugin. | Greig/Jens | 2006-11-28 | Closed | 2007-05-15 | Deployment of plugin is complete. It will move into prodcution gLite. Milestones for VO specific pools may no longer be required once we have SRM2.2 spaces. | |
D-061103-7 | Write up something about about quality of storage to give to UB. | Jens | 2006-11-28 | Closed | 2007-05-15 | This is presumably about custodial/output/replica. No further work needed now. | |
D-070109-1 | Check how VO support units are setup in GGUS | Philippa/Jeremy | 2007-01-31 | Closed | 2007-05-15 | Original meaning unclear but GGUS tickets can forward to experiment support lists. Philippa sent additional infromation to list 2007-05-15. | |
D-070206-2 | Send a list of sites that have made the change to Frederic. Might be overridden by script sent by Greig. | T2C | 2007-02-13 | Closed | 2007-05-15 | ScotGrid: Glasgow and Durham done (Ed are dCache). SouthGrid complete (5.3.07). | |
D-070213-1 | Report which UK T2s have Atlas transfer problems | Frederic | 2007-02-20 | Closed | 2007-05-15 | Either done or no longer needed (or both). | |
D-061130-1 | check site purchase plans till July 2007. (I've just entered this action originally talked about in Nov 06....PDG) | T2Cs | 2007-03-07 | Closed | 2007-05-15 | Now part of T2 reviews. Closed. | |
D-070410-2 | Add dates for experiment software weeks to the dteam calendar | Greig | 2007-04-17 | Closed | 2007-05-15 | Still waiting for CMS dates. | |
OV-070424-2 | Put the text in the quaterly report | Olivier | 2007-04-30 | Closed | Not clear what this is, but is probably done? | ||
D-070213-2 | Add Total and CamOnt details to wiki | Yves (and Pete) | 2007-02-20 | Closed | Camont and totalep are in the wiki (PG) | ||
D-061031-4 | T2s to establish experiment contacts with the LHC VOs. | T2Cs | 2006-11-28 | Closed | Done for LT2 (see D-061031-2). ScotGrid now done (ATLAS, LHCb). Progress in SouthGrid for Atlas, CMS, and Alice. NorthGrid has contacts with Atlas, Lhcb,Babar and Dzero. | ||
D-061103-1 | Understand networking topology at each site. Particularly which other departments are using the same equipment that you are using. | T2Cs | 2006-11-28 | Closed | ScotGrid diagrams now pu to date. Southgrid diagrams need updating. Northgrid current as far as HEP concerned. London done.
| ||
D-061103-10 | Summarise points regarding site installation and configuration to the list | Alessandra | 2006-11-14 | Closed | This was not site installation and configuration it was configuration of VOs and subgroups and the fact that yaim is not modular enough to allow a simple reconfiguration. (More info reqd?) | ||
D-061103-12 | Go to the UB and take the storage figures in order to find out if OPN is required. | Jeremy | 2006-11-28 | Closed | Particularly between sites that are known to have large amounts of storage. Next UB in March - joint UB/DTEAM. JC: No sites are going to need a dedicated connection at the moment. | ||
D-061103-16 | Check that Universities have network security teams. | T2Cs | 2006-11-28 | Closed | ScotGrid filling in security contacts - done. SouthGrid added a private page on Southgrid web site detailing contacts. Olivier says done in GOCDB for LT2.
| ||
D-061103-18 | Understand how best to use each of the experiment dashboards/tools. Put on a future agenda. | ?? | 2006-11-28 | Closed | There are now links to the dashboards in the wiki | ||
D-061205-1 | Check whether megatable entries correspond to resources actually available in UK | Jeremey | 2006-12-19 | Closed | Note the change in MB reporting available resources as opposed to allocated resources. | ||
D-061212-2 | Assess the storage failure cases from the user point of view and see what need to be improved in the middleware. | Storage Group | 2006-02-01 | Closed. | Not always storage problem - access may depend on other services. E.g. infosys and availability of closeSE(s). Now a Storage Group issue. | ||
D-070220-2 | : Tier-2 coordinators to review VOs supported in their Tier-2 and update the wiki with VO information not currently included (creating an “Other” category where the VO is Tier-2 specific and not GridPP approved) | Tier-2 coordinators | 2007-03-07 | Closed | Alessandra (8/3/07): updated dzero and gridpp and removed ldap instances from few VOs like hone and biomed. PG edited the wiki twice in March (including camont details). GS nothing to add. ovda added ltwo
| ||
JC-070424-1 | Send which sam test will be added as critical | Jeremy | 2007-04-30 | Closed |
| ||
D-070501-1 | Find out about tape congestion issues at RAL over weekend | Derek | 2007-05-7 | Closed | Problem was with ADS being overloaded, losing files. Derek has identified files for LHCb, need feedback from Tim Folkes for Atlas files. | ||
D-060801-1 | Revisit security audit proceedures and incident response | Jeremy and Alessandra | 2006-08-26 | Closed | 2007-08-17 | Propose that sites audit each other? Inc.resp. mirrors EGEE OSCG. Closed - new action opened for Mingchao | |
D-060922-1 | Start a services page | Jeremy | 2006-10-03 | Closed | 2007-08-17 | VOMS, helpdesk, RB, file catalog, MyProxy, RGMA registry, CA, APEL acct. http://www.gridpp.ac.uk/wiki/Grid_services - created early 2007. Not much take-up. | |
D-06-11-21-2 | Forward info about NeSC-hosted UKI web site | Jeremy | 2006-11-28 | Closed | 2008-08-17 | http://www.eu-egee.org.uk/home.html | |
JC-070228-1 | Complete the experiment contacts at sites table http://www.gridpp.ac.uk/wiki/Site_contacts | T2Cs | 2007-03-16 | Closed | 17-08-2007 | Northgrid has no local contacts. Southgrid already completed this. ScotGrid done. LT2 Done | |
D-070626-1 | Review the services page and incorporate missing services | Jeremy | 2007-07-31 | Closed | 2007-08-15; | Services covered but not much support for the page!
| |
D-070731-1 | Check possible signals that can be passed to job-wrapper for PMS | Alessandra | 2007-08-07 | Closed | |||
D-070731-2 | Check with Yves the reasons why his ngs.ac.uk installation tests okay but there are problems for Oxford and Liverpool | Jeremy | 2007-08-07 | Closed | 17-08-2007; | Mail exchange initiated between Matt and Yves. Progress made - to be reviewed at future meeting.
| |
D-070814-2 | Follow up on Northgrid and London reports | Jeremy | 2007-08-14 | Closed |
| ||
D-070814-3 | Check the routing for the support form on the GridPP site. | Jeremy | 2007-08-21 | Open | 2007-08-17; | Ticket goes into the request queue as "less urgent".
| |
D-061010-6 | Document storage class implementation policy at RAL on wiki. | Jens | 2006-10-24 | CLosed | See RAL Tier1 CASTOR SRM#Endpoint and SAPaths for a rudimentary overview of LCG svc classes available. Greig will check whether info is sufficient. | ||
D-061024-3S | Document optimal combinations of storage hardware and software/kernel/OSes | Greig | 2006-11-07 | close | Biggie. Work ongoing in storage group.
| ||
D-070227-3 | Make a list of SE occupation problems on the wiki | Greig | 2007-03-07 | Close | This is probably about full SEs. discussed at gridpp and is documented on the gridpp wiki. Over to storage group.
| ||
D-070508-1 | Check that T1-T2 channels support OPS VO | Andrew | 2007-05-15 | Closed | Complete, Andrew will check with Matt Hodges shortly. | ||
D-070710-1 | Send config_BDII commands to TB-SUPPORT | Alessandra | 2007-07-17 | Closed | http://northgrid-tech.blogspot.com/2007/08/updating-glue-schema_10.html | ||
D-070807-4 | Ensure T2Cs at least have required access to Footprints and that online instructions for use are up-to-date | Philippa | 2007-08-21 | Closed | |||
D-070814-1 | Develop policy for removing a storage element | Greig+Jeremy+All | 2007-09-28 | Closed | JJ the policy has been around for some time | ||
D-070814-4 | Suggest discussion topics for PMB-DTEAM F2F | All | 2007-08-21 | Closed | |||
D-060630-12 | Check never ending jobs for the metrics of failed hours | Dave Colling | 2006-08-29 | CLosed | If a job has no life after 48 hours it gets cutt off, if come sback to life it can be restored.
| ||
D-061017-2 | Check that the new version of the CIC report contains less entries. | Olivier | 2006-10-26 | Closed | SAM tests failing due to external problems should be filtered out but are not yet. 17/08/07 should this go to someone else now? This problem affects all sites equally, and there is not much we can do about it right now.
| ||
D-070227-2 | Look at the 2 days BDII plot of gstat for all the UK sites to identify possible failure patterns. | Olivier | 2007-03-07 | Closed | Result was plots show single plugin failure as expected. Leave open because people still interested in the plots. See plot at http://londongrid.blogspot.com/2007/06/bdii-counts.html
| ||
D-070807-2 | write up publishing queue prioritisation use case for Glueschema 1.2 | David | 2007-08-14 | Closed |
| ||
D-070410-1 | Send URL to agendas for CMS software weeks to the list | Dave C | 2007-04-17 | Closed | Email sent
| ||
D-061103-8 | Follow up with Andrew and the T1 to discuss transfer testing to international sites. | Jeremy | 2007-11-28 | Closed | 2007-10-23 | Do we need to actively deal with T2 international transfers? May deal with via experiment transfers. 17/08/07 - not seen as a priority. Held up while CASTOR was unstable. Revisit in September. revisited in October - some testing done via experiment challenges. AS considering other posssibilities.
| |
D-061010-1 | Review notes on site status in the wiki to ensure they reflect accurately each site's status. | Alessandra | 2006-10-17 | Closed | 2007-10-23 | Graeme did this for ScotGrid 2006-10-31. PDG has emailed the sysadmins in SouthGrid and asked for updates. 2006-11-14. Alessandra Can't remember what this was about.
| |
Action ID prefix | Status | ||||||
D = From Deployment team meeting | Open = Action has been created | ||||||
O = From monthly Operations meeting | Progress = Action is being worked on | ||||||
BR = Created by Buck Rogers | Closed = Action is complete |
Action ID | Action description | Owner | Target date | Status | Date closed | Notes
| |
---|---|---|---|---|---|---|---|
D-060822-4 | Document ticket follow-up/closure procedures | Jeremy | 2007-01-31 | Open | 22-09-06: Procedures touched upon at last UKI meeting but nothing documented. Needs to be on UKI ROC web-site too. Reassigned 07-12-19. 15-05-07 workflow for UKIROC helpdesk put in this week, published when ready. The current process is for T2Cs to follow up with reminders sent from Footprints each week - do we need more? 30-10-07: Still need a reference web-page/documentation.
| ||
D-061103-15 | Create a checklist of 5 security items for the T2Cs to check at each site. | Dave Kelsey/ Mingchao | 2006-11-28 | Open |
| ||
D-061103-19 | Setting of alarms in the GridLoad system. | Dave Colling | 2006-11-28 | Ongoing | Gidon is rewriting the code and will add alarms. 23-10-07: Not top priority but in progress.
| ||
D-061219-1 | Follow up with OSCT re handbook | Mingchao | 2007-11-29 | Open | Transferred to Mingchao (from AF/JC). This relates to whether updates to the OSCT incident handling handbook need to be reflected in the GridPP wiki etnry. How will we take incident response forward?
| ||
D-070220-3 | Collate plans from CMS and what is know regarding the ATLAS transfers to derive a more detailed schedule for joint and dteam testing – for site bandwidth and site SE-WAN capabilities | Andrew Elwell | 2007-05-31 | Progress | Should mainitain a wiki page with collated plans. 23-10-07: Still in progress.
| ||
D-070501-2 | Submit a test ticket to GGUS to see if it gets to LHCB production list | Jeremy | 2007-05-7 | Closed | 2007-10-23 | See also D-070109-1. Ticket submitted. Ticket was submitted in June. The TPM did not know how to deal with the request and asked "the user" me what to do with it but I never received the question so the ticket was closed - they assumed it was no longer relevant - my response 23/10 reopened ticket 21389. I've now submitted a new ticket: 28248 (on 23rd October). 30-10-07: Second ticket resolved within 12hrs.
| |
PMB-070815-1 | Raise the issue of PPS feedback information relating to upgrades issues with the relevant individual(s) on the PPS, and ask if there was anything else that could be done. | Jeremy | 2007-09-14 | Closed | 2007-10-23; | A blog will be created by Marian and will welcome input from Yves and Barry.
| |
D-070807-1 | Resend the storage accounting use case to Alessandra and to the mail list (previously sent to Lawrence). | Greig | 2007-08-14 | Closed | |||
D-071015-2 | Remove Glasgow's PPS queue from footprint. | Graeme/Jeremy | 2007-10-31 | Closed | Ticket raised: https://gus.fzk.de/ws/ticket_info.php?ticket=28259 and done. | ||
D-071106-2 | Setup vo.scotgrid.ac.uk | Graeme | 2007-11-30 | Closed |
| ||
JC-060801-1 | Revisit security audit proceedures and incident response | Mingchao | 2007-08-17 | Closed | 2007-10-17 | Propose that sites audit each other? Inc.resp. mirrors EGEE OSCG. Was action D-060801-1 | |
D-070807-21 | Circulate administration tool suggestions to the DTEAM list. | All | 2007-08-21 | Closed | 2008-01-22 | ||
D-071009-1 | Follow up the set up of net mon boxes at each site. At RAL the box is not on the same network as the SE which makes it useless. | Andrew | 2007-10-31 | Closed | 2008-01-22 | 23-10-07: Andrew will follow up one last time!
| |
D-071106-1 | See why LHCb software not being installed on Oxford SL4 cluster. | Raja | 2007-11-13 | Closed | PG and RN will talk | ||
D-071015-1 | Set up Glasgow to take part in the adhoc ATLAS DDM functionality tests. | Graeme | 2007-11-31 | Closed | Glasgow (and other T2s) took part in successful DDM FT07 test in January. | ||
D-071120-1 | Ticket gStat about filtering 4444/6666 values with CE plugins fail (otherwise RRD plots are useless) | Graeme | 2007-11-27 | Closed | https://gus.fzk.de/pages/ticket_details.php?ticket=29296 | ||
D-070710-2 | Create wiki page containing information about SRM2.2 for sites | Greig | 2007-07-24 | Closed | 2008-03-25 | Wiki page started http://www.gridpp.ac.uk/wiki/SRM | |
D-080115-1 | Test VO configurator tool and feedback the result | Alessandra | 2008-03-01 | Closed | 2008-03-25 | AF emailed the dteam list, "not too impressed" | |
D-080318-1 | Ask Stephen McCallister to close all footprints tickets where GGUS ticket is solved or verified. | Jeremy | Closed | 2008-03-25 |
| ||
D-071120-2 | Send ATLAS MC monitoring links to dTeam list. | Graeme | 2007-11-27 | Closed | 2008-04-08 | ATLAS Monitoring For Sites
| |
D-080226-1 | Ask how to specify downtimes with different vos and queue lengths | Jeremy | Closed | It is not currently possible. The requirement has been passed on to the GOCDB team.
| |||
D-080506-1 | Follow up with supernemo if they need castor space | Jens or Derek | 2008-05-13 | Closed | |||
D-080506-2 | Check with Matt who's the list of NGS software is for. | Jeremy | 2008-05-13 | Closed | The document is not really aimed at GridPP. The PMB recently reviewed GridPP-NGS overlaps and while we should continue to support the interoperation, adjusting the software list is not a top priority.
| ||
D-080506-3 | Ask sites comments on the networking results circulated by Andrew | T2C | 2008-05-13 | Closed | |||
D-080506-4 | Follow up with sites on the local mon box throughput | T2C | 2008-05-13 | Closed |
| ||
D-080603-1 | Setup a Wiki for T2Cs to post a report of the highlights of the GDB meetings | Jeremy Coles | 2008-06-14 | Closed | 2008-06-10 | Setup by someone and edited by me! | |
D-080610-2 | Sites to ensure one CE is marked in GOC DB as an APEL node. | T2C | 2008-06-17 | Closed | ScotGrid done. | ||
D-080610-1 | Any UK sites interested in trialling a CREAM CE? (e.g., IC) | JC | 2008-06-17 | Closed | Barry is doing this at IC
| ||
D-071009-1 | Get public access to the network tests results at RAL. | Derek | 2007-10-31 | Ongoing | Looks unlikely to happen after speaking to Martin. Andrew to follow up with why this is necessary. | ||
D-080520-1 | T2s to discuss deployment of federated nagios monitoring of SAM tests | T2C | 2008-05-27 | closed | This is now becoming urgent. Sites should have Nagios installed by the end of September. AE explained his findings at the dteam meeting on 9.9.08 .See ST tutorial at EGEE conference. | ||
D-090113-01 | Confirm how to apply for regional status in the GOC. | Jeremy | 2009-01-20 | closed | 2009-02-10 | Regional status is applied for in the GOC DB under 'request new role'
| |
D-080325-1 | Follow up with Guenter re UK TPM team update (Duncan replacing Olivier) | Jeremy | 2008-03-25 | Closed | Followed up and Pete also mailed Guenter. What about training? TPM update done but what about the training? Duncan now T7. This must be completed now... | ||
D-090113-02 | T2Cs to apply for regional status in the GOC. | T2Cs | 2009-01-20 | Closed (by DTR) | Done: PG,KM,DM,JCullen,AF,DB,DTR,GS | ||
D-090505-01 | Put Kashif in contact with appropriate people for WLCG Nagios | Jeremy | 2009-05-12 | Completed | |||
D-080226-2 | Follow up on downtime broadcasts scope and the current status | Jeremy | Closed | 2008-06-10 | Broadcast scope was changed and spam notices reduced by use of more targeted selections. | ||
D-080826-1 | Follow up with Steve Lloyd re maintenance of SL's tests | Jeremy | 2008-08-26 | Closed | QMUL have now installed a new machine for the tests. JC to follow up again with SL on a backup to cover periods when he is away. JC followed up and QMUL now have backup sysadmin locally. No new problems reported. | ||
D-080930-2 | to follow up re updating data in gocdb | Jeremy | 2008-09-30 | Closed | JC could not remember which data so would review the minutes again. It was contact information and this was reviewed.
| ||
D-081021-1 | to check where email addresses for GGUS ticket allocation is being taken from | Jeremy | 2008-10-21 | Closed | A wiki page was setup by Alessandra and this is now in use by the UKI ticket assignment team.
| ||
D-081125-01 | to check what's the plan for the availability accounting and how it is done now to exclude periods that are not site fault | Jeremy | 2008-12-02 | Closed | Various discussions indicated that the only flexible system is for sites to raise events on a case by case basis. Sites should do this either via their T2 coordinator or directly to Jeremy. | ||
D-090120-02 | Feedback information about rpm versioning for updates and patches to GDB. | Pete and everyone | 2009-01-27 | Progress | Feedback collated from tb-support emails. Marcus was sent UK input. | ||
D-090317-01 | Look for failures during CERN network downtime on 19th, report to Jeremy | everyone | 2009-03-19 | CLOSED, as the 19th is long passed. | Nothing was reported. | ||
D-070807-3 | Look at whether it is feasible for information about an interrupted job to flow into the L&B system | David | 2007-08-14 | Progress | Had no response from Italy, will ask again about this. 23-10-07: Still in progress.
| ||
JC-070822-1 | Develop criteria and process for removing site performance data where the site is taking risks for GridPP | Jeremy/DTEAM | 2007-10-12 | Progress | 2008-06-10 | The dates for the next round will be discussed at the DB - therefore this is becoming a priority. June08- Next round will run from September.
| |
JC-080708-1 | Implement site performance wiki page that reads current graphs | Jeremy | 2008-07-30 | Progress | 2009-07-30 | Item was on hold pending questions about site report integration. Template page now created and this will go live by the end of July.
| |
D-080930-1 | Follow up with NGS, one from NGS should join dteam mailing list or dteam meeting when request | Jens | 2008-09-30 | CLOSED | John Quley (sp?) and Jens Jensen are both know to be available - Jens is an optimal choice, since he attends anyway.
| ||
D-090120-01 | Sergey to create a post-mortem about VOMS incident. | Sergey | 2009-01-27 | Progress | the post-mortem has been created. Wiki entry still outstanding. | ||
D-090317-02 | Feedback on gridview dashboard to Jeremy | everyone | 2009-03-24 | Some items in mins of meeting 090317 (CLOSED) | |||
D-090623-01 | Formally announce closure of IC-LeSC site to ROC (ROC = JC in this case) | LT2 (Daniela/Duncan/Dave) | 2009-06-30 | Closed | 2009-07-07 | Site marked as closed in early July. Warnings were sent to users in advance. No feedback from GridPP management or users. | |
D-090623-04 | Publicise Glasgow STEP post mortem on TB-SUPPORT | Graeme/Sam | 2009-06-30 | Closed | 2009-06-23 | Message sent to TB-SUPPORT | |
D-090623-13 | Review actions older than this meeting to update status and close if possible | Jeremy | 2009-07-07 | Closed | Actions reviewed and closed where possible/appropriate. | ||
D-090505-02 | Find information about TeraGrid tag publishing | Jens | 2009-05-12 | Closed | 2009-07-07 | Publishing applications on NGS and TG
| |
D-090623-02 | Investigate load/performance issues on Glasgow WMSs | Mike | 2009-07-07 | Closed | 2009-07-31 | Steve Lloyd test failures coincident with backup of WMS database, which was locking the DB (for ~45mins) and preventing job-handling. Glasgow will implement binary logging on the DB to greatly improve backup performance.
| |
D-090623-06 | Cross check between LRMS records and APEL at Glasgow | Mike | 2009-07-07 | Done | Largely ok, but open issue with some VOs. GGUS #49246. Script for torque logs was written + circulated. | ||
D-090623-07 | Cross check between LRMS records and APEL at Manchseter | James | 2009-07-07 | Done | 2009-08-18 | Manchester had problems with accounting records around Christmas 2008. In the months since then the APEL and pbs records match well. | |
D-090623-08 | Cross check between LRMS records and APEL at Oxford | Pete | 2009-07-07 | Done | 2009-7-14 | The script has been run and a spreadsheet comparing batch results with APEL produced.
| |
D-090623-10 | Sumarise Glasgow progress on resiliency | Dug | 2009-07-07 | CLOSED | Page created, Glasgow's info added, space for other sites to add info.
| ||
D-090707-01 | Examine and potentially revise method of site reports (with reference to comments on reliability graphs, etc) | Jeremy | 2009-07-14 | Closed | 2009-10-21 Feedback sent to EGEE ops & SA1. New GridPP page in prototype. | ||
D-090714-01 | Promote the 'Glasgow' method to discover site scaling limitations for ATLAS analysis jobs. Schedule HC panda test. | Sam | 2009-07-21 | Closed | 2009-07-21 |
| |
D-090714-02 | Understand why WMSs refuse some Manchester VOMS server issued credentials and raise GGUS ticket. | Jens | 2009-07-21 | Closed | 2009-07-21 | See Jon Churchill's mail to Rollout or NGS-Ops
| |
D-090714-03 | Investigate ATLAS feedback for running on SL5. | Graeme | 2009-07-28 | Closed | RALPP to migrate; also T1 (check FZK for progress). | ||
D-090714-05 | Summarise SL5 discussion for PMB | Jeremy | 2009-07-21 | Closed | Summary with JG. Done September. | ||
D-090623-11 | LHCb to compare APEL records with internal accounting | Raja | 2009-07-21 | Closed | Raja and Graeme agreed on what to do - and did it.
| ||
D-091013-01 | Setup a wiki page to track site SL5 migiration's status | Jeremy | Closed | Set up as http://www.gridpp.ac.uk/wiki/Site_status_and_plans | |||
D-090623-03 | Investigate load/performance issues on IC WMS | Daniela -> Barry | 2009-07-07 | Closed | 2009-07-07 | Hardware is v old (from Barry) | |
D-090623-09 | Cross check between LRMS records and APEL at IC-HEP | Duncan | 2009-07-07 | Closed | Waiting on scripts (should have been distributed by now) | ||
D-091027-01 | Distribute details on tweaking NFS threads for software areas. | Sam | 2009-11 | Close | |||
D-091027-02 | Check on issues with multiple LHCb VOMS servers and post instructions if necessary | Raja | 2009-11 | Closed | 2010-26 | ||
D-091027-03 | Find out details of Lancaster SCAS trials. | Jeremy | 2009-11 | Closed | Information circulated early November. | ||
D-091027-04 | Post some notes on subclusters/CEs to help with logical/physical CPU publishing | Derek | 2009-11 | Closed | Report sent to DTeam list | ||
D-090623-12 | Prioritisation schemes used for T3 access in other countries? JC to ask GDB contacts | Jeremy | 2009-07-14 | Closed | 2010-04-27 | 20-07-09 Asked John Gordon on how to best progress this in GDB context and also for his initial input. DB request (and PMB action) for test with Panda (with Graeme). Experiments have no interest in this at the moment - feedback given to PMB. Need to mail DB. | |
D-091103-01 | Tier 2 coordinators to review security document | Tier 2 Coordinators | Closed | 2010-04-27 | |||
D-100511-02 | Circulate details on BNL VOMS certificate configuration | Graeme | Closed | 2010-05-17 | Message posted to TB-SUPPORT | ||
D-100620-01 | Verify that Santanu has actually fixed Cambridge for ATLAS. | Pete | Closed | 22-07-2010 | Santanu has double checked the problem is fixed, and Graeme reopened the site | ||
D-100620-02 | Pete to ask the PMB for clarity on using the 2009 or 2010 MOUs for the Tier-2 Q2 reports. | Pete | Closed | 20-07-2010 | Steve Lloyd confirmed we should start using the 2010MoU's in our Q3 report, although the WLCG will be looking for these levels from June 2010. | ||
D-091103-01 | Tier 2 coordinators to review security document | Tier 2 Coordinators | Closed | 2010-09-21 | |||
D-100511-01 | Ensure BNL VOMS certificate is accepted for ATLAS users | T2Cs | Closed | 2010-09-21 |
| ||
D-090707-03 | With reference to D-090623-10, T2s and sites to contribute to resiliency page | everyone? http://www.gridpp.ac.uk/wiki/Resiliency_and_Disaster_Planning | 2009-07-14 | Closed | 2010-11-02 | 20-07-09 JC asked sites to add to the page. Review over next month. 02-11-10 Noted no further additions, considered a product of its time. Perhaps return to it at HepSysMan.
| |
D-100616-01 | Post links from HEPSYSMAN talk on "the wiki" | Mingchao | Closed | 2010-11-02 | |||
D-100907-01 | Ask CMS and LHCb for inputs on building an "upcoming events" page for sites (ATLAS sample: http://atlas-speakers-committee.web.cern.ch/atlas-speakers-committee/ConfTalks2010.html). | Jeremy | 2010-09-21 | Closed | 2010-11-02 | No such page avialable. iCal page at http://svr001.gla.scotgrid.ac.uk/cgi-bin/ukidowntime.py list Atlas major dates and other UKI sites downtimes. | |
D-100907-02 | Continue to investigate transfer problems from RAL-NDGF (GGUS 61306, 61835). | Graeme, Gareth | 2010-09-14 | Closed | Took 29 days to identify a faulty line card in CERN OPN router. Atlas have requested a post-mortem as it took far too long | ||
D-101019-02 | GSTAT now contains lists of what each Federation has pledged, each Tier 2 coordinator should check to see that their Tier 2 pledge is correct | T2 coordinators | 2010-10-19 | Closed | 2010-11-02 | ||
D-090721-01 | Produce DN list for dteam for AuthZ | Jeremy | 2009-07-28 | Closed | 2010-11-02 | List available but not online. Still progressing. | |
D-091013-02 | All sites please check their country/ROC designation, http://gstat-prod.cern.ch/gstat/summary/country/; For help, see http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_my_site_information . Also check logical / physical CPU and storage info. | ALL sites | 2009-10-27 | CLOSED | Possible meeting at the next sites meeting. | ||
D-100907-03 | Make a decision on whether to use WMS monitoring a la http://svr031.gla.scotgrid.ac.uk/rbwmsmon/monitoring.html at RAL and IC. | Gareth, Catalin, Daniela, Duncan | 2010-09-21 | CLOSED (2011-02-8 meeting) | 2010-11-02 Not at Imperial - not used enough to justify. | ||
D-101019-01 | Review and document experiment procedures for failed disk servers at Tier-2s | Sam, Brian, Wahid | 2010-10-19 | Closed | 2011-01-12 Relevant pages exist at SRM_File_Loss and SE_Lost_Disk-Server which are being updated in the latter case. | ||
D-110222-01 | Publicise ATLAS sonar test links and presentations | Graeme | 2011-03-01 | Closed | 2011-02-22 | Email send to dteam list. | |
D-110222-02 | Pass site request to be able to ask LHCb pilots not to pickup new work to DIRAC team | Raja | 2011-03-01 | close | |||
D-110222-03 | Find a NorthGrid site willing to suport CERN@SCHOOL VO | Alessandra | 2011-03-1 | closed | Manchester is enabling it | ||
O-130205-01 | Check that the documentation on how to close a ce for downtime is complete. | Steve Jones/CJW & JC | 05-02-2013 | closed | CJW has agreed to review this document when created. Up for discussion in March meeting. Some discussion. SJ documented the options at https://www.gridpp.ac.uk/wiki/Scheduled_Downtimes Can't reach full consensus. Close. |
See also: Deployment Team Action items