RAL Tier1 CASTOR Experiments Completed Actions 2008

From GridPP Wiki
Revision as of 11:55, 13 January 2009 by James thorne (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Actions from RAL Tier1 CASTOR Experiments Actions closed in 2008.

Action ID Priority Experiment(s) Owner Description Status Completed date
A-20071128-03 All Tim Folkes Set up tape migration to give writes priority over reads. Tim to make change. 16/01/2008
A-20071219-03 All Shaun De Witt, Bonny Strong Plan how to operate shared CASTOR instance for small experiments. 16/01/2008
A-20080109-01 LHCb Derek Ross Send list of pools not yet released for LHCb to Raja. 16/01/2008
A-20080109-03 ALICE Andrew Sansum Follow up with John Gordon regarding ALICE CASTOR. 16/01/2008
A-20080109-04 CMS Chris Kruk Free up 3rd CMS disk server for testing. 16/01/2008
A-20080109-05 ATLAS Shaun De Witt, Catalin Condurache Send Frederic a list of files being transferred between tape and disk. 16/01/2008
A-20080116-01 All David Corney Progress Glue Schema issues, espescially information about which versions are being processed. Details required for the next meeting. 23/01/2008
A-20080116-08 LHCB Bonny Strong Two new boxes required to set up new SRM V1s for LHCB to test stripping (Bonny to clarify with Raja and Phillipe Charpentier) 23/01/2008
A-20080116-11 All Andrew Sansum, David Corney Confirm plans to move to ORACLE RAC 23/01/2008
A-20071114-03 All Cheney Ketley Speak to Jonathan Wheeler to try and resolve Nagios problems on CASTOR servers. Nagios configuration done. Investigation of dual RAID cards in disk servers ongoing. 30/01/2008
A-20071212-02 All Jens Jensen, Chris Kruk Set up dynamic publishing of free space information. Aim for the end on January 2008.
2008-01-09: Review after CCRC08 meeting on 10 January.
30/01/2008
A-20080109-06 All I-20070725-01, Andrew Sansum, David Corney, Tim Folkes Progress ordering of extra tape drives. Tape drives ordered. 30/01/2008
A-20080116-09 All Derek Ross Advise Tier2s of CCRC08, summarising the information we currently have. Ongoing. 30/01/2008
A-20080123-01 CMS James Thorne Install gcc and James Jackson's key on the CMS test disk servers. gcc and key installed. James J. has confirmed he can login. 30/01/2008
A-20080123-02 ATLAS Stephen Burke Find out who is CCRC08 contact for ATLAS. 30/01/2008
A-20080123-06 ATLAS Matt Hodges Send list of files transferred via FTS to gdss159 to Catalin. 30/01/2008
A-20071219-01 CMS Tim Folkes Reserve a number of tape drives for CMS test. 06/02/2008
A-20080109-02 LHCb Tim Folkes, Catalin Condurache, Derek Ross Try to reduce time taken to stage & migrate a tape so that LHCb tape migration can be completed by end of January Ongoing 06/02/2008
A-20080116-02 All Bonny Strong, Matt Hodges Check FTS2 upgrade plans prior to CCRC08. Is it doable, usable, stable? Do we need it? Ongoing, waiting on Matt. 06/02/2008
A-20080116-03 BaBar Chris Brew Confirm with Matt when excess BarBar disc has been released for use elsewhere Escalate with BaBar to allow MINOS some disk. All but two NFS servers released. Waiting on Tier1. 06/02/2008
A-20080116-07 LHCb Andrew Sansum Set up access for LHCB server accounts (Raja to send Chris SSH key) Raja's key sent to Chris Kruk. Andrew needs to requests box from Fabric Team. Using a single monitoring key. 06/02/2008
A-20080116-10 CMS Bonny/Shaun Review optimum configuration of disk servers based on James's analysis. Waiting for analysis. 06/02/2008
A-20080123-03 All Andrew Sansum Roll out network routing change, probably during CASTOR downtime on Tuesday, 29/01/2008. Done for all but new CMS servers from Tier2. 06/02/2008
A-20080123-04 CMS James Jackson, Andrew Sansum Run RFIO tests on disk servers with fabric team monitoring. Ongoing. 06/02/2008
A-20080116-06 All Matt Hodges (with Support from Andrew Sansum if required) Confirm disk allocation and deployment timetable is happening this week (16th Jan), and advise the experiments and the CASTOR team of the outcome Received information from ALICE. Need confirmation from Glenn. 13/02/2008
A-20080130-03 All Matt Hodges Track/resolve certificate problems with FTS. Ongoing. Certificates need to be real name of machine. 13/02/2008
A-20080130-04 CMS James Jackson Define everything CMS want on tape. Ongoing. 13/02/2008
A-20080130-07 All Derek Ross Check/review versions of gLite and other software needed for CCRC08 (e.g. on worker nodes) and update if necessary. Done, but worker node update still needs doing. 13/02/2008
A-20080206-03 CMS Derek Ross, Matt Hodges, Jens Jensen Apply for certificates for FTS hosts. See A-20080130-03. 13/02/2008
A-20080206-05 CMS James Jackson Get tape drive stats from CMS 10.5 TB staging test. 13/02/2008
A-20080206-07 ALICE Andrew Sansum Send Shaun the location of the ALICE configuration requirements. 13/02/2008
A-20080206-08 ALICE Bonny Strong, Shaun De Witt CASTOR disk server for ALICE. 13/02/2008
A-20080206-09 All VO reps Check dates for planned upgrades in March and feed back any problems. 13/02/2008
A-20080206-02 Low CMS Chris Brew Determine versions of gSOAP etc. and tell Shaun and Jens. Jens and Shaun informed. Closed. 20/02/2008
A-20080206-04 Low ATLAS Shaun De Witt Check whether stager rm or nsrm happens first (for Brian) as reported and actual free space differ. Closed 20/02/2008
A-20080213-01 LHCb Raja Nandakumar Email Shaun the LHCb dashborad address. It is http://dashboard.cern.ch/lhcb 20/02/2008
A-20080213-03 CMS Bonny Strong The number of job slots on the 13 Tier2 disk servers given to CMS need correcting as currently set to 50. Closed. 20/02/2008
A-20080213-05 All James Jackson, Andrew Sansum Decide on agenda items for meeting with CERN on 18/02/2008. Closed. 20/02/2008
A-20080206-01 Low All Raja Nandakumar LCG infosites to publish space tokens. Raja raised a ticket. Closed. 27/02/2008
A-20080213-04 High All Bonny Strong, Martin Bly Arrange meeting to discuss and plan backplane replacements. Meeting held. Closed. 27/02/2008
A-20080220-03 Medium All Matt Hodges Clarify with Glenn experiment requirements and confirm free space for ATLAS. Done. Closed. 27/02/2008
A-20080220-05 High All Andrew Sansum Publicise scale of March disruptions to PMB. Done. Closed. 27/02/2008
A-20080220-07 Medium All Martin Bly, Bonny Strong Add tuning to deployment procedure. Done. Closed. 27/02/2008
A-20080220-08 Medium All Martin Bly Send info on CMS tuning changes to Bonny. Done. Closed. 05/03/2008
A-20080227-01 High All Derek Ross, Martin Bly Identify how best to notify VOs of backplane plans. Closed. 05/03/2008
A-20080130-02 High All Chris Kruk Investigate HTTP alternative to NFS mount for LSF. Closed. 12/03/2008
A-20080305-01 High All Bonny Strong Review and update upgrade schedule and inform VOs. Due to David Edoh's departure. Closed. 12/03/2008
A-20080206-06 Low ALICE Shaun De Witt Put CERN CASTOR team in touch with RAL to get CASTOR + xrootd cookbook. Cristina chased up CERN re. this. CERN have responded via email to Bonny. Bonny will let Cristina know if the response is what she was looking for.

2008-02-27: Shaun will contact Bonny to see if cookbook is what she was looking for as Cristina would like feedback.

2008-03-12: Closed.

12/03/2008
A-20080220-02 Medium All Andrew Sansum Long term Investigate level at which the Tier1 report available disk to the User Board. 2008-03-05: RAS emailed interested parties; waiting for feedback on suggestions.

2008-03-19: Done and closed.

19/03/2008
A-20080305-02 Medium ATLAS, LHCb Catalin Condurache Estimate completion date for dCache->CASTOR migration. Done: End of April 19/03/2008
A-20080312-01 ATLAS Catalin Condurache Confirm ATLAS VO can be progressed. Closed as nonsensical. 19/3/2008
A-20080312-02 All Experiment reps Review dates in Bonny's plan and confirm OK. Closed. 19/03/2008
A-20080312-03 All Tim Folkes Send list of lost files to James Jackson, Stephen Burke, Frederic Brochu and Shaun De Witt. Closed. 19/03/2008
A-20080220-01 Low All Shaun De Witt Script to determine files in stager but not in name server. Closed 26/03/2008
A-20071017-01 Medium All James Jackson Investigate problems with disk to tape migration rates. Met with CERN developers; got hints for improvement which are being tested.

2008-02-27: Looks like changes to IO scheduler and read ahead value may help.

2008-03-05: Still investigating scheduler and read ahead.

2008-03-19: On hold while James works on other things. Big read aheads and cfq seem to give better performance.

2008-03-26: Still on hold.

2008-04-02: Closed as added as an agenda item.

02/04/2008
A-20080213-02 Medium All David Corney Report back on the power outage post-mortem once complete. Will probably report on 27/02/2008. Report sent. Closed. 16/04/2008
A-20080220-06 Medium All Martin Bly Apply appropriate tuning to existing disk servers. 2008-02-27: CMS done.

2008-03-05: Needs applying to others.

2008-03-19: Will be complete once backplane swaps done.

Closed.

16/04/2008
A-20080305-03 Low All Chris Brew, Matt Hodges Draft text explaining why we configure the FTS the way we do and put on wiki. Closed. 16/04/2008
A-20080319-01 CMS Bonny Strong, Chris Brew Release Tier2/CMS borrowed servers by 8 April. Released. Closed. 16/04/2008
A-20080319-03 CMS Bonny Strong Send list of lost files to CMS. Closed. 16/04/2008
A-20080402-01 High All Shaun De Witt Remove unnecessary announcements from, and add new ones to, GOCDB. Closed. 16/04/2008
A-20080402-02 ATLAS Bonny Strong, Catalin Condurache Try to get more information for tickets like "ATLAS SRM doesn't work". Closed. 16/04/2008
A-20080402-03 LHCb Bonny Strong, Shaun De Witt Create plan for 2.1.6 upgardes. Closed. 16/04/2008
A-20080402-04 All Shaun De Witt Check on status of gdss153. Closed. 16/04/2008
A-20080319-02 Low ATLAS Andrew Sansum, Brian Davies Decide whether to apply WAN tuning to ATLAS T0_raw. Closed. 07/05/2008
A-20080402-05 LHCb Chris Kruk, James Thorne Check on "down" machines in LHCb ganglia (124 and 163). 124 moved to ATLAS. Need to check on 163. Closed. 07/05/2008
A-20080416-01 ALICE Shaun De Witt Check on xrootd progress for ALICE. Closed. 07/05/2008
A-20080430-01 ALICE Derek Ross Check ALICE space token requirements and send to Shaun De Witt. Closed. 07/05/2008
A-20080430-02 ALICE Catalin Condurache, Bonny Strong, Cristina Lazzeroni Follow up xrootd install, meeting on afternoon of 30/4/2008? Closed. 07/05/2008
A-20080130-01 Low All Derek Ross Escalate necessity to be able to put individual instances and components of service into downtime. Ongoing. Need ability to separate individual VOs.

2008-04-30: Will soon have individual instances of services for each VO.

2008-05-21: Closed

21/05/2008
A-20080416-02 High All Chris Kruk Upgrade repack server so Tim can begin testing. Closed 07/05/2008
A-20080507-01 ATLAS Brian Davies Ticket to Tier1 helpdesk to apply WAN tuning to ATLAS disk servers. Closed 07/05/2008
A-20080514-05 LHCb Raja Nandakumar Send name of LHCb tape migration "expert" to James Jackson. Closed 07/05/2008
A-20080514-02 CMS Chris Brew Modify number of streams/files to Taiwan. Shaun would prefer that this was not done until after the ATLAS T1-T1 tests (Fri 16 May).

2008-05-21: Waiting until certificate problems are resolved.

2008-05-28: Closed

28/05/2008
A-20080514-04 ATLAS Stephen Burke Send name of ATLAS tape migration "expert" to James Jackson. Brian Davies. Closed. 28/05/2008
A-20080514-01 Medium LHCb Shaun De Witt Tell Martin which LHCb servers to apply WAN tuning to. Closed. 04/06/2008
A-20080521-02 High BaBar Shaun de Witt Contact CERN developers to find out if one can run both RFIO and xrootd on the same CASTOR disk server (c.f. A-20071219-02). Closed. 04/06/2008
A-20080528-01 ALICE Catalin Condurache Confirm with Cristina that he has all the answers that he needs regarding xrootd. Closed. 04/06/2008
A-20080528-02 LHCb Bonny Strong Look into slow staging from tape in dCache for LHCb. Closed. 04/06/2008
A-20080528-03 LHCb Bonny Strong Send Raja info on how close LHCb are to garbage collection. Closed. 04/06/2008
A-20080116-04 Low All Andrew Sansum, David Corney Review extension of ADS as official back-end to dcache Ongoing. Need cost per month. Closed. 18/6/2008
A-20080123-05 Low All Bonny Strong Script to report files that have been lost on a failed disk server. 85% done. Script reports name server path. Closed. 18/6/2008
A-20080220-04 Low All Jens Jensen Contact CERN to see if they want the information provider developed at RAL. Closed. 18/6/2008
A-20080521-01 All Andrew Sansum Investigate routing to PIC via the OPN. Possibly next week. Fixed. Closed. 18/6/2008
A-20080604-01 All Shaun De Witt Dedicate three drives to each experiment to avoid all being affected if an experiment triggers the "tape hogging" bug. Closed. 18/6/2008
A-20080604-02 ALICE Bonny Strong Email Cristina to update her on state of xrootd plus CASTOR. Closed. 18/6/2008
A-20080618-01 All David Corney, Shaun De Witt, Bonny Strong Check what was decided regarding taking SRMv1 machines out of on-call system and how it was decided. They are no longer calling out. Closed. 25/6/2008
A-20071219-02 High BaBar Shaun De Witt Set up CASTOR server for d0t1 for BaBar on gen instance No disk server. Can we do both RFIO and xrootd on same machine? Closed. 2/7/2008
A-20080116-05 High MINOS Bonny Strong Provide MINOS with scratch disk for testing CASTOR (use "6 month" disk) Need space from that released by BaBar (A-20080116-03). Waiting for disk server. Closed. 2/7/2008
A-20080514-03 Medium All Matt Hodges Gather post-CCRC08 experiment disk provisioning requirements and inform Martin. Closed. 2/7/2008
A-20080625-01 LHCb Shaun De Witt Contact Raja regarding shutting down SRMv1 for LHCb. Closed. 2/7/2008
A-20080604-03 High All Andrew Sansum, Martin Bly Approach experiments to determine their expectations during an LHC downtime. Ongoing. Informal chats so far. Closed. 23/7/2008
A-20080709-01 High All Bonny Strong Conform CASTOR 2.1.7 upgrade time-scale is ok with Dbase group (given RAC committments) Closed. 23/7/2008
A-20080820-03 High ATLAS Bonny Strong, Derek Ross, Tier1 Duty Admin Bonny give Derek and Duty Admin text for annnouncing downtime until 12.00 Wednesday 27 August. Closed. 10/9/2008
A-20080820-04 High ALICE Chris Kruk Put request for remote access for A-20080806-01 into a ticket for Martin. Closed. 10/9/2008
A-20080416-03 Low All Tim Folkes Upgrade all tape servers' RAM to 8 GB. Tried with one server that already has 8 GB RAM. Waiting for sizable transfers to see if it has made a difference. Tests showed no particular improvement. Closed. 17/9/2008
A-20080130-06 Low All Andrew Sansum Track/progress network tuning and/or different network stack to improve rates to remote sites, e.g. FermiLab. Ongoing. Try transfer tests to ASGC. James Jackson has set up test pool. Now waiting on fabric team. Closed as not going to happen for a long while. 17/9/2008
A-20080820-05 Medium LHCb Bonny Strong, Shaun De Witt Rebalance files within LHCb rDST. Taken off line. Closed. 17/9/2008
A-20080910-01 Medium LHCb Shaun De Witt Re-open Raja's ticked regarding problems on Sunday 7 September and investigate. Done. Closed. 17/9/2008
A-20080910-02 Medium ATLAS Brian Davies Set up discussion with James Jackson and Tim Folkes regarding tape families. Closed. 17/9/2008
A-20080806-01 Medium ALICE Cristina Lazzeroni, Shaun De Witt Possibly arrange for an ALICE xrootd expert to have access to machines. See A-20080820-04. Done. 24/9/2008
A-20080820-01 Medium All Jens Jensen, Bonny Strong Deploy latest CASTOR Information Provider (CIP) on Tuesday 26 August. Done. 24/9/2008
A-20080820-02 Medium All Derek Announce CIP deployment at an ops meeting. Done. 24/9/2008
A-20080917-01 High All James Thorne Bring up delay in disk server provisioning at Fabric meeting. Effort from others? Done. 24/9/2008
A-20080924-01 High All Shaun De Witt, Bonny Strong Investigate possiblility of turning off synchronisation of stager database to avoid a repeat of the file deletion problem. Closed. 22/10/2008
A-20081015-02 ILC James Thorne Join Jan Strube (rep for ILC) to weekly circulation list. Closed. 22/10/2008
A-20081015-03 All Matt Hodges Send link pointing to more detailed disc deployment info, including service class information to the distribution list for this meeting. Closed. 22/10/2008
A-20081015-04 All Matt Hodges Ensure that the CASTOR team are routinely (weekly?) advised of new experiment disk allocations, ideally outside and before this meeting, but this may be a sensible new agenda item for this CASTOR call? Closed. 22/10/2008
A-20081015-06 All David Corney Check with CERN about their plans (if any) to move to CASTOR 2.1.9. Closed. 22/10/2008
A-20081015-07 All Matt Hodges Send info on new experiments contacts to James Thorne, who will add them to the circulation list and standard weekly request for reminders to join this meeting. Closed. 22/10/2008
A-20081015-09 All? Shaun De Witt Investigate xrootd transfer failures. Closed. 22/10/2008
A-20080618-02 CMS Chris Brew CMS to test GridFTPv2 internal. Ongoing, waiting for disk servers. On hold until after first run. Closed. 29/10/2008
A-20081015-05 All Experiment reps, Matt Hodges Experiments need to contact Matt formally, by email, with requests for new disc deployment. CMS and ATLAS have agreed to this. LHcb (Raja) also needs to be consulted. Closed. 29/10/2008
A-20081015-01 CMS Chris Brew Send copy of CMS tape service plans to Bonny and Shaun. Done. 29/10/2008
A-20081015-08 All James Thorne Circulate results of report from James Thorne arising from action A-20080130-05 to meeting participants. Done. 12/11/2008
A-20081029-01 LHCb Bonny Strong Provide list of lost files to Raja. Done. 12/11/2008
A-20081029-02 All Shaun De Witt Send summary of cross talk and Big ID problems to John Gordon. Done. 12/11/2008
A-20081029-03 All Bonny Strong Initiate discussion in RAL CASTOR team regarding setting up nonProd as a service class for CIP publishing. Done. 12/11/2008
A-20081112-01 ILC Bonny Strong Give ILC access to disk on the gen instance by the end of this week. Done. 19/11/2008
A-20081112-02 ATLAS Bonny Strong, Shaun De Witt Chase for decision on ATLAS disk sitting in nonProd. Done. 19/11/2008
A-20081112-04 All David Corney Propse moving meeting to Wednesday afternoon. David to send out proposal. Done. 19/11/2008
A-20081119-03 All David Corney, James Thorne Move meeting to 13.30 on Wednesdays. Done. 26/11/2008
A-20081119-01 All Experiment reps. Add experiment test plans to the GridPP wiki and send links to James Thorne for adding to agenda. Closed. 26/11/2008
A-20081022-01 Medium All James Thorne Investigate rumoured performance hit of new 3ware firmware. Performance increase on ext3 filesystems. Done. 10/12/2008
A-20081126-01 All Shaun de Witt, Experiment Reps Co-ordinate testing of upgraded SRM endpoints. Done. 10/12/2008
A-20081126-03 CMS James Jackson Arrange a meetng to discuss testing multiple service classes on one disk pool (with Shaun de Witt, Chris Brew,...). Done. 10/12/2008
A-20081126-05 ALICE Peter Faulkner Ask Cristina to test xrootd on gen instance and let Shaun know whether it's working. Done. 10/12/2008
A-20081126-06 All Andrew Sansum, Martin Bly Circulate plan for move to new building, when complete. Done. 10/12/2008