Difference between revisions of "RAL Tier1 CASTOR Experiments Completed Actions 2008"
Latest revision as of 11:55, 13 January 2009
Actions from RAL Tier1 CASTOR Experiments Actions closed in 2008.
|Action ID||Priority||Experiment(s)||Owner||Description||Status||Completed date|
|A-20071128-03||All||Tim Folkes||Set up tape migration to give writes priority over reads.||Tim to make change.||16/01/2008|
|A-20071219-03||All||Shaun De Witt, Bonny Strong||Plan how to operate shared CASTOR instance for small experiments.||16/01/2008|
|A-20080109-01||LHCb||Derek Ross||Send list of pools not yet released for LHCb to Raja.||16/01/2008|
|A-20080109-03||ALICE||Andrew Sansum||Follow up with John Gordon regarding ALICE CASTOR.||16/01/2008|
|A-20080109-04||CMS||Chris Kruk||Free up 3rd CMS disk server for testing.||16/01/2008|
|A-20080109-05||ATLAS||Shaun De Witt, Catalin Condurache||Send Frederic a list of files being transferred between tape and disk.||16/01/2008|
|A-20080116-01||All||David Corney||Progress Glue Schema issues, espescially information about which versions are being processed. Details required for the next meeting.||23/01/2008|
|A-20080116-08||LHCB||Bonny Strong||Two new boxes required to set up new SRM V1s for LHCB to test stripping (Bonny to clarify with Raja and Phillipe Charpentier)||23/01/2008|
|A-20080116-11||All||Andrew Sansum, David Corney||Confirm plans to move to ORACLE RAC||23/01/2008|
|A-20071114-03||All||Cheney Ketley||Speak to Jonathan Wheeler to try and resolve Nagios problems on CASTOR servers.||Nagios configuration done. Investigation of dual RAID cards in disk servers ongoing.||30/01/2008|
|A-20071212-02||All||Jens Jensen, Chris Kruk||Set up dynamic publishing of free space information.|| Aim for the end on January 2008.
2008-01-09: Review after CCRC08 meeting on 10 January.
|A-20080109-06||All||I-20070725-01, Andrew Sansum, David Corney, Tim Folkes||Progress ordering of extra tape drives.||Tape drives ordered.||30/01/2008|
|A-20080116-09||All||Derek Ross||Advise Tier2s of CCRC08, summarising the information we currently have.||Ongoing.||30/01/2008|
|A-20080123-01||CMS||James Thorne||Install gcc and James Jackson's key on the CMS test disk servers.||gcc and key installed. James J. has confirmed he can login.||30/01/2008|
|A-20080123-02||ATLAS||Stephen Burke||Find out who is CCRC08 contact for ATLAS.||30/01/2008|
|A-20080123-06||ATLAS||Matt Hodges||Send list of files transferred via FTS to gdss159 to Catalin.||30/01/2008|
|A-20071219-01||CMS||Tim Folkes||Reserve a number of tape drives for CMS test.||06/02/2008|
|A-20080109-02||LHCb||Tim Folkes, Catalin Condurache, Derek Ross||Try to reduce time taken to stage & migrate a tape so that LHCb tape migration can be completed by end of January||Ongoing||06/02/2008|
|A-20080116-02||All||Bonny Strong, Matt Hodges||Check FTS2 upgrade plans prior to CCRC08. Is it doable, usable, stable? Do we need it?||Ongoing, waiting on Matt.||06/02/2008|
|A-20080116-03||BaBar||Chris Brew||Confirm with Matt when excess BarBar disc has been released for use elsewhere||Escalate with BaBar to allow MINOS some disk. All but two NFS servers released. Waiting on Tier1.||06/02/2008|
|A-20080116-07||LHCb||Andrew Sansum||Set up access for LHCB server accounts (Raja to send Chris SSH key)||Raja's key sent to Chris Kruk. Andrew needs to requests box from Fabric Team. Using a single monitoring key.||06/02/2008|
|A-20080116-10||CMS||Bonny/Shaun||Review optimum configuration of disk servers based on James's analysis.||Waiting for analysis.||06/02/2008|
|A-20080123-03||All||Andrew Sansum||Roll out network routing change, probably during CASTOR downtime on Tuesday, 29/01/2008.||Done for all but new CMS servers from Tier2.||06/02/2008|
|A-20080123-04||CMS||James Jackson, Andrew Sansum||Run RFIO tests on disk servers with fabric team monitoring.||Ongoing.||06/02/2008|
|A-20080116-06||All||Matt Hodges (with Support from Andrew Sansum if required)||Confirm disk allocation and deployment timetable is happening this week (16th Jan), and advise the experiments and the CASTOR team of the outcome||Received information from ALICE. Need confirmation from Glenn.||13/02/2008|
|A-20080130-03||All||Matt Hodges||Track/resolve certificate problems with FTS.||Ongoing. Certificates need to be real name of machine.||13/02/2008|
|A-20080130-04||CMS||James Jackson||Define everything CMS want on tape.||Ongoing.||13/02/2008|
|A-20080130-07||All||Derek Ross||Check/review versions of gLite and other software needed for CCRC08 (e.g. on worker nodes) and update if necessary.||Done, but worker node update still needs doing.||13/02/2008|
|A-20080206-03||CMS||Derek Ross, Matt Hodges, Jens Jensen||Apply for certificates for FTS hosts.||See A-20080130-03.||13/02/2008|
|A-20080206-05||CMS||James Jackson||Get tape drive stats from CMS 10.5 TB staging test.||13/02/2008|
|A-20080206-07||ALICE||Andrew Sansum||Send Shaun the location of the ALICE configuration requirements.||13/02/2008|
|A-20080206-08||ALICE||Bonny Strong, Shaun De Witt||CASTOR disk server for ALICE.||13/02/2008|
|A-20080206-09||All||VO reps||Check dates for planned upgrades in March and feed back any problems.||13/02/2008|
|A-20080206-02||Low||CMS||Chris Brew||Determine versions of gSOAP etc. and tell Shaun and Jens.||Jens and Shaun informed. Closed.||20/02/2008|
|A-20080206-04||Low||ATLAS||Shaun De Witt||Check whether stager rm or nsrm happens first (for Brian) as reported and actual free space differ.||Closed||20/02/2008|
|A-20080213-01||LHCb||Raja Nandakumar||Email Shaun the LHCb dashborad address.||It is http://dashboard.cern.ch/lhcb||20/02/2008|
|A-20080213-03||CMS||Bonny Strong||The number of job slots on the 13 Tier2 disk servers given to CMS need correcting as currently set to 50.||Closed.||20/02/2008|
|A-20080213-05||All||James Jackson, Andrew Sansum||Decide on agenda items for meeting with CERN on 18/02/2008.||Closed.||20/02/2008|
|A-20080206-01||Low||All||Raja Nandakumar||LCG infosites to publish space tokens.||Raja raised a ticket. Closed.||27/02/2008|
|A-20080213-04||High||All||Bonny Strong, Martin Bly||Arrange meeting to discuss and plan backplane replacements.||Meeting held. Closed.||27/02/2008|
|A-20080220-03||Medium||All||Matt Hodges||Clarify with Glenn experiment requirements and confirm free space for ATLAS.||Done. Closed.||27/02/2008|
|A-20080220-05||High||All||Andrew Sansum||Publicise scale of March disruptions to PMB.||Done. Closed.||27/02/2008|
|A-20080220-07||Medium||All||Martin Bly, Bonny Strong||Add tuning to deployment procedure.||Done. Closed.||27/02/2008|
|A-20080220-08||Medium||All||Martin Bly||Send info on CMS tuning changes to Bonny.||Done. Closed.||05/03/2008|
|A-20080227-01||High||All||Derek Ross, Martin Bly||Identify how best to notify VOs of backplane plans.||Closed.||05/03/2008|
|A-20080130-02||High||All||Chris Kruk||Investigate HTTP alternative to NFS mount for LSF.||Closed.||12/03/2008|
|A-20080305-01||High||All||Bonny Strong||Review and update upgrade schedule and inform VOs.||Due to David Edoh's departure. Closed.||12/03/2008|
|A-20080206-06||Low||ALICE||Shaun De Witt||Put CERN CASTOR team in touch with RAL to get CASTOR + xrootd cookbook.|| Cristina chased up CERN re. this. CERN have responded via email to Bonny. Bonny will let Cristina know if the response is what she was looking for.
2008-02-27: Shaun will contact Bonny to see if cookbook is what she was looking for as Cristina would like feedback.
|A-20080220-02||Medium||All||Andrew Sansum||Long term Investigate level at which the Tier1 report available disk to the User Board.|| 2008-03-05: RAS emailed interested parties; waiting for feedback on suggestions.
2008-03-19: Done and closed.
|A-20080305-02||Medium||ATLAS, LHCb||Catalin Condurache||Estimate completion date for dCache->CASTOR migration.||Done: End of April||19/03/2008|
|A-20080312-01||ATLAS||Catalin Condurache||Confirm ATLAS VO can be progressed.||Closed as nonsensical.||19/3/2008|
|A-20080312-02||All||Experiment reps||Review dates in Bonny's plan and confirm OK.||Closed.||19/03/2008|
|A-20080312-03||All||Tim Folkes||Send list of lost files to James Jackson, Stephen Burke, Frederic Brochu and Shaun De Witt.||Closed.||19/03/2008|
|A-20080220-01||Low||All||Shaun De Witt||Script to determine files in stager but not in name server.||Closed||26/03/2008|
|A-20071017-01||Medium||All||James Jackson||Investigate problems with disk to tape migration rates.|| Met with CERN developers; got hints for improvement which are being tested.
2008-02-27: Looks like changes to IO scheduler and read ahead value may help.
2008-03-05: Still investigating scheduler and read ahead.
2008-03-19: On hold while James works on other things. Big read aheads and cfq seem to give better performance.
2008-03-26: Still on hold.
2008-04-02: Closed as added as an agenda item.
|A-20080213-02||Medium||All||David Corney||Report back on the power outage post-mortem once complete.||Will probably report on 27/02/2008. Report sent. Closed.||16/04/2008|
|A-20080220-06||Medium||All||Martin Bly||Apply appropriate tuning to existing disk servers.|| 2008-02-27: CMS done.
2008-03-05: Needs applying to others.
2008-03-19: Will be complete once backplane swaps done.
|A-20080305-03||Low||All||Chris Brew, Matt Hodges||Draft text explaining why we configure the FTS the way we do and put on wiki.||Closed.||16/04/2008|
|A-20080319-01||CMS||Bonny Strong, Chris Brew||Release Tier2/CMS borrowed servers by 8 April.||Released. Closed.||16/04/2008|
|A-20080319-03||CMS||Bonny Strong||Send list of lost files to CMS.||Closed.||16/04/2008|
|A-20080402-01||High||All||Shaun De Witt||Remove unnecessary announcements from, and add new ones to, GOCDB.||Closed.||16/04/2008|
|A-20080402-02||ATLAS||Bonny Strong, Catalin Condurache||Try to get more information for tickets like "ATLAS SRM doesn't work".||Closed.||16/04/2008|
|A-20080402-03||LHCb||Bonny Strong, Shaun De Witt||Create plan for 2.1.6 upgardes.||Closed.||16/04/2008|
|A-20080402-04||All||Shaun De Witt||Check on status of gdss153.||Closed.||16/04/2008|
|A-20080319-02||Low||ATLAS||Andrew Sansum, Brian Davies||Decide whether to apply WAN tuning to ATLAS T0_raw.||Closed.||07/05/2008|
|A-20080402-05||LHCb||Chris Kruk, James Thorne||Check on "down" machines in LHCb ganglia (124 and 163).||124 moved to ATLAS. Need to check on 163. Closed.||07/05/2008|
|A-20080416-01||ALICE||Shaun De Witt||Check on xrootd progress for ALICE.||Closed.||07/05/2008|
|A-20080430-01||ALICE||Derek Ross||Check ALICE space token requirements and send to Shaun De Witt.||Closed.||07/05/2008|
|A-20080430-02||ALICE||Catalin Condurache, Bonny Strong, Cristina Lazzeroni||Follow up xrootd install, meeting on afternoon of 30/4/2008?||Closed.||07/05/2008|
|A-20080130-01||Low||All||Derek Ross||Escalate necessity to be able to put individual instances and components of service into downtime.|| Ongoing. Need ability to separate individual VOs.
2008-04-30: Will soon have individual instances of services for each VO.
|A-20080416-02||High||All||Chris Kruk||Upgrade repack server so Tim can begin testing.||Closed||07/05/2008|
|A-20080507-01||ATLAS||Brian Davies||Ticket to Tier1 helpdesk to apply WAN tuning to ATLAS disk servers.||Closed||07/05/2008|
|A-20080514-05||LHCb||Raja Nandakumar||Send name of LHCb tape migration "expert" to James Jackson.||Closed||07/05/2008|
|A-20080514-02||CMS||Chris Brew||Modify number of streams/files to Taiwan.|| Shaun would prefer that this was not done until after the ATLAS T1-T1 tests (Fri 16 May).
2008-05-21: Waiting until certificate problems are resolved.
|A-20080514-04||ATLAS||Stephen Burke||Send name of ATLAS tape migration "expert" to James Jackson.||Brian Davies. Closed.||28/05/2008|
|A-20080514-01||Medium||LHCb||Shaun De Witt||Tell Martin which LHCb servers to apply WAN tuning to.||Closed.||04/06/2008|
|A-20080521-02||High||BaBar||Shaun de Witt||Contact CERN developers to find out if one can run both RFIO and xrootd on the same CASTOR disk server (c.f. A-20071219-02).||Closed.||04/06/2008|
|A-20080528-01||ALICE||Catalin Condurache||Confirm with Cristina that he has all the answers that he needs regarding xrootd.||Closed.||04/06/2008|
|A-20080528-02||LHCb||Bonny Strong||Look into slow staging from tape in dCache for LHCb.||Closed.||04/06/2008|
|A-20080528-03||LHCb||Bonny Strong||Send Raja info on how close LHCb are to garbage collection.||Closed.||04/06/2008|
|A-20080116-04||Low||All||Andrew Sansum, David Corney||Review extension of ADS as official back-end to dcache||Ongoing. Need cost per month. Closed.||18/6/2008|
|A-20080123-05||Low||All||Bonny Strong||Script to report files that have been lost on a failed disk server.||85% done. Script reports name server path. Closed.||18/6/2008|
|A-20080220-04||Low||All||Jens Jensen||Contact CERN to see if they want the information provider developed at RAL.||Closed.||18/6/2008|
|A-20080521-01||All||Andrew Sansum||Investigate routing to PIC via the OPN.||Possibly next week. Fixed. Closed.||18/6/2008|
|A-20080604-01||All||Shaun De Witt||Dedicate three drives to each experiment to avoid all being affected if an experiment triggers the "tape hogging" bug.||Closed.||18/6/2008|
|A-20080604-02||ALICE||Bonny Strong||Email Cristina to update her on state of xrootd plus CASTOR.||Closed.||18/6/2008|
|A-20080618-01||All||David Corney, Shaun De Witt, Bonny Strong||Check what was decided regarding taking SRMv1 machines out of on-call system and how it was decided.||They are no longer calling out. Closed.||25/6/2008|
|A-20071219-02||High||BaBar||Shaun De Witt||Set up CASTOR server for d0t1 for BaBar on gen instance||No disk server. Can we do both RFIO and xrootd on same machine? Closed.||2/7/2008|
|A-20080116-05||High||MINOS||Bonny Strong||Provide MINOS with scratch disk for testing CASTOR (use "6 month" disk)||Need space from that released by BaBar (A-20080116-03). Waiting for disk server. Closed.||2/7/2008|
|A-20080514-03||Medium||All||Matt Hodges||Gather post-CCRC08 experiment disk provisioning requirements and inform Martin.||Closed.||2/7/2008|
|A-20080625-01||LHCb||Shaun De Witt||Contact Raja regarding shutting down SRMv1 for LHCb.||Closed.||2/7/2008|
|A-20080604-03||High||All||Andrew Sansum, Martin Bly||Approach experiments to determine their expectations during an LHC downtime.||Ongoing. Informal chats so far. Closed.||23/7/2008|
|A-20080709-01||High||All||Bonny Strong||Conform CASTOR 2.1.7 upgrade time-scale is ok with Dbase group (given RAC committments)||Closed.||23/7/2008|
|A-20080820-03||High||ATLAS||Bonny Strong, Derek Ross, Tier1 Duty Admin||Bonny give Derek and Duty Admin text for annnouncing downtime until 12.00 Wednesday 27 August.||Closed.||10/9/2008|
|A-20080820-04||High||ALICE||Chris Kruk||Put request for remote access for A-20080806-01 into a ticket for Martin.||Closed.||10/9/2008|
|A-20080416-03||Low||All||Tim Folkes||Upgrade all tape servers' RAM to 8 GB.||Tried with one server that already has 8 GB RAM. Waiting for sizable transfers to see if it has made a difference. Tests showed no particular improvement. Closed.||17/9/2008|
|A-20080130-06||Low||All||Andrew Sansum||Track/progress network tuning and/or different network stack to improve rates to remote sites, e.g. FermiLab.||Ongoing. Try transfer tests to ASGC. James Jackson has set up test pool. Now waiting on fabric team. Closed as not going to happen for a long while.||17/9/2008|
|A-20080820-05||Medium||LHCb||Bonny Strong, Shaun De Witt||Rebalance files within LHCb rDST.||Taken off line. Closed.||17/9/2008|
|A-20080910-01||Medium||LHCb||Shaun De Witt||Re-open Raja's ticked regarding problems on Sunday 7 September and investigate.||Done. Closed.||17/9/2008|
|A-20080910-02||Medium||ATLAS||Brian Davies||Set up discussion with James Jackson and Tim Folkes regarding tape families.||Closed.||17/9/2008|
|A-20080806-01||Medium||ALICE||Cristina Lazzeroni, Shaun De Witt||Possibly arrange for an ALICE xrootd expert to have access to machines.||See A-20080820-04. Done.||24/9/2008|
|A-20080820-01||Medium||All||Jens Jensen, Bonny Strong||Deploy latest CASTOR Information Provider (CIP) on Tuesday 26 August.||Done.||24/9/2008|
|A-20080820-02||Medium||All||Derek||Announce CIP deployment at an ops meeting.||Done.||24/9/2008|
|A-20080917-01||High||All||James Thorne||Bring up delay in disk server provisioning at Fabric meeting. Effort from others?||Done.||24/9/2008|
|A-20080924-01||High||All||Shaun De Witt, Bonny Strong||Investigate possiblility of turning off synchronisation of stager database to avoid a repeat of the file deletion problem.||Closed.||22/10/2008|
|A-20081015-02||ILC||James Thorne||Join Jan Strube (rep for ILC) to weekly circulation list.||Closed.||22/10/2008|
|A-20081015-03||All||Matt Hodges||Send link pointing to more detailed disc deployment info, including service class information to the distribution list for this meeting.||Closed.||22/10/2008|
|A-20081015-04||All||Matt Hodges||Ensure that the CASTOR team are routinely (weekly?) advised of new experiment disk allocations, ideally outside and before this meeting, but this may be a sensible new agenda item for this CASTOR call?||Closed.||22/10/2008|
|A-20081015-06||All||David Corney||Check with CERN about their plans (if any) to move to CASTOR 2.1.9.||Closed.||22/10/2008|
|A-20081015-07||All||Matt Hodges||Send info on new experiments contacts to James Thorne, who will add them to the circulation list and standard weekly request for reminders to join this meeting.||Closed.||22/10/2008|
|A-20081015-09||All?||Shaun De Witt||Investigate xrootd transfer failures.||Closed.||22/10/2008|
|A-20080618-02||CMS||Chris Brew||CMS to test GridFTPv2 internal.||Ongoing, waiting for disk servers. On hold until after first run. Closed.||29/10/2008|
|A-20081015-05||All||Experiment reps, Matt Hodges||Experiments need to contact Matt formally, by email, with requests for new disc deployment. CMS and ATLAS have agreed to this. LHcb (Raja) also needs to be consulted.||Closed.||29/10/2008|
|A-20081015-01||CMS||Chris Brew||Send copy of CMS tape service plans to Bonny and Shaun.||Done.||29/10/2008|
|A-20081015-08||All||James Thorne||Circulate results of report from James Thorne arising from action A-20080130-05 to meeting participants.||Done.||12/11/2008|
|A-20081029-01||LHCb||Bonny Strong||Provide list of lost files to Raja.||Done.||12/11/2008|
|A-20081029-02||All||Shaun De Witt||Send summary of cross talk and Big ID problems to John Gordon.||Done.||12/11/2008|
|A-20081029-03||All||Bonny Strong||Initiate discussion in RAL CASTOR team regarding setting up nonProd as a service class for CIP publishing.||Done.||12/11/2008|
|A-20081112-01||ILC||Bonny Strong||Give ILC access to disk on the gen instance by the end of this week.||Done.||19/11/2008|
|A-20081112-02||ATLAS||Bonny Strong, Shaun De Witt||Chase for decision on ATLAS disk sitting in nonProd.||Done.||19/11/2008|
|A-20081112-04||All||David Corney||Propse moving meeting to Wednesday afternoon.||David to send out proposal. Done.||19/11/2008|
|A-20081119-03||All||David Corney, James Thorne||Move meeting to 13.30 on Wednesdays.||Done.||26/11/2008|
|A-20081119-01||All||Experiment reps.||Add experiment test plans to the GridPP wiki and send links to James Thorne for adding to agenda.||Closed.||26/11/2008|
|A-20081022-01||Medium||All||James Thorne||Investigate rumoured performance hit of new 3ware firmware.||Performance increase on ext3 filesystems. Done.||10/12/2008|
|A-20081126-01||All||Shaun de Witt, Experiment Reps||Co-ordinate testing of upgraded SRM endpoints.||Done.||10/12/2008|
|A-20081126-03||CMS||James Jackson||Arrange a meetng to discuss testing multiple service classes on one disk pool (with Shaun de Witt, Chris Brew,...).||Done.||10/12/2008|
|A-20081126-05||ALICE||Peter Faulkner||Ask Cristina to test xrootd on gen instance and let Shaun know whether it's working.||Done.||10/12/2008|
|A-20081126-06||All||Andrew Sansum, Martin Bly||Circulate plan for move to new building, when complete.||Done.||10/12/2008|