RAL Tier1 CASTOR Experiments Completed Actions 2008
Actions from RAL Tier1 CASTOR Experiments Actions closed in 2008.
Action ID | Priority | Experiment(s) | Owner | Description | Status | Completed date |
---|---|---|---|---|---|---|
A-20071128-03 | All | Tim Folkes | Set up tape migration to give writes priority over reads. | Tim to make change. | 16/01/2008 | |
A-20071219-03 | All | Shaun De Witt, Bonny Strong | Plan how to operate shared CASTOR instance for small experiments. | 16/01/2008 | ||
A-20080109-01 | LHCb | Derek Ross | Send list of pools not yet released for LHCb to Raja. | 16/01/2008 | ||
A-20080109-03 | ALICE | Andrew Sansum | Follow up with John Gordon regarding ALICE CASTOR. | 16/01/2008 | ||
A-20080109-04 | CMS | Chris Kruk | Free up 3rd CMS disk server for testing. | 16/01/2008 | ||
A-20080109-05 | ATLAS | Shaun De Witt, Catalin Condurache | Send Frederic a list of files being transferred between tape and disk. | 16/01/2008 | ||
A-20080116-01 | All | David Corney | Progress Glue Schema issues, espescially information about which versions are being processed. Details required for the next meeting. | 23/01/2008 | ||
A-20080116-08 | LHCB | Bonny Strong | Two new boxes required to set up new SRM V1s for LHCB to test stripping (Bonny to clarify with Raja and Phillipe Charpentier) | 23/01/2008 | ||
A-20080116-11 | All | Andrew Sansum, David Corney | Confirm plans to move to ORACLE RAC | 23/01/2008 | ||
A-20071114-03 | All | Cheney Ketley | Speak to Jonathan Wheeler to try and resolve Nagios problems on CASTOR servers. | Nagios configuration done. Investigation of dual RAID cards in disk servers ongoing. | 30/01/2008 | |
A-20071212-02 | All | Jens Jensen, Chris Kruk | Set up dynamic publishing of free space information. | Aim for the end on January 2008. 2008-01-09: Review after CCRC08 meeting on 10 January. |
30/01/2008 | |
A-20080109-06 | All | I-20070725-01, Andrew Sansum, David Corney, Tim Folkes | Progress ordering of extra tape drives. | Tape drives ordered. | 30/01/2008 | |
A-20080116-09 | All | Derek Ross | Advise Tier2s of CCRC08, summarising the information we currently have. | Ongoing. | 30/01/2008 | |
A-20080123-01 | CMS | James Thorne | Install gcc and James Jackson's key on the CMS test disk servers. | gcc and key installed. James J. has confirmed he can login. | 30/01/2008 | |
A-20080123-02 | ATLAS | Stephen Burke | Find out who is CCRC08 contact for ATLAS. | 30/01/2008 | ||
A-20080123-06 | ATLAS | Matt Hodges | Send list of files transferred via FTS to gdss159 to Catalin. | 30/01/2008 | ||
A-20071219-01 | CMS | Tim Folkes | Reserve a number of tape drives for CMS test. | 06/02/2008 | ||
A-20080109-02 | LHCb | Tim Folkes, Catalin Condurache, Derek Ross | Try to reduce time taken to stage & migrate a tape so that LHCb tape migration can be completed by end of January | Ongoing | 06/02/2008 | |
A-20080116-02 | All | Bonny Strong, Matt Hodges | Check FTS2 upgrade plans prior to CCRC08. Is it doable, usable, stable? Do we need it? | Ongoing, waiting on Matt. | 06/02/2008 | |
A-20080116-03 | BaBar | Chris Brew | Confirm with Matt when excess BarBar disc has been released for use elsewhere | Escalate with BaBar to allow MINOS some disk. All but two NFS servers released. Waiting on Tier1. | 06/02/2008 | |
A-20080116-07 | LHCb | Andrew Sansum | Set up access for LHCB server accounts (Raja to send Chris SSH key) | Raja's key sent to Chris Kruk. Andrew needs to requests box from Fabric Team. Using a single monitoring key. | 06/02/2008 | |
A-20080116-10 | CMS | Bonny/Shaun | Review optimum configuration of disk servers based on James's analysis. | Waiting for analysis. | 06/02/2008 | |
A-20080123-03 | All | Andrew Sansum | Roll out network routing change, probably during CASTOR downtime on Tuesday, 29/01/2008. | Done for all but new CMS servers from Tier2. | 06/02/2008 | |
A-20080123-04 | CMS | James Jackson, Andrew Sansum | Run RFIO tests on disk servers with fabric team monitoring. | Ongoing. | 06/02/2008 | |
A-20080116-06 | All | Matt Hodges (with Support from Andrew Sansum if required) | Confirm disk allocation and deployment timetable is happening this week (16th Jan), and advise the experiments and the CASTOR team of the outcome | Received information from ALICE. Need confirmation from Glenn. | 13/02/2008 | |
A-20080130-03 | All | Matt Hodges | Track/resolve certificate problems with FTS. | Ongoing. Certificates need to be real name of machine. | 13/02/2008 | |
A-20080130-04 | CMS | James Jackson | Define everything CMS want on tape. | Ongoing. | 13/02/2008 | |
A-20080130-07 | All | Derek Ross | Check/review versions of gLite and other software needed for CCRC08 (e.g. on worker nodes) and update if necessary. | Done, but worker node update still needs doing. | 13/02/2008 | |
A-20080206-03 | CMS | Derek Ross, Matt Hodges, Jens Jensen | Apply for certificates for FTS hosts. | See A-20080130-03. | 13/02/2008 | |
A-20080206-05 | CMS | James Jackson | Get tape drive stats from CMS 10.5 TB staging test. | 13/02/2008 | ||
A-20080206-07 | ALICE | Andrew Sansum | Send Shaun the location of the ALICE configuration requirements. | 13/02/2008 | ||
A-20080206-08 | ALICE | Bonny Strong, Shaun De Witt | CASTOR disk server for ALICE. | 13/02/2008 | ||
A-20080206-09 | All | VO reps | Check dates for planned upgrades in March and feed back any problems. | 13/02/2008 | ||
A-20080206-02 | Low | CMS | Chris Brew | Determine versions of gSOAP etc. and tell Shaun and Jens. | Jens and Shaun informed. Closed. | 20/02/2008 |
A-20080206-04 | Low | ATLAS | Shaun De Witt | Check whether stager rm or nsrm happens first (for Brian) as reported and actual free space differ. | Closed | 20/02/2008 |
A-20080213-01 | LHCb | Raja Nandakumar | Email Shaun the LHCb dashborad address. | It is http://dashboard.cern.ch/lhcb | 20/02/2008 | |
A-20080213-03 | CMS | Bonny Strong | The number of job slots on the 13 Tier2 disk servers given to CMS need correcting as currently set to 50. | Closed. | 20/02/2008 | |
A-20080213-05 | All | James Jackson, Andrew Sansum | Decide on agenda items for meeting with CERN on 18/02/2008. | Closed. | 20/02/2008 | |
A-20080206-01 | Low | All | Raja Nandakumar | LCG infosites to publish space tokens. | Raja raised a ticket. Closed. | 27/02/2008 |
A-20080213-04 | High | All | Bonny Strong, Martin Bly | Arrange meeting to discuss and plan backplane replacements. | Meeting held. Closed. | 27/02/2008 |
A-20080220-03 | Medium | All | Matt Hodges | Clarify with Glenn experiment requirements and confirm free space for ATLAS. | Done. Closed. | 27/02/2008 |
A-20080220-05 | High | All | Andrew Sansum | Publicise scale of March disruptions to PMB. | Done. Closed. | 27/02/2008 |
A-20080220-07 | Medium | All | Martin Bly, Bonny Strong | Add tuning to deployment procedure. | Done. Closed. | 27/02/2008 |
A-20080220-08 | Medium | All | Martin Bly | Send info on CMS tuning changes to Bonny. | Done. Closed. | 05/03/2008 |
A-20080227-01 | High | All | Derek Ross, Martin Bly | Identify how best to notify VOs of backplane plans. | Closed. | 05/03/2008 |
A-20080130-02 | High | All | Chris Kruk | Investigate HTTP alternative to NFS mount for LSF. | Closed. | 12/03/2008 |
A-20080305-01 | High | All | Bonny Strong | Review and update upgrade schedule and inform VOs. | Due to David Edoh's departure. Closed. | 12/03/2008 |
A-20080206-06 | Low | ALICE | Shaun De Witt | Put CERN CASTOR team in touch with RAL to get CASTOR + xrootd cookbook. | Cristina chased up CERN re. this. CERN have responded via email to Bonny. Bonny will let Cristina know if the response is what she was looking for.
2008-02-27: Shaun will contact Bonny to see if cookbook is what she was looking for as Cristina would like feedback. 2008-03-12: Closed. |
12/03/2008 |
A-20080220-02 | Medium | All | Andrew Sansum | Long term Investigate level at which the Tier1 report available disk to the User Board. | 2008-03-05: RAS emailed interested parties; waiting for feedback on suggestions.
2008-03-19: Done and closed. |
19/03/2008 |
A-20080305-02 | Medium | ATLAS, LHCb | Catalin Condurache | Estimate completion date for dCache->CASTOR migration. | Done: End of April | 19/03/2008 |
A-20080312-01 | ATLAS | Catalin Condurache | Confirm ATLAS VO can be progressed. | Closed as nonsensical. | 19/3/2008 | |
A-20080312-02 | All | Experiment reps | Review dates in Bonny's plan and confirm OK. | Closed. | 19/03/2008 | |
A-20080312-03 | All | Tim Folkes | Send list of lost files to James Jackson, Stephen Burke, Frederic Brochu and Shaun De Witt. | Closed. | 19/03/2008 | |
A-20080220-01 | Low | All | Shaun De Witt | Script to determine files in stager but not in name server. | Closed | 26/03/2008 |
A-20071017-01 | Medium | All | James Jackson | Investigate problems with disk to tape migration rates. | Met with CERN developers; got hints for improvement which are being tested.
2008-02-27: Looks like changes to IO scheduler and read ahead value may help. 2008-03-05: Still investigating scheduler and read ahead. 2008-03-19: On hold while James works on other things. Big read aheads and cfq seem to give better performance. 2008-03-26: Still on hold. 2008-04-02: Closed as added as an agenda item. |
02/04/2008 |
A-20080213-02 | Medium | All | David Corney | Report back on the power outage post-mortem once complete. | Will probably report on 27/02/2008. Report sent. Closed. | 16/04/2008 |
A-20080220-06 | Medium | All | Martin Bly | Apply appropriate tuning to existing disk servers. | 2008-02-27: CMS done.
2008-03-05: Needs applying to others. 2008-03-19: Will be complete once backplane swaps done. Closed. |
16/04/2008 |
A-20080305-03 | Low | All | Chris Brew, Matt Hodges | Draft text explaining why we configure the FTS the way we do and put on wiki. | Closed. | 16/04/2008 |
A-20080319-01 | CMS | Bonny Strong, Chris Brew | Release Tier2/CMS borrowed servers by 8 April. | Released. Closed. | 16/04/2008 | |
A-20080319-03 | CMS | Bonny Strong | Send list of lost files to CMS. | Closed. | 16/04/2008 | |
A-20080402-01 | High | All | Shaun De Witt | Remove unnecessary announcements from, and add new ones to, GOCDB. | Closed. | 16/04/2008 |
A-20080402-02 | ATLAS | Bonny Strong, Catalin Condurache | Try to get more information for tickets like "ATLAS SRM doesn't work". | Closed. | 16/04/2008 | |
A-20080402-03 | LHCb | Bonny Strong, Shaun De Witt | Create plan for 2.1.6 upgardes. | Closed. | 16/04/2008 | |
A-20080402-04 | All | Shaun De Witt | Check on status of gdss153. | Closed. | 16/04/2008 | |
A-20080319-02 | Low | ATLAS | Andrew Sansum, Brian Davies | Decide whether to apply WAN tuning to ATLAS T0_raw. | Closed. | 07/05/2008 |
A-20080402-05 | LHCb | Chris Kruk, James Thorne | Check on "down" machines in LHCb ganglia (124 and 163). | 124 moved to ATLAS. Need to check on 163. Closed. | 07/05/2008 | |
A-20080416-01 | ALICE | Shaun De Witt | Check on xrootd progress for ALICE. | Closed. | 07/05/2008 | |
A-20080430-01 | ALICE | Derek Ross | Check ALICE space token requirements and send to Shaun De Witt. | Closed. | 07/05/2008 | |
A-20080430-02 | ALICE | Catalin Condurache, Bonny Strong, Cristina Lazzeroni | Follow up xrootd install, meeting on afternoon of 30/4/2008? | Closed. | 07/05/2008 | |
A-20080130-01 | Low | All | Derek Ross | Escalate necessity to be able to put individual instances and components of service into downtime. | Ongoing. Need ability to separate individual VOs.
2008-04-30: Will soon have individual instances of services for each VO. 2008-05-21: Closed |
21/05/2008 |
A-20080416-02 | High | All | Chris Kruk | Upgrade repack server so Tim can begin testing. | Closed | 07/05/2008 |
A-20080507-01 | ATLAS | Brian Davies | Ticket to Tier1 helpdesk to apply WAN tuning to ATLAS disk servers. | Closed | 07/05/2008 | |
A-20080514-05 | LHCb | Raja Nandakumar | Send name of LHCb tape migration "expert" to James Jackson. | Closed | 07/05/2008 | |
A-20080514-02 | CMS | Chris Brew | Modify number of streams/files to Taiwan. | Shaun would prefer that this was not done until after the ATLAS T1-T1 tests (Fri 16 May).
2008-05-21: Waiting until certificate problems are resolved. 2008-05-28: Closed |
28/05/2008 | |
A-20080514-04 | ATLAS | Stephen Burke | Send name of ATLAS tape migration "expert" to James Jackson. | Brian Davies. Closed. | 28/05/2008 | |
A-20080514-01 | Medium | LHCb | Shaun De Witt | Tell Martin which LHCb servers to apply WAN tuning to. | Closed. | 04/06/2008 |
A-20080521-02 | High | BaBar | Shaun de Witt | Contact CERN developers to find out if one can run both RFIO and xrootd on the same CASTOR disk server (c.f. A-20071219-02). | Closed. | 04/06/2008 |
A-20080528-01 | ALICE | Catalin Condurache | Confirm with Cristina that he has all the answers that he needs regarding xrootd. | Closed. | 04/06/2008 | |
A-20080528-02 | LHCb | Bonny Strong | Look into slow staging from tape in dCache for LHCb. | Closed. | 04/06/2008 | |
A-20080528-03 | LHCb | Bonny Strong | Send Raja info on how close LHCb are to garbage collection. | Closed. | 04/06/2008 | |
A-20080116-04 | Low | All | Andrew Sansum, David Corney | Review extension of ADS as official back-end to dcache | Ongoing. Need cost per month. Closed. | 18/6/2008 |
A-20080123-05 | Low | All | Bonny Strong | Script to report files that have been lost on a failed disk server. | 85% done. Script reports name server path. Closed. | 18/6/2008 |
A-20080220-04 | Low | All | Jens Jensen | Contact CERN to see if they want the information provider developed at RAL. | Closed. | 18/6/2008 |
A-20080521-01 | All | Andrew Sansum | Investigate routing to PIC via the OPN. | Possibly next week. Fixed. Closed. | 18/6/2008 | |
A-20080604-01 | All | Shaun De Witt | Dedicate three drives to each experiment to avoid all being affected if an experiment triggers the "tape hogging" bug. | Closed. | 18/6/2008 | |
A-20080604-02 | ALICE | Bonny Strong | Email Cristina to update her on state of xrootd plus CASTOR. | Closed. | 18/6/2008 | |
A-20080618-01 | All | David Corney, Shaun De Witt, Bonny Strong | Check what was decided regarding taking SRMv1 machines out of on-call system and how it was decided. | They are no longer calling out. Closed. | 25/6/2008 | |
A-20071219-02 | High | BaBar | Shaun De Witt | Set up CASTOR server for d0t1 for BaBar on gen instance | No disk server. Can we do both RFIO and xrootd on same machine? Closed. | 2/7/2008 |
A-20080116-05 | High | MINOS | Bonny Strong | Provide MINOS with scratch disk for testing CASTOR (use "6 month" disk) | Need space from that released by BaBar (A-20080116-03). Waiting for disk server. Closed. | 2/7/2008 |
A-20080514-03 | Medium | All | Matt Hodges | Gather post-CCRC08 experiment disk provisioning requirements and inform Martin. | Closed. | 2/7/2008 |
A-20080625-01 | LHCb | Shaun De Witt | Contact Raja regarding shutting down SRMv1 for LHCb. | Closed. | 2/7/2008 | |
A-20080604-03 | High | All | Andrew Sansum, Martin Bly | Approach experiments to determine their expectations during an LHC downtime. | Ongoing. Informal chats so far. Closed. | 23/7/2008 |
A-20080709-01 | High | All | Bonny Strong | Conform CASTOR 2.1.7 upgrade time-scale is ok with Dbase group (given RAC committments) | Closed. | 23/7/2008 |
A-20080820-03 | High | ATLAS | Bonny Strong, Derek Ross, Tier1 Duty Admin | Bonny give Derek and Duty Admin text for annnouncing downtime until 12.00 Wednesday 27 August. | Closed. | 10/9/2008 |
A-20080820-04 | High | ALICE | Chris Kruk | Put request for remote access for A-20080806-01 into a ticket for Martin. | Closed. | 10/9/2008 |
A-20080416-03 | Low | All | Tim Folkes | Upgrade all tape servers' RAM to 8 GB. | Tried with one server that already has 8 GB RAM. Waiting for sizable transfers to see if it has made a difference. Tests showed no particular improvement. Closed. | 17/9/2008 |
A-20080130-06 | Low | All | Andrew Sansum | Track/progress network tuning and/or different network stack to improve rates to remote sites, e.g. FermiLab. | Ongoing. Try transfer tests to ASGC. James Jackson has set up test pool. Now waiting on fabric team. Closed as not going to happen for a long while. | 17/9/2008 |
A-20080820-05 | Medium | LHCb | Bonny Strong, Shaun De Witt | Rebalance files within LHCb rDST. | Taken off line. Closed. | 17/9/2008 |
A-20080910-01 | Medium | LHCb | Shaun De Witt | Re-open Raja's ticked regarding problems on Sunday 7 September and investigate. | Done. Closed. | 17/9/2008 |
A-20080910-02 | Medium | ATLAS | Brian Davies | Set up discussion with James Jackson and Tim Folkes regarding tape families. | Closed. | 17/9/2008 |
A-20080806-01 | Medium | ALICE | Cristina Lazzeroni, Shaun De Witt | Possibly arrange for an ALICE xrootd expert to have access to machines. | See A-20080820-04. Done. | 24/9/2008 |
A-20080820-01 | Medium | All | Jens Jensen, Bonny Strong | Deploy latest CASTOR Information Provider (CIP) on Tuesday 26 August. | Done. | 24/9/2008 |
A-20080820-02 | Medium | All | Derek | Announce CIP deployment at an ops meeting. | Done. | 24/9/2008 |
A-20080917-01 | High | All | James Thorne | Bring up delay in disk server provisioning at Fabric meeting. Effort from others? | Done. | 24/9/2008 |
A-20080924-01 | High | All | Shaun De Witt, Bonny Strong | Investigate possiblility of turning off synchronisation of stager database to avoid a repeat of the file deletion problem. | Closed. | 22/10/2008 |
A-20081015-02 | ILC | James Thorne | Join Jan Strube (rep for ILC) to weekly circulation list. | Closed. | 22/10/2008 | |
A-20081015-03 | All | Matt Hodges | Send link pointing to more detailed disc deployment info, including service class information to the distribution list for this meeting. | Closed. | 22/10/2008 | |
A-20081015-04 | All | Matt Hodges | Ensure that the CASTOR team are routinely (weekly?) advised of new experiment disk allocations, ideally outside and before this meeting, but this may be a sensible new agenda item for this CASTOR call? | Closed. | 22/10/2008 | |
A-20081015-06 | All | David Corney | Check with CERN about their plans (if any) to move to CASTOR 2.1.9. | Closed. | 22/10/2008 | |
A-20081015-07 | All | Matt Hodges | Send info on new experiments contacts to James Thorne, who will add them to the circulation list and standard weekly request for reminders to join this meeting. | Closed. | 22/10/2008 | |
A-20081015-09 | All? | Shaun De Witt | Investigate xrootd transfer failures. | Closed. | 22/10/2008 | |
A-20080618-02 | CMS | Chris Brew | CMS to test GridFTPv2 internal. | Ongoing, waiting for disk servers. On hold until after first run. Closed. | 29/10/2008 | |
A-20081015-05 | All | Experiment reps, Matt Hodges | Experiments need to contact Matt formally, by email, with requests for new disc deployment. CMS and ATLAS have agreed to this. LHcb (Raja) also needs to be consulted. | Closed. | 29/10/2008 | |
A-20081015-01 | CMS | Chris Brew | Send copy of CMS tape service plans to Bonny and Shaun. | Done. | 29/10/2008 | |
A-20081015-08 | All | James Thorne | Circulate results of report from James Thorne arising from action A-20080130-05 to meeting participants. | Done. | 12/11/2008 | |
A-20081029-01 | LHCb | Bonny Strong | Provide list of lost files to Raja. | Done. | 12/11/2008 | |
A-20081029-02 | All | Shaun De Witt | Send summary of cross talk and Big ID problems to John Gordon. | Done. | 12/11/2008 | |
A-20081029-03 | All | Bonny Strong | Initiate discussion in RAL CASTOR team regarding setting up nonProd as a service class for CIP publishing. | Done. | 12/11/2008 | |
A-20081112-01 | ILC | Bonny Strong | Give ILC access to disk on the gen instance by the end of this week. | Done. | 19/11/2008 | |
A-20081112-02 | ATLAS | Bonny Strong, Shaun De Witt | Chase for decision on ATLAS disk sitting in nonProd. | Done. | 19/11/2008 | |
A-20081112-04 | All | David Corney | Propse moving meeting to Wednesday afternoon. | David to send out proposal. Done. | 19/11/2008 | |
A-20081119-03 | All | David Corney, James Thorne | Move meeting to 13.30 on Wednesdays. | Done. | 26/11/2008 | |
A-20081119-01 | All | Experiment reps. | Add experiment test plans to the GridPP wiki and send links to James Thorne for adding to agenda. | Closed. | 26/11/2008 | |
A-20081022-01 | Medium | All | James Thorne | Investigate rumoured performance hit of new 3ware firmware. | Performance increase on ext3 filesystems. Done. | 10/12/2008 |
A-20081126-01 | All | Shaun de Witt, Experiment Reps | Co-ordinate testing of upgraded SRM endpoints. | Done. | 10/12/2008 | |
A-20081126-03 | CMS | James Jackson | Arrange a meetng to discuss testing multiple service classes on one disk pool (with Shaun de Witt, Chris Brew,...). | Done. | 10/12/2008 | |
A-20081126-05 | ALICE | Peter Faulkner | Ask Cristina to test xrootd on gen instance and let Shaun know whether it's working. | Done. | 10/12/2008 | |
A-20081126-06 | All | Andrew Sansum, Martin Bly | Circulate plan for move to new building, when complete. | Done. | 10/12/2008 |