Difference between revisions of "RAL Tier1 CASTOR Experiments Completed Actions 2007"
From GridPP Wiki
James thorne (Talk | contribs) |
(No difference)
|
Latest revision as of 11:54, 13 January 2009
Actions from RAL Tier1 CASTOR Experiments Actions closed in 2007.
Action ID | Priority | Experiment(s) | Owner | Description | Status | Completed date |
---|---|---|---|---|---|---|
A-20070425-01 | ATLAS | Matt Hodges | Discussion on disk allocation needs to be concluded. | 02/05/2007 | ||
A-20070425-02 | ATLAS | Matt Hodges | Stop jobs immediately; add new disk servers after upgrade on 30 April/1 May. Need to have disk allocation details (action A-20070425-01) by Friday 27/4/2007. | 02/05/2007 | ||
A-20070425-03 | ATLAS | Chris Kruk | Check provisioning for LSF licences. | |||
A-20070425-04 | All | Matt Hodges | Create a table on the wiki showing which disk servers are in which CASTOR pools. | See here | 02/05/2007 | |
A-20070425-05 | All | Matt Hodges | Send SSH keys to Chris Kruk. | 02/05/2007 | ||
A-20070425-06 | ATLAS, LHCb | Matt Hodges | Split ATLAS/LHCb shared disk pool. | 02/05/2007 | ||
A-20070425-08 | LHCb, CMS(?) | Raja Nandakumar | Email Dave Newbold a copy of LHCb test plans. | 02/05/2007 | ||
A-20070502-01 | LHCb | Raja Nandakumar | Ensure that files on list sent by Derek Ross are not requested from castor tape. | 09/05/2007 | ||
A-20070502-02 | All | David Corney | Ensure that the experiments are kept in the loop" regarding RAL plans for castor configuration change. | 09/05/2007 | ||
A-20070509-02 | All | James Thorne | Provision 26 servers for CASTOR. | 16/05/2007 | ||
A-20070509-05 | CMS | Simon Metson | Arrange a meeting with Bonny and Shaun in Bristol. | 16/05/2007 | ||
A-20070509-01 | All | Shaun | Change service class replication for all experiments. | On hold until production platform working | 23/05/2007 | |
A-20070523-03 | CMS | Simon Metson | Check on key files prior to CMS file deletion. | Can probably delete files; CMS are now checking all key files are on tape. CMS data management system needs syncing. | 13/06/2007 | |
A-20070523-04 | All | Tim Folkes | Update Bonny Strong on repack to ensure progression while Tim is on leave | 13/06/2008 | ||
A-20070523-05 | ATLAS | Matt Hodges | Provision new disk pools for ATLAS tests. | CASTOR config pending. Ongoing. |
13/06/2007 | |
A-20070531-01 | All | Shaun De Witt, Bonny Strong | Need plan to co-ordinate testing of re-install by the experiments. | 13/06/2007 | ||
A-20070516-01 | All | Nick White | Introduce monitoring to detect "out of memory" on disc servers | Needs highlighting in Nagios, if OOM starts then it is probably too late. It looks like a low memory problem. Not related to no. of jobs (only 30 per host), may be related to GridFTP. David Corney to ask to Martin Bly to contact Nick White. Ongoing. Refer to issue. |
13/06/2007 | |
A-20070523-01 | All | Shaun De Witt, Bonny Strong | Revised plan for separate instances and upgrade for discussion next week to clarify experiment needs, especially ATLAS and LHCb. Include integration with SRM 2.2 endpoint. | Bonny to push hardware install for ATLAS so that it's about 2 weeks behind CMS for 2.1.3. James Thorne working with Cheney to get ATLAS machines set up. |
20/06/2007 | |
A-20070613-01 | CMS | Shaun De Witt | Email Dave Newbold description of SQL (from shell) problem for forwarding onto Dave's SQL contacts. | 20/06/2007 | ||
A-20070613-03 | All | David Corney, James Thorne | Put plan to show timetable of roll out of 2.1.3 CASTOR instances on wiki | See the RAL Tier1 CASTOR 2.1.3 Roll Out page. | 20/06/2007 | |
A-20070620-01 | ATLAS | Bonny Strong, Matt Hodges, Martin Bly | Review (re-)deployment of ATLAS disk servers for dCache. | CASTOR group have two servers that may be available. | 27/06/2007 | |
A-20070620-03 | LHCb | Bonny Stong, Martin Bly | Meet to discuss hardware requirements for LHCb 2.1.3 instance. | Meeing on Monday 25/06/2007? | 26/06/2007 | |
A-20070620-04 | ATLAS | Bonny Strong | Open tape SRM (d0t1) for ATLAS | 27/06/2007 | ||
A-20070627-02 | All | Matt Hodges, Shaun De Witt | Update/review disk server wiki page. | Done. | 04/07/2007 | |
A-20070509-03 | CMS | Simon Metson | Investigate use of "WAN out". | Review at meeting in Bristol on 22/05 Ongoing, some sort of Phedex problem. Bonny talking to Simon. Fixed. |
11/07/2007 | |
A-20070523-02 | CMS | Bonny Strong | Set up CMS pool for temporary files. | Ongoing. Don't garbage collect small file; d0t0. This needs to be done before production use. |
11/07/2007 | |
A-20070704-04 | All | James Thorne | Link to SRM endpoint information from the meeting page. | Done. Link to the RAL Tier1 CASTOR SRM page added. | 11/07/2007 | |
A-20070704-05 | All | Bonny Strong | Discussion needed about using different tape pools for different ns paths (e.g. to group all data in a year, or by type). Questions about castor-specific meaning of pathnames, how other sites would be affected, and if this should/could be handled by storage tokens in SRMv2. Should include Shaun de Witt. | Site specific. | 11/07/2007 | |
A-20070711-03 | All | James Thorne | Link from the wiki page to James Jackson's CMS testing documentation (RAL Tier1 CASTOR CMS Testing). | Done. | 12/07/2007 | |
A-20070425-07 | All | Andrew Sansum | Put hardware thoughts and discussion on the wiki. | Done | 25/07/007 | |
A-20070509-04 | All | Andrew Sansum | Re-visit disk server network tuning. Wait until CASTOR source(?) is stable for days | Done | 25/07/2007 | |
A-20070620-02 | ATLAS | Bonny Strong, Stephen Burke | Plan ATLAS 2.1.3 instance testing. | Done | 25/07/2007 | |
A-20070627-01 | ATLAS | Shaun De Witt | Discuss with ATLAS the effect of moving to shorter paths and whether to keep the longer paths for permanent data. | Done | 25/07/2007 | |
A-20070704-01 | CMS | I-20070704-02 James Jackson |
Run functionality tests for CMS 2.1.3 instance. Also look at timings (I-20070704-04). | Done | 25/07/2007 | |
A-20070704-02 | ATLAS | Shaun De Witt, Stephen Burke | Discuss moving disk servers from 2.1.2 to new 2.1.3 instance. | Done | 25/07/2007 | |
A-20070711-01 | All | Chris Kruk | Co-ordinate with David Edoh to apply hot fixes for Oracle. | Done | 25/07/2007 | |
A-20070711-02 | All | Andrew Sansum | See also A-20070509. Meet with James Jackson and Nick White to discuss disk server tuning. | Done | 25/07/2007 | |
A-20070711-04 | All | Chris Kruk | Create/update SRM wiki page. | Done | 25/07/2007 | |
A-20070711-08 | ATLAS | Shaun De Witt | Send Brian Davies some hints on testing. | Done | 25/07/2007 | |
A-20070711-11 | All | Shaun De Witt | Deploy SRM2 | Done | 25/07/2007 | |
A-20070704-03 | ATLAS | Stephen Burke | Inform Bonny of decision on whether to move d1 files to 2.1.3 production instance. | Done | 08/08/2007 | |
A-20070711-09 | ATLAS | Brian Davies | Send out test results when ATLAS has some figures. | Done | 08/08/2007 | |
A-20070711-10 | ATLAS | Shaun De Witt, Stephen Burke | Discussion on how we progress to SRM2. For discussion at 1st Aug meeting | Done | 08/08/2007 | |
A-20070725-02 | CMS | James/Simon Metson | Agree date with CMS for formal switch on of production system. | Done | 08/08/2007 | |
A-20070725-03 | ATLAS | Stephen Burke | Decided when to switch d1t0 end point to 2.1.3 and inform CASTOR team at RAL ASAP as test machine upgrade is waiting on this decision. | Done | 08/08/2007 | |
A-20070711-05 | All | David Corney | Review CASTOR@RAL issues list | Let James Thorne have the updated list for the wiki. | 22/08/2007 | |
A-20070725-01 | ATLAS | Shaun | Phone Miguel to find out when T0 testing starts for ATLAS. | Done | 22/08/2007 | |
A-20070808-01 | All | Bonny | Set up new SA path for BDII for RAL SRM. | Done | 22/08/2007 | |
A-20070808-02 | All | David | Publish Experiment test dates via links on this wiki. | Done | 22/08/2007 | |
A-20070711-06 | CMS | Dave Newbold | Script to delete test data. | Ongoing | 29/08/2007 | |
A-20070829-01 | ATLAS, LHCB | David Corney | Contact Atlas and LHCB to ensure disc tuning can be scheduled asap and before Nick White goes on leave | Closed | 05/09/2007 | |
A-20070822-03 | All | Matt Hodges | Tune FTS | Ongoing. Done for CMS. Closed, see A-20070905-06. | 12/09/2007 | |
A-20070905-03 | All | Dave Newbold, Shaun De Witt | Check RFIO timeout for A-20070905-01. | Timeout is 300s. Closed. | 12/09/2007 | |
A-20070905-08 | CMS | Bonny Strong | Arrange meeting with Simon, Dave N., Shaun, Matt on Friday 7/9/2007 to discuss CMS transfer performance. | Closed. | 12/09/2007 | |
A-20070905-09 | All | Bonny Strong, Shaun De Witt | Check with developers regarding "prepare to get" being scheduled. | Prepare to get is being scheduled. Closed. | 12/09/2007 | |
A-20070905-10 | All | Tim Folkes | Provide Matt with tape usage figures. | Done, closed. | 12/09/2007 | |
A-20070711-07 | All | Simon Metson | Look at logs from CSA06 to see if the transfer "dead time" was in evidence then. | Ongoing. | 26/09/2007 | |
A-20070905-02 | All | Bonny Strong | Run fix script for A-20070905-01 to check the impact on the databases. Script needs to run more frequently than RFIO timeout. | Not automated until we have a better understanding of the problem. | 26/09/2007 | |
A-20070905-04 | All | Bonny Strong | Provide name server hosts on wiki to mitigate against problems with nsrm command. | Ongoing. | 26/09/2007 | |
A-20070905-07 | ATLAS | Shaun De Witt | Chase up LSF tuning with ATLAS. | Ongoing. | 26/09/2007 | |
A-20070912-01 | CMS | Bonny Strong | Investigate upping LSF slots to 8. | 26/09/2007 | ||
A-20070912-02 | CMS | Bonny Strong, Chris Brew | Debug file transfers by tracking a file through the system. | 26/09/2007 | ||
A-20070912-03 | CMS | Chris Kruk | Check Dave Newbolds password to LSF web GUI. | 26/09/2007 | ||
A-20070912-04 | ATLAS | Shaun De Witt | Do LSF tuning for ATLAS (see also A-20070905-07). | 26/09/2007 | ||
A-20070912-05 | All | Shaun De Witt | Put something in Savannah re issue I-20070822-01. | 26/09/2007 | ||
A-20070912-06 | All | Andrew Sansum | Raise issue at PMB meeting. | 26/09/2007 | ||
A-20070926-01 | All | Shaun De Witt | When is the real target date for production deploymnt of SRM 2.2? | Answer next week. | 10/10/2007 | |
A-20071003-03 | ATLAS | Shaun De Witt | Check dates of ATLAS M5 | 22nd Nov | 10/10/2007 | |
A-20071003-04 | ATLAS | Catalin Condurache | Raise participation in ATLAS M4 reprocessing on the ATLAS UK mailing list | Problems with M4 reprocessing software means M4 reprocessing is delayed. | 17/10/2007 | |
A-20071017-04 | All | Shaun De Witt, Bonny Strong | Set up test for internal CASTOR gridFTP v2. | Being done by CERN. | 24/10/2007 | |
A-20071003-02 | CMS | James Jackson, Bonny Strong | Investigate processing of info from the log archives (A-20071003-01) | Being treated along with A-20071003-01 now so closing this one. | 24/10/2007 | |
A-20070711-12 | CMS | Andrew Sansum | Move CMS WANout disk server to new route to JANET. | Waiting for network people. CMS would like this done before 24/9/2007. Andrew has asked networking for this to be done before Tuesday. Ongoing, escalated. Should be testing by next meeting (20071107). Using gdss128 to test. | 07/11/2007 | |
A-20071003-01 | CMS | James Jackson, Bonny Strong | Investigate archiving and processing of log information. | James has a useful tool set for this now. James will support other VOs who want to use the CMS stuff. Each VO will need access to stager and LSF logs. | 07/11/2007 | |
A-20071017-02 | CMS | Bonny Strong | Reduce by one the number of tape drives for CMS WanInTest. | Done but WanIn didn't work. Bonny to change file class for load test. | 07/11/2007 | |
A-20071024-01 | All | Shaun De Witt | Ask David Edoh to look at database performance before and after upgrade. | 07/11/2007 | ||
A-20071024-02 | ATLAS | Brian Davies | Try to get files out of CASTOR post database upgrade. | 07/11/2007 | ||
A-20071024-03 | CMS | James Jackson | Document logging tools. | 07/11/2007 | ||
A-20070905-01 | All | Bonny Strong | Put fix in place for recalls problem. | Ongoing. Recall problem went away for ATLAS when connections to database were increased. Monitoring. Closed as the issue is being monitored, see issue list. | 07/11/2007 | |
A-20070613-02 | CMS | Shaun De Witt, Bonny Strong, Dave Newbold | Set up discussion on service classes and tape pool mapping | After CSA07. Done. | 14/11/2007 | |
A-20071003-05 | All | Shaun De Witt | Send SRM deadtime issue to CERN | Waiting for a second set of graphs from Brian. Done. | 14/11/2007 | |
A-20071024-04 | All | Bonny Strong, experiment reps. | Discuss possibility of cross-experiment testing. | Done. | 14/11/2007 | |
A-20071107-01 | LHCb | Derek Ross | Change LHCb SA paths in BDII. | Done. | 14/11/2007 | |
A-20070822-01 | ATLAS, LHCb | Catalin Condurache | Plan ATLAS disk usage and dCache -> CASTOR migration. Plan for LHCb too. ATLAS have 84TB on disk and LHCb have 70TB on disk. | 120 MB/s (~10 TB/day) with 8 streams (8 parallel globus URL copies), max. 50 MB/s pre server. This is without any checksumming. Need to look at tape to tape migration too. | 28/11/2007 | |
A-20070822-02 | All | I-20070620-01, Bonny Strong | Write wiki How-to on deleting files from CASTOR. | Done | 28/11/2007 | |
A-20070905-05 | All | Andrew Sansum | Put tuning information on wiki, including ext3 journalling options. | Andrew has asked Nick White to do this. Done. On the wiki at RAL Tier1 Disk Server Tuning. | 28/11/2007 | |
A-20070905-06 | ATLAS | Matt Hodges | Determine FTS tuning requirements for ATLAS in the same way as CMS. See also A-20070822-03. | Done | 28/11/2007 | |
A-20071010-01 | ATLAS | Bonny Strong | Stuck recalls for ATLAS are not being monitored/checked need a tool to do this. | Gone away in 2.1.4 | 28/11/2007 | |
A-20071017-03 | ATLAS | Shaun De Witt | Ask Miguel to open up ATLAS T0 tests to repack instance. | Shaun asked Miguel. Done. | 28/11/2007 | |
A-20071114-01 | All | Shaun De Witt | Put a Request For Enhancement into Savannah for the ability to dedicate tape drives to a service class. Done. | 28/11/2007 | ||
A-20071114-02 | ATLAS, CMS | Matt Hodges | Send a broadcast announcement for the ATLAS and CMS CASTOR downtime. Done. | 28/11/2007 | ||
A-20071128-01 | All | David Corney | Follow up with Jens regarding dynamic/daily publishing of space information from SRMv2. | 12/12/2007 | ||
A-20071128-02 | All | Matt Hodges | Move inbound FTS channels to Globus URL copy. | 12/12/2007 | ||
A-20071017-05 | All | Derek Ross, Jens Jensen | Publish GLUE Schema for SRM2.2. | Ongoing. Done for srmf. | 19/12/2007 | |
A-20071128-04 | ALICE | Matt Hodges | Allocate disk servers for ALICE | Depoends on A-20071128-05 | 19/12/2007 | |
A-20071128-05 | ALICE | Catalin Condurache | Engage ALICE in discussion on xrootd/CASTOR | Inform ALICE when test setup ready. | 19/12/2007 | |
A-20071212-01 | All | Derek Ross | Update space token documentation on wiki | 19/12/2007 | ||
A-20071212-03 | All | David Corney | Draft a list of "top ten" Tier1 CASTOR issues and circulate to the experiments. | 19/12/2007 | ||
A-20071212-04 | ALICE | Shaun De Witt | Set up ALICE SRM/CASTOR/xrootd test bed. | 19/12/2007 |