Difference between revisions of "RAL Tier1 weekly operations castor 31/03/2014"
From GridPP Wiki
(→Advanced Planning) |
(→Operations News) |
||
Line 2: | Line 2: | ||
* Disk deployments: 1 CV’13 in lhcbDst / 10 CV ’13 in lhcbNonProd waiting for blessing / 3 CV’13 on way to cmsNonProd | * Disk deployments: 1 CV’13 in lhcbDst / 10 CV ’13 in lhcbNonProd waiting for blessing / 3 CV’13 on way to cmsNonProd | ||
* Disk Draining: 2 atlas servers drained and 1 in progress. 3 CMS servers drained and 1 in progress | * Disk Draining: 2 atlas servers drained and 1 in progress. 3 CMS servers drained and 1 in progress | ||
+ | * CMSdisk at 7% free | ||
+ | * Ascls (controls tape robot) Tim will stop all tape access for an hour or so. Need a discussion with Andrew re CMS tape recall | ||
+ | * Atlas would like to store c2 million EVNT monte carlo files – Brian to discuss with Alastair. Other tier 1s are not keen but RAL tier 1 / castor should be able to cope with this. | ||
== Operations Problems == | == Operations Problems == |
Revision as of 16:19, 28 March 2014
Contents
Operations News
- Disk deployments: 1 CV’13 in lhcbDst / 10 CV ’13 in lhcbNonProd waiting for blessing / 3 CV’13 on way to cmsNonProd
- Disk Draining: 2 atlas servers drained and 1 in progress. 3 CMS servers drained and 1 in progress
- CMSdisk at 7% free
- Ascls (controls tape robot) Tim will stop all tape access for an hour or so. Need a discussion with Andrew re CMS tape recall
- Atlas would like to store c2 million EVNT monte carlo files – Brian to discuss with Alastair. Other tier 1s are not keen but RAL tier 1 / castor should be able to cope with this.
Operations Problems
- CMS load continues to cause problems, we had to restart transfer/diskmanagers to get things working again (Monday 10:45 and Tuesday 17:30)
- transfermanagerd restarted on fdscdlf02 Thursday
- vcert srm and name server not accessible due to issues with hypervisor after rack move, possibly some config required to bring it back. Dimitrios is looking into this
- We had a node crash on Neptune causing brief issues with Atlas srm, known issue has already been logged with Oracle
Blocking Issues
- none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB
Advanced Planning
Tasks <<<<< REVIEW THIS >>>>>
- CASTOR 2.1.14 + SL5/6 testing. The change control has gone through today with few problems.
- iptables to be installed on lcgcviewer01 to harden the logging system against the injection of junk data by security scans.
- Quattor cleanup process is ongoing.
- Installation of new Preprod headnodes
Interventions
- (Tue 1 Apr) Facilities CASTOR Upgrade. Downtime between 0900-1600
Staffing
- Castor on Call person
- Matthew
- Staff absence/out of the office:
- (Mon-Fri) Rob A/L
- (Friday) Bruno poss A/L