Difference between revisions of "RAL Tier1 weekly operations castor 31/03/2014"

Revision as of 16:19, 28 March 2014

Disk deployments: 1 CV’13 in lhcbDst / 10 CV ’13 in lhcbNonProd waiting for blessing / 3 CV’13 on way to cmsNonProd
Disk Draining: 2 atlas servers drained and 1 in progress. 3 CMS servers drained and 1 in progress
CMSdisk at 7% free
Ascls (controls tape robot) Tim will stop all tape access for an hour or so. Need a discussion with Andrew re CMS tape recall
Atlas would like to store c2 million EVNT monte carlo files – Brian to discuss with Alastair. Other tier 1s are not keen but RAL tier 1 / castor should be able to cope with this.

CMS load continues to cause problems, we had to restart transfer/diskmanagers to get things working again (Monday 10:45 and Tuesday 17:30)
transfermanagerd restarted on fdscdlf02 Thursday
vcert srm and name server not accessible due to issues with hypervisor after rack move, possibly some config required to bring it back. Dimitrios is looking into this
We had a node crash on Neptune causing brief issues with Atlas srm, known issue has already been logged with Oracle

Entries in/planned to go to GOCDB

Tasks <<<<< REVIEW THIS >>>>>

CASTOR 2.1.14 + SL5/6 testing. The change control has gone through today with few problems.
iptables to be installed on lcgcviewer01 to harden the logging system against the injection of junk data by security scans.
Quattor cleanup process is ongoing.
Installation of new Preprod headnodes

Interventions

@@ Line 2: / Line 2: @@
 * Disk deployments: 1 CV’13 in lhcbDst / 10 CV ’13 in lhcbNonProd waiting for blessing / 3 CV’13 on way to cmsNonProd
 * Disk Draining: 2 atlas servers drained and 1 in progress. 3 CMS servers drained and 1 in progress
+* CMSdisk at 7% free
+* Ascls (controls tape robot) Tim will stop all tape access for an hour or so. Need a discussion with Andrew re CMS tape recall
+* Atlas would like to store c2 million EVNT monte carlo files – Brian to discuss with Alastair. Other tier 1s are not keen but RAL tier 1 / castor should be able to cope with this.
 == Operations Problems ==