RAL Tier1 weekly operations castor 24/11/2014

From GridPP Wiki

Revision as of 14:14, 21 November 2014 by Christopher Prosser 1e304264ea (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to: navigation, search

Contents

1 Operations News
2 Operations Problems
3 Blocking Issues
4 Planned, Scheduled and Cancelled Interventions
5 Advanced Planning
6 Actions
7 Staffing

Operations News

xrootd security advisory with FAX component within xrootd
SL6 Headnode work - scheduled to be stress testing WE 22/23 Nov (using realistic jobs from Alastair / Andrew if possible)
Draining - latest estimate is to complete draining in 11 weeks (with no breaks). LHCb draining rate test - pseudo rebalancing also underway
webdav access for LHCb now working
GDSS673 back into production (lhcbRawRdst)

Operations Problems

gdss720 / gdss763 are both drained, out of production and currently being worked on by Fabric team
gdss659 still in atlasNonProd
lcgclsf02 failed on Tuesday night, root cause unknown - server has been tested by fabric and returned to production
SRM SAM test failures on LHCB/CMS and callout on pluto. Some deadlocking on dbases, investigating root cause

Blocking Issues

grid ftp bug in SL6 - stops any globus copy if a client is using a particular library. This is a show stopper for SL6 on disk server.

Planned, Scheduled and Cancelled Interventions

A Tier 1 Database cleanup is planned so as to eliminate a number of excess tables and other entities left over from previous CASTOR versions. This will be change-controlled in the near future.

Advanced Planning

Tasks

Plan and publish SL6 deployment plans
Plan to ensure PreProd represents production in terms of hardware generation are underway
Possible future upgrade to CASTOR 2.1.14-15 post Christmas
Switch from admin machines: lcgccvm02 to lcgcadm05
Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers

Interventions

Upgrade Production headnodes to SL6

Actions

Staffing

Castor on Call person
- Matt

Staff absence/out of the office:

- Chris - Friday
- Rob - out Tuesday and Thursday

Retrieved from "https://www.gridpp.ac.uk/w/index.php?title=RAL_Tier1_weekly_operations_castor_24/11/2014&oldid=6771"