Difference between revisions of "RAL Tier1 weekly operations castor 26/01/2015"
From GridPP Wiki
Line 42: | Line 42: | ||
== Actions == | == Actions == | ||
− | Rob to pick up DB cleanup change control | + | * Rob to pick up DB cleanup change control |
− | Bruno to document processes to control services previously controlled by puppet | + | * Bruno to document processes to control services previously controlled by puppet |
− | Gareth to arrange meeting castor/fab/production to discuss the decommissioning procedures | + | * Gareth to arrange meeting castor/fab/production to discuss the decommissioning procedures |
== Staffing == | == Staffing == |
Latest revision as of 14:00, 23 January 2015
Contents
Operations News
- Draining - ongoing
- Name server SL6 upgrade completed - no issues
- Redundant atlasHotdisk service class and disk pool from CASTOR
Operations Problems
- certificates on fdsdss20 to fdsdss30 will be expiring 1st Feb - Gareth has raised with Fabric
- castor functional test on lcgccvm02 causing problems - Gareth reviewing
- storageD retrieval from castor problems - investigation ongoing
- 150k zero size files reported last week have almost all been dealt with, CMS files outstanding
- Files with no ns or xattr checksum value in castor are failing transfers from RAL to BNL using the BNL FTS3 server.
Blocking Issues
- grid ftp bug in SL6 - stops any globus copy if a client is using a particular library. This is a show stopper for SL6 on disk server.
Planned, Scheduled and Cancelled Interventions
- Removal of redundant CASTOR DB tables Monday 26th 9am (Shaun)
- Kernel upgrade on Castor SL5 disk/srm/tape. Tuesday 27/Wednesday 28/Thursday 29
- Kernel upgrade on Castor facilities - scheduled for Monday 26th 9-10am
- Oracle upgrade of preprod 2nd Feb - will require a short outage
- Oracle PSU patching 3rd (Neptune)/4th (Pluto) - castor production at risk 4th Feb
- Upgrade Oracle DB to version 11.2.0.4 (Late February?)
- Upgrade CASTOR to version 2.1.14-14 OR 2.1.14-15 (Early February)
Advanced Planning
Tasks
- DB team need to plan some work which will result in the DBs being under load for approx 1h - not terribly urgent but needs to be done in new year.
- Provide new VM? to provide castor client functionality to query the backup DBs
- Plan to ensure PreProd represents production in terms of hardware generation are underway
- Possible future upgrade to CASTOR 2.1.14-15 post-Christmas
- Switch from admin machines: lcgccvm02 to lcgcadm05
- Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers
Interventions
Actions
- Rob to pick up DB cleanup change control
- Bruno to document processes to control services previously controlled by puppet
- Gareth to arrange meeting castor/fab/production to discuss the decommissioning procedures
Staffing
- Castor on Call person
- Chris
- Staff absence/out of the office:
- Rob out until Monday 2nd Feb