Difference between revisions of "RAL Tier1 weekly operations castor 28/07/2014"
From GridPP Wiki
(Created page with "== Operations News == * 2.1.14-13 upgrades now complete including switching off the compatibility mode. * Elastic Search infrastructure has been fixed by James – we need to...") |
|||
(3 intermediate revisions by one user not shown) | |||
Line 3: | Line 3: | ||
* Elastic Search infrastructure has been fixed by James – we need to put tools on the admin node. | * Elastic Search infrastructure has been fixed by James – we need to put tools on the admin node. | ||
* Plan to ensure PreProd represents production in terms of hardware generation are underway. A student will be starting soon with the task of investigating visualisation and querying solutions for CASTOR use. | * Plan to ensure PreProd represents production in terms of hardware generation are underway. A student will be starting soon with the task of investigating visualisation and querying solutions for CASTOR use. | ||
− | |||
Line 14: | Line 13: | ||
* LHCb possible IO problems - ticket raised and investigation started. | * LHCb possible IO problems - ticket raised and investigation started. | ||
* Need to update dteam and ATLAS VOs' voms-servers | * Need to update dteam and ATLAS VOs' voms-servers | ||
+ | |||
== Blocking Issues == | == Blocking Issues == | ||
+ | * V13 Deployment issues - lsfadmin and amanda backup users clashing / look at removing amanda backup payload | ||
== Planned, Scheduled and Cancelled Interventions == | == Planned, Scheduled and Cancelled Interventions == | ||
− | + | ||
− | + | ||
== Advanced Planning == | == Advanced Planning == | ||
'''Tasks''' | '''Tasks''' | ||
− | |||
* Resume draining on the ATLAS instance (again, once name server compatibility mode change complete) | * Resume draining on the ATLAS instance (again, once name server compatibility mode change complete) | ||
* Switch from admin machines: lcgccvm02 to lcgcadm05 | * Switch from admin machines: lcgccvm02 to lcgcadm05 | ||
Line 33: | Line 33: | ||
'''Interventions''' | '''Interventions''' | ||
− | + | * Upgrade of Facilities CASTOR from 2.1.14-11 to 2.1.14-13 Wed 30th July | |
Line 42: | Line 42: | ||
* Staff absence/out of the office: | * Staff absence/out of the office: | ||
** Dataservices away day Monday | ** Dataservices away day Monday | ||
− | ** Chris Tuesday | + | ** Chris out Tuesday |
Latest revision as of 15:46, 25 July 2014
Contents
Operations News
- 2.1.14-13 upgrades now complete including switching off the compatibility mode.
- Elastic Search infrastructure has been fixed by James – we need to put tools on the admin node.
- Plan to ensure PreProd represents production in terms of hardware generation are underway. A student will be starting soon with the task of investigating visualisation and querying solutions for CASTOR use.
Operations Problems
- low level db locking issues continue (various VOs) - has been reported to the developers at CERN.
- A potential race condition which could result in data loss has been seen on CMS (2.1.14-13) while investigating a file that would not migrate to tape. CERN have been notified.
- Atlas Xrootd proxy issues - investigation starting.
- CMS xroot issues
- Facilities castor error
- LHCb possible IO problems - ticket raised and investigation started.
- Need to update dteam and ATLAS VOs' voms-servers
Blocking Issues
- V13 Deployment issues - lsfadmin and amanda backup users clashing / look at removing amanda backup payload
Planned, Scheduled and Cancelled Interventions
Advanced Planning
Tasks
- Resume draining on the ATLAS instance (again, once name server compatibility mode change complete)
- Switch from admin machines: lcgccvm02 to lcgcadm05
- New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries.
- Replace DLF with Elastic Search
- Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers
Interventions
- Upgrade of Facilities CASTOR from 2.1.14-11 to 2.1.14-13 Wed 30th July
Staffing
- Castor on Call person
- Rob - Weekend 26/27 July and following week
- Staff absence/out of the office:
- Dataservices away day Monday
- Chris out Tuesday