Difference between revisions of "RAL Tier1 weekly operations castor 14/07/2014"
From GridPP Wiki
(Created page with "== Operations News == * 2.1.14-13 upgrade for Atlas Stagers completed over the course of last Tuesday/Wednesday (8th-9th July) * Plan to ensure PreProd represents production i...") |
|||
Line 2: | Line 2: | ||
* 2.1.14-13 upgrade for Atlas Stagers completed over the course of last Tuesday/Wednesday (8th-9th July) | * 2.1.14-13 upgrade for Atlas Stagers completed over the course of last Tuesday/Wednesday (8th-9th July) | ||
* Plan to ensure PreProd represents production in terms of hardware generation are underway. | * Plan to ensure PreProd represents production in terms of hardware generation are underway. | ||
− | * Elastic Search has been through some testing, others encouraged to use it | + | * Elastic Search has been through some testing, others encouraged to use it. A student will be starting next week with the task of investigating visualisation and querying solutions for CASTOR use. |
* Deployment of disk servers is due to restart next week. | * Deployment of disk servers is due to restart next week. | ||
Latest revision as of 13:48, 14 July 2014
Contents
Operations News
- 2.1.14-13 upgrade for Atlas Stagers completed over the course of last Tuesday/Wednesday (8th-9th July)
- Plan to ensure PreProd represents production in terms of hardware generation are underway.
- Elastic Search has been through some testing, others encouraged to use it. A student will be starting next week with the task of investigating visualisation and querying solutions for CASTOR use.
- Deployment of disk servers is due to restart next week.
Operations Problems
- A potential race condition which could result in data loss has been seen on CMS (2.1.14-13) while investigating a file that would not migrate to tape. CERN have been notified.
- The srmbed daemon on the Gen SRMs was unstable on 2014-07-11 (Friday). The reason for the problem was not identified, but it went away by, apparently by itself.
- A CMS db locking issue was seen during working hours on 2014-07-11. This will be reported to the developers.
- Atlas SUM test failures have stopped since dark data search ceased. A new VM configured to run against the standby database will be created as a front-end for such queries. Chris will be leading this.
Blocking Issues
Planned, Scheduled and Cancelled Interventions
- CASTOR 2.1.14-13 upgrade for Repack - planned for Tuesday or Wednesday this week.
- Switch-off of compatibility mode for Tier 1 Name Server
- Upgrade of Facilities CASTOR from 2.1.14-11 to 2.1.14-13.
Advanced Planning
Tasks
- Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers
- Put V13 servers in NonProd into production (once name server compatibility mode change complete)
- Resume draining on the ATLAS instance (again, once name server compatibility mode change complete)
- Switch from admin machines: lcgccvm02 to lcgcadm05
- Replace DLF with Elastic Search
Interventions
- CASTOR 2.1.14-13 stager upgrades for Tier 1 - 8th July for Atlas
Staffing
- Castor on Call person
- Rob
- Staff absence/out of the office:
- Matt interviewing on Monday/Tuesday.