Difference between revisions of "RAL Tier1 weekly operations castor 27/10/2014"
From GridPP Wiki
(Created page with "== Operations News == * xrootd security advisory with FAX component within xrootd * SL6 Headnode work - tested in vcert, next test in prepord including stress testing * Final ...") |
|||
(5 intermediate revisions by 2 users not shown) | |||
Line 8: | Line 8: | ||
== Operations Problems == | == Operations Problems == | ||
− | * | + | * gdss720 / gdss763 are both drained, out of production and waiting for Fabric work on (poss RAID and other work) |
+ | * A few CMS SUM test failures this week, investigations inconclusive | ||
== Blocking Issues == | == Blocking Issues == | ||
* grid ftp bug in SL6 - stops any globus copy if a client is using a particular library. This is a show stopper for SL6 on disk server. | * grid ftp bug in SL6 - stops any globus copy if a client is using a particular library. This is a show stopper for SL6 on disk server. | ||
− | |||
Line 28: | Line 28: | ||
* Switch from admin machines: lcgccvm02 to lcgcadm05 | * Switch from admin machines: lcgccvm02 to lcgcadm05 | ||
* New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries. | * New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries. | ||
− | |||
* Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers | * Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers | ||
Line 43: | Line 42: | ||
** Shaun Monday | ** Shaun Monday | ||
** Bruno Following 2 weeks | ** Bruno Following 2 weeks | ||
+ | ** Chris Tues-Thurs |
Latest revision as of 14:50, 27 October 2014
Contents
Operations News
- xrootd security advisory with FAX component within xrootd
- SL6 Headnode work - tested in vcert, next test in prepord including stress testing
- Final 5 servers have been deployed into lhcbRawRdst
- Draining improvement workaround by putting full or almost full disk servers in to Read Only
- 2-1-14-14 castor upgrade priority dropped as we have a draining workaround. Revisit once SL6 work done (in new year)
Operations Problems
- gdss720 / gdss763 are both drained, out of production and waiting for Fabric work on (poss RAID and other work)
- A few CMS SUM test failures this week, investigations inconclusive
Blocking Issues
- grid ftp bug in SL6 - stops any globus copy if a client is using a particular library. This is a show stopper for SL6 on disk server.
Planned, Scheduled and Cancelled Interventions
- A Tier 1 Database cleanup is planned so as to eliminate a number of excess tables and other entities left over from previous CASTOR versions. This will be change-controlled in the near future.
- Juan further patch castor dbs (PSU patches for Pluto and Juno) – standard change ... TBC
- Functional testing new errata in preprod
Advanced Planning
Tasks
- Plan to ensure PreProd represents production in terms of hardware generation are underway
- Possible future upgrade to CASTOR 2.1.14-15 post christmas
- Switch from admin machines: lcgccvm02 to lcgcadm05
- New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries.
- Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers
Interventions
Staffing
- Castor on Call person
- Matt V
- Staff absence/out of the office:
- Shaun Monday
- Bruno Following 2 weeks
- Chris Tues-Thurs