RAL Tier1 weekly operations castor 27/10/2014

From GridPP Wiki
Jump to: navigation, search

Operations News

  • xrootd security advisory with FAX component within xrootd
  • SL6 Headnode work - tested in vcert, next test in prepord including stress testing
  • Final 5 servers have been deployed into lhcbRawRdst
  • Draining improvement workaround by putting full or almost full disk servers in to Read Only
  • 2-1-14-14 castor upgrade priority dropped as we have a draining workaround. Revisit once SL6 work done (in new year)


Operations Problems

  • gdss720 / gdss763 are both drained, out of production and waiting for Fabric work on (poss RAID and other work)
  • A few CMS SUM test failures this week, investigations inconclusive


Blocking Issues

  • grid ftp bug in SL6 - stops any globus copy if a client is using a particular library. This is a show stopper for SL6 on disk server.


Planned, Scheduled and Cancelled Interventions

  • A Tier 1 Database cleanup is planned so as to eliminate a number of excess tables and other entities left over from previous CASTOR versions. This will be change-controlled in the near future.
  • Juan further patch castor dbs (PSU patches for Pluto and Juno) – standard change ... TBC
  • Functional testing new errata in preprod


Advanced Planning

Tasks

  • Plan to ensure PreProd represents production in terms of hardware generation are underway
  • Possible future upgrade to CASTOR 2.1.14-15 post christmas
  • Switch from admin machines: lcgccvm02 to lcgcadm05
  • New VM configured to run against the standby CASTOR database will be created as a front-end for dark data etc queries.
  • Correct partitioning alignment issue (3rd CASTOR partition) on new castor disk servers

Interventions


Staffing

  • Castor on Call person
    • Matt V


  • Staff absence/out of the office:
    • Shaun Monday
    • Bruno Following 2 weeks
    • Chris Tues-Thurs