RAL Tier1 weekly operations castor 07/04/2014

From GridPP Wiki
Revision as of 13:17, 4 April 2014 by Matthew Viljoen 83b6101d7f (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Operations News

  • Facilities CASTOR was successfully upgraded to 2.1.14-11

Operations Problems

  • CMS load continues to cause problems, we had to restart transfer/diskmanagers to get things working again (Monday 10:45 and Tuesday 17:30)
  • transfermanagerd restarted on fdscdlf02 Thursday
  • vcert srm and name server not accessible due to issues with hypervisor after rack move, possibly some config required to bring it back. Dimitrios is looking into this
  • We had a node crash on Neptune causing brief issues with Atlas srm, known issue has already been logged with Oracle

Blocking Issues

  • none

Planned, Scheduled and Cancelled Interventions

Entries in/planned to go to GOCDB none

Advanced Planning

Tasks

  • Atlas would like to store c2 million EVNT monte carlo files – Brian to discuss with Alastair. Other tier 1s are not keen but RAL tier 1 / castor should be able to cope with this.

Interventions

Staffing

  • Castor on Call person
    • (Mon-Wed) Matthew
    • (Thu-Fri) Rob?
  • Staff absence/out of the office:
    • (Mon-Fri) Chris A/L
    • (Mon-Wed) Matt in DL then First Aid training
    • (Thu-Fri) Matt A/L