RAL Tier1 weekly operations Fabric 20101108

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
  • Tim:
  • Jonathan:
  • James A:
    • Fixed problems with repository servers.
    • Deployed production class pakiti2 web interfaces.
    • Starting looking at CASTOR facilities Quattorisation.
  • James T
    • Attended meeting on STFC outreach on behalf of eScience
    • STEM ambassador work at a school in Oxford
    • Preparation for SL5 64-bit upgrade of disk servers
    • Fixed a bug in AII which was preventing installation of V07i disk servers
    • Streamline 2009 acceptance tests completed on the majority of hosts
  • Cheney
    • investigating what actually happened with castor facilties
    • writing db backups check scripts
    • set up test of amanda backup & recovery for kevin h
    • investigate web intrusions on hinode
    • applying quatted OS updates


  • Kash:
    • 4 days Annual leave
    • Drive replacement.
    • Fixing broken WNs.
    • gdss463 replaced raid card cables this time. (Fixed)
    • Hardware failure stats/graphs.
    • lcglb01 smart errors reported to Streamline.
    • Streamline/areca disk servers crashed due to single faulty drive. (ongoing)


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
  • Tim:
  • Cheney
    • investigating what actually happened with castor facilties
    • writing db backups check scripts
    • writing docco
  • Jonathan:
  • James T:
    • Disk problems in Kash's absence
    • Preparation for LHCb SL5 64-bit upgrade
    • LHCb SL5 64-bit upgrade
  • James A:
    • Building new errata release for Quattor.
    • Rolling out latest errata across farm.
    • Continuing deployment of Pakiti2.
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Continuous decommissioning old batch systems.(R 27)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)
  • Cheney - changed date for being off - now Nov 24th - early warning -likely to be off most of december - date subject to change -

Fabric On-Call

  • Kashif Hafeez

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1