RAL Tier1 weekly operations Fabric 20100802

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
    • Virtualisation testing
    • Worked with Cheney on Facilities Castor set up
    • Delivering testbed VMs & training to GST


  • Tim:
  • Jonathan:
    • Away Tuesday-Thursday (so out all week)
  • James A:
  • James T
    • Special leave all week
  • Cheney
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • gdss163 given back to Castor team.
    • Replaced 3 drives in Streamline 2009 (Test) disk servers.
    • gdss368 replaced 4x2gb memory and given back to Castor team.
    • gdss207 replaced 4 ports raid card. (Fixed)
    • lcg1212 replaced motherboard by HP Engineer.
    • gdss187 verify completed given back to Castor team.
    • Hardware failure stats/graphs.
    • gdss419 two faulty drives.(Intervention)
    • Preparing Viglen 2006 disk servers with new raid configuration for Castor Preprod.
    • Streamline/areca disk servers crashed due to single faulty drive. (ongoing)


Absences

  • Jonathan on partial retirement (not in on Monday and Friday) and away Tuesday-Thursday (so out all week)
  • James T on special leave all week.

Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
    • Continue with Virtualisation - esp shared storage
    • Some work on Facilities Castor
  • Tim:
  • Cheney
  • Jonathan:
  • James T:
    • Catch up
    • Streamline 2009 testing
    • Strategic security tasks
    • Disk server IPMI plan
    • Slides for TDG next week
  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • gdss417 crashed due to single drive failure. (Intervention)
    • gdss380 run 7 days acceptance test.
    • Update daily status of Streamline 2009 disk servers testing.
    • Continuous decommissioning old batch systems.(R 27)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)
  • Ian out Wednesday-Friday (operation & recovery)
  • Cheney A/L all week
  • Tim A/L all week

Fabric On-Call

  • James T - Monday-Sunday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1