RAL T1 weekly ops Fabric 20110606

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Tim:
  • James A:
    • Worked on job plan.
    • General Quattor assistance.
    • Closed out work on CLF array.
    • Performed bulk renewal of ~300 storage node certificates.
  • Cheney
    • DMF had broken file system
    • tidy up backups
    • write backup status monitoring prog
    • fix disk arrays
    • set up storageD monitor
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old disk servers/batch systems.
    • gdss365 start testing after re-creating raid array.
    • Viglen 2007 all disk servers firmware update. (ongoing)
    • Update firmware on Jetstor systems.(ongoing) Updated on three.
    • Contacted with Areca support regarding problems with SL08 disk servers.
    • Add (SL09, V09 and SL10) in Adaptec Storage Manager for monitoring. (Ongoing)
    • gdss135 kernel-panic.
    • Updated Adaptec StorMan on all SL09 and Viglen 09 disk servers.
    • gdss294 re-created raid array.


  • Martin:
    • Installed C300 #2 in Tel9
    • Installed Tier1 Castor Services 4 + 10 T10KC tape servesr + Arista switch
    • eMROG
    • Various management stuff
  • Ian:
    • Job plan
    • Common Ops Virtualisation work
    • Began setting up EqualLogic iSCSI demo unit
    • Began work on managed provision workflow
    • Induction for editing escience external website


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Tim:
  • Cheney
    • Backups
  • James A:
    • Open Day logistics and plannning.
    • Closing out tickets.
    • Work on Storage ITT.
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Hardware failure metrics continue.
    • Continue SL08 testing.
    • Continuous decommissioning old disk servers/batch systems.(R 27)
    • Continue Labelling racks and systems in UPS and HPD room.


  • Martin:
    • Work on MAIA systems
    • Database config for Facilities Castor
    • Maintenance for C300 #1
    • Procurement stuff
  • Ian:
    • Further EqualLogic evaluation
    • Further provisioning workflow in RT
    • Disk ITT review
    • Prep for Edinburgh HEP Computing workshop
    • Prep for TDG CVMFS talk
    • Tier1 Quattor refactoring workday (Friday)

Absences

Fabric On-Call

  • Ian Primary - Monday - Sunday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1