RAL T1 weekly ops Fabric 20110620

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Tim:
    • Castor rack re-organisation
    • DMF instabilities - investigate
    • ADS closedown
    • SDB planning
    • T10KC problems
  • James A:
  • Cheney
    • Dev of scripts to report on amanda backups
    • More infosec on amanda backups
    • fix problems on db array
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old disk servers/batch systems.
    • Updated wiki with procedures for disk servers diagnosing.
    • Viglen 2007 all disk servers firmware update. (ongoing)
    • Update firmware on Jetstor systems.(ongoing) Updated on three.
    • Ongoing discussion with Areca support regarding problems with SL08 disk servers.
    • Added (SL09, V09 and SL10) in Adaptec Storage Manager for monitoring.
    • Started 'verify fix' on SL09 disk servers with bad blocks on drives.
    • Re-create and configure raid array of 5-7 CV 05 disk servers after decommissioning.
    • Quattor02 swap drive in port 0 with R410.


  • Martin:
  • Ian:
    • Equallogic eval
    • Expanding CVMFS replica backend storage
    • Setting up new shared virtual machine manager
    • CVMFS talk at HEP Computing Workshop in Edinburgh


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Tim:
    • SDB planning
    • T10KC media problems
    • Sort out STK/IBM Maintanance contracts (i.e. pay them)
    • CEDA recalls
  • Cheney
    • test the backups
    • take-on more backups
    • tighten security backups
  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Hardware failure metrics continue.
    • Continue SL08 testing.
    • Continuous decommissioning old disk servers/batch systems.(R 27)
    • Continue Labelling racks and systems in UPS and HPD room.


  • Martin:
  • Ian:
    • Further work on Equalogic Eval
    • Migrating hypervisors to new infrastructure
    • OS errata
    • COPs report
    • Plan for StratusLab testbed

Absences

    • cheney 1 week hols 27th june - 1st july.

Fabric On-Call

  • Ian Fabric oncall Monday - Sunday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1