RAL Tier1 weekly operations Fabric 20110228

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
    • iSCSI performance testing/tuning
    • Building production cvmfs mirror/replica
    • Virtualisation
  • Tim:
  • James A:
  • James T
    • Applied new WAN tuning to all remaining CASTOR instances
    • Documentation (Loggers, Ganglia)
    • V10/SL10
    • SL08
  • Cheney
    • DMF disaster recovery testing
    • set up rsync for Greg Matthews
    • tinker with backups for Nick H
    • help Johnathn Churchill with his fibre
    • investigate problems with tape controller
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • gdss380 added new mac address in dhcp, need re-install.
    • Start adding correct hotspare in (SL09 & SL10)
    • gdss496 started verify fix.(Intervention)
    • gdss115 multiple drives failure. (Out of production)
    • Reported faulty memory in New Dell system in ups room.
    • Update firmware on Jetstor systems.(ongoing) Updated on two.
    • Checked all SL09 and SL10 disk servers. (for failed stripes)
    • Test room review. (Every Monday morning)
    • Check Clustervision new batch systems. (Testing)
    • Replaced drive in system in EMC rack.(MTI)
    • Replaced drives in loggers1 & 2.
    • SL08 testing continue.
    • Pack Viglen switch and cables for return.


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
    • Finalising production cvmfs mirror/replica
    • switch WNs to use on site cvmfs replica
    • further iSCSI reserach
    • Plan deployment of management network hardware


  • Tim:
  • Cheney
    • DMF disaster recovery
    • Backups
    • Rysnc
  • James T:
    • Documentation
    • Preparation for handover
  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Correct hotspare configuration in SL09 disk servers.
    • Hardware failure metrics continue.
    • SL08 testing.
    • Continuous decommissioning old batch systems.(R 27)

Absences

    • Cheney on leave - tues, wed possibly thurs.
    • James A on Leave Monday
    • IAn out Wednesday

Fabric On-Call

  • Kashif Monday - Sunday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1