RAL T1 weekly ops Fabric 20110801

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Tim:
  • James A:
  • Cheney
    • DMF performance and integrity testing
    • backups
    • set up oracle backups


  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old disk servers/batch systems.
    • Appointment with Physio..
    • gdss423 replaced 4x2gb memory. (EDAC memory errors)
    • gdss434 EDAC memory errors. Replaced memory and back into production.
    • gdss523 fsprobe errors. Run verify-fix and memory test.
    • gdss435 EDAC memory errors. (Reported)
    • gdss208 re-created raid array.
    • EMC engineer visit to replace PSU.(Booked in Visbadge)


  • Martin:
  • Ian:
    • leave
    • Getting Aslan started on cdb2sql project
    • Installing more local storage hypervisors
    • Preparation for atlas namespace change in cvmfs
    • Work on CPU ITT


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Tim:
  • Cheney
  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Hardware failure review and metrics continue.
    • Continuous decommissioning old disk servers/batch systems.(R 27)
    • Continue Labelling racks and systems in UPS and HPD room.


  • Martin:
  • Ian:
    • Final planning for Atlas namespace change in CVMFS
    • Testing of Equallogic iscsi array
    • Work on Stratuslab test cloud
    • Work with Aslan on cdb2sql project

Absences

    • Cheney 2 weeks holiday

Fabric On-Call

  • Ian Primary on call Monday - Sunday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1