RAL Tier1 weekly operations Fabric 20110221

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
  • Tim:
    • LHCB Data recovery
    • Tessella SDB install
    • Netbackup trial
  • James A:
  • James T
    • Gen upgrade to SL5 64-bit
    • CMS WAN tuning
    • Moved BDII host for Richard
    • iSCSI targets a disk servers
    • Viglen 2010 isntallations
    • SL08 testing
  • Cheney
    • DMF DR
    • got rsync working on dmf
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • gdss380 added new mac address in dhcp, need re-install.
    • Change control for Adaptec raid cards. Approved. (SL09 & SL10)
    • gdss496 started verify fix.(Intervention)
    • gdss113 and gdss121 replaced 4x1gb memory.
    • gdss104 and gdss256 Edac memory errors. (Cleared)
    • Update firmware on Jetstor systems.(ongoing)
    • gdss502 drives failure and failed stripes. (Verify fix doesn't fix failed stripes)
    • Test room review. (Every Monday morning)
    • gdss510 motherboard replaced by Streamline. Raid card not seen by system.
    • gdss596 and gdss597 logs sent to Streamline after finishing very fix.
    • Moved lcgbdii0631 into UPS room.
    • SL08 testing started.
    • SL 2009 Auto rebuild on hotspare fails. Set rebuild priority from Low to High.


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
  • Tim:
    • Tues - T10KC tape dsrive meeting in London
    • Wed-Thur - Project Management Course Coseners
    • Fri "Show and Tell", non-T1
  • Cheney
    • DMF DR
  • James T:
    • WAN tuning on other CASTOR instances
    • SL08 Criteria for re-acceptance
    • Viglen 2010 machine for CASTOR team to test
    • Thinking about hand over of duties
    • Documentation
  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • SL 2009 Auto rebuild on hotspare fails. Set rebuild priority from Low to High
    • Hardware failure metrics continue.
    • SL08 testing.
    • Continuous decommissioning old batch systems.(R 27)

Absences

  • Cheney might have monday 28th off.
  • James T leaving Friday 1st April.

Fabric On-Call

  • Monday - Sunday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1