RAL Tier1 weekly operations Fabric 20100920

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
  • Tim:
  • Jonathan:
  • James A:
  • James T
    • SL09 testing
    • Atlas power off stuff
    • Deployed 2 disk servers to repack
    • CASTOR 2.1.9 upgrade prep
    • A/L Friday PM
  • Cheney
    • Development of castor facilities
    • demo amanda backups to SCT (kev)
    • help fix rhubarb snarl up
    • fix solarb snarl up
    • try to get offline tape robot controller ready


  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • gdss110 fsprobe errors. (Acceptance testing)
    • gdss380 acceptance testing. (Crashed with single faulty drive)
    • gdss477 replaced 24 ports raid card. (Borrowed for gdss473)
    • lcgfts01 replaced drive with hotswap. (sda)
    • gdss280 acceptance testing. (Intervention)
    • lcglb01 faulty drives reported to Streamline.
    • Streamline 2009 disk server raid card and drives replaced by Streamline Engineer.(Testing)
    • Hardware failure stats/graphs.
    • Preparing Viglen 2006 disk servers with new raid configuration for Castor Preprod.
    • Streamline/areca disk servers crashed due to single faulty drive. (ongoing)


Absences

  • Jonathan on partial retirement (not in on Monday and Friday)

Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
  • Tim:
  • Cheney
    • Quatt the Facilities
    • Prep the spare robot controller for use
  • Jonathan:
  • James T:
    • SL09 testing
    • Atlas power off stuff
    • Deployed 2 disk servers to repack
    • CASTOR 2.1.9 upgrade prep
  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Move/decommission systems from Atlas to R89. (HPD/UPS)
    • Update daily status of Streamline 2009 disk servers testing.
    • Continuous decommissioning old batch systems.(R 27)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)

Fabric On-Call

  • Kashif Hafeez

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1