RAL Tier1 weekly operations Fabric 20110207

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
  • Tim:
    • T10KC purchaes
    • VTL removal work (sorting out "duff" tapes)
    • Data loss on CS7541
    • DMF disk funnies


  • James A:
  • James T
    • CMS SL5 64-bit upgrade
    • Updated GridFTP RPMs to 2.1.9-10 on LHCb disk servers
    • 2.1.9-10 upgrade of preProd
    • Moved disk servers to per-instance Quattor cluster
    • Moved disk servers to puppetmaster02
    • Discussed disk server problems with CERN
    • Prepared gdss280 to replace gdss283
  • Cheney
    • DMF disaster recovery
    • creating some ganglia for alistair
    • created some hadoop testing servers for brian
    • created some docco for gareth
    • analysis of security on backups
    • fix nfs problems for diamond
    • talk about rsync into the dmf
    • relocate kit on the ads benches
    • set up backups trial for stephen rankin
    • set up separate backups trial for freddie akeroyd
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • gdss380 add new mac address in dhcp and re-install.
    • gdss189 read-only filesystem.(Scsi errors)
    • gdss496 Scsi errors. Reported to Streamline with logs.(Intervention)
    • Tier1 Strategy meeting.
    • Fabric Hardware failure metrics.
    • Send logs to VSPL for Jetstor systems.
    • gdss502 drives failure and failed stripes.
    • Disk status catch-up with James T.
    • gdss280 started Acceptance test. Replacement disk server for gdss283.
    • SL 2010 and Viglen 2010 disk servers in testing. Finish testing.
    • SL 2009 Auto rebuild on hotspare fails. Set rebuild priority from Low to High.


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
  • Tim:
    • Next years spend
    • Oracle maintanence schedule for tape/librraies
    • T10KC media prices and availability
  • ADS backup users ref. closedown
  • Cheney
    • DMF dr.
  • James T:
    • Replace gdss283 with gdss280
    • Roll out WAN tuning updates for CMS
    • SL08 testing with Kash
  • James A:
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • SL 2009 Auto rebuild on hotspare fails. Set rebuild priority from Low to High
    • Hardware failure metrics continue.
    • SL08 testing.
    • Continuous decommissioning old batch systems.(R 27)

Absences

    • cheney off wednesday
    • Tim out Tuesday

Fabric On-Call

  • Monday - Sunday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1