RAL Tier1 weekly operations Fabric 20110418

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
    • Published errata templates for Quattor managed systems
    • Prepare CVMFS for Atlas and LHCb production
    • CVMFS callout documentation
    • APR
    • Started prep for Science Oxford talk


  • Tim:
    • ADS shutdown stuff
    • Castor repack stuff
    • APR stuff
    • Chasing Oracle on T10KC stuff
    • Amanda backup stuff
  • James A:
    • Clearing backlog of tickets
    • Tending to Batch and Storage farms
    • Monitoring and fixing network issues
    • Covering for Kash
  • Cheney
    • Write apr and job plan
    • Fix backups for Sct
    • Fix backups for DMF
    • Model for disk servers
    • Apply journals to DMF clone
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • Test room review. (monthly)
    • gdss496 need to install smartd tool.
    • quattor02 same errors again.(Reported to Dell)
    • Viglen 2007 disk servers firmware update. (Gareth)
    • Update firmware on Jetstor systems.(ongoing) Updated on three.
    • gdss502 more bad blocks.
    • Streamline spares detail.
    • APR with MJB booked.
    • SL08 testing. 4 drives failure without any crash.


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
    • APR
    • Prep for Science Oxford talk
    • Work on recruitment
    • Virtualisation status report/plan


  • Tim:
    • APR
    • ADS Backup meeting
    • DMF Futures
  • Cheney
    • DMF training
    • Model for disk servers
    • set up more users on DMF
  • James A:
    • APR and Job Plan
    • Clearing backlog of tickets
    • Encouraging users to move away from touch
    • Tending to Batch and Storage farms
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Hardware failure metrics continue.
    • Continue SL08 testing.
    • Continuous decommissioning old batch systems.(R 27)
    • Continue Labelling racks and systems in UPS and HPD room.
    • Book review meeting with Andrew and James for Fabric metrics for other hardware failures.

Absences

Fabric On-Call

  • Ian Primary oncall Monday - Thursday
  • Kash Fabric on call Friday-Sunday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1