RAL Tier1 weekly operations Fabric 20110307

From GridPP Wiki
Revision as of 16:12, 7 March 2011 by Ian collier (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
    • Finalising production cvmfs mirror/replica
    • switched WNs to use on site cvmfs replica
    • Fixed cvmfs config issues and published solution
    • further iSCSI benchamarking & research
    • Planning deployment of management network hardware
  • Tim:
  • James A:
    • Benchmarking Viglen 2010 Worker Nodes
  • James T
    • Documentation
    • Viglen 2010 disk server CASTOR configuration via quattor
    • Handed Viglen 2010 machine to CASTOR team for testing
    • Started working on CASTOR 2.1.10-0 upgrade quattor templates
    • SL08 testing
  • Cheney
    • Some more testing of rsync for greg matthews
    • some more testing of amanda backups with nick hill
    • model of disk resources
    • hinode infosec
    • try to get sgi licenses
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • gdss380 added new mac address in dhcp, need re-install.
    • Hotspare configuration in SL09 disk servers. (Completed)
    • gdss496 re created raid array with arcconf.(Initializing)
    • gdss210 and gdss283 using for cannibalising.
    • Dell Engineer is visiting today to replace memory.
    • Update firmware on Jetstor systems.(ongoing) Updated on two.
    • Arrange collection with vendors to return the faulty parts.
    • Test room review. (Every Monday morning)
    • Check Clustervision new batch systems. (Testing)
    • Added Fabric metrics for the month of February 2011.
    • More drive failures in loggers1 & 2.
    • SL08 testing continue.
    • Viglen switch and cables packed and moved to logistics for collection.
    • Replaced drive in Jetstor2 port 10.
    • Labelling racks and systems in UPS and HPD room.


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
    • Health & Safety course
    • Work on new database systems
  • Ian:
    • Monitoring cvmfs squids
    • Implement update notification mechanism for cvmfs - together with developers
    • Finalise plans for management network backbone
    • Virtualisation deployments
    • Sort out firewall/routing issues for new subnet


  • Tim:
  • Cheney
    • DMF DR
  • James T:
    • Documentation
    • SL08 testing
    • Officially begin handover of disk to James A.
    • Handover ganglia to Production Team
  • James A:
    • Deploying Viglen 2010 Worker Nodes
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Hardware failure metrics continue.
    • Continue SL08 testing.
    • Continuous decommissioning old batch systems.(R 27)
    • Continue Labelling racks and systems in UPS and HPD room.

Absences

    • Martin on H&S Course Monday & Tuesday
    • James working at home on Monday, waiting for the gas man

Fabric On-Call

  • Ian Primary on-call Monday - Sunday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1