RAL Tier1 weekly operations Fabric 20100329

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
    • Preperation for Open Day talk
    • Work on HEPiX Virtualisation Working Group distribution method proposal
    • Work on Castor database futures
    • Laptop transfer
    • Change controls notifications
  • Ian:
    • Helped James with SL5 disk server dependencies
    • Contributed to QUattor QuestNet networking bid
    • Documentation of Tier1 Quattor instance


  • Tim:
  • Cheney:
    • cleaning machine room
    • investigate sls timeouts
    • build new robot controller
    • fix zfs on new robot controller
    • investigate oracle install problems
    • check over castor151 backups
    • relocate fibre channel switches
    • replace failed drive in vtl
    • fix backup problems on nagger
    • bring up tape servers after mir problems
  • James T:
    • Testing Viglen 09 Kit
    • SL5 64-bit + XFS quattor disk server build
    • Tier1 tour prep
  • Jonathan:
    • continued work on disposals
    • fixed atlasbackup problems on some nodes
    • updated root SSH authorized keys across farm
    • ran scans of log files after security alert
    • Nagios configuration updates
    • continued work on Quattor-managed Nagios slave server
  • James A:
    • SL54 Upgrade progressing.
    • Tier1 Tour Preperations.
    • Batch system training session.
    • Continued testing of Viglen Worker Nodes.
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • Moved and packed rack sliders from R27 to R89 for return.(wrong sliders)
    • Mac addresses of Dell new 13 systems. (For MJB)
    • Streamline engineers replaced few drives and took 2 disk servers. (gdss483 and 494)
    • Castor servers (cdbc13) still working. (Intervention)
    • install01 (intervention)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)

Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
    • Open Day + talk
    • Prep HEPiX site report
    • Work on HEPiX VWG proposal
  • Ian:
    • Further work on Castor servers in Quattor
    • Help ChrisK apply new lsf licenses
    • Work on Virtualisation Platform


  • Tim:
  • Cheney:
    • Build new robot controller
  • James T:
    • Tier1 tours
    • SL5, 64-bit, XFS disk server build
  • Jonathan:
    • Open Day and OPB tours
    • continue reconfiguration of nagios06
    • continue work on disposal of old kit from A1 Upper machine room
  • James A:
    • Tier1 OPBs and open day.
    • Continuation of SL54 upgrade.
    • Continued testing of Viglen Worker Nodes.
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Continuous decommissioning old batch systems.(R 27)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)
  • James T annual leave Wednesday & Thursday.

Fabric On-Call

Ian Fabric on call Monday - Wednesday

James A Fabric on call Thursday-Monday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1