RAL Tier1 weekly operations Grid 20090817

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

Developments

  • Andrew
    • Induction
  • Catalin
  • Derek
    • Drafting detailed plan for SL5 migration
    • Finished quattorising torque server config
    • Started quattorising WN
    • Implemented helpdesk dump
  • Matt
    • LFC:
      • ATLAS front-end separation (DNS alias, GOCDB, IS changes)
    • WLCG accounting
    • Failover to backup MyProxy host
    • Test deployment of gLite 3.2 (SL5) UI using Quattor

Operational Issues and Incidents

Description Start End Affected VO(s) Severity
MyProxy unavailable (no backup host following failover) 2009-08-12 (00:30) 2009-08-07 (09:30) All Medium
lfc0448 - SMART errors detected 2009-06-15 Ongoing ATLAS Low
helpdesk DB tables not backed up 2009-07-01 Ongoing None Low

Plans for Week(s) Ahead

Development Priorities

  • Andrew
    • Retrieve Grid certificate
    • Start to use LCG
  • Catalin
  • Derek
    • Continue quattorising worker node
    • Bring quattorised batch server to production level
  • Matt
    • LFC
      • Chase ATLAS use of non-ATLAS front-ends
      • Back-end split?
    • FTS m/w updates (to meet baseline requirement)?
  • Richard
    • Induction

Resource Requests

Downtimes

Description Start End Affected VO(s)
LFC ATLAS back-end separation August 20 (08:00)? August 20 (12:00)? ATLAS, MINOS

Requirements and Blocking Issues

Description Required By Priority Status
SL5 Worker Node Kickstart High Post-kickstart configuration needed; not yet suitable for bulk deployment
Non-capacity HW for testing Medium Still using the old HW
Hardware for PPS Medium We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this.
lfc0448 disk failures Low Disk replacement needed

OnCall/AoD Cover

  • Primary OnCall
  • Grid OnCall: Matt
  • AoD: