RAL Tier1 weekly operations Grid 20090622

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

Developments

  • Catalin
    • Work on LFC streaming
    • WMS service draining
    • Tests on SL5 WNs
  • Derek
    • YII Objectives
    • Wrote Nagios test for Cream CE issue
    • Investigating possible solutions to LCG CE file limit problem
    • Deployed test Quattor configuration for site and top-level BDIIs
  • Matt
    • Completed R89 Rack Migration templates (whole team).
    • Migrated MyProxy service to non-migrating rack.

Operational Issues and Incidents

Description Start End Affected VO(s) Severity
Production pool account at 32k subdirectory limit 2009-06-03 Ongoing ATLAS High
LB01 RAID failure 2009-06-17 Ongoing All Low
lfc0448 - SMART errors detected 2009-06-15 Ongoing All Low
ce.ngs - SAN problems 2009-06-16 - 17 Done egee test vo Low

Plans for Week(s) Ahead

Downtimes

Description Start End Affected VO(s)
WMS drain ahead of R89 move 2009-06-17 10:00 2009-06-26 12:00 All

Development Priorities

  • Catalin
    • support the R89 move (if needed)
    • finalise recovery documentation
    • debug the LFC streaming (with Carmine)
  • Derek
    • Continuing investigation of LCG CE 32k file solutions
    • Refine YII Objectives
    • Quattorise test LFC
  • Matt
    • Plan SL4 to SL5 migration.
    • Migrate MyProxy service to R89 CPU rack.
    • R89 late rota cover Thu/Fri.

Requirements and Blocking Issues

Description Required By Priority Status
SL5 Worker Node Kickstart High Post-kickstart configuration needed; not yet suitable for bulk deployment
LB01 RAID failure Medium Disk replacement needed
lfc0448 disk failures Medium Disk replacement needed
Non-capacity HW for testing Medium Still using the old HW
Hardware for PPS Medium May need to deploy imminently

OnCall/AoD Cover

  • Primary OnCall
    • Catalin: Fri-Sun
  • Grid Oncall
    • Derek: Mon-Thu
  • AoD
    • None