RAL Tier1 weekly operations Grid 20090727

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

Developments

  • Derek
    • Quattorising torque
    • Worker node update
    • Blog software update
  • Matt
    • Catchup
    • Finish WLCG accounting
    • Move MyProxy to backup host (Kash replaced disks, and made them hotswappable on both hosts)
    • PPS/CASTOR Pre-Prod post shortlisting
    • Check quattor-generated Maui configuration
    • Deploy PPS top-level BDII
    • Set up test FTS instance for testing 2.2 release.

Operational Issues and Incidents

Description Start End Affected VO(s) Severity
lcgce07 - misconfiguration 2009-07-21 2009-07-22 All Low (SL5)
lfc0448 - SMART errors detected 2009-06-15 Ongoing All Low
lcgpx0619 - RAID failure 2009-07-03 2009-07-24 All Low
helpdesk DB tables not backed up 2009-07-01 Ongoing none Medium
lcgmon01 - SMART errors detected 2009-07-18 Ongoing None Medium

Plans for Week(s) Ahead

Development Priorities

  • Catalin
    • Catching up
    • Tune WMS/LB servers
    • Prepare documentation about the LFC separation
  • Derek
    • Continue quattorising torque server
    • Interview Tours (Tues pm)
  • Matt
    • PPS/CASTOR Pre-Prod interviews (Tuesday)
    • Update SL4/SL5 migration plan (distribute to VOs)

Resource Requests

Downtimes

Description Start End Affected VO(s)
LFC ATLAS separation August August All

Requirements and Blocking Issues

Description Required By Priority Status
SL5 Worker Node Kickstart High Post-kickstart configuration needed; not yet suitable for bulk deployment
lfc0448 disk failures Medium Disk replacement needed
lcgmon01 disk failures Medium Disk replacement needed
Non-capacity HW for testing Medium Still using the old HW
Hardware for PPS Medium May need to deploy imminently

OnCall/AoD Cover

  • Primary OnCall
  • Grid OnCall: Catalin (Mon-Thu); Derek (Fri-Sun)
  • AoD