Production Team Report 2010-01-18

From GridPP Wiki
Jump to: navigation, search

RAL Tier1 Production Team Report for 18th January 2010.

Again, thanks to everyone for keeping the systems going despite the snowy weather.

AoD This Week

Mon-Tues: John Wed: Catalin Thu: Gareth Thu-Fri: John

Last Week (21-24 December)

  • Gareth: AoD (bit less than 1 day), Scheduling and rescheduling changes for January.
  • John: AoD (3+ days). Checksumming files (for LHCb FSPROBE errors), Script to enable Dashboard to query Overwatch (to report servers in intervention).
  • Tiju: A/L.

Changes to Operating procedures

  • None

Declared Outages in GOC DB

  • At Risk on Castor for RAC noes memory upgrade (18th Jan 09:00 - 22nd Jan 16:00)
  • SRM updates: Tuesday (GEN, CMS, LHCb), Wednesday (Atlas) (At Risk).
  • Big Intervention on 27/29th (Wed & Thursday).
    • Stop batch from 20:00 on 24th Jan (Sunday) to 28th 17:00.
    • FTS outage 27th (07:00 - 19:00)
    • LFC Outage 27th (08:00 - 19:00)
    • Castor outage 27th 08:00 - 28th 17:00.

More details of proposed timetable for the changes within those time windows on internal Wiki at:

 https://wiki.e-science.cclrc.ac.uk/web1/bin/view/EScienceInternal/January2010Plans

The following are expected to be added to the GOC DB:

  • Tuesday 26th: Migration of 3D databases back to EMC disk arrays. Essentially work for the database team, but could (if problems) interfere with strategy meeting?
  • CIP update Thursday 14:30 - 15:30.
  • Grid Services nodes (kernel updates) "At Risk" Wednesday 27th.
    • But LFC front ends "At Risk" Thursday 28th 12:00-14:00

Not yet scheduled in:

  • At Risk for Castor Atlas & LHCb for replacing RAC node (replace cdbc08 with cdbe07).