Production Team Report 2010-11-01

From GridPP Wiki
Jump to: navigation, search

RAL Tier1 Production Team Report for 1st November 2010.

AoD This Week

Mon - Wed: Tiju Thu: Gareth Fri: Tiju

Last week

  • Gareth: AoD(1days), A/L (half day).
  • John: AoD(4days)
  • Tiju: Investigate icinga, Added apollo, Other nagios work

Changes to Operating procedures

  • Primary On-Call Rota now on Google.
  • Atlas running User Jobs (making use of CVMFS)

Declared Outages in GOC DB

  • Tues 2nd Nov: "Warning" on all SRM end points. Tape System Unavailable. Work on tape robot to resolve problem with power supply cooling.
  • Tues 2nd Nov: "Warning" on MyProxy for migration to Quattorized service.
  • Wednesday 3rd Nov. "Warning" on Site-BDII for rolling update to glite 3.2 and SL5.

Advanced Warning

  • Remaining Castor: Current scheduling (T.B.C) on following dates:
    • Upgrade CMS - Tuesday - Thursday 16-18 November.
    • Upgrade ATLAS - Monday - Wednesday 6 - 8 December.
  • Monday 13th December - UPS test.

Other Changes

  • Fabric:
    • Upgrade of Disk servers to 64-bit OS (to resolve checksum problem).
    • Tape Drive Microcode Update.
    • Double the network link to the tape robot stack (stack 12), postponed from the last TS. (Requires Castor stop).
    • Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
    • Update firmware in RAID controller cards for a batch of disk servers.
  • Database:
    • Re-visit non-Castor database multipathing
  • Grid Services:
    • Replacing the MON box with glite-APEL
  • Castor:
    • Possible SRM update
  • Networks:
    • None
  • Atlas
    • Enable user jobs.