Production Team Report 2011-01-10

From GridPP Wiki
Jump to: navigation, search

RAL Tier1 Production Team Report for 10th January 2011.

AoD This Week

Mon: Gareth Tue - Wed: Tiju Thu: Gareth Fri: Tiju

Last week

  • Gareth: AoD for 1 day; Work on Post Mortems
  • John: AoD - 3 days. nagios work; dumped Wiki
  • Tiju: A/L

Changes to Operating procedures

  • None

Declared Outages in GOC DB

  • Mon/Tues 17/18 January: 64-bit OS on Atlas Castor disk servers.

Advanced Warning

  • Sat 22nd January - Power outage in Atlas building.

Other Changes

  • Fabric:
    • Application of kernel update to batch server.
    • Addition of additional gateway address to enable additional IP range.
    • Double the network link to the tape robot stack (stack 12). (Requires Castor stop).
    • Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
  • Database:
    • Oracle 10.2.0.5 upgrade. (Will do after CERN has done updates to like databases).
    • Re-visit non-Castor database multipathing
    • Increase shared memory for OGMA, LUGH & SOMNUS (proposed for Tuesday 18th Jan).
  • Grid Services:
    • Changes to increase resilience of the BDII service
    • glite update on site BDII nodes.
    • Change batch Job OS selection mechanism (part of enable scheduling by node).
    • Deployment of Atlas Squid servers.
  • Castor:
    • Upgrade CMS Disk Servers to 64-bit - end January / Early February.
    • Upgrade GEN Disk Servers to 64-bit - date TBD.
    • Change ATLAS castor permissions to prevent users deleting data
    • Upgrade Puppetmaster & Clients
  • Networks:
    • None