Production Team Report 2011-01-10
From GridPP Wiki
Revision as of 16:49, 10 January 2011 by Gareth smith (Talk | contribs)
Contents
RAL Tier1 Production Team Report for 10th January 2011.
AoD This Week
Mon: Gareth Tue - Wed: Tiju Thu: Gareth Fri: Tiju
Last week
- Gareth: AoD for 1 day; Work on Post Mortems
- John: AoD - 3 days. nagios work; dumped Wiki
- Tiju: A/L
Changes to Operating procedures
- None
Declared Outages in GOC DB
- Mon/Tues 17/18 January: 64-bit OS on Atlas Castor disk servers.
Advanced Warning
- Sat 22nd January - Power outage in Atlas building.
Other Changes
- Fabric:
- Application of kernel update to batch server.
- Addition of additional gateway address to enable additional IP range.
- Double the network link to the tape robot stack (stack 12). (Requires Castor stop).
- Swap out the older of the pair of SAN switches in the Tier1 Oracle databases for its new replacement. (Requires FTS, LFC, 3D stop).
- Database:
- Oracle 10.2.0.5 upgrade. (Will do after CERN has done updates to like databases).
- Re-visit non-Castor database multipathing
- Increase shared memory for OGMA, LUGH & SOMNUS (proposed for Tuesday 18th Jan).
- Grid Services:
- Changes to increase resilience of the BDII service
- glite update on site BDII nodes.
- Change batch Job OS selection mechanism (part of enable scheduling by node).
- Deployment of Atlas Squid servers.
- Castor:
- Upgrade CMS Disk Servers to 64-bit - end January / Early February.
- Upgrade GEN Disk Servers to 64-bit - date TBD.
- Change ATLAS castor permissions to prevent users deleting data
- Upgrade Puppetmaster & Clients
- Networks:
- None