RAL Tier1 Operations Report for 22nd February 2012

Review of Issues during the week 15th to 22nd February 2012.

There were some failures of the Atlas SRM SAM tests early on Friday morning. At the moment there is a known problem with the Atlas SRMs that is worked around by an aggressive re-starter. However, this failed and one of the SRMs was manually restarted after a call-out.

Resolved Disk Server Issues

None.

Current operational status and issues.

There is a known issue with the Atlas SRMs (see above).
There is a problem with some batch job submission. This is believed to be when a VO uses information from the bdii in the submission process and was exposed by the batch server upgrade last week.

Ongoing Disk Server Issues

None

Notable Changes made this last week

On Monday (20th February) The CMS Castor instance was upgraded to version 2.1.11-8 with new hardware being introduced for the Atlas Castor head nodes.
The same update for the Atlas Castor instance has just been completed this morning (Wed. 22nd Feb.)

Forthcoming Work & Interventions

Thursday 23 Feb. Application of Oracle "PSU" patches to Atlas 3D & LHCb 3D/LFC systems ("OGMA" & "LUGH")
Tuesday 28th Feb - morning. Electrical work in morning to prepare for moving part of the cooling system onto the UPS supply. Some other electrical work carries on for the whole week (27 Feb - 2 Mar).
Week beginning 5th March (TBC) FTS update to version 2.2.8.

Declared in the GOC DB

Monday 27 Feb. Upgrade of LHCb Castor instance to version 2.1.11-8.
Wednesday 29 Feb. Upgrade of GEN Castor instance to version 2.1.11-8.

Advanced warning for other interventions

The following items are being discussed and are still to be formally scheduled and announced. We are carrying out a significant amount of work during the current LHC stop.

Databases:
- Regular Oracle "PSU" patches are pending.
- Switch Castor and LFC/FTS/3D to new Database Infrastructure (started)
  - Next step of these changes is to move Castor databases and enable Data Guard.
Castor:
- Update the Castor Information Provider (CIP) (Need to re-schedule.)
- Move to use Oracle 11g (requires a minor Castor update.)
Networking:
- Changes required to extend range of addresses that route over the OPN.
- Install new Routing & Spine layers.
Fabric:
- BIOS/firmware updates, Other re-configurations (adding IPMI cards, etc.)
Grid Services:
- Updates of Grid Services (including WMS, FTS, MyProxy, LFC front ends) to EMI/UMD versions.

Entries in GOC DB starting between 15th and 22nd February 2012.

There were no unscheduled outages during this period.

Service	Scheduled?	Outage/At Risk	Start	End	Duration	Reason
lcgvo05.gridpp.rl.ac.uk	SCHEDULED	OUTAGE	22/02/2012 11:00	21/02/2013 12:00	365 days, 1 hour	Outage on Atlas vobox for Alastair to investigate
srm-atlas.gridpp.rl.ac.uk	SCHEDULED	OUTAGE	22/02/2012 08:00	22/02/2012 16:00	8 hours	Update of Atlas Castor instance to version 2.1.11-8
srm-cms.gridpp.rl.ac.uk	SCHEDULED	OUTAGE	20/02/2012 08:00	20/02/2012 15:35	7 hours and 35 minutes	Update of CMS Castor instance to version 2.1.11-8
lcgwms01.gridpp.rl.ac.uk	SCHEDULED	OUTAGE	09/02/2012 15:00	15/02/2012 12:00	5 days, 21 hours	System unavailable - EMI installation

Open GGUS Tickets


GGUS ID	Level	Urgency	State	Creation	Last Update	VO	Subject
79428	Green	Less Urgent	In Progress	2012-02-21	2012-02-21	SNO+	glite-wms-job aborted
79720	Green	Very Urgent	Waiting Reply	2012-02-21	2012-02-22	t2k.org	All jobs failing at RAL
79283	Red	Top Priority	In Progress	2012-02-16	2012-02-22	LHCb	Job publishing problem for LHCb at RAL
77026	Red	Less Urgent	In Progress	2011-12-05	2012-02-03		BDII
74353	Red	Very Urgent	Waiting Reply	2011-09-16	2012-02-10	Pheno	Proxy not renewing properly from WMS
68853	Red	less urgent	On hold	2011-03-22	2012-02-21		Retirement of SL4 and 32bit DPM Head nodes and Servers (Holding Ticket for Tier2s)

Tier1 Operations Report 2012-02-22

Contents

RAL Tier1 Operations Report for 22nd February 2012

Review of Issues during the week 15th to 22nd February 2012.

Resolved Disk Server Issues

Current operational status and issues.

Ongoing Disk Server Issues

Notable Changes made this last week

Forthcoming Work & Interventions

Declared in the GOC DB

Advanced warning for other interventions

Entries in GOC DB starting between 15th and 22nd February 2012.

Open GGUS Tickets

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools