RAL Tier1 Operations Report for 4th July 2012

Review of Issues during the week 27th June to 4th July 2012

On Friday (29th June) and over the weekend (Sunday 1st July) there was a backlog of migrations to tape for CMS. This was not an operational problem and resolved itself - it was caused by the high rate of tape access going on for that period.

Resolved Disk Server Issues

GDSS586 (AtlasGroupDisk - D1T0) was out of service for a couple of hours on Monday 2nd July for battery replacement.
Also on Monday 2nd July four disk servers from Alice Tape (D0T1) were taken out of production for a short while (less than an hour) for a disk controller firmware update.

Current operational status and issues

On 12th/13th June the first stage of switching ready for the work on the main site power supply took place. The work on the two transformers is expected to take until 18th December and involves powering off one half of the resilient supply for 3 months while being overhauled, then repeat with the other half.
There is still a problem with the reporting of disk capacity to be followed up.

Ongoing Disk Server Issues

GDSS607 (LHCbDst - D1T0) has been out of service for some time. It is being swapped for a different server.

Notable Changes made this last week

Moved Castor databases to Oracle 11. (Currently running without DataGuard which we expect to re-instate tomorrow).
FTS Database moved back to correct Oracle RAC (Somnus) (which is at Oracle 11).
Disk servers and Castor headnodes rebooted to update kernels & errata.
EMI installation on WMS02. (This means all WMSs now at EMI WMS v3.3.5).

Declared in the GOC DB

None

Advanced warning for other interventions

The following items are being discussed and are still to be formally scheduled and announced.

The FTS Agents are being progressively moved to virtual machines.

Listing by category:

Databases:
- Switch LFC/FTS/3D to new Database Infrastructure.
Castor:
- A minor update to the Castor Information provider (CIP).
- Upgrade to version 2.1.12.
Networking:
- Install new Routing layer for Tier1 and update the way the Tier1 connects to the RAL network. (Plan to co-locate with replacement of UKlight network).
- Update Spine layer for Tier1 network.
- Replacement of UKLight Router.
- Addition of caching DNSs into the Tier1 network.
Grid Services:
- Updates of Grid Services as appropriate. (Services now on EMI/UMD versions unless there is a specific reason not.)

Entries in GOC DB starting between 27th June and 4th July 2012

There were no unscheduled outages during the last week. We also note that although the batch system was declared as down during the Castor Oracle database move on 27th June, already running Atlas batch jobs were allowed to continue to run through the intervention. Other VO's batch jobs were paused. (No batch jobs were allowed to start during this period.)

Service	Scheduled?	Outage/At Risk	Start	End	Duration	Reason
All Castor & Batch	SCHEDULED	OUTAGE	27/06/2012 08:45	27/06/2012 12:00	3 hours and 15 minutes	Storage (Castor) and Batch (CEs) unavailable. Oracle database behind Castor being moved to Oracle 11.
lcgfts.gridpp.rl.ac.uk,	SCHEDULED	OUTAGE	27/06/2012 07:45	27/06/2012 10:45	3 hours	Service drained then unavailable while back end (Oracle) database moved back to correct Oracle RAC.
lcgwms02.gridpp.rl.ac.uk,	SCHEDULED	OUTAGE	21/06/2012 12:00	27/06/2012 13:00	6 days, 1 hour	EMI installation WMSv3.3.5

Open GGUS Tickets


GGUS ID	Level	Urgency	State	Creation	Last Update	VO	Subject
83768	Green	Urgent	Waiting Reply	2011-07-02	2012-07-03	NA62	FTS channel from Liverpool to RAL
83578	Red	Urgent	Waiting Reply	2011-06-26	2012-06-26	MICE	Tape space on Castor for mice reconstructed data
83564	Red	Less Urgent	Waiting Reply	2011-06-25	2012-07-02	MICE	Software area for MICE data reconstruction
68853	Red	Less Urgent	On hold	2011-03-22	2012-06-25	N/A	Retirenment of SL4 and 32bit DPM Head nodes and Servers

Tier1 Operations Report 2012-07-04

RAL Tier1 Operations Report for 4th July 2012

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools