Tier1 Operations Report 2010-03-24

From GridPP Wiki
Jump to: navigation, search

RAL Tier1 Operations Report for 24th March 2010.

Review of Issues during week 17th to 24th March 2010.

  • GDSS126 (CMS WanOut) was out of production from the end of Thursday afternoon (18th March) to Friday morning (19th) to enable the RAID array to rebuild following a disk failure.
  • A configuration problem was found on server GDSS346 (AtlasMCtape) which was taken out of production from Monday (22nd March) to Tuesday (23rd).
  • There was an issue with CMS migration queues over the weekend. Resolved on Monday morning by a daemon restart.
  • On Monday 22nd March GLEXEC was enabled on the Worker Nodes.

Current operational status and issues.

  • A rolling upgrade to the batch nodes to SL 5.4 is under way.

Declared in the GOC DB:

  • Thursday 25th March. Clean-up of non-Atlas LFC schema (lfc.gridpp.rl.ac.uk). This is to remove redundant information from when the Atlas and non-Atlas LFCs were split.

Not in the GOC DB. Noted Here:

  • Thursday 25th March. Half of tape drives will receive update microcode in a rolling update.

Advanced warning:

The following items remain to be scheduled:

  • Castor Oracle Database infrastructure. One change, the removal of unstable node from Oracle RAC and its replacement by another node, remains to be done.
  • Kernel and glibc updates will need to be done on Oracle database (RAC) nodes. Awaiting change control review.

Entries in GOC DB starting between 17th and 24th March 2010.

There were no entries in the GOC DB for this last week.