RAL Tier1 weekly operations Grid 20110314
From GridPP Wiki
Contents
Operational Issues
Description | Start | End | Affected VO(s) | Severity | Status |
---|---|---|---|---|---|
ATLAS Frontier server (lcgce04) affected by DNS changes | Thu 10 Mar 18:15 | Fri 11 Mar 13:00 | ATLAS | Medium | Wrong DNS change. Reverted back next day. |
non-LHC WMS (lcgwms03) unavailable | Mon 14 Mar 00:30 | Mon 14 Apr 09:30 | non-LHC | High | Host affected by high no of I/O operations; reboot needed |
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|---|---|---|---|---|
Oracle DB down - installation of more isolating transformers | lcglfc0{669,670,671,672,673,674,675}, lcgvo-s3-03, lcgvo-s3-04, lcgsql-s3-12 | Tue 15 Mar 07:40 | Tue 15 Mar 11:00 | All |
Blocking Issues
Description | Requested Date | Required By Date | Priority | Status |
---|---|---|---|---|
Developments/Plans
Highlights for Tier-1 Ops Meeting
Highlights for Tier-1 VO Liaison Meeting
Detailed Individual Reports
Alastair
- Working on ATLAS permission change. [On hold]
- Setting up xrootd for ATLAS at RAL.
- Talking to ALICE
- Looking into upgrading castor client on all WN.
- Disk pool merging and DB change.
- Cleaning up dark data [Ongoing]
- Writing change control [Done]
- Moving files! [Done!]
- Preparing for Beauty 2011 conference.
- Requested new VO box for ATLAS Frontier.
Andrew
- CMS squid name changes [Done]
- Learning about setting up tape families for data & MC; responded to 3 tickets [Done]
- Attended CMS UK computing meeting at Imperial [Done]
- PhEDEx Dev instance upgraded to 4_0_0; Prod & Debug to do [Ongoing]
- Improved APEL Nagios plugin [Done]
- Deleted CMS dark data, tidying up empty directories [Done]
- CMS Data Ops
- MC rereco at FNAL [Ongoing]
Catalin
- preparation for electrical intervention on Tuesday
- investigate another problem/crash on lcgwms03
- involved with CREAM CEs installation and configuration [ongoing]
- work on quattorised ATLAS Frontier installation [ongoing]
- tomcat v6.0.32 upgrade on ATLAS Frontier server [done]
- apply latest errata and kernel [done]
- assist work on LFC Oracle DB change [done]
Derek
- Investigating BLParser isssues on lcgce09 [ongoing]
- Publishing whole node queue [done]
- Errata updates [done]
- Improving config of small vos in quattor [ongoing]
- Metrics report [done]
Matt
- Deploy testbed LFC and MyProxy. [New]
- Management of FTS groups. [New]
- Prep for training course (Mon-Wed next week). [New]
- Testing Hadoop instance. [Ongoing]
- Contact NFS users. [Ongoing]
Richard
- Dealing with fall-out from moving a top BDII into the UPS room. [Ongoing]
- Trying out new hypervisor (hv-10) to see how much performance has improved (have moved an existing VM across to the new h/v) [Ongoing].
- Building an ARGUS server using the new QWG templates [Ongoing]
- CASTOR items:
- Built cfssh09.gridpp.rl.ac.uk as a StorageD server. [Done]
- Running some stress tests on preprod instance. [Ongoing]
VO Reports
ALICE
ATLAS
CMS
- MC reprocessing has started
- Deleted 63 TB dark data and over 443000 empty directories
- PhEDEx 4_0_0 released: supports FTS checksumming and Twitter
LHCb
OnCall/AoD Cover
- Primary OnCall:
- Grid OnCall: Derek
- AoD: