RAL Tier1 weekly operations Grid 20110307
From GridPP Wiki
Operational Issues
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Status
|
|
|
|
|
|
|
Downtimes
Description
|
Hosts
|
Type
|
Start
|
End
|
Affected VO(s)
|
Blocking Issues
Description
|
Requested Date
|
Required By Date
|
Priority
|
Status
|
|
|
|
|
|
Developments/Plans
Highlights for Tier-1 Ops Meeting
Highlights for Tier-1 VO Liaison Meeting
Detailed Individual Reports
Alastair
- Working on ATLAS permission change. [On hold]
- Setting up xrootd for ATLAS at RAL.
- Talking to ALICE
- Looking into upgrading castor client on all WN.
- Disk pool merging and DB change.
- Cleaning up dark data [Ongoing]
- Writing change control [Done]
- Moving files! [Ongoing]
- Preparing for Beauty 2011 conference.
- Requested new VO box for ATLAS Frontier.
Andrew
- Migration to FTS groups for CMS [Done]
- Prepared FTS groups setup for ATLAS [Done]
- Feb accounting; migrated tape usage from vmgr to ns in UB schedule & capacity planning system [Done]
- Kernel/errata updates [Done]
- CMS storage consistency check; setup script/cron to run monthly. [Done]
- CMS squid name changes [Ongoing]
- CMS data ops
- Installed new PA instances required for FNAL move to Lustre [Done]
Catalin
- work on quattorised ATLAS Frontier installation
- apply latest errata and kernel
- assist work on LFC Oracle DB change [ongoing]
- involved with CREAM CEs installation and configuration [ongoing]
- two new VOS to be added to the LFC [done]
- GGUS issue with pheno affecting lcgwms03 [done]
Derek
- Catching up after leave [done]
- Investigating load problems on lcgce05 [done]
- Investigating BLParser isssues on lcgce09 [ongoing]
- Publishing whole node queue [ongoing]
Matt
- Deploying test Hadoop instance. [Ongoing]
- Contact NFS users. [Ongoing]
- Deploying FTS test instance on new virtual hosts. [Done]
Richard
- Updating Site level BDIIs to level 21. [Ongoing]
- Moving one more top BDII into UPS room for better resilience. [Ongoing]
- Trying out new hypervisor (hv-10) to see how much performance has improved (have moved an existing VM across to the new h/v) [Ongoing].
- Building an ARGUS server using the new QWG templates [Ongoing]
- Working on the "team status page" being developed as an action from team awayday [Ongoing]
- Reviewing G/S process documentation [Ongoing]
- CASTOR items:
- Developed a script to stress test FTS xfers in/out of preprod instance. [Ongoing]
VO Reports
ALICE
ATLAS
CMS
- 2011-02-28: CREAM CE temporarily blacklisted by a CERN WMS, leading to 35 Job Robot jobs aborting.
- Large MC reprocessing will start across all T1s sometime this week
LHCb
OnCall/AoD Cover
OnCall Rota
- Primary OnCall:
- Grid OnCall: Derek
- AoD: