RAL Tier1 weekly operations Grid 20110509
From GridPP Wiki
Operational Issues
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Status
|
Downtimes
Description
|
Hosts
|
Type
|
Start
|
End
|
Affected VO(s)
|
Blocking Issues
Description
|
Requested Date
|
Required By Date
|
Priority
|
Status
|
|
|
|
|
|
Developments/Plans
Highlights for Tier-1 Ops Meeting
Highlights for Tier-1 VO Liaison Meeting
Detailed Individual Reports
Alastair
Andrew
- April UB schedule, metrics [Done]
- Updated APEL Nagios check (add check of APEL sync test) [Done]
- lcgfts01 OS kernel/errata update [Done]
- Old diskserver removal/draining; removal of cmsWanout; adding diskservers to cmsFarmRead [Ongoing]
- Looked into recent CMS problems [Done]
- Updated FTS Monitor to 1.5.3 [Done]
- Fixing problems with cmsUnmerged plots in castormon [Ongoing]
Catalin
- work on BDII stability [ongoing]
- involved with CREAM CEs installation and configuration [ongoing]
- update glite LFC [ongoing]
- work on quattorised ATLAS Frontier installation [stalled]
- work on non-LHC WMS stability
Derek
- Catching up after A/L [done]
- Investigating issues with lcgce08 [done]
- Incorporating mysql tuning params for CREAM CEs into quattor [done]
- Change control for Quatt'ing lcgce03 [done]
- Trying to get IPMI ip address for services hosts resolved [in progress]
- Documentation [ongoing]
- Moving to 50% Tier 1 on Thursday 12th
VO Reports
ALICE
- large amount of user jobs (~24k out of 26k); efficiency irrelevant, stability of services more important
ATLAS
CMS
- Reprocessing ongoing at all Tier-1s (a lot is still to come...)
- CMS now using 3 GB queue
LHCb
OnCall/AoD Cover
OnCall Rota
- Primary OnCall:
- Grid OnCall: Derek (Mon-Sun)
- AoD: