RAL Tier1 weekly operations Grid 20110221
From GridPP Wiki
Operational Issues
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Status
|
|
|
|
|
|
|
Downtimes
Description
|
Hosts
|
Type
|
Start
|
End
|
Affected VO(s)
|
Blocking Issues
Description
|
Requested Date
|
Required By Date
|
Priority
|
Status
|
|
|
|
|
|
Developments/Plans
Highlights for Tier-1 Ops Meeting
Highlights for Tier-1 VO Liaison Meeting
Detailed Individual Reports
Alastair
- Working on ATLAS permission change. [On hold]
- Setting up xrootd for ATLAS at RAL.
- Disk pool merging and DB change.
- Preparing for Beauty 2011 conference.
- File consistency checking.
Andrew
- Nagios plugin for VOBOX proxy renewal [Ongoing]
- Capacity planning systems; preparations for Capacity Signoff Meeting; post Capacity Signoff Meeting modifications
- Investigating CMS issues (gdss84 D2D problems; Job Robot failures on 16th Feb)
- CMS data ops
- Problematic MC rereco at FNAL (now including glite-WMS/LB problems)
- Started data rereco at RAL, ASGC, FNAL (100 million events)
Catalin
- involved with CREAM CEs installation and configuration [ongoing]
- 3 days A/L
Derek
- Investigating whole node jobs effect on scheduler [done]
- Reviewing CE documentation [done]
- Tidying up/Finishing off in preparation for 2 weeks A/L [done]
Matt
- Second phase of migration of FTS agents to Quattorised h/w. [New]
- Test new MAUI configuration for gridWN queue. [New]
- Contact NFS users. [New]
- Update CRL checks Nagios plugin. [Done]
- Look at Derek's CA patch for Quattor. [Done]
- First phase of migration of FTS agents to Quattorised h/w. [Done]
- Review VOBOX/CE incident. [Done]
Richard
- Added new glite updates 21 and 22 into Quattor. Currently building a test top BDII to check the updates. [Ongoing]
- Moving 1 site and 1 top BDII into UPS room for increased resilience. [Ongoing]
- Trying out new hypervisor (hv-10) to see how much performance has improved (have moved an existing VM across to the new h/v) [Ongoing].
- Building an ARGUS server using the new QWG templates [Ongoing]
- Working on the "team status page" being developed as an action from team awayday [Ongoing]
- Reviewing G/S process documentation [Ongoing]
- CASTOR items:
- Working with SDW to import latest CASTOR quattor structure into the "cert-in-a-box" cluster. [Ongoing]
VO Reports
ALICE
ATLAS
CMS
- As of this morning, CREAM CE SAM tests are critical and count towards site availability.
LHCb
OnCall/AoD Cover
OnCall Rota
- Primary OnCall:
- Grid OnCall: Matt
- AoD: