RAL Tier1 weekly operations Grid 20100719

Operational Issues

Description	Start	End	Affected VO(s)	Severity	Status
Job status monitoring from CREAMCE	2-Feb-2010		CMS	medium	[10-Feb-2010] WMS patch available soon; CREAMCE new version available soon [07-Apr-2010] CMS tests have shown that WMS patches resolve the problem; still waiting for patch to be installed on the production WMSs in Italy [13-Jul-2010] CNAF WMSs have been updated; testing using backfill is in progress
WMS03	16-Jul-2010	16-Jul-2010	Non-LHC	low	Was unresponsive and rebooted

Downtimes

Description	Hosts	Type	Start	End	Affected VO(s)

Blocking Issues

Description	Requested Date	Required By Date	Priority	Status
HW needed to test Dataguard technology for LFC/FTS	19 May 2010	15 June 2010	Medium	[24-05-2010]HW available; needs to be deployed by Fabric and then handed over to Dataservices
#61658: HW request for CMS Squid VOBOX	30 June 2010		Medium	[30-06-2010]Request made
#62179: Request for new CMS pool accounts	16 July 2010		High	[16-07-2010]Request made

Developments/Plans

Highlights for Tier-1 Ops Meeting

Mayo has now left, please remove any access he may have had
Only 2 Grid Team members in on Wed-Thu
New CMS t1production role
Batch farm full :-), causing issues for CMS :-(

Highlights for Tier-1 VO Liaison Meeting

Investigating options for limiting Alice jobs after CMS ran work elsewhere over the weekend
Progressing with enabling new CMS role on batch farm
Roll out an upgrade of the top level BDIIs next week (At-risk)
2 crashes of WMS03 with no obvious cause

Detailed Individual Reports

Alastair

Working on ATLAS software server on /afs [ongoing]
Written script to identify unavaliable files when a disk server is taken out of production. [testing]
Looking into Slow LHCb transfers between SARA and RAL. (fix with James T now)
Working to improve pbsjobs database to allow easier monitoring of production work.
Working on ATLAS Frontier service, monitoring and backup.

Andrew

Investigated slow transfers of an important MC dataset to many T2s [Done]
Added Ganglia monitoring of CMS data transfers (volume per day & rates) to/from CERN, T1s, T2s [Done]
Preparations for new CMS t1production role
- Working on change-control form & implementation plan; submitted request for Fabric for new pool accounts
Updated FTS monitor to v1.4 [Done]
Understanding disk & tape capacity calculations
CMS data ops
- MC production at CNAF
- backfill (MC production) at RAL; testing CREAM CEs
- Data reprocessing at FNAL
Try glite-APEL installation in testbed [To do]
Write script for checksum checking of last file on T10KB tapes [To do]

Catalin

Python course (Mon - Thu) RAL R1

Derek

Sync'd testbed against QWG profiles [Done]
Rebooted lcgwms03 [Done]
Debugging t2k job submission issues
CIC broadcast for lcgce02 decommission [Done]
Writing Strawman Cloud strategy [ongoing]
Sync production templates against QWG

Matt

Richard

Submitted change control request for updating RAL top-level BDIIs [done]
Working on the "team status page" being developed as an action from team awayday [ongoing]
Reviewing G/S process documentation [ongoing]
Developed a tool to help with automating the wiki page on grid middleware versions [done]
CASTOR items:
- Continue trying to get 2.1.9 functional tests running on pre-prod

VO Reports

ALICE

waiting for CREAM-CE 1.6 deployment at RAL
cannot roll out new xrootd version (20100510-1509_dbg) on Castor 2.1.7

ATLAS

CMS

Due to CMS unable to get any job slots at RAL, v2 of an urgent workflow was run at FNAL. The v1 finally generated at RAL has been deleted.
Started to use CREAM CEs again due to upgrade of CNAF WMSs; no problems so far.

LHCb

OnCall/AoD Cover

Primary OnCall: Catalin
Grid OnCall:
AoD:

RAL Tier1 weekly operations Grid 20100719

Contents

Operational Issues

Downtimes

Blocking Issues

Developments/Plans

Highlights for Tier-1 Ops Meeting

Highlights for Tier-1 VO Liaison Meeting

Detailed Individual Reports

Alastair

Andrew

Catalin

Derek

Matt

Richard

VO Reports

ALICE

ATLAS

CMS

LHCb

OnCall/AoD Cover

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools