RAL Tier1 weekly operations Grid 20100809
From GridPP Wiki
Contents
Operational Issues
Description | Start | End | Affected VO(s) | Severity | Status | |
---|---|---|---|---|---|---|
Job status monitoring from CREAMCE | 2-Feb-2010 | CMS | medium | [10-Feb-2010] WMS patch available soon; CREAMCE new version available soon [07-Apr-2010] CMS tests have shown that WMS patches resolve the problem; still waiting for patch to be installed on the production WMSs in Italy [13-Jul-2010] CNAF WMSs have been updated; testing using backfill is in progress [19-Jul-2010] So far everything looks good | ||
FTS02 | 21-Jul-2010 | All | High | SMART errors on both FTS02 disks, Fabric have replacements and wish to arrange swap out |
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|---|---|---|---|---|
Blocking Issues
Description | Requested Date | Required By Date | Priority | Status |
---|---|---|---|---|
HW needed to test Dataguard technology for LFC/FTS | 19 May 2010 | 15 June 2010 | Medium | [24-05-2010]HW available; needs to be deployed by Fabric and then handed over to Dataservices |
Developments/Plans
Highlights for Tier-1 Ops Meeting
- Baseline updates (WMS, BDII, CE)
- Quattor development for FEs (LFC, FTS)
- Comparison between CIP, overwatch, and Grid disk accounting.
- Testing FTS new timeout parameters
Highlights for Tier-1 VO Liaison Meeting
Detailed Individual Reports
Alastair
- Working on ATLAS software server [ongoing]
- Working on ATLAS Frontier service, monitoring and backup.
- Working on testing FTS timeout limits.
- Working on ATLAS B-Physics software code.
Andrew
- Updated PhEDEx prod & debug instances to 3_3_2 [Done]
- CMS CASTOR 2.1.9 testing, including xroot with CMSSW [Ongoing]
- DAC-Overwatch-BDII disk capacity consistency web pages
- July accounting [Done]
- A/L from 10th August, back on 20th August
Catalin
- submitted change control request for glite-WMS update [done]
- ATLAS frontier monitoring [ongoing]
- test LFC quattor profiles (SL4 and SL5) [ongoing]
- prepare gLite updates for WMS03
- work on improving ganglia monitoring for Grid Services
Derek
- Writing Strawman Cloud strategy [ongoing]
- CREAM CE quattor profile [ongoing]
- Investigating CREAM CE instability [ongoing]
- Handed over blog maintenance to production team
Matt
- Build gLite3.2 FTS test node
- Add timeout configuration to local FTS information (SVN)
- Audit wLCG pledges vs. deployed disk
- Finish first pass of ascii FTS docs; look at build system
Richard
- Submitted c/c request for s/w update on RAL top-level BDIIs [done]
- Working on the "team status page" being developed as an action from team awayday [ongoing]
- Reviewing G/S process documentation [ongoing]
- Demonstrated a prototype version of tool for automating the wiki page on grid middleware versions
- Testing a pair of quattorised site-level BDIIs
- Preparing a talk on how to write a Quattor component
- CASTOR items:
- Finishing the running of 2.1.9 functional tests on pre-prod instance
VO Reports
ALICE
- still happy with 1250 jobs limit
ATLAS
CMS
LHCb
OnCall/AoD Cover
- Primary OnCall:
- Grid OnCall: Derek
- AoD: