RAL Tier1 weekly operations Grid 20101220
From GridPP Wiki
Revision as of 13:13, 20 December 2010 by Andrew lahiff (Talk | contribs)
Contents
Operational Issues
Description | Start | End | Affected VO(s) | Severity | Status |
---|---|---|---|---|---|
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|
Blocking Issues
Description | Requested Date | Required By Date | Priority | Status |
---|---|---|---|---|
DNS change request for Atlas squids | 07 Dec 2010 | RT#70487 |
Developments/Plans
Highlights for Tier-1 Ops Meeting
Highlights for Tier-1 VO Liaison Meeting
Detailed Individual Reports
Alastair
- ATLAS TaskForce [ongoing]
- Draining SL08 disk servers deployed to ATLAS service classes.
- Working on ATLAS permission change. [On hold]
Andrew
- Capacity planning system [Ongoing]
- Removed ATLAS/LHCb disk caches from UB schedule scripts [Done]
- Wrote Nagios plugin to monitor CMS job monitoring [Done]
- Updates to CMS job monitoring XML file format [Done]
- Dealing with corrupt files
- CMS data ops
- Rereco, skims at RAL, IN2P3, KIT [Ongoing]
- Dec4 rereco postmortem
Catalin
- test squid deployments for ATLAS [done]
- finalise quattor templates for ATLAS squid machines [ongoing]
- work on Tier1 DB migration plans [ongoing]
Derek
- Deploying testbed batch system [ongoing]
- Debugging issue with Magic jobs [ongoing]
- Initial rollout of setting Operating System config on pbs mom on batch workers to sl5 [ongoing]
- Removed reservation and increased job limit for atlassgm to 10 to allow more cvmfs validation jobs over holiday
Matt
- T2K FTS configuration. [Done]
- Prep for A/L. [Done]
- Quattorisation of FTM. [Done]
- Deploying PBS JobMon monitoring tools. [Stalled]
- Test FTS SRM/GridFTP ratio configuration. [Stalled]
Richard
- Wrote a gmetric tool to measure Quattor deploy hitrate (i.e. percentage of deploys (as found in SVN repo) that were "seen" by a machine) [Done]
- Working prototype of tool for automatic the checking of middleware baselines now in place [Done]
- Developing a set of Quattor templates for an ARGUS server [Ongoing]
- Developing a "pseudo-update" to apply gLite update 19 to BDIIs [Ongoing]
- Working on the "team status page" being developed as an action from team awayday [Ongoing]
- Reviewing G/S process documentation [Ongoing]
- CASTOR items:
- Added an LSF server to the "cert-in-a-box" cluster. [Ongoing]
VO Reports
ALICE
ATLAS
CMS
LHCb
OnCall/AoD Cover
- Primary OnCall:
- Grid OnCall: Catalin (Mon-Thu), Derek (Sat-Sun)
- AoD: