RAL Tier1 weekly operations Grid 20101115
From GridPP Wiki
Revision as of 15:37, 16 November 2010 by Alastair dewhurst (Talk | contribs)
Contents
Operational Issues
Description | Start | End | Affected VO(s) | Severity | Status |
---|---|---|---|---|---|
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|
Blocking Issues
Description | Requested Date | Required By Date | Priority | Status |
---|---|---|---|---|
Developments/Plans
Highlights for Tier-1 Ops Meeting
Highlights for Tier-1 VO Liaison Meeting
Detailed Individual Reports
Alastair
- Monitoring user jobs at RAL. (CVMFS)
- Fixing bugs with ATLAS re-processing to make sure it runs smoothly at RAL.
- Writing script to graph transfer times for FTS transfers [on hold]
- Working on returning gdss326 to production.
- Working on ATLAS permission change. (Found problem with CERN solution)
- Emergency srm upgrade for ATLAS.
Andrew
- Capacity planning system project [Ongoing]
- Preparations for capacity signoff meeting [Done]
- Installing & configuring Jobview on testbed torque server, CE [Ongoing]
- Installing FTS monitor 1.5 [Ongoing]
- CMS data ops
- Pile-up MC reprocessing at CNAF [Done]
- Data rereco, skims at RAL, IN2P3 [Ongoing]
Catalin
- LB service migration to gLite3.2 [ongoing]
- work on (x)ROOT(d); deploy test infrastructure [ongoing]
- test squid on LHCb VOBOX
- work on WMS monitoring [ongoing]
Derek
- Investigation of secure deployment of ssh keys to hosts [ongoing]
- Change control for providing additional CREAM CE for Atlas [ongoing]
- Investigating solutions for whole node scheduling [ongoing]
- Investigating discrepancies in SAM metric downtimes [ongoing]
- Deploying Test StratusLab deployment [ongoing]
Matt
- Deploying PBS JobMon monitoring tools. [Ongoing]
- Further testing of Quattorised gLite3.2 FTS FEs. [Ongoing]
- Quattorisation of MyProxy nodes. [Ongoing]
- Test FTS SRM/GridFTP ratio configuration. [Ongoing]
Richard
- Working on the tool for automatic the checking of middleware baselines
- Wrote a [CGI script] to display recent changes in the SVN repo used by Quattor [Done]
- Added a new Nagios check for stale /etc/noquattor files [Done]
- Developing a set of Quattor templates for an ARGUS server [Ongoing]
- Developing a "pseudo-update" to apply gLite update 19 to BDIIs [Ongoing]
- Wrote a CGI script for logging hardware requests from G/S team in the Fabric queue in RT [Ongoing]
- Working on the "team status page" being developed as an action from team awayday [Ongoing]
- Reviewing G/S process documentation [Ongoing]
- CASTOR items:
- Using grid to run many jobs so as to stress test Pre-prod and Facilities instance
VO Reports
ALICE
ATLAS
CMS
- Big reprocessing planned for data and MC starting Dec 15th; might be 2 weeks early.
LHCb
OnCall/AoD Cover
- Primary OnCall:
- Grid OnCall: Matt (Mon-Sun)
- AoD: