RAL Tier1 weekly operations Grid 20101108
From GridPP Wiki
Operational Issues
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Status
|
|
|
|
|
|
|
Downtimes
Description
|
Hosts
|
Type
|
Start
|
End
|
Affected VO(s)
|
Blocking Issues
Description
|
Requested Date
|
Required By Date
|
Priority
|
Status
|
|
|
|
|
|
Developments/Plans
Highlights for Tier-1 Ops Meeting
Highlights for Tier-1 VO Liaison Meeting
Detailed Individual Reports
Alastair
- Monitoring user jobs at RAL. (CVMFS)
- Fixing bugs with ATLAS re-processing to make sure it runs smoothly at RAL.
- Writing script to graph transfer times for FTS transfers [on hold]
- Working on returning gdss398 to production.
- Working on ATLAS permission change. (Found problem with CERN solution)
Andrew
- Capacity planning system project [Ongoing]
- Preparations for capacity signoff meeting [Ongoing]
- October accounting [Done]
- Updating errata on glite-APEL, CMS VOBOX, CMS squids [Done]
- Installing & configuring Jobview on testbed torque server [Ongoing]
- Migration to glite-APEL [Done]
- CMS data ops
- Pile-up MC reprocessing at CNAF [Ongoing]
- Data rereco at RAL, IN2P3 [Ongoing]
Catalin
- LB service migration to gLite3.2 [ongoing]
- work on (x)ROOT(d); deploy test infrastructure [ongoing]
- work on WMS monitoring [ongoing]
Derek
- Investigation of secure deployment of ssh keys to hosts [ongoing]
- Change control for providing additional CREAM CE for Atlas [ongoing]
- Investigating solutions for whole node scheduling [ongoing]
- Investigating discrepancies in SAM metric downtimes [ongoing]
Matt
- Deploying PBS JobMon monitoring tools. [New]
- Further testing of Quattorised gLite3.2 FTS FEs. [Ongoing]
- Quattorisation of MyProxy nodes. [Ongoing]
- Test FTS SRM/GridFTP ratio configuration.
Richard
- Update RAL site-level BDIIs [Done]
- Wrote a [CGI script] to display recent changes in the SVN repo used by Quattor
- Added a new Nagios check for stale /etc/noquattor files
- Developing a set of Quattor templates for an ARGUS server [Ongoing]
- Developing a "pseudo-update" to apply a gLite update to BDIIs
- Wrote a CGI script for logging hardware requests from G/S team in the Fabric queue in RT [Ongoing]
- Working on the "team status page" being developed as an action from team awayday [Ongoing]
- Reviewing G/S process documentation [Ongoing]
- CASTOR items:
- Using grid to run many jobs so as to stress test Pre-prod and Facilities instance
VO Reports
ALICE
ATLAS
CMS
LHCb
OnCall/AoD Cover
OnCall Rota
- Primary OnCall:
- Grid OnCall: Derek (Mon-Sun)
- AoD: Catalin (Wed)