RAL Tier1 weekly operations Grid 20100315
From GridPP Wiki
Contents
Operational Issues
Description | Start | End | Affected VO(s) | Severity | Status |
---|---|---|---|---|---|
Job status monitoring from CREAMCE | 2-Feb-2010 | CMS | medium | [10-Feb-2010] WMS patch available soon; CREAMCE new version available soon |
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|
Blocking Issues
Description | Requested Date | Required By Date | Priority | Status |
---|---|---|---|---|
Hardware for Testbed | Medium | Required for change validation, load testing, etc. Also for phased rollout (which replaces PPS).
Have initial hardware. [2010-02-22] More hardware expected by end of March. |
Developments/Plans
Highlights for Tier-1 Ops Meeting
- CMS: RAL will be getting the data (custodial) for the possible 900 GeV collisions
- FTS2.2 upgrade on Wednesday
- Disk Deployment meeting Tuesday at 10:00; small number of ongoing issues; moved to F51.
Highlights for Tier-1 VO Liaison Meeting
- FTS2.2 upgrade done.
- Announced plans to non-LHC VOs to support them on SL5 batch.
- Deploying glexec on WN.
Detailed Individual Reports
Alastair
- Invesitage ways of installing ATLAS software in a new AFS test area.
- Monitor ATLAS MC production and re-processing currently going on at RAL. [Ongoing]
- Continue ATLAS disk deployment.
Andrew
- February accounting [Done]
- Renewed certificates for lcgvo0598 & lcgvo0599 [Done]
- Sent draft Tier-1 VO survey questions to Glenn for comment [Done]
- CMS data ops
- Ran 2 MC reprocessing workflows at RAL [Done]
- Ran 1 rereco MC preproduction workflow at IN2P3 [Done]
- Installed and setup ProdAgent on new SLC5 CMS VOBOX (at CERN) [Done]
- Re-started backfill at RAL and IN2P3 [Ongoing]
Catalin
- tidying up Nagios configurations (ALICE VOBOX, CE, SCAS) [done]
- LHCb LFC re-configuration [done]
- work on LFC schema tidying up (w/ Carmine) [ongoing]
- work on Dataguard replication (w/ Carmine) [ongoing]
- quattorise additional LFC frontends (w/ Ian) [ongoing]
- various grid services updates (following TOAST)
Derek
- Change Control and Deploying SCAS servers and glexec
- Deploying SL5 CREAMCE for non-LHC vos
- Deploying infrastructure host for testbed
- Writing talks for batch system training
Matt
- Tier-1 talk.
- FTS2.2:
- Submit Change Control request. [Done]
- Fix t2k/t2k.org configuration problems. [Done]
- Upgrade confirmed for Wednesday.
- Test SL5 CREAM CE installation. [Done]
- Disk deployment meeting.
- Update resource profiles for Q2/10.
- Organise testbed strategy strand meetings. [Done]
Richard
- Using stress-testing script developed for CASTOR to test behaviour of new BDII server
- Re-working the Grid Services Quattorisation Roadmap as a WIKI page
- Working on proposal on intra/inter -team communication to meet an action from the team awayday
- Reviewing G/S process documentation
- Further Nagios items from the to-do list (https://wiki.e-science.cclrc.ac.uk/web1/bin/view/EScienceInternal/NagiosTasksToDo)
- CASTOR items:
- Working on benchmarking plan to establish baseline performance before upgrading to new CASTOR release(s)
Mayo
- Uploaded new Metric system Documentation to the Tier1 wiki[Done]
- Fixed Bug in Metric system pie charts [Done]
- TSBN spreadsheet backend script to copy data form castoradm1 to TSBN spreadsheet [Done]
- Create Batch job to run TSBN backend script and update web interface automatically [Done]
- implement feedback into TSBN web interface
- Set up scripts that update TSBN interface to run as scheduled jobs on a windows machine
- Begin collaboration with SCT on NGS certificate wizard project
- writing and configuring Nagios nrpe plugins
VO Reports
ALICE
ATLAS
CMS
- RAL will be getting the data (custodial) for the possible 900 GeV collisions
- Unforeseen collisions at 2.36 TeV for 40 minutes on 14th March
LHCb
OnCall/AoD Cover
- Primary OnCall: Catalin
- Grid OnCall:
- AoD: