General updates
|
Monday 25th June
- The batch systems APEL are planning to support are: PBS / Torque; LSF; Grid Engine and SLURM (to be confirmed). Do we have any other needs/requests?
- For those interested in the GridPP review of the Tier-1 (meeting agenda).
- Good GridPP presence at the JANET6 meeting last Thursday.
- Q2 2012 finishes this week. Tier-2 quarterly reports due by 13th July.
- Changing IP addresses & DNS (DNS updates)
- ATLAS are considering increasing the limit of their ESD file size from the current 10GB to 15GB (or 12GB at least). Would this create any problem at sites?
- ATLAS job recovery now appears to be working well - allowing jobs to finish even if the SE is not up/unstable when the job finishes (see ATLAS subsection).
- A recent SL5 update (to python and its libraries), now in fnal and cern repositories, broke running ATLAS jobs last week.
Monday 18th June
Tuesday 12th June
- There is a WLCG pre-GDB meeting on WN security - the agenda is here. Vidyo connection available.
|
Tier-1 - Status Page
|
Tuesday 26th June
- Problem found & fixed with packets to the RAL Tier1 not being routed over the OPN link (13 - 21 June).
- The major site networking upgrade successfully took place last Tuesday morning (19th June).
- Problem with CMS SRMs over this last weekend affected SUM tests and FTS transfers.
- Castor databases will be updated to Oracle 11 on Wednesday 27th June.. (Declared in GOC DB) Will also move the FTS database back to its correct location. Minimising interruptions to batch services during this time at VOs request.
|
Storage & Data Management - Agendas/Minutes
|
Considering input to a community support model for DPM and possible alternatives.
Wednesday 20th June
- snoplus needs/plans on agenda last week.
- The collaboration will depend on the RAL LFC and are looking to increase Tier-2 usage – current needs of 10TB/site will increase to 20TB/site.
- Data taking will start in the autumn and continue for 6 months.
Wednesday 6 June 2012 - we are still digesting CHEP information, see also blog, plus a few of the usual operational upgradional stuff. Hoping to find a few spare clock cycles for some slightly more experimental stuff.
Wednesday 23 May 2012 - lots of exciting stuff at CHEP, we have about five things in, some posters, some oral.
|
Accounting - UK Grid Metrics HEPSPEC06
|
Wednesday 6th June - Core-ops
- Request sites to publish HS06 figures from new kit to this page.
- Please would all sites check the HS06 numbers they publish. Will review in detail on 26th June.
Friday 11th May - HEPSYSMAN
- Discussion on HS06 reminding sites to publish using results from 32-bit mode benchmarking. A reminder for new kit results to be posted to the HS06 wiki page. See also the blog article by Pete Gronbech. The HEPiX guidelines for running the benchmark tests are at this link.
|
Documentation - KeyDocs
|
Wednesday, 6th June
Released a document, hep.ph.liv.ac.uk/~sjones/VomsSnooper.odt, that describes how to
- Maintain site VOMS info document for the approved VOs
- Check a site's VOMS records correspond exactly with CIC portal
- Create new site VOMS records direct from CIC portal, without manual transcription
Note: I'm accepting tips from GridPP core task members etc. about other use cases for these processes. This will be converted to wiki formatted and made available in the normal way.
Next jobs:
- review logical/sequence of VOMS admin process, document it if it works, fix it if it doesn't.
- create standard baseline for proxy renewal process, and write it up in wiki.
Note: I'm accepting tips from other Gridpp core team members etc. for document priorities. Please think
about where the problems lie (i.e. what costs us yet is easy to fix) and get back to me.
Tuesday, 29th May
- VOMS Records in GridPP Approved VO list now up to date with CICs Portal XML. This can be used by Site Admins to ensure their site-info.def/vo.d directories are up to date. A tool, SidFormatter, will be released this week to facilitate comparison with the benchmark. A process has been devised to ensure that GridPP Approved VO is kept up to date to within a week of CIC Portal changes. Consultation to be made about further fields that we may wish to advertise in this manner.
Friday 27th April
- Appeal for a volunteer to enhance "Grid User Crash Course" (https://www.gridpp.ac.uk/wiki/Grid_user_crash_course) with simple use case for dependable proxy renewal for long jobs, as this is a recurrent requirement that has caused multiple queries on TB_SUPPORT.
|
Interoperation - EGI ops agendas
|
Monday 18th June
The EMI 1 updates are just minor revisions: Top BDII, BLAH and Storm. A repackage for GFAL/lcg-utils to handle the globus lib dependancy problems. Further EMI-2 updates, probably of interest only for those doing EA of them.
- Staged rollout: Lot's of EMI-2 packages, working their way through the verification/SR process. The software that is just a repackage from EMI-1 to EMI-2 are skipping SR on SL5 - the SL6 versions will be tested. Most of the products in SL5, support upgrade and reconfiguration from the EMI1 versions.
- Note that CREAM is one of the products that can't to an inplace update - new DB schema, so needs a drain/wipe/re-install.
- Question from Tiziana - anyone using CREAM in Cluster Mode? Any feedback on that?
|
Monitoring - Links MyWLCG
|
Wednesday 6th June
- Ranking continues. Plan to have a meeting in July to discuss good approaches to the plethora of monitoring available.
- Glasgow dashboard now packaged and can be downloaded here.
|
On-duty - Dashboard ROD Rota
|
Monday 25th June - JW
- Only one issue of note this week, namely I could not close the Cambridge ticket despite it being solved in GGUS. After filling in all the form data correctly, adding in the solution, setting the ticket to "solved" in the dashboard, the dashboard still complained that I had not filled in "something" with correct data. I extended the ticket by 1 day, and was able to close it the following day by (having entered in the same data). Anyhow, it is an unrepeatable bizarre and annoying problem.
|
Rollout Status WLCG Baseline
|
Monday 11th June
- EMI2 is released but not in Staged Rollout yet. Buyers beware.
Thursday 10th May
- The cream ce and the WMS which were released at the end of April have finally gone into Staged Rollout
- Call for more sites to take part in EMI-2 rollout tests.
- The overall SR contributions are in this table.
Friday 27th April
- Updated version information on rollout page
- WN scan indicates some sites not keen on OS updates to those nodes.
|
Security - Incident Procedure Policies
|
Monday 25th June
- Rota availability responses slow
- Is anyone following up on SSC5/6?
- Stratuslab VM (ex UK)
- gridftp
|
|
Services - PerfSonar dashboard
|
Tuesday 26th June
- Cambridge added to UK dashboard
- Currently clarifying the ownership of the GridMon boxes (assumption was ownership was transferred to sites but they may still be in a RAL asset database).
Tuesday 19th June
- Some of the volunteer sites may not have perfsonar by end of June. Which other sites are close?
- GridPP will resume running VOMS. Current plan is for the master to remain at Manchester and to host backups at Oxford/Imperial.
|
Tools - MyEGI Nagios
|
Saturday 23rd June
- Email alerts from Nagios appear to have stopped. Reported to developers.
Tuesday 12th June
- Lancaster backup Nagios now available (link).
- A reminder as to who can see the data: all members of ops and dteam and any one who is registered in GOCDB as site-admin, regional manager etc. it is also possible to add any one who has a PKI certificate.
|
Site Updates
|
Monday 25th June
|
|