General updates
|
Tuesday 2nd April
- Any remaining certificate problems?
- Support for EMI-1 dCache was extended. See this broadcast. Report any tickets that have not been updated.
- Jens has produced a page onKey Tokens. How do we want to use this now?
- GGUS have released a new page on using the system.
- There was an OMB meeting last Tuesday. (To be reviewed)
Monday 18th March
- The March GDB (agenda) minutes are available. See also the actions and the pre-GDB on Clouds agenda and summary pages.
- The next WLCG operations coordination planning meeting takes place this Thursday 21st March. (agenda)
- EMI-3 ARGUS has shown again an issue with email addresses in certificates. The UK CA can now issue certificates without these addresses and it may be beneficial for sites to change their certificates sooner rather than later.
- EGI have been collecting information about problems found by component in the EMI-1 to EMI-2 transition. For those who can access it please check this page. If the page is not open please email Jeremy with any problems encountered that you want checked as captured.
- The final WLCG availability report for February is now online.
|
WLCG Operations Coordination - Agendas
|
Tuesday 2nd April
- A new task force on http proxy discovery is being formed (read more). They are looking for members.
- Minutes of the 21st March planning meeting are now available.
|
Tier-1 - Status Page
|
Tuesday 2nd April
- Generally quiet operations this last week.
- Investigations are ongoing into problems at batch job set-up.
|
Storage & Data Management - Agendas/Minutes
|
Monday 1st April
Wed 20 March 2013
- Ruminated over the agenda items from last week's GDB
- EMI roadmap (dCache, and other things)
- FTS support for HTTP - we knew this but how do we make use of it now
- Storage accounting records, needs updated APEL;
- Work of storage group(s) on interfaces and protocols, and future furlongpebbles.
- RAL D1T0 evaluation.
- Seems to be settling on HDFS and CEPH which will be run anyway
- what about Lustre?
- Presentation to PMB next Monday, but no decision yet.
|
Accounting - UK Grid Metrics HEPSPEC06 Atlas Dashboard HS06
|
Tuesday 12th March
- APEL publishing stopped for Lancaster, QMUL and ECDF
Tuesday 12th February
- SL HS06 page shows some odd ratios. Steve says he now takes "HS06 cpu numbers direct from ATLAS" and his page does get stuck every now and then.
- An update of the metrics page has been requested.
|
Interoperation - EGI ops agendas
|
Tuesday 2nd April
- Minutes of the 20th March EGI ops meeting are available.
Monday 19th March
- The next EGI operations meeting (agenda) takes place this Wednesday 20th March.
Monday 4th March
- An EGI operations meeting agenda for today's meeting is now available.
- SR: Large number of updates in UMD2, and UMD1. In particular, in theUMD1 release, the DPM, LFC and L&B are security updates. The UMD2 WMS is _not_ backwards compatible, without a workaround, as describe in the release notes: https://wiki.egi.eu/wiki/UMD-2:UMD-2.4.0
- EMI-3 release expected 7th March, UMD-3 prioritisation underway
- Argus should be in the Site-BDII; it had the information provider from the EMI-2 release, so it's probably a plan to update EMI-1 Argus's. (As should VOMS servers; they've had the information providers in all EMI releases)
|
Monitoring - Links MyWLCG
|
Tuesday 5th February
- Task will focus on probes and sharing of useful tools - suggestions and comment welcome
Monday 2nd July
- DC has almost finished an initial ranking. This will be reviewed by AF/JC and discussed at 10th July ops meeting
Wednesday 6th June
- Ranking continues. Plan to have a meeting in July to discuss good approaches to the plethora of monitoring available.
- Glasgow dashboard now packaged and can be downloaded here.
|
On-duty - Dashboard ROD rota
|
Monday 1st April
- A new GOCDB field related to the ROD email address was not populated. Emails should now reach the team.
Tuesday 5th March
- Handling tickets related to EMI-1 probes - what to expect.
- Recommendation with respect to upgrading CE (drain first)
Tuesday 12th February
- Need all ROD members to complete availability survey for the rota.
|
Rollout Status WLCG Baseline
|
Tuesday 2nd April
- EMI-1 components should be out of production. Nagios probes will report critical this month. Services remaining (without special condition) beyond 30th April will need to be placed in downtime.
Monday 4th March
- EMI early adopters list by component.
- Do we have a Staged Rollout list for EMI3?
Tuesday 5th February
References
|
Security - Incident Procedure Policies Rota
|
Tuesday 2nd April
- Reminder about ptrace kernel issue (CVE-2013-0871)
- Thanks to all those sites that took part in the security challenge
Tuesday 5th March
- Two openafs vulnerabilities announced (CVE-2013-1794 and CVE-2013-1795). Further details available at http://www.openafs.org/security. Updated RPMS for SL5/6 available.
|
|
Services - PerfSonar dashboard | GridPP VOMS
|
Tuesday 2nd April
- Impending electrical work at Manchester - we need to commission the backup VOMS arrangement as soon as possible.
Monday 18th February
- PerfSonar tests to BNL reveal poor rates for several sites since upgrade
Tuesday 5th February
- NGS VOMS to be switched off this week
|
Tickets
|
Monday 1st April 20:00 BST</br>
27 Open UK tickets, but we'll have to wait until next week for a full review of them all as Matt's on leave this week and sending his apologies for tomorrow's meeting - nothing's striking him as urgent although someone on the ROD/Ops team might want to look at https://ggus.eu/ws/ticket_info.php?ticket=92512 (Wahid has set it to waiting for reply, there might be some confusion over who needs to do the replying).
In the meant time if you aren't on leave too then please have a gander at your sites tickets and see if there's ought that needs your attention:
http://tinyurl.com/cblj3ab
Otherwise he'll catch y'all next week, by then hopefully he will have stopped referring to himself in the third person again.
In other news:
EMI-3 Storm is not production ready: https://ggus.eu/tech/ticket_show.php?ticket=92819
|
Tools - MyEGI Nagios
|
Tuesday 13th November
- Noticed two issues during tier1 powercut. SRM and direct cream submission uses top bdii defined in Nagios configuration to query about the resource. These tests started to fail because of RAL top BDII being not accessible. It doesn't use BDII_LIST so I can not define more than one BDII. I am looking into that how to make it more robust.
- Nagios web interface was not accessible to few users because of GOCDB being down. It is a bug in SAM-nagios and I have opened a ticket.
Availability of sites have not been affected due to this issue because Nagios sends a warning alert in case of not being able to find resource through BDII.
|
VOs - GridPP VOMS VO IDs Approved VO table
|
Tuesday 2 April 2013
Monday 4th March 2013
Monday 26th February 2013
- NGS VOMS server. Durham fixed. Last site is Glasgow, and I'm running tests now. Hopefully this should now be fixed https://ggus.eu/ws/ticket_info.php?ticket=90356 - note that this has taken 3 months to complete.
- SNO+ reports lcg-cp timeouts for large files. I suspect this is a problem with the UI.
- Issues with Proxy renewal.
- Certificate for RAL myproxy server doesn't match advertised hostname (how does this work at all?).
- Other myproxy issues as well. GGUS#99105 GGUS#9172
SNO+ Questions
- Jobs appear to fail, but have uploaded output and it is in LFC
- MC production
- Want 2-3 people managing this
- Shifters monitoring sites and filing tickets
- How best to manage certificates - currently upload two proxies to myproxy - one for jobs to renew and one for the UI to renew.
- How best to do this - should they use a robot cert?
|
|