General updates
|
Tuesday 15th January
- Failing dteam jobs - request for sites to check config with vomsSnooper
- EGI-Inspire have funds for mini-projects for new work in operations, communication, coordination... if you have ideas please share them.
- The January GDB agenda is available.
- There is a pre-GDB on Operations. See the agenda. The focus is Operations (e.g. SL6 and security).
- ATLAS and LHCb have a 30th April 2013 target date for their sites to have CVMFS.
Monday 7th January
- There is a GDB next week. The draft agenda is starting to form here .
- The pre-GDB will have an Operations Coordination Team focus: link agenda
Tuesday 18th December
- There is now an updated/final T2 availability/reliability report for November from WLCG.
- For those wanting a better insight into EGI operations priorities take a look at today's OMB agenda.
Monday 17th December
- Add yourself to the Janet UK community on the EVO system via this link if you have not already done so.
- There was a GDB last Wednesday. Matt's notes are in the wiki. The GDB meeting summary can also be referenced.
- There was an ATLAS T1/2/3 jamboree last Monday and Tuesday.
- A reminder that sites need to update their voms.gridpp.ac.uk voms setup for the NES/NGS VOs.
|
Tier-1 - Status Page
|
Tuesday 15th January
- Operationally a quiet week. We have seen some intermittent failures of Atlas SRM SUM tests (failure to delete the file). The cause of (at least) some of these is now understood, and we are tracking this issue.
- We are seeing some Top-BDII instability and are preparing a new version for roll-out (EMI2 on SL6).
- Other items:
- Ongoing investigating into asymmetric data rates seen to remote sites.
- Test instance of FTS version 3 now available and being tested by Atlas & CMS.
|
Storage & Data Management - Agendas/Minutes
|
Wednesday 5 Dec
- DPM EMI upgrades:
- Future DPM support now better understood (DMLite)
- Brunel still to try dCache migration
- ATLAS Jamboree next week, ATLAS want to change all their filenames... (by 2014)
- How we are doing Big Data(tm)
|
Accounting - UK Grid Metrics HEPSPEC06 Atlas Dashboard HS06
|
Tuesday 30th October
- Storage availability in SL pages has been affected by a number of sites being asked by ATLAS to retire the ATLASGROUPDISK space token while the SUM tests were still testing it as critical. The availability will be corrected manually once the month ends. Sites affected in different degrees are RHUL, CAM, BHAM, SHEF and MAN.
Friday 28th September
- Tier-2 pledges to WLCG will be made shortly. The situation is fine unless there are significant equipment retirements coming up.
- See Steve Lloyd's GridPP29 talk for the latest on the GridPP accounting.
Wednesday 6th September
- Sites should check the atlas page reporting HS06 coefficient because according to the latest statement from Steve that is what it's going to be used Atlas Dashboard coefficients are averages over time.
I am going to suggest using the ATLAS production and analysis numbers given in hs06 directly rather
than use cpu secs and try and convert them ourselves as we have been doing. There doesn't seem to be
any robust way of doing it any more and so we may as well use ATLAS numbers which are the ones they are
checking against pledges etc anyway. If the conversion factors are wrong then we should get them fixed in our
BDIIs. No doubt there will be a lively debate at GridPP29!
|
Interoperation - EGI ops agendas
|
Tuesday 18th December
- Update coming from today's meeting....!
Monday 3rd December
Monday 5th November
- There was an EGI ops meeting today.
- UMD 2.3.0 in preparation. Release due 19 November, freeze date 12 November.
- EMI-2 updates: DPM/LFC and VOMS - bugfixes, and glue 2.0 in DPM.
- EGI have a list of sites considered unresponsive or having insufficient plans for the middleware migration. The one UK site mentioned has today updated their ticket again with further information.
- In general an upgrade plan cannot extend after the end of 2012.
- A dCache probe was being rolled into production yesterday, alarms should appear in the next 24 hours on the security dashboard
- CSIRT is taking over from COD on migration ticketing. By next Monday the NGIs with problematic sites will be asked to contact the sites, asking them to register a downtime for their unsupported services.
- Problems with WMS in EMI-2 (update 4) - WMS version 3.4.0. Basically, it can get proxy interaction with MyProxy a bit wrong. The detail is at GGUS 87802, and there exist a couple of workarounds.
|
Monitoring - Links MyWLCG
|
Monday 2nd July
- DC has almost finished an initial ranking. This will be reviewed by AF/JC and discussed at 10th July ops meeting
Wednesday 6th June
- Ranking continues. Plan to have a meeting in July to discuss good approaches to the plethora of monitoring available.
- Glasgow dashboard now packaged and can be downloaded here.
|
On-duty - Dashboard ROD rota
|
Tuesday 15th January
- Main issue relates to COD tickets as mentioned last week.
Monday 7th January
- Several sites are in the red due to the middleware tickets being older than 30 days. We got a COD ticket for this despite the ticket being filed as a top priority, COD did not answer so I reset the priority to something sensible. We aren't the only ones hit by this problem.
- At the moment the security alerts don't seem to update on the dashboard - at least ceprod08 has not cleared all day.
|
Security - Incident Procedure Policies Rota
|
Tuesday 15th January
- There has been a recent java patch. Sites should consider to roll out updates to client machines (especially those containing user certificates). The most current version is "Version 7 Update 11".
|
|
Services - PerfSonar dashboard | GridPP VOMS
|
Tuesday 20th November
- Reminder for sites to add perfSONAR services in GOCDB.
- VOMS upgraded at Manchester. No reported problems. Next step to do the replication to Oxford/Imperial.
Monday 5th November
- perfSONAR service types are now defined in GOCDB.
- Reminder that the gridpp VOMS will be upgraded next Wednesday.
Thursday 18th October
- VOMS sub-group meeting on Thursday with David Wallom to discuss the NGS VOs. Approximately 20 will be supported on the GridPP VOMS. The intention is to go live with the combined (upgrades VOMS) on 14th November.
- The Manchester-Oxford replication has been successfully tested. Imperial to test shortly.
|
Tickets
|
Monday 14th January 2013, 14.30 GMT
I'm sickly and grumpy and today I opened up GGUS to see 60 tickets. 60
Open UK tickets. And so I've given up even try to do a proper review of
them this week. Sorry, don't have it in me.
So for an indepth review see here:
http://tinyurl.com/a8jsjs3
and please check to see if your site has any tickets assigned to it that
need tending.
A good number of sites have got tickets concerning having not enabled
the gridpp voms server for the ngs.ac.uk VO- these should be easily
fixed. A number more have tickets from atlas regarding spacetoken
juggling, a lot of these are being kept open to aid sites in tracking
the space shuffle. A few sites (Glasgow, ECDF, UCL) have been ticketed
again by atlas for transfer performance, but the reasons seem unrelated.
DURHAM, LANCASTER, GLASGOW, IC, and ECDF have nagios security version
tickets. I know for three of them this is all down to the tarball (it
exists now, for SL5 at least. I might get an SL6 beta tarball out
today). Congrats to UCL for upgrading their SE and getting off the
out-of-date list.
Other then that there seem to have a large mixed bag of tickets landing
at sites over the last week.
Normal Service will resume next week.
|
Tools - MyEGI Nagios
|
Tuesday 13th November
- Noticed two issues during tier1 powercut. SRM and direct cream submission uses top bdii defined in Nagios configuration to query about the resource. These tests started to fail because of RAL top BDII being not accessible. It doesn't use BDII_LIST so I can not define more than one BDII. I am looking into that how to make it more robust.
- Nagios web interface was not accessible to few users because of GOCDB being down. It is a bug in SAM-nagios and I have opened a ticket.
Availability of sites have not been affected due to this issue because Nagios sends a warning alert in case of not being able to find resource through BDII.
Wednesday 17th October
Monday 17th September
Monday 10th September
- Discusson needed on which Nagios instance is reporting for the WLCG (metrics) view
|
VOs - GridPP VOMS VO IDs Approved VO table
|
Monday 14 January 2012
- NGS VOMS server: Please enable GridPP VOMS server
- Neiss.org.uk
- Now have VO-ID card in operations-portal (previously CIC portal)
- GridPP/NGS VOMSs server issues
- NGS WMS hadn't enabled current CEs at QMUL and Lancs, so I've requested the GridPP WMSs enable it - as the VO is supported on GridPP sites.
- Would be a good use case for SARONGS - but they don't have the time to debug this.
Tuesday 9 January 2012
- Please can VOs report publications (there's now a section on the Quarterly report for them)
- Spring Cleaning of VO support
- GridPP VOMS server support - affects ngs.ac.uk and neiss.org.uk - expect tickets soon.
- Removal of VOs no longer in use - totalep and some others.
- Neiss.org.uk: NGS WMS hasn't enabled QMUL and Lancs CEs. Should we support it on the GridPP WMSs
- T2k FTS transfers slowed due to copying files that already exist - T2k script now more robust.
Mon 17th December
Tue 4th December
Thursday 29 November
Tuesday 27 November
- VOs supported at sites page updated
- now lists number of sites supporting a VO, and number of VOs supported by a site.
- Linked to by Steve Lloyd's pages
Tuesday 23 October
- A local user is wanting to get on the grid and wants to set up his own UI. Do we have instructions?
|
Site Updates
|
Monday 14th January
- Site updates at Tuesday's meeting.
|
|