|
|
Line 33: |
Line 33: |
| * The [https://indico.cern.ch/event/304944/call-for-abstracts/ CHEP 2015 abstract deadline] has moved to 25th October. Who has or will be submitting something? | | * The [https://indico.cern.ch/event/304944/call-for-abstracts/ CHEP 2015 abstract deadline] has moved to 25th October. Who has or will be submitting something? |
| * [http://indico.cern.ch/event/320819/timetable/#all.detailed HEPiX is taking place this week] in the Nebraska. | | * [http://indico.cern.ch/event/320819/timetable/#all.detailed HEPiX is taking place this week] in the Nebraska. |
| + | * [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20141008 Minutes] and [https://twiki.cern.ch/twiki/bin/view/LCG/GDBActionInProgress actions] have been made available from the [http://indico.cern.ch/event/272778/ October GDB]. |
| | | |
| '''Tuesday 7th October''' | | '''Tuesday 7th October''' |
General updates
|
Monday 13th October
Tuesday 7th October
- A reminder to please share your work with everyone via blog posts. In the core-ops meeting it was suggested that there be an incentive... we'll consider that!
- Ewan will take a closer look at the middleware package reporter (the Pakiti contender... or ally).
- Matt will be (trialing) following up on VO Nagios errors from GridPP Nagios.
- There is an IPv6 quarterly meeting this week.
- There is a GDB tomorrow 8th October.
- GridPP collaboration meeting now scheduled for April 28th to 30th 2015
Tuesday 30th September
|
WLCG Operations Coordination - Agendas
|
Tuesday 14th October
Tuesday 7th October
- * There was a WLCG ops coordination meeting last Thursday. (Agenda: Minutes). Some notes follow...
- News: HEP_OSlibs-7.0.0-0.el7.cern.x86_64.rpm for CentOS7 has been released; CHEP 2015 15th October [s://indico.cern.ch/event/304944/call-for-abstracts/ abstract deadline] approaching; comments on Shellshock.
- MW baselines: New version of the UI and WN estimated for next UMD end October; dCache 2.2.x decommissioning deadline is 31-10-2014
- MW issues: xroot package deployed with ROOT 6 breaks access to dCache storage, affecting LHCb. Fix coming. CREAM, WMS, L&B, UI, WN cannot be installed at the moment because the classads package ( dependency for all of them ) was declared an orphan in EPEL!
- T0 & T1 updates: Mainly SE upgrades
- Oracle: Upgrade plans updated.
- T0 news: WMSes decommissioned 1st October.Lxplus5 will be stopped in October; AFS UI (removal) discussion ongoing.
- T1 feedback: NTR
- T2 feedback: NTR
- ALICE: Investigation of job failure rates and inefficiencies; HLT farm running as an ALICE site since Sep 24.
- ATLAS: DC14 ongoing. Multi-core recommendation: 16GB physical memory per job. Serial production tasks in future will be limited. ARC-CE tests in ATLAS-CRITICAL from 1st October.
- CMS: Scale testing of HTCondor and GlideinWMS by OSG - various issues. Reminder: Participate in space monitoring; Update xrootd fallback configuration.
- LHCb: dCache storage sites broken when accessed by ROOT6/xrootd; new stripping campaign is currently being prepared; testing new VOMS.
- glexec: NTR
- Machine job features: NTR
- MW readiness: Meeting on 1st October. DPM, CREAM and BDII verification exercises. MW package reporter development. Next meeting 19th November.
- Multi-core: 50M events/daily for ATLAS. Continue deployment.
- SHA-2: Testing new VOMS for each experiment.
- WMS decommissioning: with the deployment of the Condor SAM probes nothing is using WMS anymore. Machines off. WG will end.
- IPv6: LHCbDIRAC tested and working
- Squid monitoring/HTTP proxy: NTR
- Network & Tmetrics WG: Shellshock & perfSONAR news. PS 3.4 coming.
|
Tier-1 - Status Page
|
Tuesday 7th October
- Access for all VOs to our CREAM CEs has been stopped (apart from ALICE and SNO+).
- We are currently experiencing a problem with a disk array that holds the Castor databases. Castor performance may be degraded and we await an engineer to fix the faulty array.
|
Storage & Data Management - Agendas/Minutes
|
Wedn 01 Oct
- Summary of all the exciting events in Amsterdam last week - EUDAT, EGI big data, RDA
- DPM 1.8.9 early testing, and (separately) xroot4 early-ish testing. Supporting multiple VOs in one xroot server.
Wedn 17 Sept
- iRODS - what it is and why it should choose to collapse on Betelgeuse 7.
- Technical problems with Vidyo
Wedn 10 Sept.
- High load at L'pool causing low throughput - how to throttle xroot transfers (and is the load necessary or a bug?)
- Still testing WebFTS
- Prep for DPM workshop
|
Accounting - UK Grid Metrics HEPSPEC06 Atlas Dashboard HS06
|
Tuesday 7th October
- GridPP metrics need updating for CMS. Any comments on the metrics page at the moment?
- APEL issues for Birmingham and Sussex, and the portal appears to stop at 1st October (being followed-up).
Tuesday 30th September
- Slight delay for Birmingham and Sussex.
Tuesday 23rd September
- Slight APEL delay for Birmingham .
|
Documentation - KeyDocs
|
See the worst KeyDocs list for documents needing review now and the names of the responsible people.
Tuesday 7th October
- Keydocs were reviewed at the core-ops meeting last week. The situation with updates is improving.
- Main GridPP website expected to use Wordpress with a plug-in to cover the gridsite aspects.
|
Interoperation - EGI ops agendas
|
Monday 6th October
There was a meeting today - link: https://wiki.egi.eu/wiki/Agenda-06-10-2014
- EMI-WN 3.1.0 in SR: if anyone is running this in production please get in touch to help get this past rollout
- MySQL 5.0 noted to be under Oracle Lifetime Sustaining Support (for some time now).
- See agenda for guidance on middleware consequences
- classads "retired" from EPEL repos
- SL/SLC/CentOS 5 Support Lifetime
- This was highlighted, though not suggested to be urgent
|
Monitoring - Links MyWLCG
|
Tuesday 30th September
- Monitoring meeting last Friday, link : https://indico.cern.ch/event/341748/ , minutes: https://indico.cern.ch/event/341748/material/minutes/minutes.html
- Of note, we had identified a couple of differences between SAM2/3 where entries were appearing in SAM3 which had not been in SAM2. This is because they were being picked up from the vofeed - from the minutes, "Pablo and Maarten confirm that the VOfeed is the only authoritative source of topology; agreed that all services in the VOfeed will be tested and their availability will be calculated; agreed to add a new attribute to VOfeed to flag which services should be excluded from the official reports."
Tuesday 23rd September
- Next (replacement) meeting this Friday to continue discussions.
|
On-duty - Dashboard ROD rota
|
Monday 29th September
Monday 22nd September
- Quiet week - little to report.
- EGI is looking for people to join the ops portal review and testing TF.
Tuesday 2nd September
- Sussex is back in business - kept closing their low availability alarm wrt the GGUS ticket.
- The UCL ticket is now finally receiving some attention.
- Ongoing problems at RAL.
Tuesday 26th August
- RAL : Nagios jobs staying in queue for long time - to be investigated.
- Sussex : Matt needs help probably from some SGE experts.
- UCL : No acknowledgement from the site (ticket escalated to second level).
- 100IT : There is an alarm from EGI federated cloud - this needs discussion.
- Durham : Availability alarms - require constant closing with some comments. Ticket with devs is open.
Tuesday 12th August
- Last week was quiet.
- Still one or to responses needed for next rota allocations.
|
Security - Incident Procedure Policies Rota
|
Tuesday 14th October
- Shellshock updates and follow-up
- Banning challenge status
Tuesday 30th September
- Shellshock - advice and follow-up.
- Note particularly the advisories from WLCG/EGI.
Tuesday 23rd September
- High priority vs critical tests in pakiti.
- FAX update
|
|
Services - PerfSonar dashboard | GridPP VOMS
|
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).
Monday 13th October
- perfSONAR 3.4 has been released. Clear documentation on what to do (clean reinstall) coming this week together with information on mesh updates. See the GDB presentation slides 13 and 14.
- RIPE have sent a reminder to connect probes that have been handed out (some weeks ago now). Please could the following sites check their status: Lancaster; Brunel; Sussex; and ECDF. 20599 at RAL has never properly connected (DHCP issue?).
Tuesday 7th October
Tuesday 23rd September
|
Tools - MyEGI Nagios
|
Tuesday 16th Sep
- Multi VO nagios maintained at Oxford has been upgraded to add ARC CE tests.
- https://vo-nagios.physics.ox.ac.uk/nagios/
- It is currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk
- Should we start monitoring it more actively and open ticket for sites failing tests ?
|
VOs - GridPP VOMS VO IDs Approved VO table
|
Monday 11th August
- Steve J sent an email to hyperk on 7th regarding "software directory for Hyperk (CVMFS)" and entries in the VO ID card.
"Monday 14th July 2014"
- HyperK.org will initially use remote storage (irods at QMUL) - so CPU resources would be appreciated.
"Monday 30 June 2104"
- HyperK.org request for support from other sites
- 2TB storage requested.
- CVMFS required
- Cernatschool.org
- WebDAV access to storage -world read works at QMUL.
- ideally will configure federated access with DFC as LFC allows.
|
Site Updates
|
Tuesday 9th September
- Intel announced the new generation of Xeon based on Haswell.
Tuesday 20th May
- Various sites but notably Oxford have ARGUS problems. 100s of requests seen per minute. Performance issues have been noted after initial installation at RAL, QMUL and others.
|
|