General updates
|
Monday 14th July
- Workshop - CVMFS monitoring feedback
- ATLAS DC14 13TeV simulation starting - note Alessandra's recommendation regarding Nikehf scripts and multicore running (for torque/maui) sites.
- topBDII caching and errors
- ILC VOMS changes
- Sites with ARC CEs who want to support LHCb need to make a few configuration changes. This is to ensure that there is an environment variable available to jobs which specifies the name of the queue.
- EGI A/R report for June
- Did anyone else see kernel problems like Liverpool (see blog)
Tuesday 1st July
- HyperK can now make use of additional resources and a general request for enablement was circulated. The request includes 1-2TB disk which triggered discussion about default space tokens.
- Michel J confirms that the 9th September pre-GDB will be on clouds.
- There was an EGI OMB meeting last Thursday. Main points:
- EGI review – 2nd & 3rd July
- GFAL/lcg_util not supported after October
- OLAs for core services now in wiki
- Main NGI updates question: what proportion of resources will be cloud vs grid accessible?
- 17 sites provided responses via eGrant pools. 4 now active (1 from UK = Brunel).
- There was a UK CA notifcation last week. They plan to close the legacy OpenCA interface in mid-July. Users should now use (and in fact are already using) the CA portal or
CertWizard.
- We were going to revisit this week the issue of high load on the squid-frontier servers at Liverpool and Glasgow.
- The agenda for the WLCG workshop taking place next week is now final.
Monday 23rd June
|
WLCG Operations Coordination - Agendas
|
Tuesday 1st July
Monday 23rd June
- Minutes from last Thursday's meeting. Highlights....
- A page is available listing current known middleware issues affecting WLCG.
- Baselines: Storm 1.11.4 released in EMI containing several bug fixes. Baseline update with UMD release.
- 3 issues affected some sites after the latest EMI update of Cream and LB. The problems are under investigations by the PTs.
- CVMFS: Starting from July, sites not compliant with the 2.1.19 version will be notified with a GGUS ticket (noted that upgrade just requires an update of the RPM and a restart CVMFS).
- T0: The OPS VO now runs in voms-admin instead of VOMRS, after the migration done on June 17th
- Tier-1/Tier-2 feedback: NTR!
- ALICE: successful campaign for users to move away from old ROOT versions. T0 job efficiency issues ongoing.
- ATLAS: DC14 expected to start in approximately 2 weeks from now.Panda/Jedi is now fully ready for user analysis.
- CMS: Started to remove individual release tags from CEs. After the introduction of disk/tape separation at the T1 sites, CMS now must site readiness measures for T1 sites
- LHCb: Recommend CVMFS 2.1.19. General request: ensure that downtimes, including unscheduled outages, accurately reflect the specific services which are unavailable.
- FTS3: Monitoring the auto-tuning algorithm closely and adjusting various monitoring tools of FTS3.
- glexec: 10 sites have yet to enable it. ARGUS instabilities being investigated.
- Machine/job features: PBS/torque and LSF implemented. SLURM pending. SGE and HTCondor in progress.
- MW readiness: ATLAS and CMS DPM setups in progress. Monitoring prototype being deployed at test sites.
- Multicore: CMS stable flow. Gathering reports for July workshop. ATLAS MC jobs on-hold pending new software release.
- SHA-2: New VOMS fix for CERN instances requires sites to update ARGUS, UI, CREAM and WN instances.
- WMS decommissioning: Progress with SAM Condor validation. ARC-CE WN tests failing for some CMS sites (incl. Imperial).
- IPv6: NTR
- HTTP proxy discovery: Task overview table updated.
- Network and transfers metrics: Mesh leaders developed. Kick off in July.
- AOB: OSG plan to migrate to HTCondor CEs by October.
|
Tier-1 - Status Page
|
Tuesday 1st July
- LHCb Castor Stager Upgrade was carried out successfully last Thursday. The final update is the Atlas Castor instance stager which is planned for the Atlas - Tue 1st July.
- There is a UPS/Generator load test tomorrow morning (Wed 2nd July) and the site has been declared in an At Risk (warning) in the GOC DB from 10 to 11 local time.
- We are looking at how to end the FTS2 service, now FTS3 is becoming widely used.
- The software server used by the small VOs will be withdrawn from service. Its use as a software server is very limited (possibly only SNO+) although a few VOs use it for uploading files to the CVMFS repository.
|
Storage & Data Management - Agendas/Minutes
|
Wednesday 2 July
- Guidance and policies for "small" VOs: how to get them started with stuff, without preventing them later growing bigger.
Tuesday 1st July
Tuesday 17th June
- Advances with CEPH at RAL will be reported to the Storage Meeting. It is hoped to setup a regular update contribution.
Tuesday 10th June
- The DPM Collaboration agreement has been updated.
|
Accounting - UK Grid Metrics HEPSPEC06 Atlas Dashboard HS06
|
Tuesday 1st July
- There are no SL6 HS06 entries in our wiki for UCL and EFDA.
- Are there any observations from the latest GridPP metrics tables? (Does anything need addressing or correcting?).
- APEL is not up-to-date for: RHUL; Manchester and Durham.
Tuesday 24th June
- APEL not up-to-date for: RHUL; Manchester, Durham and Sussex.
|
Documentation - KeyDocs
|
See the worst KeyDocs list for documents needing review now and the names of the responsible people.
Monday 16th June
- A review is starting of old and obsolete pages within the GridPP website - there are many! Please review sections that you have created and update them if necessary.
Tuesday 6th April
- KeyDocs are going to be reviewed (in next 4 weeks) as the system is not working (or not adding anything) in some areas.
|
Interoperation - EGI ops agendas
|
Tuesday 14th July
- URT: see agenda for details
- SR: In verification: gfal2 v. 2.5.5
- active: globus-info-provider-service v. 0.2.1 cream v. 1.16.3
- Ready to be released: storm v. 1.11.4 lb v. 11.1 wms v. 3.6.5 dcache v. 2.6.28
- DMSU report: CREAM CLI/GridSite SegFaults at Long-Lived Proxies solved
- Migration of Central SAM services: Note to make sure that if being reinstalled that patches are applied
- EMI-2/APEL-2 - Looks like UCL is still publishing with APEL-2 publisher
- Hoped that gr.net issues resolved on Monday. Summary of discussion to be in minutes.
- Next meeting placeholder 28th July, but may not happen (OMD depending)
Tuesday 1st July
- Today's ops meeting cancelled - partly due to forthcoming 4th EGI annual review.
- EMI-2 decommissioning: The situation is followed by COD (GGUS 106354). "Please remember that we passed the decommissioning deadline and after today - Sites still deploying unsupported service end-points risk suspension, unless documented technical reasons prevent a Site Admin from updating these end-points (source PROC16).
- There is STILL use of UMD2/EMI2 APEL clients to send accounting data. As of today there are 20 sites (see latest list) still using UMD2/EMI2 APEL clients
|
On-duty - Dashboard ROD rota
|
Tuesday 1st July
- Quiet week. Sussex emi2 ticket is still open. UCL also has a open ticket regarding some problem with storage.
Tuesday 24th June
- Very quiet shift. Dashboard downtime on Tuesday seemed to go ok.
|
Rollout Status WLCG Baseline
|
Tuesday 18th March
Tuesday 11th February
- 31st May has been set as the deadline for EMI-2 decommissioning. There may be an issue for dCache (related to 3rd party/enstore component).
References
|
Security - Incident Procedure Policies Rota
|
Monday 14th July
- EGI CSIRT ADVISORY [EGI-ADV-20140625]
Tuesday 1st July
- There was a very useful security challenge debrief last week. Thanks to Heiko.
- There may be a site contacts challenge in the coming months. Please could every site review their site security contact details and ensure that the GOCDB entry is up-to-date and working.
- EGI indicates that site ARGUS instances can now be hooked up with the regional instances.
- There was one EGI amber final report last week.
- Next team meeting 16th July.
Monday 23rd June
- CVE-2014-3153 - but no public exploit.
- This kernel vulnerability has been patched in errata released last week.
- PerfSonar/Cacti updates.
- New IGTF CA release 1.58 - the EGI release is due on 30th June.
|
|
Services - PerfSonar dashboard | GridPP VOMS
|
- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).
Tuesday 17th June
- The GridPP VOMS server was updated on 11/06/2014 - no issues reported.
|
Tools - MyEGI Nagios
|
Tuesday 1st July
- There was a monitoring problem on 26th June. All ARC CE's were using storage-monit.phyics.ox.ac.uk for replicating files as part of the nagios testing. storage-monit was updated but not re-yaimed until later. Storage-monit was broken for the morning leading to all ARC SRM tests failing.
Tuesday 24th June
- An update from Janusz on DIRAC:
- We had a stupid bug in Dirac which affected the gridpp VO and storage. Now it is fixed and I was able to successfully upload a test file to Liverpool and register the file with the DFC
- The async FTS is still under study, there some issues with this.
- I have a link to software to sync user database from a VOMS server, haven’t looked into this in detail yet.
|
VOs - GridPP VOMS VO IDs Approved VO table
|
"Monday 30 June 2104"
- HyperK.org request for support from other sites
- 2TB storage requested.
- CVMFS required
- Cernatschool.org
- WebDAV access to storage -world read works at QMUL.
- ideally will configure federated access with DFC as LFC allows.
Monday 16 June 2014
- CVMFS
- Snoplus almost ready to move to CVMFS - waiting on two sites. Will use symlinks in existing software
- VOMS server: Snoplus has problems with some of the VOMS servers - see ggus 106243 - may be related to update.
Tuesday 15th April
|
Site Updates
|
Tuesday 20th May
- Various sites but notably Oxford have ARGUS problems. 100s of requests seen per minute. Performance issues have been noted after initial installation at RAL, QMUL and others.
|
|