|
|
Line 28: |
Line 28: |
| <!-- *********************************************************** -----> | | <!-- *********************************************************** -----> |
| <!-- ***********************Start General text*********************** ----->''' | | <!-- ***********************Start General text*********************** ----->''' |
| + | '''Tuesday 6th May''' |
| + | * WLCG workshop - responses considered as part of a list. If you notified Jeremy last week, please now go ahead and submit a visit notice as usual and book early. |
| + | * WLCG A/R reports [http://sam-reports.web.cern.ch/sam-reports/2014/201404/wlcg/ for April] are now available. |
| + | |
| '''Tuesday 29th April''' | | '''Tuesday 29th April''' |
| * There is an [https://indico.cern.ch/event/289680/ LHCOPN/LHCONE meeting] at CERN - yesterday and today. | | * There is an [https://indico.cern.ch/event/289680/ LHCOPN/LHCONE meeting] at CERN - yesterday and today. |
Line 44: |
Line 48: |
| ** EMI-2 decommissioning update. | | ** EMI-2 decommissioning update. |
| * Pete Clarke circulated the final network forward look document update. | | * Pete Clarke circulated the final network forward look document update. |
− |
| |
− |
| |
− |
| |
− | '''Tuesday 22nd April'''
| |
− | * Note that CERN are requesting all users to update their passwords within 2 weeks.
| |
− | * An EGI update on the openSSL vulnerability for users was circulated last week. See the [https://operations-portal.egi.eu/broadcast/archive/id/1127 broadcast for users], or the earlier [https://wiki.egi.eu/wiki/EGI_CSIRT:Alerts/OpenSSL-2014-04-08 one for sites and service providers].
| |
− | * The final WLCG T2 availability/reliability figures for March have been added [https://espace.cern.ch/WLCG-document-repository/ReliabilityAvailability/2014/march-14/ here].
| |
− |
| |
− | '''Tuesday 15th April'''
| |
− | * [https://twiki.cern.ch/twiki/bin/view/LCG/GDBMeetingNotes20140409 Summary notes] from [https://indico.cern.ch/event/272620/ April's GDB] are available. The [https://twiki.cern.ch/twiki/bin/view/LCG/GDBActionInProgress actions] have also been updated.
| |
− | * A new GOCDB role has been requested for the use case where a user's DN is to be associated with the site, allowing other systems (nagios for example) to read the list of user DNs that are linked to that Site and take subsequent authorisation decisions.
| |
− | * There has been an [https://operations-portal.egi.eu/broadcast/archive/id/1123 advance notification] of an extended GOCDB service OUTAGE starting 07:00 to 14:00 (BST) on 29th April.
| |
− | * A reminder: The WLCG T2 March availability/reliability figures were made available two weeks ago. Please could sites below the 90% targets write with details of issues encountered.[http://sam-reports.web.cern.ch/sam-reports/2014/201403/wlcg/WLCG_All_Sites_ALICE_Mar2014.pdf ALICE], [http://sam-reports.web.cern.ch/sam-reports/2014/201403/wlcg/WLCG_All_Sites_ATLAS_Mar2014.pdf ATLAS], [http://sam-reports.web.cern.ch/sam-reports/2014/201403/wlcg/WLCG_All_Sites_CMS_Mar2014.pdf CMS], and [http://sam-reports.web.cern.ch/sam-reports/2014/201403/wlcg/WLCG_All_Sites_LHCB_Mar2014.pdf LHCb]. The EGI availability/reliability [https://wiki.egi.eu/wiki/Availability_and_reliability_monthly_statistics figures for March] are available.
| |
| | | |
| | | |
General updates
|
Tuesday 6th May
- WLCG workshop - responses considered as part of a list. If you notified Jeremy last week, please now go ahead and submit a visit notice as usual and book early.
- WLCG A/R reports for April are now available.
Tuesday 29th April
- There is an LHCOPN/LHCONE meeting at CERN - yesterday and today.
- A reminder that there is a GOCDB service OUTAGE today 06:00 to 13:00 UTC (07:00 to 14:00 BST). After that it will be at risk. During the outage a read-only fail over service will be in use.
- Planning for the May pre-GDB on Data Access and GDB is almost done.
- Please email Jeremy if you have an interest in attending the WLCG workshop in July.
- There was an EGI OMB meeting last Thursday. See the agenda here. Topics covered were:
- Operations updates;
- EGI Competence Centres Call;
- Update on SAM migration;
- Migration of 1st and 2nd level support;
- Status of other core tasks;
- Security updates;
- New features of the accounting portal;
- CVMFS task for update;
- EMI-2 decommissioning update.
- Pete Clarke circulated the final network forward look document update.
|
WLCG Operations Coordination - Agendas
|
Tuesday 22nd April
- There was a WLCG operations coordination planning meeting last Thursday. The minutes are now available. Also see the agenda.
- There was a request to add xrootd endpoints of your site in GOCDB. Alessandra provided this status summary link.
Tuesday 15th April
Tuesday 8th April
- Registration for the next WLCG workshop opens this week.
- WLCG [ps://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions baselines] have been updated. gLiteWMS to be checked.
- Various Tier-0/1 storage updates - see table in minutes
- Various Oracle updates have been completed at CERN.
- Job efficiency report Meyrin vs Wigner being compiled.
- Some delays in use of VOMS-admin (due to some bugs to be fixed and some features that need further understanding/changing (because of their different behaviour to VOMRS).
- CERN batch capacity migrated to SLC6 was at 65% last week.
- ALICE: Steady activities in preparation for Quark Matter 2014 (May 19-24, GSI Darmstadt)
- ATLAS: Rucio commissioning: we started just in the last days the commissioning of the various Rucio services. DataTransfer issues: observed few links with "slow transfers" (order of 0.5MB/s) includes 3 UK sites. Observed issue with CVMFS cache: ATLAS file is 2.2GB and the default shared cache was set to 2GB.
- CMS: DBS2 will be switched off April 7th. CVMFS switch at CERN: Monday, April 14th .
- LHCb: Incremental stripping campaign almost finished. Future VOMS2 server added to the VO card.
- Tools: GGUS new version released on 26 March: multiple site notification, CMS specific SU and forms.
- FTS3: New version deployed as pilot - in production in 2-3 weeks if no issues.
- glexec: 79 tickets closed and verified, 16 still open (no change)
- Machine/JF: detailed plan for bare metal, cloud, client and bi-directional developments has been discussed and agreed within the TF
- Middleware readiness: process agreed. Volunteer sites to be agreed by 15th April.
- Multi-core: Various reviews done. Next review experience in CMS and ATLAS shared sites when handling multicore jobs from both VOs.
- perfSONAR: Deadline for perfSONAR installation has passed (April 1st). 9 sites missing out of 111. No UK sites listed - thank you! But some firewall issues to resolve.
- SHA-2: EGI Operations Portal VO cards for the experiments have been updated with the details of the future VOMS servers
- WMS decommissioning: CERN WMS instances for experiments are being drained as of 13:53 CEST on April 1
- xrootd: no update
- IPv6: Some new test sites. Panda Dev instances are being made dual stack.
- http proxy discovery: no update.
Tuesday 1st April
|
Tier-1 - Status Page
|
Tuesday 6th May
- Network intervention last Tuesday completed successfully.
- New testing CVMFS client 2.1.19.
- In process of scheduling Castor 2.1.14 upgrade.
- The software server used by the small VOs will be withdrawn from service (aiming for June).
|
Storage & Data Management - Agendas/Minutes
|
Tuesday 22nd April
- A DPM collaboration meeting is being planned for the coming week(s). Are there any site comments or feedback on DPM as a product (e.g. speed of new feature development) and the support it receives?
Wedn. 2 April 2014
- All metrics green for the past quarter!
- Performance issues being pursued - Brian is testing/coordinating
- Report from GridPP32: "big" VOs, "small" VOs. See blog.
- Report from ISGC2014: dCache, DIRAC, new countries. See blog.
Tuesday 18th March
- Chris noticed some of Steve's tests failing. At IC this related to a full spacetoken. Bristol is not working as there is no SCRATCHDISK spacetoken. Durham fails with a no space left on device error message.
March 2014
- How would we move data between DiRAC and GridPP?
|
Accounting - UK Grid Metrics HEPSPEC06 Atlas Dashboard HS06
|
Tuesday 29th April
- Glasgow looks slightly delayed with recent accounting data publishing.
Tuesday 15th April
- The APEL accounting system has been undergoing database maintenance to improve performance and reliability. Networking problems at the RAL site have delayed completion of the operation. Sites may see nagios alerts warning them that they have not published accounting data for 7 days - these will stop after the maintenance work completes.
|
Documentation - KeyDocs
|
See the worst KeyDocs list for documents needing review now and the names of the responsible people.
Tuesday 15th April
Tuesday 1st April
- Keydocs action needed by Jens J; Rob H/Security T; Alessandra F; Wahid B; David C and Matt D.
- We need to reassign Mark M's documents on Core Grid Services
Tuesday 18th March
- Keydocs action needed by: Mark M; Jens J; Rob H/Security T; Alessandra F; Wahid B; David C and Matt D.
|
Interoperation - EGI ops agendas
|
Tuesday April 29th
- Just a note that the next meeting is Monday 5th May
|
On-duty - Dashboard ROD rota
|
Tuesday 29th April
- Ongoing problems with the dashboard. Issue escalated to EGI.
- There was an update of Dashboard last Thursday which solved a few issues. It no longer shows warning alarms which is good. There were some other improvements as well.
- EMI-2 deadline is approaching. The following need attention and are about to be escalated:
- RHUL (cream2.ppgrid1.rhul.ac.uk) ) Last update in ticket: 16/4
- Sussex (grid-cream-01.hpc.susx.ac.uk, grid-bdii.hpc.susx.ac.uk,) (Plus Sha-2 compliance for grid-cream-01.hpc.susx.ac.uk) ) Last update in ticket: 16/4
- Bristol (lcgce03.phy.bris.ac.uk, lcgce04.phy.bris.ac.uk, lcgbdii.phy.bris.ac.uk) ) Last update in ticket: 23/4
- ECDF (info2.glite.ecdf.ed.ac.uk, ce7.glite.ecdf.ed.ac.uk) Last update in ticket: 28/4
- Durham (ce1.dur.scotgrid.ac.uk, se01.dur.scotgrid.ac.uk) ) Last update in ticket: 23/4
Tuesday 22nd April
- There were ongoing problems with the dashboard last week. Several bugs, possibly including one related to the email function, have been fixed.
|
Rollout Status WLCG Baseline
|
Tuesday 18th March
Tuesday 11th February
- 31st May has been set as the deadline for EMI-2 decommissioning. There may be an issue for dCache (related to 3rd party/enstore component).
References
|
Security - Incident Procedure Policies Rota
|
Tuesday 29th April
- The changes to the regional dashboard make the on-duty task harder. Need to rely on Pakiti again.
Tuesday 15th April
- Update on the OpenSSL status.
- The discussion list members have been updated. Anyone missing?
|
|
Services - PerfSonar dashboard | GridPP VOMS
|
Tuesday 29th April
- It was mentioned several weeks ago that the perfsonar meshes were being sorted by host name and that sorting by site name would be available soon. This is now the case. You can see the familiar GridPP site sorting here and the large WLCG mesh here. Note the square of GridPP sites towards the bottom right. Red squares represent throughput of less than 500 Mb/s.
Tuesday 15th April
- New LiveCD and LiveUSB images are now available containing the latest openssl packages (see email of 11th April).
Tuesday 8th April
- Some discrepancies found in VOMS ports and listings between VOMSsnooper and the dashboard for ops. (15009 vs 15002.
- Also noted WLCG VOMS changes. New VOMS servers are being introduced as notified in this broadcast.
|
Tickets
|
Monday 28th April 2014, 16.30 BST
I'm afraid the ticket roundup is incredibly light and not in the usual (or any) format.
EMI upgrade tickets:
ECDF, Bristol, RHUL, Durham, EFDA-JET, Glasgow, Sussex, UCL and RALPP all have open EMI upgrade tickets. Can everyone with an open ticket please update it this week (preferably buy the first) if they haven't done so in the last 7 days (or if you have but have made progress since then). It's a lot easier for the Person on Duty to extend tickets when there's site updates to validate their actions.
(RALPP have submitted https://ggus.eu/index.php?mode=ticket_info&ticket_id=104839 in response to an argus problem they were seeing post upgrade).
UCL have another Nagios error ticket: https://ggus.eu/index.php?mode=ticket_info&ticket_id=104824
Interesting One:
https://ggus.eu/index.php?mode=ticket_info&ticket_id=104937
Manchester received a ticket from Steve Traylen regarding a lot of connections to the CVMFS stratum 1. Andrew confirms these are VAC machines (unless I've misread something). It looks like the local squid cache was being ignored, Andrew is on the case.
Afraid that's it from me. Next week's will be better (because it's the first Monday of the month... that came around quickly!).
|
Tools - MyEGI Nagios
|
Monday 17th March
Tuesday 26th November
- Regional Nagios updated to release 22. It is a glite to UMD update and it required a fresh installation.
- There have been some internal changes in SAM-Nagios. Test probes are now the responsibility of product team. Some test names have been changed as a result of this reorganization. For example the org.sam.CREAMCE-DirectJobSubmit test has become emi.cream.CREAMCE-DirectJobSubmit. This does not affect the operational activities.
- Please could all site admins look at services associated to their site and please mail Kashif if anything odd is noticed. Site admins can reschedule tests for their sites and it would be helpful if most functionalities are tested.
- Also, look at myegi which can be useful with links to the Dashboard, GSTAT, Accounting Portal and GGUS.
|
VOs - GridPP VOMS VO IDs Approved VO table
|
Tuesday 15th April
Monday 17 February 2014
- Proxy renewal
- All RAL WMSs now renew proxies with 1024 bits. This looks like the end of this (at last).
Tuesday 11 February 2014
- Proxy renewal
- lcgwms06 at RAL has been upgraded and works
- Both Imperial's WMSs work
- Glasgow's will still need to be upgraded (unless they have been since Friday).
|
Site Updates
|
Tuesday 8th April
- Steve noted that Liverpool are having a problem with the CVMFS clients on their workers nodes. "...in short, VO/CVMFS admin for na62 and mice are publishing stale .cvmfswhitelist and repos cannot be mounted on new systems. I expect this to spread to other systems and VOs as local cache dates expire."
|
|