Operations Bulletin 060114

From GridPP Wiki
Jump to: navigation, search

Bulletin archive


Week commencing 30th December 2013
Task Areas
General updates

  • Next HEPSYSMAN meeting is on Monday 11th in Birmingham (details).

Tuesday 17th December

  • The December GDB agenda is here. Official notes are not yet available but will be placed in a summary on this page.
  • Details of, and talks from, the DPM workshop that took place in Edinburgh last Friday (13th) can be found here.

Tuesday 10th December

  • There is a pre-GDb today on Identity Federation in WLCG. It will discuss existing federation work around the community and set a WLCG direction. Join by Vidyo if you wish to contribute!
  • There is a GDB tomorrow. The agenda will cover security; provisioning of EGI core services; SHA-2 readiness; ops coordination updates; an update on networking and report from HEPiX.
  • A meeting of the middleware readiness working group will take place on Thursday afternoon.
  • Minutes from the GridPP technical meeting on Friday are available.
  • The draft Tier-2 availability/reliability report was circulated last week. Corrections due by 15th December. Also please check the VO reports and the EGI/NGI report!
  • Note LAL reports VOs running SAM tests under a regular account is showing up fair-share limits in the results - with subsequent impacts on A/R results.
  • There are plans for a January HEPSYSMAN at Birmingham.
  • The Sussex GGUS access issue was resolved. For future reference, GGUS support access can be applied for via this page.


WLCG Operations Coordination - Agendas

Tuesday 17th December

  • The next WLCG ops coordination meeting is on Thursday 19th (agenda).


Tuesday 10th December

  • Confirmation of the multi-core task force with this mandate. Some concerns about overlaps with the machine/job features TF.
  • Discussion of experiment Christmas plans
  • Update of the [ttps://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions baseline versions]. BDII update important for SAM BDII nodes at CERN.
  • Tier-1 WNs on OPN is now being tracked here.
  • ALICE - MC will continue over break. Best efforts approach appreciated.
  • ATLAS - plans for ramp up of MC production. Repro and analysis ramp up also expected in coming weeks.
  • CMS - Run2 MC samples prep starting. "Appreciate all support from the sites we can get, but don’t expect normal levels of support, especially for T2 sites"
  • LHCb: Usage of distributed grid resources for mainly monte carlo productions. Surveillance by the operations team on a best effort basis. Also note a new CVMFS dashboard for LHCb.
  • Christmas plans summary: "All experiments will run activities over christmas at non negligible scale. They do not require special effort from sites or WLCG in general, while best effort support is highly appreciated"
  • WMS decommisioning: looks like WMS usage by CMS decreasing but it is variable.
  • glexec: 31 tickets remain open. Status tracked here.
  • FTS3: testing ongoing
  • Tracking tools: An engineer will be on-call for GGUS over the vacation period.
  • perfSONAR: Code maintenance an issue with BNL funding cuts. Looking at OSG and ESNet options. 3.3.2 out soon. See Status & Plans update. Asking sites to make accessible the perfSONAR main page

(https://<hostname>/toolkit) for the central operations activity. Plans are for OSG to host perfSONAR-PS central service, BNL dashboard not all correct.

  • IPv6: request from CMS to have IPV6 supported on SLC5 at CERN. Alistair D taking on ATLAS role for IPv6 testing.
  • Middleware readiness: Meeting planned for 12th December.
  • Machine/job features: Discussion between current implementation and proposed route minimizing draining waste (MDW) cpu time for multi-core pilots.
  • SHA-2: still some updates at sites ongoing (>10 sites). "by mid January the WLCG infrastructure is expected to be essentially ready ". OSG plans to move in mid-January.
  • VOMRS: VOMS-Admin still in testing.

Tuesday 3rd December

Tier-1 - Status Page

Tuesday 17th December

  • Rolling updates to Worker nodes (applying kernel/errata updates, updating Condor version and slightly reducing memory overcommit) ongoing. Other updates to Grid Services being applied.
  • Checking systems ahead of Christmas break. During the holiday we will have our usual out of hours cover supplemented by a brief daily check of systems.
Storage & Data Management - Agendas/Minutes

Monday 9th December

  • Spacetokens for non-LHC VOs - recommendations.

Tuesday 8th October

  • The DPM workshop agenda and registration page will appear here.

Monday 30th September

  • A DPM workshop is being organised in Edinburgh for 13th December. GridPP PMB anticipated covering travel for of order 10 UK sysadmins for this event. Interest should be indicated during the storage group meeting.



Accounting - UK Grid Metrics HEPSPEC06 Atlas Dashboard HS06

Tuesday 26th November

Tuesday 5th November

  • A reminder to keep an eye on the SL HS06 page for odd ratios. Steve takes HS06 cpu numbers direct from ATLAS and the page does get stuck every now and then.
  • The metrics page has been updated.

Tuesday 13th August

Documentation - KeyDocs

See the worst KeyDocs list for documents needing review now and the names of the responsible people.

Tuesday 17th December

  • A number of documents have gone into the warning state. Please could those with responsibilities here please review their documents - the server will be emailing you with a reminder.


Monday 11 November

  • The plan for use of adoption of backup servers continues to evolve. Please see latest version here. The new version contains details of tests and concluding operations for site and VO admins.
  • The approved VOs page continues to be updated with the newest data from the operations portal.

Note: T2K now requires liblockfile-devel.

Tuesday 5th November

  • Documents states will be reviewed at the core ops meeting this coming Thursday.

Tuesday 1st October

  • The approved VOs page has been updated with the newest data from the operations portal. Note that the VOMS records for LondonGrid now contain some alternative voms servers. The migration plan for use of these backup servers is now document here.
Interoperation - EGI ops agendas

Tuesday 17th December

  • The next meeting will be on Thursday combined with the EGI OMB.

Tuesday 3rd December

  • Additional notes:
    • the 2.6.16 version of dCache mentioned has a serious bug in the migration module; 2.6.17 has this fixed so should be used in preference. The possibility of skipping 2.6.16 in the overall release of EMI-3 being discussed
    • Note that the cream updates mentioned in this meeting contain security updates and so are recommended.
    • Looking for CREAM/LSF plugin staged rollout, but don't believe there are any such sites in the UK
    • SHA-2 : 17 sites remaining in the EGI that are publishing SHA-2 and alarming; I don't think that any such sites in the UK (just a couple) are unaccounted for/previously documented.
    • It was asked when CAs would start issuing SHA-2 certs only (UK noting that it's planning to from January)
  • Next meeting: (last for 2013) 16th December
gLite support calendar.


Monitoring - Links MyWLCG

Tuesday 10th December

  • Feedback transmitted and discussed by consolidation group; next meeting is now in January.

Tuesday 26th November

  • As noted by Alessandra, if possible we'd like site feedback on the consolidated monitoring prototype before the next meeting a week on Friday to report back to the group (with thanks to everyone who has already contributed)
  • Some notes to form a wiki on Graphite are to be found here: https://www.gridpp.ac.uk/wiki/MonitoringTools but these are under development, however if there are areas people would find useful that could be expanded, please let David know.
  • Glasgow dashboard now packaged and can be downloaded here.
On-duty - Dashboard ROD rota

Tuesday 17th December

  • Quiet week with no UK wide problems.
  • A few sites (EFDA, Sussex, Brunel) have tickets which haven't made any visible progress, partly because of waiting for fixes/help. The other tickets are hopefully transient problems that the sites will fix next week.

Monday 9th December

  • UCL tickets closed.


Rollout Status WLCG Baseline

Tuesday 29th Oct Yesterday the first stage rollout request (for the CREAMCE) in months has come through. I've updated the Stage of the Nation page.


Tuesday 8th Oct There have been updates to EMI2 and 3 yesterday, but no new request for Staged Rollout. There is a problem with dcap-libs: [GGUS 97805] References


Security - Incident Procedure Policies Rota

Tuesday 19th November

  • There was a team meeting last Friday 15th November. Next meeting on 29th.
  • Just a couple of site issues showing up in Pakiti.
  • Looking at ARGUS server for UK NGI.

Tuesday 29th October

  • There was a team meeting on Friday 25th.
  • A couple of critical warnings are appearing in Pakiti and being followed up.


Services - PerfSonar dashboard | GridPP VOMS

Tuesday 26th November

  • The main perfSONAR issues this week affect Manchester and Sussex.

Tuesday 19th November

  • There is a new dashboard. Feedback is welcome.
  • Manchester, Durham, Glasgow and Sussex show problems across the board.

Tuesday 1st October

  • PerfSONAR latency hosts configured to use the WLCG meshes should now have a traceroute measurement achive (MA) accessible from the GUI under 'Service Graphs' --> 'Traceroute'. Here is an example.

Tuesday 17th September

  • Upgrading/re-installing hosts to v3.3.1/mesh is only making slow progress.
  • There is a new view of the status between sites.
  • An outage at Manchester due to central switch maintenance means that VOMS is not going to be contactable for a period this morning. It is clear that we need the backup VOMS instances fully available to VOs - please can someone take a lead?
Tickets

http://tinyurl.com/cblj3ab

Merry Christmas and a Happy New Year!


Tools - MyEGI Nagios

Tuesday 26th November

  • Regional Nagios updated to release 22. It is a glite to UMD update and it required a fresh installation.
  • There have been some internal changes in SAM-Nagios. Test probes are now the responsibility of product team. Some test names have been changed as a result of this reorganization. For example the org.sam.CREAMCE-DirectJobSubmit test has become emi.cream.CREAMCE-DirectJobSubmit. This does not affect the operational activities.
  • Please could all site admins look at services associated to their site and please mail Kashif if anything odd is noticed. Site admins can reschedule tests for their sites and it would be helpful if most functionalities are tested.
  • Also, look at myegi which can be useful with links to the Dashboard, GSTAT, Accounting Portal and GGUS.
VOs - GridPP VOMS VO IDs Approved VO table

Tuesday 9 December 2013

  • Backup VOMS server
    • VO managers still need to check sites - Scotgrid,northgrid,southgrid,londongrid,gridpp VOs were going first, but have not yet updated their status.

Monday 2nd December 2013


Monday 25th November 2013

  • CVMFS progress - but not quite there yet.
  • 6 VOs (cern@school,gridpp,na62, pheno,sno+,t2k.org ) have updated their VOID card entries and updated the wiki.
  • Storage
    • Gfal2 - GGUS 99043,99044,99055,99067 - not performant, but very interesting functionality
    • Webdav now enabled on LFC@RAL and ports free from firewall - needs testing

Tuesday 19 November 2013

Site Updates

Actions


Meeting Summaries
Project Management Board - MembersMinutes Quarterly Reports

Empty

GridPP ops meeting - Agendas Actions Core Tasks

Empty


RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) Agenda EVO meeting

Wednesday 11th December

  • Operations report
  • There was a successful UPS/Generator load test this morning (Wed 11th Dec.)
  • A number of members of the Castor team attended the Castor face-to-face meeting earlier this week.
WLCG Grid Deployment Board - Agendas MB agendas

Empty



NGI UK - Homepage CA

Empty

Events

Empty

UK ATLAS - Shifter view News & Links

Empty

UK CMS

Empty

UK LHCb

Empty

UK OTHER
  • N/A
To note

  • N/A