GDB 14th October 2009

From GridPP Wiki
Jump to: navigation, search

Introduction

GDB meetings in 2010 will be on second Wed of each month. pre-GDB - only topic is virtualisation (likely to be Nov or Jan) - any other issues?

LHC still on track for mid-Nov.

EGEE SLA compliance - sites with <50% availability for 3 consecutive months will be suspended.

Security patching - 2 sites suspended by EGEE.

Authorisation Services

Current plan is to deploy SCAS and glexec. Are we good to go - no - security patch still required in glexec_WN on SL5.

Argus - certification in progress still - detailed performance testing.

SCAS - tested and released at some sites but not tested in as much detail as Argus.

Is there any reason not to deploy SCAS - answer: no technical reason. Leave it to sites which to deploy. Plan to ask a few sites in each ROC to deploy Argus. Argus has more functionality. SCAS builds on lcmaps.

Argus - multi-user pilot jobs, command line use, global banning lists (if activated), Nagios plug-ins etc.

"Argus was always considered as the long term solution for authorization in gLite". Part of EMI proposal. Argus developers are now looking for sites that are willing to deploy Argus for gLExec on WN use-case and use OSCT global banning lists

Middleware news

SL5 migration: WLCG MB endorses move to SL5. Meta-rpm available, some Centos users have had problems installing the meta-rpm. Less releases than planned (EGEE meeting).

glite 3.2 upcoming releases: glexec_wn basically ready, apart form rpms needing updating for security issue. CREAM: some problems were found, patch is ready for certification. DPM_disk Argus MPI_utils

Tier-1 dcache stability for Data Taking: John went through each sites summarising the statuses. There is a draft of baseline services http://indico.cern.ch/materialDisplay.py?contribId=8&sessionId=2&materialId=slides&confId=45480

Installed Capacity and BDII

Steve went through the current state of publishing of information at sites, pointing out some of the issues. Sites should check the status of their sites and correct any errors.

ROSCOE

Robust Scientific Communities for EGI. Several communities, some new. Start date June 2010 (around EGI start date), 3 year project.

Experiments

LHCb Simulation currently running. Data access issues and instabilities of services are still the main problem.

SLS based alarming system in place since ~ month for lhcb storage operations. If Free< 4TB AND 2. free/total < 15% AND 3. Total<pledged then email is sent to site.

Shared area: still a plague at many sites.May be the most important service at the site it has to scale with #slots. Tier2 most of the problem now!

CMS Projects: Data ops, Facilities Ops and new Analysis operations projects. Meetings: Data ops, Facilities Ops and WLCG Ops

Availability tested by CMS specific SAM tests (by workflow) and other metrics such as job robot => site readiness status.

Note list of site responsibilities and CMS expectations (slide 11).

ATLAS The last slide of Graeme's talk http://indico.cern.ch/materialDisplay.py?contribId=3&sessionId=3&materialId=slides&confId=45480 summarises ATLAS priorities for sites.