GDB 10th June 2009

From GridPP Wiki
Jump to: navigation, search

notes: Duncan

Agenda

Introduction (Convener: Michel Jouvin (LAL / IN2P3) )

Michel summarized yesterdays storage pre-gdb meeting.

Overview Board Summary (Convener: Ian Bird (CERN)

CERN will lead the formation of an EGI SSC for HEP (+astroparticle?) together with others.


STEP09 Progress Summary (Convener: Jamie Shiers (CERN))

Scale Test for the Experiment Program 09. The priority to improve site readiness is NOW! Must do a proper post-mortem. Some concerns regarding T1-T1 connectivity.

Site Readiness: Sites should be producing a service incident report after a major incident.

Jamie emphasised the need to maintain service levels. Proposed a monthly F2F WLCG operations meeting fitting into the existing pre-GDB & GDB schedule.

He reminded us of the existence of wikis and the daily WLCG operations meeting.

Summary - still too many significant service degradations. About 1/3 could be avoided. However, also significant improvements in WLCG operations and service delivery have been made. Need a similar step up in service level if we are able to of offer an acceptable level of service for data taking.

ASCG gave a brief informal report on their recovery from the fire.

Middleware Update (Convener: Andreas Unterkircher (CERN) )

glite 3.1 next release 15 June fixes lcg_cr -t bug but introduces a new bug

next releases - DPM 1.7.2, bdii 5.0, WMS 3.2 (needs staged release), SCAS, Dcache 1.9.7-1 (server)

glite 3.2 SL5 UI to be released 15th June. Next are DPM/LFC.

Discussion on port of glite to Debian.

CREAM CE (Nick Thackray)

See slides for status URL. CondorG submission still beimng worked on by US-CMS MB milestones: one CREAM CE at European Tier-1's +Triumf and CERN. 5 T2 supporting ALICE. Aug 1 - at least 2 T2 for each expt - getting there.

OSG Security Drill (Convener: Mine Altunay (FNAL)/ Ruth Pordes)

Similar to EGEE Security drills. OSG did the drills and both sites responded well. Will look at Tier-2s in coming months.

Chimera Migration (Michel Jouvin)

Critical for dcache performance at Tier-1s. Migration not trivial and irreversible so Tier-1s are not queueing up to be first to move. Migration must be done before data-taking. A long discussion followed.

HEPix Report (Michel Jouvin)

HEPix is extending - trying to go to a new place for each meeting. Next spring in Lisbon. Main focus in Umea (Sweden) was virtualisation. Site reports, Scientific Linux, Data Centres, Sotrage - file system working group, Lustre Virtualisation: will be a track at each meeting. Virtualised WN, inctegrating with batch systems, CERNVM. Discussion about VO-maintained images. File Systems Working Group. Lustre outperforming other systems by a factor of two. CERN joined (lustre eval).

Conclusion very useful forum open to any site. Next meeting is in Berkeley in October 26-30th.

SL5 for Worker Nodes

As soon as issues with SELinux are fixed and compatability libraries deployed the expts will be able to use SL5/x86_64 resources with SL4 binaries. Expts also making first native builds of software for SL5/gcc43 and starting to validate them.

SL5 decisions to be made (Oliver Keeble (CERN))

Expt software has dependenceies. Many can be satisfied from OS. Proposing a meta package for the rest. Decision: is this useful for sites. How will we advertise it - propose to use a GlueHostApplicationSoftwareRunTimeenvironment tag. All WLCG sites encouraged to move rapidly to SL5 after STEP09 (end of July). Experiements will be shipping there own gcc

Publishing installed capacity (Convener: Steve Traylen (CERN) )

Pushing to get consistency in what is being published.

Step One - Publish Non Zero Logical CPUs Step Two - Publishing HEPSPEC 2006 Values - this is waiting for a yaim patch.

Also required for storage.

Pilot Jobs Update (Convener: Maarten Litmaath (CERN) )

Alice has found some manpower to work on glexec. They have a prototype pilot job.

Glexec - two open problems identified. One when glexec is called it runs in a reduced environment and so loses its original environment. Patch is there.

Lancaster has joined testing of glexec/scas.

User Analysis Support (Convener: Massimo Lamanna (CERN) )

How best to support users? Avoid developers being disrupted and also communicate well with sites. SAM is an essential tool and also job submission tests e.g. CMS job robots and ATLAS hammertest. Systems should if they don't converge present similar output for comparison purposes. Lhcb are putting ganga-based hammertest into production also.