GDB 9th March 2011

March GDB

Wednesday 09 March 2011 at - CN2P3, Lyon

http://indico.cern.ch/conferenceDisplay.py?confId=126130

1 Introduction (John Gordon)
2 ACE - availabilities based on CREAM (Wojciech Lapka)
3 CREAM status (John Gordon)
4 glexec and MUPJ update (Maarten Litmaath)
5 LHC Open Network Environment (LHCONE) (John Shade)
6 File-systems Efficiency (Xavier Canehan)
7 LHCb (Stefan Roiser)
8 ATLAS - (Stephan Jezequel)
9 CMS (Daniele Bonacorsi)

Introduction (John Gordon)

July meeting cancelled due to WLCG workshop.

forthcoming events:

HEPIX Darmstadt 2-6 May.
WLCG meeting DESY 11-13th July.
EGI Technical Forum 19-23rd Sept.

News:

RGMA registry closed on 1 March.
Sites publishing through glite-MON will no longer work.
CPU installed capacity - 13 sites not publishing.
59 sites not publishing shares for LHC VO's.

CERN-VM-fs:

CERN IT support being finalised.
Security audit ready to report (positively).
Replication/mirroring process is working- mirror at RAL updates every hour. Working on updates triggered by CERN whenever changes occur - used with RAL WN. BNL progressing.

ACE - availabilities based on CREAM (Wojciech Lapka)

CREAM nagios probe; everything OK for Tier-1 sites.
Differences for ~30 sites - including Imperial and Brunel.
Request to Sites: please check if services declared in GOCDB.
New FCR mechanism is being tested by CMS.

CREAM status (John Gordon)

A few sites in UK not supporting CREAM CE.
Correct availability calculation should be available at end of March.
WLCG pushing to decommission LCG-CE.

glexec and MUPJ update (Maarten Litmaath)

Most Tier-1's now passing glexec test.
test URL: https://samnag023.cern.ch/nagios/cgi-bin/status.cgi?servicegroup=SERVICE_CE&style=detail

ATLAS

glexec tests now OK at various UK Tier-2's (and BNL)
target proxies not accepted by Panda

CMS

USCMS use glexec in production at a few OSG sites (setuid mode)
other T1/T2 sites being tested
issue of location of glexec $GLEXEC_LOCATION/sbin needs to be defined explicitly (bug opened)
issue with tarball WN still not clear

LHCb running Nagios glexec tests as well

CREAM not yet tested

ALICE

integrating glexec into AliEn

next big push is to roll out in Tier-2's

LHC Open Network Environment (LHCONE) (John Shade)

The problem:

LHC data models are evolving - more dynamic, less pre-placement of data
network usage will increase and be more dynamic
desire not to swamp R&E networks for LHC traffic (although LHCONE allows to use R&E if preferred)

The constraints:

don't break LHCOPN
distributed management
designed for agility and expandability
must appeal to funding agencies

3 levels of Tier-2

proposed solution:

exchange points: "Exchange points will be built in carrier-neutral facilities so that any connector can connect with their own fiber or using circuits provided by any telecom provider."
with LHCONE T2 and T3 will be able to obtain data from any T1 or T2

"LHCONE provides connectivity directly to T1s, T2s, and T3s, and to various aggregation networks, such as the European NRENs, GEÃÅANT, and North American RONs, Internet2, ESnet, CANARIE, etc."

Next steps

looking for feedback
build a prototype at CERN
refine, esp. monitoring

File-systems Efficiency (Xavier Canehan)

C2INP3 presented some results on file system testing they have done
seeing problems with latency on some jobs

Lunch

LHCb (Stefan Roiser)

CVMFS in production use at some Tier-1s: NIKEF, PIC and testing at RAL & CNAF
Now running production on T0,T1 as well as T2 (up to 30k jobs running)
Data consistency checks being done - developing new tools to automate this
Setup of runtime environment sometimes times out (esp. sites with AFS software spaces)- caused by high number of file ops, fixed at IN2P3 but ongoing at CERN
'sawtooth pattern' - in site usage i.e. pulsing of usage

Tier-2s

how to inform the Tier-2's of important upgrades?
e.g. the CREAM bug

In conclusion: no major problems and good response from sites

ATLAS - (Stephan Jezequel)

Operations over last quarter:

ATLAS is 'gradually breaking cloud model'
Consolidation of ATLAS and sites in preparation for data taking
34% of analysis being done by US
DATADISK and MCDISK merged with DDM (FTS transfers/central deletion)
Sites have to clean up dark data after migration

Space token shares for 2011: See https://twiki.cern.ch/twiki/bin/view/Atlas/StorageSetUp

ATLAS wants to break the cloud model to get more flexibility Obvious constraint : Should match the network connectivity between sites

Actions:

Prepare LFC consolidation at CERN
Some T2s running G4 simulation for different T1s
Direct transfers between some T2s and all T1s

In the future:

Promote 'good' T2s to host primary replicas (only in T1s today)

Cross-cloud production Reason:

allocate more CPU resources for urgent simulation

Some big T2 sites already associated to many Tier1s

T2 connectivity to all T1s
Select good T2 sites which will always transfer from/to all T1s: called T2Ds
A long list of sites 'in probation'

ATLAS want to reduce raw file size by zipping them (factor 2) at Tier-0

CMS (Daniele Bonacorsi)

Not many major issues in CMS Computing Operations since last time we met

analysis, analysis, analysis
preparing for 2011 data-taking

Site readiness: 40/50 Tier-2s consistently 'ready'

CMS is monitoring the transition to CREAM - Brunel and Imperial failing the test

glexec on the WN and ARGUS: "Initial tests indicate we have a long way to go for full deployment"

Conclusion: 'CMS Computing Operations OK since last time we met'

GDB 9th March 2011

Contents

Introduction (John Gordon)

ACE - availabilities based on CREAM (Wojciech Lapka)

CREAM status (John Gordon)

glexec and MUPJ update (Maarten Litmaath)

LHC Open Network Environment (LHCONE) (John Shade)

File-systems Efficiency (Xavier Canehan)

LHCb (Stefan Roiser)

ATLAS - (Stephan Jezequel)

CMS (Daniele Bonacorsi)

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools