GDB 9th March 2011

From GridPP Wiki
Jump to: navigation, search

March GDB

Wednesday 09 March 2011 at - CN2P3, Lyon

http://indico.cern.ch/conferenceDisplay.py?confId=126130

Introduction (John Gordon)

July meeting cancelled due to WLCG workshop.

forthcoming events:

  • HEPIX Darmstadt 2-6 May.
  • WLCG meeting DESY 11-13th July.
  • EGI Technical Forum 19-23rd Sept.

News:

  • RGMA registry closed on 1 March.
  • Sites publishing through glite-MON will no longer work.
  • CPU installed capacity - 13 sites not publishing.
  • 59 sites not publishing shares for LHC VO's.

CERN-VM-fs:

  • CERN IT support being finalised.
  • Security audit ready to report (positively).
  • Replication/mirroring process is working- mirror at RAL updates every hour. Working on updates triggered by CERN whenever changes occur - used with RAL WN. BNL progressing.


ACE - availabilities based on CREAM (Wojciech Lapka)

  • CREAM nagios probe; everything OK for Tier-1 sites.
  • Differences for ~30 sites - including Imperial and Brunel.
  • Request to Sites: please check if services declared in GOCDB.
  • New FCR mechanism is being tested by CMS.

CREAM status (John Gordon)

  • A few sites in UK not supporting CREAM CE.
  • Correct availability calculation should be available at end of March.
  • WLCG pushing to decommission LCG-CE.

glexec and MUPJ update (Maarten Litmaath)

ATLAS

  • glexec tests now OK at various UK Tier-2's (and BNL)
  • target proxies not accepted by Panda

CMS

  • USCMS use glexec in production at a few OSG sites (setuid mode)
  • other T1/T2 sites being tested
  • issue of location of glexec $GLEXEC_LOCATION/sbin needs to be defined explicitly (bug opened)
  • issue with tarball WN still not clear

LHCb running Nagios glexec tests as well

  • CREAM not yet tested

ALICE

  • integrating glexec into AliEn

next big push is to roll out in Tier-2's

LHC Open Network Environment (LHCONE) (John Shade)

The problem:

  • LHC data models are evolving - more dynamic, less pre-placement of data
  • network usage will increase and be more dynamic
  • desire not to swamp R&E networks for LHC traffic (although LHCONE allows to use R&E if preferred)

The constraints:

  • don't break LHCOPN
  • distributed management
  • designed for agility and expandability
  • must appeal to funding agencies

3 levels of Tier-2

proposed solution:

  • exchange points: "Exchange points will be built in carrier-neutral facilities so that any connector can connect with their own fiber or using circuits provided by any telecom provider."
  • with LHCONE T2 and T3 will be able to obtain data from any T1 or T2

"LHCONE provides connectivity directly to T1s, T2s, and T3s, and to various aggregation networks, such as the European NRENs, GEÃÅANT, and North American RONs, Internet2, ESnet, CANARIE, etc."

Next steps

  • looking for feedback
  • build a prototype at CERN
  • refine, esp. monitoring

File-systems Efficiency (Xavier Canehan)

  • C2INP3 presented some results on file system testing they have done
  • seeing problems with latency on some jobs

Lunch

LHCb (Stefan Roiser)

  • CVMFS in production use at some Tier-1s: NIKEF, PIC and testing at RAL & CNAF
  • Now running production on T0,T1 as well as T2 (up to 30k jobs running)
  • Data consistency checks being done - developing new tools to automate this
  • Setup of runtime environment sometimes times out (esp. sites with AFS software spaces)- caused by high number of file ops, fixed at IN2P3 but ongoing at CERN
  • 'sawtooth pattern' - in site usage i.e. pulsing of usage

Tier-2s

  • how to inform the Tier-2's of important upgrades?
  • e.g. the CREAM bug

In conclusion: no major problems and good response from sites

ATLAS - (Stephan Jezequel)

Operations over last quarter:

  • ATLAS is 'gradually breaking cloud model'
  • Consolidation of ATLAS and sites in preparation for data taking
  • 34% of analysis being done by US
  • DATADISK and MCDISK merged with DDM (FTS transfers/central deletion)
  • Sites have to clean up dark data after migration

Space token shares for 2011: See https://twiki.cern.ch/twiki/bin/view/Atlas/StorageSetUp

ATLAS wants to break the cloud model to get more flexibility Obvious constraint : Should match the network connectivity between sites

Actions:

  • Prepare LFC consolidation at CERN
  • Some T2s running G4 simulation for different T1s
  • Direct transfers between some T2s and all T1s

In the future:

  • Promote 'good' T2s to host primary replicas (only in T1s today)

Cross-cloud production Reason:

  • allocate more CPU resources for urgent simulation

Some big T2 sites already associated to many Tier1s

  • T2 connectivity to all T1s
  • Select good T2 sites which will always transfer from/to all T1s: called T2Ds
  • A long list of sites 'in probation'

ATLAS want to reduce raw file size by zipping them (factor 2) at Tier-0

CMS (Daniele Bonacorsi)

Not many major issues in CMS Computing Operations since last time we met

  • analysis, analysis, analysis
  • preparing for 2011 data-taking

Site readiness: 40/50 Tier-2s consistently 'ready'

  • CMS is monitoring the transition to CREAM - Brunel and Imperial failing the test

glexec on the WN and ARGUS: "Initial tests indicate we have a long way to go for full deployment"

Conclusion: 'CMS Computing Operations OK since last time we met'