GDB 12th May 2010

From GridPP Wiki
Jump to: navigation, search

Introduction (John Gordon)

First meeting since data collection started. Ian Bird: Data flowing, things going smothly - computing holding up which is positive, significant data rates. Large number of people doing analysis, starting to see some limits but nothing broken yet. LHCC etc are happy with WLCG.

WLCG operations meeting (Jamie Shiers)

Wednesday - service issues. Thursday - experiment Jamborees. Friday - Summaries, outlook etc.

Book early.

Data Management and Storage review (Andrea Sciaba)

Dcache - problems with 1.9.5-18! - do not upgrade. dcap++ coming soon. Building a testbed for NFS 4.1 to run ATLAS and CMS applications. Kerberos security in NFS4.1 coming soon. Developers meeting last week - more consistent identity system and improved release mechanism.

DPM 1.7.4-6 certified. Periodic cleanup of request database. Fixes free space reporting in dpm-listspaces. Allow setting of RFIO buffer size on client side. Various bug fixes

ATLAS Would like to perform non-regional Tier-1 to Tier-2 transfers. How to configure FTS channels?

26th IEEE MSS conference -

"We are somewhat unique in having a proven multi-PB distributed solution. However, we are not where industry is / is going, but rather where it was going. Our requirements do not appear unique."

Evolution of Storage and Data Management (Ian Bird)

Experiments expressed concerns about performance and scalability of data access. 3-day workshop in Amsterdam to discuss these issues.

Glexec update (Martin Litmaath)

5 sites at which glexec works for ops and 7 sites publish that they support glexec.

Virtualisation (Tony Cass)

Draft policy document for Trusted virtual machines. Generation. Transmission: How to get images to and from sites: how do you know which images have been endorsed? Posible for site to decide which images to accept. Local repository of images which can be instantiated on WN. Expiry and revocation. Contextualisation. Support for multiple hypervisors.

Now have a skeleton of a scheme that will enable a site to treat VMS images exactly as normal worker nodes. Active involvement of VO's is now highly desirable.

Experiment session

LHCb Some Tier-2's ('T2-LAC') will be used for analysis. Also finding problems with file access during analysis - adopting ATLAS-style staging to WN disk and Hammer-cloud testing. Shared area is critical issue at Tier-2's.

CMS Things working quite well.

ATLAS Lots of reprocessing ongoing - aim to reprocess 100% of all data of interest. CREAM - fix for lease bug in recent Condor but not yet released. Suffering from site stability and at the moment a large effort of manpower is required to maintain good running of sites with no sign of it decreasing. Need to stabilise current tools and only add in new functions which reduce workload.

OSG

Problems with top-level BDII affecting OSG operations - example of out of hours operational issues - what is the procedure - what is the SLA?