GDB 11th April 2012

From GridPP Wiki
Jump to: navigation, search

GDB summary (Jeremy's notes based on reviewing the slides after the meeting so contains none of the discussion outcomes)


Proposals for the Future (Michel Jouvin) Thanks to John (5yrs). Continue with 2nd Wed in each month. Agenda in advance. Taking notes to be shared? TEGs/Working groups in pre-GDB slots possible. EGI/EMI end in next few years – model based on public clouds supported in EU. TEG futures Data Management & Storage (Dirk Duellmann) Process started with questionnaire. Final report April. Data placement layer diagram – placement (user – reliable push) and archive (service - dependable). Introduced idea of placement with federation layer (end user pull & opportunistic). Review SE components and implementations. Future protocol layers could include http and webdav. Only current placement option xrootd. Working groups looking at areas. GridFTP must remain in medium term. Managed transfers – only current option FTS. FTS3 plan update with http; use of replicas and stage from archive. LFC not needed in medium term. Storage quotas can be experiment handled. Experiments will split archive (tape) and cache (disk pools). SRM ubiquitous, needed in short-term but has performance concerns and functionality mismatch. Storage Operations: keep sites in protocol development/testing; check data loss expectations; policies for data handling; improve activity monitoring. POOL dev not needed in medium term. Security concerns being jointly addressed with sec TEG (back doors, permissions., ownership issues).

Operations Tools (Maria and Jeff) Stated aspirations and principles. https://twiki.cern.ch/twiki/bin/view/LCG/WLC GTEGOperations#Documents. Awaiting global assessment and prioritization of tasks. Recommendations for: Monitoring: converge on one system and a common multi-VO tool for performance. Deploy WLCG wide network monitoring system. Deployment: Adopt CVMFS. Information System: BDII caching then fully review BDII. Operations: Address T2 specific comms needs. Common site admin training. Establish core team for testing new services. Middlware: review services on procedures, scalability, functionality and documentation. Deployment: Endorse via EPEL, Application Area repository, fix on request options. Create WLCG monitoring coordination body. Support tools GGUS and further dev. Broadcast system. Expand opportunities for pre-release pilots and implement recognition model. SAM expands to include metrics from DIRAC etc. Look again at SSB.

PerfSonar (McKee) Overview and possible use in WLCG. Need a standardized way to monitor and locate problems. US ATLAS 2008 – deployed 2010 – LHCOPN choice June 2011. Baseline to see change in move to LHCONE. Main purpose – aid network diagnosis. Long-term network monitoring for LHCONE TBD! Locate instances close to storage resources at site. Instances – latency (10 pps to targets) and bandwidth (20-60s test per defined pair every 4 hrs). Configurations suggested. Primary probs found: firewalls and congested GPN links. Modular dashboard highly configurable. Alerting (who?)still to be understood.

Workload Management (Davide Salomoni) GLExec to be standard way of handling pilot jobs; extend CE for streamed submissions; bring submission frameworks closer (e.g. glideinWMS); extend CE/jdl for whole node and multi-core jobs; evaluate CPU pinning; extend CE to tag jobs CPU or I/O bound; m/w to support virtual CE; recommend end of WMS; simplify information system; share virtualization experience, adopt HEPiX-virt recommendations – dynamic provisioning of resouces (clouds).

Security (Romain Wartel) Details here: https://twiki.cern.ch/twiki/bin/view/LCG/WLCGSecurityTEG Risk analysis waiting on MB feedback. Need for fine-grained traceability – discussion lacked key stakeholders. Top risks are misused identities, attack propagation between WLCG sites and exploitation of a serious OS vulnerability. Still many areas to be addressed: e.g. ownership of traceability information. Data ownership issues identified. More by June.

Databases (Dave Dykstra) https://twiki.cern.ch/twiki/bin/view/LCG/WLCGTEGDatabase COOL to continue. ATLAS reviewing conditions volume and complexity. Frontier service usage increasing. Squid monitoring should be WLCG activity and job discovery of squids should be follow a standard. Further use of Active Data Guard. PVSS works for all Online databases – no change. NoSQL databases evaluated by expts. but more to come.

Middleware EMI-1 status (Cristina Aiftimiei) General links. Latest update 14 – 16.03.2012. BDII core update. Revision for Gridsite and VOMS. Update 15 (20th April) revision to BLAH and WMS (inc. wildcards in FQANs); minor changes to DPM/LFC (v1.8.3) including optimized sample my.cnf and http/dav frontend (ro).; GFAL/lcg_util goes to 1.12.0; Proxyrenewal fix for expired VOMS extensions; VOMS Oracle and VOMS-admin bug fixes. WMS v3.3.6. gLite security updates stop on 30.04.2012 except for gLite-WN and gLite-UI where it is extended to 30th September. EMI-2 progress RC4 build on SL5/x86_64 98% and SL6/x86_64 95%. Debian 6 UI. Estimated release date is 7th May 2012. UMD and WLCG (Michel Drescher) Hydra not yet released as part of EMI. UMD will only contain products from Technology Providers (for m/w) that have signed MoU and SLA. Product updates are verified and rejected from UMD if fail tests (e.g. WMS). QC verification: https://documents.egi.eu/public/ListBy?topicid=37 Staged Rollout: https://documents.egi.eu/public/ListBy?topicid=38. EMI tests seek elimination of bugs. EGI tests seek continuous service delivery.

SHA2 and RFC proxies (Maarten Litmaath) Update. Problem: IGTF wants CAs to move from SHA-1 to SHA-2 signatures ASAP (start Jan 2013… then takes 395 days for SHA-1 to completely disappear). For this WLCG needs to use RFC proxies instead of Globus legacy proxies. We have 8 months. Looking at SHA-1 risks. dCache & BestMan support RFC but not ready for SHA-2. ARGUS, CREAM, WMS, DIRAC …. Should work with SHA-2 but not tested, need RFC support. EMI-2 will be ready April/May (but time lag UMD plus bug resolutions)… Looking at late Autumn 2012 target for all deployed SW to support RFC proxies … with a full switch in Jan 2013. Then all SW to support SHA-2 by Spring 2013. Plan-B would be to introduce SHA-2 CAs.

Glexec deployment (Maarten Litmaath) Nagios tests now regional. Each EGI site supporting glexec needs to update the GOCDB with a “gLExec” flag for each supporting CE. Results for UK: http://tinyurl.com/cwef84h. How to deploy is https://twiki.cern.ch/twiki/bin/view/LCG/GlexecDeployment. Experiments also starting to probe status – CMS enabling in GlideinWMS, ATLAS GlideinWMS back-end for PanDA, ALICE revising approach and LHCb checking current usage.

OSG Software Update (Brian Bockelman) OSG3 overview. Big evolution in distribution (use OS packaging tools over pacman), packaging (RPM format via yum using Koji) and included software (reuse (e.g. EPEL) packages where possible). Pacman still needed for Debian. Looking at relocatable tarballs options.