Jeremy Coles Essential to share expertise and tasks (e.g. monitoring, etc), also good communication, a central help-desk, expert groups (e.g. storage). Any big changes should be co-ordinated between sites Constraints/problems: manpower; sites are independent (technically + manpower), therefore differences exist in site configurations Feedback is important, e.g. from PPS, sharing their test results Main way of improving is sharing manpower Peter Gronbech Issues with the gLite install not being flexible enough to take advantage of generic eScience clusters later on Olivier / London London T2 model based on three main points: - Identity (common monitoring, planning, calendars, mailing list, set of supported VOs). Forming a team - Shared support (trust between people, regular meetings, cross-site support/intervention, common YAIM files) - Liaise with experiments: well identified communication channels with identified people in experiments (GGUS not reliable enough) Discussion Other T2's generally agree. Obviously easier in London with sites geographically close. Not easy in other T2's. Northgrid: get together only if major problems surface. Southgrid: do not have manpower at all sites. Currently different models for the 4 T2's Peter Watkins: Phone meetings also OK if distance is a problem. T2's (currently) doing quite well with their different models. How will this measure up to the new challenges posed by the start of data taking? e.g. transferring data from T1 to T2's (distributed) *and* for many experiments at once. It will be essential to share storage and also contacts with VO's vital (different level than GGUS) Peter Gronbech It helps enormously if VO members are in the site; like in cases special sets of rpm's need be installed/upgraded or specific customization needed. Jeremy Coles Still ramping up, still need to go directly to VO. In future, GGUS should be sufficient Discussion: highlighted that for e.g. preparing a site for a SC, direct contact is the way to go Peter Love (???) Current models seem to work. But sites within a T2 need to consolidate resources: how workable is this? (e.g. share dCache between 2 sites or having one site providing CE, another providing SE...) Discussion Questioned whether it is practical to make distributed sites to look completely as one site. Might have issues with e.g. bandwidth between 2 sites not being good enough. Currently, each site on its own works quite well, but good chances to run into manpower problem at some point. Graeme Stewart Warned of danger of T2's sites using exotic deployment models. Might be fancy in the short term, but might run into increasing problems and costs in the long run The intermediate hierarchy from the whole of the UK down to individual sites proves to be successful Next: share tasks and cross-site support to improve reliability. Also ScotGrid moving in this direction. Now each site has its home-brewed configuration. It's easy to fix straightforward problems for another site, but close to impossible to fix more complex problems: one should move to more homogeneous installations. Discussion Models in other countries? Generally no distributed/federation T2's, mostly individual T2's, felt they are generally behind the UK. Models differ from country to country, reflecting the founding scheme of that country. Some federations actually exist. Pointed out that eventually, CPU load on T2's will be equivalent to load on T1's, but more difficult to manage, partly due to chaotic analysis Everyone satisfied that communication OK within the UK Jeremy Coles Other point is how to meet MoU requirements for T2's Duncan Rand Accounting is not yet well in place: APEL accounts for CPU usage, but what about success/failure rate? Should investigate efficiencies and rates of wasted CPU Discussion there's not a clear cut definition of what a failure is. It would be good to have a breakdown of exit codes by VO to understand better the rate of wasted CPU. It is argued that this should be set up by VOs who should look at and understand rates, then feed back to sites. It is also argued that efficiencies are not highest priority yet. VO's code not yet perfect. Later on, when contention for CPU will be an issue, they will hurry to fix their code Conclusion London T2 model currently seems to be a good model to emulate.