DN - Dave Newbold JS - Jamie Shiers JG - John Gordon DC - Dave Colling TC - Tony Cass ML - Mark Leese PC - Pete Clarke RJ - Roger Jones TD - Tony Doyle Comment - Audience DN Summary of last years discussion JS Need to get residual services together Focus so far on T0 > T1 rates The other rates aren't great If T1 > T2 rates aren't up that site/Tier can only do Monte Carlo Need more input from GridPP Database team JG GridPP's T1 > T2 rates better than others VOMS needs to be worked on (JS disagrees) SRM is quite new so still to be tested SL3 works so elements of this in SL4 should be OK The implementation of these new services will affect tests JC Loads need to tested continously not just every six months Services need to be stabilised with complaints of sites being down Security challenges need to be stepped up A lot to do DC CMS service challenge took 25% of expected data transferred from T0 > T1 > T2 Tests went very well in the UK getting 250MB/s ten times expected CASTOR was important The most important element was the link between CMS and sites Communication was very good with daily meetings at their peak JG Can ATLAS do this type of collaboration RJ Yes DN Can we move from "heroic" to 24/7? JS CMS had a good process, checking every element and link months before the actual service challenge and chasing individual problems systematically TC SRM 2.2 is not a problem Castor is great Storage organisation discussions need to take into account experiments needs and network setup Agrees with JG that VOMS needs to be worked on We need to move to 24/7 not "heroic" mode Certificates are an issue for some systems Average CERN site reliability is 70% with a soft target of 88% this is a problem Capacity levels will be easy to reach in theory but need to do it now and not leave it so we end up with new machines which won't work ML We need to identify the bottlenecks which we now have data for Working close to UKERNA is very useful It is a better situation but networking reps need to be told about problems Comment Network problems should move from a local to global level till solved JC Identifying whether a network problem is local or not Comment Need network expertise availble at every site DN Easy to blame the network, a lot of the time it isn't a network issue. PC There shouldn't be a problem if you involve network guys from the start JC Wondered whether the info actually filters up from local to global DC Involving network guys can be a good thing or bad thing sometimes they like seeing the service challenges others don't Comment Need network collaboration JG CERN had full involvement till it came to paying for links Comment All tests will need to be re-run with SuperJANET5 DN Do people at the sites realise the experiments requirments? JC More dialogue is needed RJ Need to pin down numbers, the UK is currently just dealt with as one large lump JS Need to balance the system CMS has numbers indicating what they want from T2s etc Repeat of no 1Gb/s link between T1 > T2 T2 is useless and will be left with just Monte Carlo DN That could be a problem in the UK JG Do CMS expect all sites to be up to CMS's T2 standards? Comment Query on how many MANs each T2 had and how we measure bandwidth with the T2s being geographically distributed Comment Shouldn't be an issue Any bandwidth problems are more connected to the institutes connection to the local network RJ SuperJANET5 will allow dedicated links ala UKlight PC If we need LHCOPN for T2 we can ask for them JG Not needed by the current model How can we keep things running? EGEE are monitored constantly to check sites going down can we get this for wLCG? JS When stuff breaks it can be hard to pin point a problem and we need a system to identify problems earlier SAM is OK but need other independent tools DN Experiments should be more involved in T1 > T2 tests. Is this happening? JC CMS is good but there are problems that better collaboration could solve DC London T2 is good but coercion was needed to get people involved JG T0 > T1 is being scrutinised now but spotlight will move to T1 > T2 once that is 90% JC Transparency of tests could be better JG Better than it was TC Need to test/measure even if you disagree with how/what is done Need milestones TD CERNProd effciency is to be 90% is that what we expect to be met by T1 on SAM test JS Yes JC When a test fails it should be investigated immediately not at the end of the service challenge Comment ScotGrid attempting to implement this Would like an RSS for the SAM tests TC Need to start work on T2s before the spotlight switches to them JC Sites do report problems and guess reasons JS Hope everyone is using UTC JC How can we improve experiment/deployment team interaction DC Need someone from every experiment monitoring VOs in each T2 ACTION - Identify someone for every T2 for each experiment JC Wants an indivdual NOT a service/email address RJ ATLAS would prefer a team/service Comment LHCb would sort it out for them Comment LHCb rates sites this information should be fed back to sites so they can improve rating etc.