GridPP 15 Panel Discussion Session ================================== What is really needed by the LHC? ================================= Roger Jones (ATLAS): A very long list. Overall a stable service with robust components. CPU is OK and data we’ve heard about. We need to exercise data models and really need an analysis of Tier-2 CPU/disk activity. We need to change the balance of disk/CPU at Tier-1. We need scaled tests of the computing models. It is known and acknowledged that we need better file management. It is a pity that high level data management is done by the clients despite commonality. It would be better for smaller experiments otherwise. This is a reason for putting on pressure. We need to fully integrate VOMS into all aspects e.g. group based accounting. The LHC VOs are very large and at the moment individuals can do what they want. We need to break this down. Nick Brook (LHCb): Would reiterate what Roger said. The emphasis from the LHC experiments so far has been production on the Grid and analysis is just beginning started to involve general users. We have to get general users to use the Grid. This will be far more chaotic, harder to understand and debug and the users will be less sympathetic and computer literate. It is very important to have a robust system even if it means maybe sacrificing functionally. In security, VOMS has to be there for file protection etc. We need a way to understand the information out there e.g. what’s a disk storage element and what’s Mass Storage. The information has to be correct. At the moment the Glue schema fields are filled inconsistently. CASTOR2 will be a problem for RAL. It has a unique front end, lots needs to be implemented in SRM and inconsistencies need to be cleared up. File transfers problems have been seen in SC3 from Tier-0 to Tier-1. This should not just be for production managers. General users want to move their files round the Grid as well. We need to understand the performance of file catalogues optimise them, add authentication etc. Dave Newbold (CMS): Has a common view regarding Service Challenges. Top priorities are: 1 to sacrifice features for functionality. It causes problems for users if everything changes. Need stability on 6 month timescales. 2 Address the largest gap – high level data management. The experiments have gone off and done their own thing. In some cases this is healthy and in some not. We need to discuss and find commonality. 3 Are we going to have the resources? Are the Tier-1 and Tier-2s going to have the resources? There are already problems for BaBar. If the Grid scales have to make sure there is something to scale on. Glenn Patrick (LHCb): Mostly covered already. For LHCb it is particularly important to have a fully resourced Tier-1. There is a mismatch between Tier-1 and Tier-2. Data management is a big issue. Pleased that we are moving to CASTOR but this introduces uncertainties as well. Peter Hobson (CMS): Stress user communities making use of Tier-2s. Phil Clark (LHCb): If someone is on multiple experiment it is not clear which VO you are on. This should be cured by VOMS. Distributed analysis has not really been tried yet. Should try stripping at different institutes. Nick Brook: On day one we want something robust and reliable. We do not see that at Tier-2 yet. Until proven play safe and ensure data is available for physicists. Dave Newbold: There have been discussions this week on how to use Tier- 2s. Phil Clark Stripping could be done at Tier-2s. Getting analysis jobs to run at institutes is key at the moment. Audience Comments: Users need to summarise succinctly what they need. There are communication problems in LCG as a whole. Understanding of what experiments need is not taken notice off. But experiments’ requirements also change and these not communicated by experiments. It’s traditional. We want everything stable and everything changed. Stability is most important. At the start of GridPP wrote down scale, robustness and functionality. Base services must be stable. Need fewer releases and the missing functionality needs to be well mapped out. The plan is consistent with everything said. Some stability is more important, but for some things the functionality is not sufficient so we need clear plan as to how to get there. In a year stability needs to be priority. Maybe a couple of years later we can add more functionality. We’ve given up on the dreams and are now in the real world. What is really needed by the Other Experiments ============================================== Fergus Wilson (BaBar): The unique thing about non LHC experiments is that they are taking data. The emphasis and scale are different. Success is not running a job on a Grid but a physics paper. Users want to run their jobs and don’t want to know about the Grid. They have complex data bases that need to be correct. They already have working distributed systems. We have to convince them the Grid is better. They already have working software. They have less people in general. We need to target tasks carefully and remember that there are other Grids as well. Need feedback on stable sites that are good. Collaboration managements want firm commitments. Most experiments are happy with basic the LCG but are not using all the functionality. Support is OK if you can find it. Better communication is needed between management and users. Users will use a single site if they can. They request a national policy on VO support. They want Tier-2 stability and are happy with small Tier-2s. They want a realistic roadmap for Tier-2s including support. Firewalls cause problems outside out people’s control. They want easier access to install experiment specific software and stability in the provided software. They are trying to solve data management problems over and over again. This needs better coordination. LHC will solve these problems so why should others as well. There is limited manpower to use the Portal. They need more system management support. The UB questionnaire is still valid. There should be a common description of what a Tier-2 is in terms of hardware, software and people. VOs need named people at Tier-2s. Preferably a single Tier-2 as a guinea pig for all experiments, then clone if similar. Experiments need to document their requirements. Need face to face discussions, a non LHC forum to exchange views and a way to find out what the LHC experiments are doing on data management side. Dave Sankey (H1) Hera’s last data will taken be in summer 2007. What we have now is what we will end with. Primary use of Grid is for Monte Carlo. Are never going to use data management on the Grid. Interested in using Tier-2s if similar and they can be cloned. Have been running on the Tier-1 for 10 years. On Tier-1 know who to talk to and vice versa. Moving to Tier-2s is more amorphous. Now moving beyond ones where there are local contacts. There is no mechanism for non LHC VOs to know who to talk to. [It was pointed out that direct contact between VO and sites is not the Grid model. Should talk to deployment team or not even them] There is no right model as it was not designed for non LHC experiments. [Also pointed out that Tier-2s are not the same]. But the interfaces should be the same. Actually find that LHC and non LHC not that different. Data is not completely diffuse. Some sites are more different than others. Individuals need direct contact at some sites. Monte Carlo can be done anywhere. Large experiments have many people at Tier-1 but smaller LHC and non LHC don’t have the manpower. In discussion it was pointed out that the forum for these requirements is now the User Board. So far there has been little input. There is no fixed representation. Users don’t know about it. It was suggested that Fergus should track his questionnaire at the UB. Deployment has representation on the UB. Maybe we should go back to explicit representation. Dave Newbold said he would go back to the UB remit. He had taken on board that we need to change the mode of operation of the UB. Gidon Moont (Portal) The main users, Mice, Calice and T2K all have users at Imperial. Like H1 they just want computing. They want to use one or two big sites but have to use the Grid. Want it to be easy. They want automatic retries and not come back and see failures 2 days later. Need good documentation for VOs etc. Want phone contact. Sites round the UK have to support their VO. Any site with a physicist in the VO should support that VO. Gidon said that his job is just to show it can be done not any development. Users want to send the job and get the data back. If you tell them exactly what to do they will do it but they won’t read general documentation. Alessandra Forti (NorthGrid) Are setting up a VOMS server in Manchester and can help setting up a VO. They will offer structure to enable VOs at other sites. Most sites don’t want to do work for a VO they have no connection with. Tier-2s should really support all VOs but not provide half an FTE to develop it or provide help. Audience Comments: Users want to know who to talk to. Most sites have generic email addresses that do reply. This won’t scale though. Sites don’t actually know what they are supposed to provide e.g. C compiler, Java runtime? Can’t assume all sites the same. VO Information should go on the CIC Portal i.e. software, hardware requirements etc What is required overall? ========================= Dave Britton summarised that we need more resources, more stability more functionality but it needs to done in organised way.