Difference between revisions of "GDB November 2008"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 13:42, 25 November 2008

12 November 2008

Meeting summary/report

GGUS Ticket Distribution

Summary about the meaning of team and alarm tickets. More interesting for T1 and experiment shifters than for T2 at the moment. However since the team tickets will be extended to T2s in the future it's worth giving a look.

Personally I do not like the scheme that requires to send a WARNING email to the site that the ROC is about to assign the ticket to avoid the latency of the GGUS piramid. This generates a heavier load of email traffic. I'd prefer the ticket directly assigned to the site and an email sent to the ROC about it.

LHCOPN

Networking Operations. I had hardtime following what he was saying and it was mostly for T1s.

Accounting

Long discussion about how to account for local usage of resources, i.e. jobs submitted directly to the batch system. Cristina has technical solution for apel. Most of the audience (me included) disagreed with allowing users to use the resources with direct job submission. Lyon however has an LHC experiment use case in which the grid cannot be used because the application doesn't allow it, nevertheless the experiment is using the resources that should be accounted for. Also it was pointed out that we should account local usage even if the grid interface is used. This can be done for users not affiliated to any VO (see gridpp VO usage from Glasgow engineers and Manchester astronomers) but is more difficult when users belong to a VO (LHC or else).

Pilot jobs

Maarten gave one slide with the experiment frameworks approval process: most LHC experiments are OK. Then Oliver gave a status of SCAS certification. Basically it has been tested by the developers at NIKHEF but it needs more testing, error messages are poor (I wonder about the logs that the sys admins are supposed to parse). Not in view for another couple of months. LHCB gave a report on the status of their pilot job framework saying they are ready and ready to use glexec. And finally Atlas presented it's plan for testing the user pilots without glexec. A pilot role will be created to separate users analysis from production and system administrators can now insert a batch system jobID in the panda monitoring server and get back *all* the information about the job, which frankly is easier than parsing the local log files, finding out what resource broker was used, contact the RB site and get information from them about the name of the executable... and it requires the same level of trust. Objections: the users executables have access to the pilot proxym, which being different from the production one might have a more limited scope, and it's difficult to trace which jobs wrote what files from a multicore WN as they all have the same UID. However I'm not sure that introducing glexec would make things easier from this point of view. Of course it depends on how easy is to read the log files or if any tool to extract the information is supplied. There was also a discussion on possible repercussions on the accounting when using a pilot jobs framework as the pilot might occupy a CPU without using it at all.

Automated Collection of Resource Provision

A description on how to provide the management with information about installed capacity (per VO) and the experiments with information about resources usage (per VO). Nothing much new on this front as all these should be published in the IS. A document is underway for the system administrators on how to do this for the CPUs and for the storage the info providers supplied by the developers sohuld be used. DPM one from Michel is being inserted in the release, Storm one should be ready for december, dcache one is 1.9.2 and a castor one is deployed at RAL.

pre-GDB summary

pre-GDB was interesting overview of the storage situation although most of the day was dedicated to dcache even though it is being abandoned by quite few sites and the discussion on DPM issues was cut off although ~195 sites are using it. StoRM looked really interesting and I had to shut my ears off because it is not the time to change again (or maybe it is?). The summary at the GDB doesn't convey for me most of the interesting points given during the talks. Sites should read the pre-GDB slides of the SRM they are interested in to know more about it.

CERN Data Management Strategy

Interesting for experiments and perhpas T1, not for T2s.

Middleware

  • Parallel versions of WN software: the new mechanism is to install parallel versions throught relocatable rpms. The software will be installed in directories containing the WN version. The Info System will have to contain the information about the WN version most likely as it is done for the experiments software using GlueHostApplicationSoftwareRunTimeEnvironment. The proposal is summarised here https://twiki.cern.ch/twiki/bin/view/EGEE/ParallelMWClients
  • Cream: it's not ready, there is a bug and the WMS cannot currently talk to Cream.... (everybody laughed). To be put in production cream has to pass 13 criterias (see talk). Frankly I don't see Cream in sight soon. Perhaps in time for CCRC09?

UK input/issues

Follow up actions