Difference between revisions of "GDB 11th November 2009"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 09:43, 17 November 2009

http://indico.cern.ch/conferenceDisplay.py?confId=45481

GDB 11.11.09 http://indico.cern.ch/conferenceDisplay.py?confId=45481

John Gordon

Since previous meeting lots of action on EGI, ROSCOE, EMI etc. Steven Newhouse has been asked to come to the next meeting. RRB meeting. Summary of HEPIX later. Rumours of beams

Next GDB in 2nd Dec. But OSG-WLCG-EGEE on 10-11 meeting not workshop.

Next year tried to book 2nd Wednesday of each month. Some clashes around March April. No Pre GDB meetings planned as yet. Virtualisation?

FP7 calls close on 24th Nov, SC09 next week, EGI Council Meeting 3rd Dec. EGI-DS workshop in Stockholm 3rd 4th Dec. ISGC Taipei March EGEE user Forum 12-14 April

Issues: Middleware Pilot (glexec SCAS) Hepix Virtualisation Gstat2

JG reported at the MB about cream glexec SCAS

Security alerts recently. Now another one.

141 domains out there with unpatched kernels. ~15% fixed according to Romain’s testing

Is there an SLA ? Sites are responsible for their own security, but reminders and push is necessary.

The latest security exploit will be discussed at the Operations meeting next Monday. So it's likely that more pressure will be heading our way if we have not updated the kernel. So please make the upgrades as soon as you can preferably this week. If CERN security scans don't show that many sites have upgraded compared to the current 15% level, the level of persuasion will escalate. Its is not necessary to kill jobs, just mark some nodes off line, and reboot with the new kernel in batches.

For sites that have difficulty in patching quickly due to requirements from vendors they should contact the security group (Romain Wartel) for advice .

LHC schedule still on track for mid nov. Security group Last time 2 sites were suspended.


Middleware.

Moved to staged rollout. Updates are sent to selected production sites.

Product teams. Responsible for producing quality ensured middleware releases. Responsible for all stages and accountable for the products they provide.

Services will have different repos, that allow say a new version of DPM to be released without having to wait for all the other components to be certified against the new versions of the rpms they depend on.

No new releases on SL5 since last GDB. Fix for glite security issue mentioned last time. Torque update v2.3.6 Glexec futher patches added and in certification. Cream Lfc oracle Dpm disk Voms server Dcache clients Argus MPI_utils

Myproxy server? Needs looking into.

Cream and glexec top priority.

There have been new versions of DPM disk and LFC released and clients. Since the head node was certified and some configuration changes were needed which delayed it.


SL4 releases.

FTS 2.2 VOMS admin server

Rollback of failing BDII update.

There was an rpm dependency missing that was never submitted by the developers. Question is why was this not seen in PPS. They were in an unclean site. Certification appeared to work due to dependencies on other services and the order of restarting. Post installation script failed to restart the

Roll back, Method prevents many sites also updating to the broken release. Totally understandable. Sites that have upgraded have to go through the pain or removing rpms manually.

Not able to recreate the old rpms? Due to the complex way etics is used for the project, its not correct to just blame etics.


Glexec/SCAS

Scas and lcg-ce has to share the gridmap dir. To guarantee that scas gives the same range of pool accounts on the WN’s

Lcg-ce is known for its load problems (m. litmath)

J Gordon went over the feedback he has received, including the uk point about it (glexec) not yet being available for SL5 WNs.

Lhcb and Atlas are already using multiuser pilot jobs, big heated discussion. ‘Come on data taking starts in two weeks!!’ LHCb jobs do not use glexec.


Seems like the experiments don’t care about glexec. It’s the security implications that are the issue. Sites may also say they don’t care but the security policy suggests that this is unacceptable as the risks could effect the whole grid. JG to take to MB.


GSTAT Old pages were cluttered, tied into SAM. So tried to consolidate the code base, remove dependency on gocdb, etc

dJango , web based frame work Monitors via nagios from bdii’s and checks for validation.

Used the installed capacity doc and integrated that so a lot of sites failing at the moment. Want feed back to project-grid-info-support@cern.ch Submit ggus ticket to site if the info is wrong.

It would be nice to have a VO view (says CMS)

VOview is on the to do list.


FTS 2.2 released and ready to deploy at least to test sites

HEPIX

Virtualization becoming very important, Benchmarking still active. Monitoring tools.

Virtualization

File systems

HEPSPEC06

Question of renaming hepspec06 benchmark name?

Misc.

iSCSI evaluation Can be an inexpensive way to get redundancy, using cheap storage boxes.

Quattor at RAL talk.

Quator workshop in Bruxelles last week.

Virtualization GDB Sept presentation was repeated at HEPIX.

Was a warmer reception to the idea of running remotely generated images than in Umea. Would like to see this as a reality by the end of next year? Want people to join the group that can actually involve their site, not just random people.

Verifiability /Trust of image generation and transmission. May be cern VM.

Contextualisation Efficiency of transmission Multi hypervisor support

Many t2’s concerned about the overhead especially if they don’t control all the resources. Cannot assume that all the WNs will be virtual. Need progress on ways to express resource requirements. Environments acceptable real or virtual Number of cores Other?

Currently number of cores requirements used by mpi for example, not easy to say has to be on one node.

LHCb would like to say would like multi core. Say if there is 8 then fine but if 4 that’s also ok. LHCb why should we do it.

What sites don’t like is that jobs start see there is no s/w and start installing it.

Debate over whether VMs should contain expt software or not. Tony Cass, we have to compare different methods and see which provides the best cost benefit.


CREAM

Features Talk about when they will be set to done etc…

CMS testing talk.

US cream condor talk.