Difference between revisions of "Topic Notes"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 13:40, 25 July 2012

Site Admin Notes

This page contains the notes on each of the topics that site admins are expect to keep up to date.


Topic: HEPSPEC06

HEPSPEC06 Benchmark Results

This topic exists for sites to tabulate HEPSPEC06 benchmarks, especially if they have benchmarked across a wide range of kernels - 32bit vs 64bit, or SL4 vs SL5, for example.
The Official HEPSPEC benchmark value to be used for accounting purposes must be run with the mandatory flags '-O2 -pthread -fPIC -m32' which are the default in the cern config file 'linux32-gcc_cern.cfg'. This means the applications are compiled in 32bit mode even on a 64bit OS.

Running the benchmark in 64bit mode is known to give better results and can be published for information only. Likewise running on an OS release above that used in production such as SL6 (when SL5 is currently the standard WN OS) is only for information.

The value used for accounting should be run on the WN hardware with the configuration seen by the jobs, so things like Hyper Threading on /off or Turbo mode, number and type of disks and memory, kernel version etc should be noted and the same as will be seen by a grid job.

Results in Green are valid results ie 64bit SL5 OS , but with 32bit gcc. Dark Green represents CPU's in use in the Grid Cluster at the site.

More information on running the benchmark is available here How to run HEPSPEC06

Topic: Middleware_transition

gLite to UMD/EMI middleware transition 


This is to collate information about site status and plans in moving from the gLite middleware stack to EMI-x/UMD-x. There is also information on the technical processes of EMI/UMD middleware transition, including an example .

Topic: Protected_Site_networking


Topic for tracking site status and plans

This topic has been created to provide a central GridPP reference page that allows the project to understand the status and plans of each site in regard of their internal networks. It can be updated by the site administrators concerned or the Tier-2 (deputy) coordinators.


Topic: Resiliency_and_Disaster_Planning


Summary
A major theme of GridPP22 was resiliency and disaster planning with topics ranging from the loss of a site through to the tasks faced everyday by system administrators. This topic has been created to collate information about resiliency and disaster planning on a site by site basis. This should generate discussion of on what preparations and precautions are being taken at each site.

Topic: SL4_Survey_August_2011


SL4 usage survey 


This is to collate information about the usage of SL4. This is a EGI initiative, to determine the current usage of SL4, and barriers to use of more recent software.

Topic: Site_information

Topic for gathering site feedback and information

There are two areas to be completed in October 2008. The first is to provide information about batch system memory limits. The second is to give an update on networking issues that can feed in to a GridPP networking review.

Background to batch system memory request for details:

This concerns ATLAS in that jobs have been killed off by batch systems even though they only have very short spike requirements for memory greater than initially requested memory per job slot. And Graeme said ... "The point is that the final memory is used only for a short time so even on our 4 core nodes we don't see swapping or other problems. Second point is that nothing in the information system lets you know what the site policy is re.
killing jobs off - and this is extremely important in knowing if a job can be safely run on a particular site. What I would suggest is really wanted is that sites provide information on

1. Real physical memory per job slot.

2. Real memory limit beyond which a job is killed.

3. Virtual memory limit beyond which a job is killed.

4. Number of cores per WN. If you have several configurations please list each one.

This can be per-queue if you have different limits (e.g., RAL have 2G and 3G queues)".

So, in what follows for each site there are three areas to complete with this information along with a comment prompt should you feel the need. We will then work this information into a table which you can edit if the situation at your site changes.

Network

Pete Clarke wishes to develop an overview of current network problems and issues together with a forward look from sites.
Jeremy's summary of the driving discussion: "We agreed the best approach is to gather input from the individual Tier-2 sites and cross-check their status against the statements made in the readiness review (especially about problems to be overcome). Following this the feedback will be shared with the UK experiment representatives to gain their input (have they seen any problems?). Once all views have been gathered a network status and forward look document will be compiled with an initial draft being ready for early December". To gather your site input the following sections are available to complete:

1. WAN problems experienced in the last year. Please include here any connectivty problems, bandwidth capping issues or other factors that led (or may have led) to the WAN becoming a bottleneck for data transfers to/from your site.

2. Problems/issues seen with site networking. If there are any internal networking issues with which you are dealing that are likely to impact user analysis or prevent the full capacity of your CPU/storage resources from being realised, please mention them here.

3. Forward look. Is your site planning any changes to its LAN or WAN connectivity over the coming 12 months? If so please give details.


Topic: Site_status_and_plans


Topic for tracking site status and plans

This topic has been created to provide a central GridPP reference page that allows the project to understand the status and plans of each site for pending and future middleware upgrades. It can be updated by the site administrators concerned or the Tier-2 (deputy) coordinators.

Background to batch system memory request for details:

SL5 worker nodes

Put here the percentage of your cluster on SL5 and/or an indication when nodes will be moved to SL5.

SRM upgrades

Record in this area the current version of the SRM and dates of any expected upgrades.