Testing an installed site

From GridPP Wiki
Jump to: navigation, search

After an LCG site has been installed, it will be necessary to run a number of tests on the site to ensure that all components are operating correctly. There are a number of testsuites, or recommended sets of commands to test, to assist in this.

LCG Guide: "How to test an LCG2 site"

What is it?

A LCG document listing a collection of basic CLI commands that can be run to test the correct setup of a site. These will test the functionality required from a User Interface (UI), Compute Element (CE), Worker Node (WN) and Storage Element (SE).

A site which passes all the functionality tests here will not necessarily be correctly configured, but it will be sufficiently well configured for more advanced tests and monitoring to be possible.

A number of the WN tests invoke the replica manager. These will fail until the site has been included in the TestZone BDII.

How should it be used?

After a site has been newly installed, the SysAdmin should run through all the tests in the document. When all tests, except the replica manager test have been passed, the site should apply to join the TestZone BDII. After inclusion in the BDII, the replica manager tests should be run.

More information

The document is available in html and pdf formats.

TSTG Certification Testsuite

What is it?

The TSTG Certification Testsuite was developed for use on the Certification and Testing testbed, to assist in the integration and debugging of new middleware before release.

How should it be used?

With care. The documentation provided is poor and the in line help is virtually non-existent. The Testsuite was intended to test the integration of services on an entire testbed, rather than the configuration of a single deployed site. Passing the wrong parameters may lead to an attempt to test the entire grid! It is also unclear exactly which functionality is being tested, so when a service appears to fail it is unclear how this information can be used.

It is also unclear from the documentation provided if the TSTG suite is valid for versions LCG-2_X where X &gt 0.

More information

Information on obtaining and using a local copy of this can be found at here

TestZone Suite

What is it?

The suite of tests run by GOC Monitoring and forms the basis of the Gstat GIIS Monitor (see [monitoring Site Monitoring] and TestZone Reports ).

The suite is based upon the tests in the [#LCG Guide LCG Guide] but is more automated and extensive. In addition to running the tests, the suite builds up a time series of results, and can an HTML formatted summary, similar to the Gstat pages.

Unfortunately the tests, at the moment, essentially give 'pass/fail' type information, and do not attempt more complex diagnostics when failure occurs.

How should it be used?

Download the testsuite to a suitable UI and run the tests, with your site as an input parameter, on a regular basis. If your site fails a GOG monitoring test, this suite is the best suite to use to ensure that the problem has been fixed.

There is a single script:
lcg2/tztests/bin/tztests
which is used to run the suite. Calling this script with no options gives the list of all options. The testsuite is designed to run on multiple sites, but can be used to monitor a single site quite easily.

More information

To obtain a local copy of the TestZone suite:

export CVSROOT=:pserver:anonymous@lcgdeploy.cvs.cern.ch:/cvs/lcgdeploy
export CVS_RSH=ssh
cvs co lcg2/tztests

See: http://goc.grid.sinica.edu.tw/gocwiki/SFT_Client_installation

Trouble Shooting Guide: What to do if things go wrong?

There are a number of packages and websites that can be used to help [monitoring.html monitor] a grid site. If a site fails a test or a monitoring programme reveals a problem, you will be sent an email by the GOC monitoring team. This will probably give some advice upon the nature of the problem. Examining the TestZone Reports for your site should give some indication of the nature of the problem and examining the [#TestZone TestZone] testsuite should give an indication of the grid operation that failed.

Please note, if your site reveals a problem, there is a [troubleshooting.html procedure] by which the monitoring team interact with your ROC to ensure it is fixed.


The GOC Wiki

What is it?

The GOC Wiki is a collection of known failures of grid of the grid sites, together with reasons for the failures and suggestions for fixing these.

How should it be used?

When you are supplied with a failure by the GOC Monitoring they will ideally suggest an entry within the GOC Wiki, which will have helpful advice on what may be causing the problem and where to fix it.

More information

GOC Wiki

GridPP FAQ

What is it?

A webpage maintained by the GridPP deployment team.

How should I use it?

Search through it to see if it has answers to your problems.

If there is a problem not solved by the FAQ, you may suggest a new entry.

More information

The old version ishere.

A more up-to-date version is being developedhere.

TB-Support

What is it?

A mailing list for UK sites in GridPP. All site admins under the coverage of the RAL ROC should be subscribed to this. This mailing list is also monitored by the RAL ROC team, and is the preferred method of requesting site support in the UK. Past emails are archived and available on-line.

How should I use it?

If the GOC Wikki does not answer your question, you may find it useful to search the archives of TB-Support. If this still does not answer your question, you can email the mailing list, requesting more detailed, personal support.

More information

Archives

LCG-Rollout

What is it?

Technically, a mailing list for the LCG deployment team to pass information out to participating sites (such as for announcements of new releases). All sites participating in LCG should subscribe their site administrators to this list.

How do I use it?

LCG-Rollout is *not* intended to be a support email list, although to some extent it is used as this. Searching the online archives should be used to see if a problem you have encountered is know or solved. However, you should avoid emailing the list with troubleshooting queries: direct your queries to the ROC instead.

More information

Archives

About this page

This page is maintained by Olivier van der Aa.