Difference between revisions of "ATLAS Site Availability and Performance (ASAP)"

From GridPP Wiki
Jump to: navigation, search
(Where to check ASAP site status)
(How to find ASAP related HC detailed status records)
Line 12: Line 12:
  
 
==How to find ASAP related HC detailed status records==
 
==How to find ASAP related HC detailed status records==
 
Blah Blah Blah
 
  
 
==How to use HC detailed status records to debug common scenarios==
 
==How to use HC detailed status records to debug common scenarios==
  
 
Blah Blah Blah
 
Blah Blah Blah

Revision as of 14:21, 7 January 2015

Since December 2013, a new way for computing availability has been introduced. It's called ASAP / ATLAS Site Availability and Performance / ATLAS_AnalysisAvailability (depending where you look). From what I have gleaned about it, ASAP is a metric to replace ADCD site status; it’s not related the SAM tests. The status of the "PandaResource" of an analysis queue is used by ASAP. A site is considered to be unavailable when its analysis queue is in test mode. This document will briefly describe some of the implications of this.

How to get the important alerts

Unfortunately, at the moment, when a queue is set to test mode, the notification email is sent to cloud support and doesn’t go to to our site admins. For some site admins, they may be able to subscribe to the list (atlas-support-cloud-uk@cern.ch) at e-groups.cern.ch. Admins without the necessary security credentials can request to be subscribed; ask Elena Korolkova, Alessandra Forti or another GridPP representative of ATLAS.

Once you are getting the alerts, it's usually easy to set up filters that can find the messages for your site by searching the subject field for the name of the site's queues. A list of all Panda queues can be found here: dashb-atlas-ssb.cern.ch.

Where to check ASAP site status

You can check your site's ASAP status here: wlcg-mon.cern.ch. You can use buttons to select various sites and timescales etc.

How to find ASAP related HC detailed status records

How to use HC detailed status records to debug common scenarios

Blah Blah Blah