How does a PBS queue be flagged as out of action?
If you reset the status of the PBS queue using pbs qmgr commands, this will be picked up by MDS and no jobs will be submitted to your queue:qmgr -c 'set queue %queuename% enabled=false'
qmgr -c 'set queue %queuename% started=false'
MDS will pick up this information and automatically start reporting
GlueCEStateStatus: Closed
for the queue. Jobs will then stop being assigned to the queue by the RB. Allow the queue to drain of jobs and make whatever configuration changes you need. Then, when you are happy the system is operational:
qmgr -c 'set queue %queuename% enabled=true'
qmgr -c 'set queue %queuename% started=true'
The MDS will then return the status to
GlueCEStateStatus: Production
and it will magically spring back into the list of queues available for jobs from the RB.
This is the recommended method of smoothly taking a PBS queue offline for maintenance. Please note the Grid Operations Centre MUST be notified 24hours in advance of any site going offline. Notification should be via the LCG-ROLLOUT mailing list. Notification of less than 24 hours will be counted as unscheduled downtime ie. as if your site had crashed. Notification must be for a specified period of downtime. It is likely if you exceed the specified period this will be treated as unscheduled downtime.