RAL Tier1 Resources Review 20101201

From GridPP Wiki
Jump to: navigation, search

Agenda

Overview of purpose of meeting, review agenda and meeting mechanics (Andrew Sansum)

Tier-1 deployed capacity and near term projections (Matt Hodges)

Experiment resource usage (Andrew Lahiff)

Experiment requests and proposed allocations (Glenn Patrick)

Available before the meeting (v2):

Available after the meeting (v3):

Summary of MoU commitments

  • current status
  • do we meet the pledges (for CPU and disk, see here)
  • planned to commit

Financial situation and longer term capacity planning (Dave Britton)

Attendees

Andrew Lahiff, John Gordon, Henry Nebrensky, Dave Britton, Marcel Stanitzki, Andrew Sansum, Matt Hodges, Tim Folkes, Matt Viljoen, Dave Sankey, David Colling, Sarah Pearce.

Notes

Overview of purpose of meeting, review agenda and meeting mechanics

The purpose of meeting is to check how we're delivering resources and future projections, and reconcile this with our commitments.

Tier-1 deployed capacity and near term projections

Matt Hodges initially introduced the first public interface of the new capacity planning system. We currently can't deliver 10% of our CPU capacity to non-LHC VOs. The scheduler is configured so that either the WLCG pledge or 90% of available capacity is delivered to LHC VOs. This ensures WLCG pledges are always met. There is deployment of new capacity in April 2011. After this time we will be able to meet the WLCG pledges and deliver 10% of CPU capacity to non-LHC VOs. For disk we have no problems meeting the commitments for both LHC and non-LHC VOs. In December the change reflects the recent withdrawal of SL08 disk and deployment of SL09 disk, which gives a net increase of 1.2 PB. The contingency disk is therefore able to cover the loss of the SL08 hardware. There's also the addition of 1.3 PB from the 2010 procurements scheduled to come online in April 2011. There's a deficit in tape allocatable in 2011 from when the 2011 pledges come into effect and when T10KC tapes are expected to be allocatable, in July 2011.

Experiment resource usage

Usages for each of the VOs for CPU, disk and tape were presented. There was nothing causing concern. A large increase in disk and tape usage since the CASTOR upgrade was noted.

The total tape usage increased almost linearly from 1600 TB to 2800 TB over the past 12 months. There is no sign that we will hit the total available tape capacity any time soon, unless tape writing by the VOs rapidly accelerates. No problems are expected despite the tape capacity deficit until T10KC tapes are available.

Experiment requests and proposed allocations

Requests from experiments have all been met as far as Andrew Sansum can tell. There is no shortfall in capacity, and we can meet all commitments that Glenn proposed to allocate (apart from the recognized problem with tape capacity). Glenn maintains a headroom in both CPU and disk.

There was some discussion about the disk-0 caches for ATLAS and LHCb which are currently not included as part of their normal allocation. There was agreement that disk requests from experiments should include all disk required.

2012 running will likely cause replanning by the LHC experiments.

Summary of MoU commitments

For CPU WLCG commitments have been met. A little bit more disk has been deployed to LHC VOs than the pledges, but only taking into account the disk-0 pools for ATLAS and LHCb. The intention is to deliver both the WLCG pledges and the disk-0 non-allocation. For ATLAS, the recent high water mark is in October, when the non-disk-0 deployment missed the pledge by less than the capacity of one server, and there was also 180TB of disk-0 deployed. Since then there has been a loss in capacity, and the removal of SL08 diskservers has further complicated things. CMS deployed disk capacity is slightly under the pledge. Andrew Sansum suggested that slightly more than the pledge should be deployed to give some headroom, or find another way to ensure that commitments are met despite disk servers in intervention. Overall there are no concerns with any LHC VO.

Financial situation and longer term capacity planning

Dave Britton said that the only news since the last resource meeting is the outcome of the CSR exercise. Better than many feared it could be, but there are still unknowns about how the global CSR settlement will filter down to STFC then finally to GridPP. There are still concerns on the capital side, but there have been re-assurances. We're in a regime where problems are likely to occur but they'll probably be at the manageable level. 2012 running is a potentially big problem, but it's a shared problem across all of WLCG.

Additional documents

http://www.gridpp.ac.uk/eb/UKlongrange/tier1expts2011-v3.xls