Difference between revisions of "Computing Requirements"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 15:32, 19 January 2010

Introduction

This page has been moved to the E-Science Twiki which can be found here. It was moved to allow easier editing of the requirements spreadsheets. This page remains here as a short summary of the various computing requirements of the main experiments at the Tier 1 site at RAL. Which is roughly correct as of January 2010.

ATLAS

You can download the ATLAS requirement model from here. The computing model was based on the slide produced by Graeme Stewart which can be found here. The input assumptions at the top of the document come from presentation made at the ATLAS Tier 1/2/3 Jamboree held on 13/10/09, the agenda can be found here. The current model does not take into account how the computing model may be "broken" in the first year, with early data.

The ATLAS computing model has been divided into 3 different activities; Data Export, Simulation, Re-processing.

  • Data Export: This includes RAL involvement in the entire ATLAS data distribution process. Under the current model RAL recieves 10% of the real RAW, ESD and AOD data directly from CERN. The RAW data will be archived to tape while the ESD and AOD will be stored on disk. It recieves an additional 10% of the ESD from other Tier 1s. It will also receive an entire copy of the AOD which will be distributed to the Tier 2 sites around the UK Cloud.
  • Simulation: This includes all Monte Carlo Simulation. This is a fairly complex process as the data has to be transferred between the Tier 2s in the UK Cloud and RAL during the process. LArge number of small HITs files are produced at the Tier 2s which are then transferred to RAL to be merged and converted into ESD and AOD. These files are then stored on tape as well as being sent back out to the Tier 2s.
  • Re-processing: Re-processing is expected to happen 2-3 times a year. During re-processing the RAW data that is on tape will need to be run over again in order to produce new ESD and AOD. The ESD then needs to be copied to another Tier 1, while the AOD needs to be copied to all the other Tier 1s. RAL will also receive a similar amount of fdata from the other Tier 1s. The re-processing rate is expect to be significantly faster than the date taking rate.

CMS

The computing model was based on a presentation given on 18 Jun 2009 by M. C. Sawley (an earlier version of this presentation can be found here). All numbers in the spreadsheet in column D with a grey background are input parameters, which were obtained from the previously mentioned presentation and from a presentation given on 17 Sep 2009 by C. Grandi & D. Bonacorsi.

The CMS computing model spreadsheet has been divided into 4 different activities; Data Taking, AOD Sync, Re-processing, Simulation.

  • Data Taking: Each Tier 1 receives a share of the real RAW and RECO data based on the relative CPU capacities of the sites. RAL receives 11% of the real RAW and RECO data directly from CERN, which is archived to tape. The RAW data is automatically pre-staged on disk.
  • AOD Sync: Each Tier 1 stores a full copy of the AOD data. The AOD data produced at RAL is copied to all other Tier 1s, and RAL receives all the AOD data produced at other Tier 1s (note that this step is not been carried out yet, as AOD data isn't ready to be used by scientists yet).
  • Re-processing: Re-processing is expected to happen 3 times per year. During re-processing the RAW data that is on tape will need to be run over again in order to produce new RECO and AOD.
  • Simulation: Monte Carlo production takes place only at Tier 2s, with some exceptions at Tier 1s. Tier 2s devote half of their computing time to MC production all year. MC data from the Tier 2s in the UK is transferred to RAL for storage.

LHCb

The full details of the lhcb computing model are available at https://twiki.cern.ch/twiki/pub/LHCb/ComputingModel/Dataflows.pdf

As a brief overview, within LHCb at present we have two classes of sites : ones which have data (CERN + Tier-1s) and ones which do not (Tier-2s). The CERN and all the Tier-1s perform essentially identical actions, with the exception that CERN is the primary data repository and so, has a copy of *all* the raw data.

Jobs (whether production or user) can go any grid site available and supporting LHCb, if the job does not need input data. If the job needs to run on input data, the sites where the data is available are chosen after consultation with the LHCb LFC and the first site that picks up the job gets it.

All LHCb jobs come to the grid through DIRAC which implements within it, a complete set of tools to receive and run jobs and return their output after completion. A set of job prioritisation rules are also set up with the ability to modify these priorities as needed and thus there is no need to worry about this at the site level. The jobs are run through "Generic Pilots" which are capable of running multiple jobs in series if it finds that there is enough time left in the pilot job slot on the worker node.

Data (whether from the detector or from the Monte Carlo simulation) processing at the CERN + Tier-1s involves two stages before the data is available for user analysis :

  • Reconstruction - where the individual particles and some other event specific quantities are reconstructed from the raw data (d0t1) and written out to a file uploaded to the SE at the end of the job (d1t1 or d1t0 as the case may be). This requires access to the raw data which is usually on d0t1. There can typically be 2 - 4 reconstruction passes on the data each year.
  • Stripping - where the output of the reconstructed data is filtered with some selection algorithms to remove the obvious background events. This step can perform re-reconstruction of the events and thus, the inputs are from d1t1 / d1t0 and d0t1 service classes. The output is to d1t0 / d1t1 service class. There can be 4 - 6 stripping passes each year.

Note here that the reconstruction and stripping of the data can (and does indeed) take place at all the Tier-1s and CERN. Tools are in place to ensure that a given file is processed only once in a given pass and automatic recovery of failed processing passes is done.

Other

The other experiments make up only 10% of the computing requirements at the UK Tier 1 site.