Setting up a new Virtual Organisation (VO)

From GridPP Wiki
Jump to: navigation, search

This page is intended to become an example of how a model VO gets up and running on the grid.

You are strongly recommended to discuss this with your friendly local sysadmin

VO Setup

  • Chose a name and request the VO be created
    • This must be of the format of a DNS name e.g
    • The VO must have control of the domain name referenced in the VO name
    • We would advise that the DNS name reflect the scope/ownership of the VO i.e. a domain would not be suitable for an international experiment - unless it is UK led.
  • Request VO is enabled at sites.
    • We recommend starting with a small number of sites and get things working, then expand.
  • Get certificates for users who need them. Certificates are personal and must not be shared, but it is sometimes possible to get permission to run jobs with a "portal" certificate.
    • Certificates are required for every user who will submit directly to the grid.

Job and data management

Bookkeeping and managing failures is what seems to cause VOs the most difficulty.

Job Management

There are several sets of software that may help manage sets of jobs.

  • Ganga
  • Dirac
  • Panda (currently atlas only)

Use Myproxy to enable jobs to renew their proxy certificate and last longer than 24h.

Privileged users for priority tasks.

Software deployment

  • CVMFS is now the recommended method for new VOs to deploy their experimental software.
    • RAL provide a CVMFS server for EGI VOs
  • Classic sgm jobs
    • Via github (see Sno+ example)

Data management

See also Data Management for further details - which is mainly aimed at sites but could be of interest also to (new) VOs.

Bookkeeping a major headache

  • Atlas's rucio may help, but not yet available.
    • Dirac may help here too
  • Keep multiple copies of data you care about.
    • Catastrophic failure of a site (eg fire)
    • Failure of a tape
    • Accidental deletion (by you or the site)
    • Corruption
    • Unavailability due to site downtime

  • Checksum data
    • On transfer
    • Ask supporting sites to checksum files to catch silent data corruption (see CERN paper)

  • Privileged users (so normal user can't delete vital data)
  • Use FTS to transfer large sets of data (but catch failures)
  • Register files in LFC?
    • Dirac has the Dirac File catalog as well
  • Use consistent mapping from LFC name to SURL (cf CMS trivial file catalogue)
    • This is enforced by the Dirac File catalogue
  • Check consistency of LFC/site data
  • Federated access to storage via webdav may be possible in the future.

LHC like Data model

RAW data: copy at CERN (Tier-0), 2 copies at Tier-1 sites. Custodial storage on tape (but may have disk copy too).

Processed data: reprocess every few months at Tier-1 (with newer versions of software), copies at Tier-1. Distribute to Tier-2 sites for user analysis.

Simulation: Monte-carlo simulations carried out at Tier-2. Results packaged up and archived at Tier-1 and copies shipped out to Tier-2 sites.