Difference between revisions of "GridPP VO Incubator"

From GridPP Wiki
Jump to: navigation, search
Line 648: Line 648:
 
* TP in the process of changing roles. Need to finalise the new end user.
 
* TP in the process of changing roles. Need to finalise the new end user.
  
* MWS: Asked TP about status - currently processing data so need of simulations reduced. Will pick up again when more simulation required.
+
* 20th Feb: Asked TP about status - currently processing data so need of simulations reduced. Will pick up again when more simulation required.
  
 
{|border="1" cellpadding="1"
 
{|border="1" cellpadding="1"
Line 685: Line 685:
 
|MS
 
|MS
 
|2016-05-23
 
|2016-05-23
 +
|In progress
 +
|
 +
|
 +
 +
|-
 +
|VOI-PRA-004
 +
|Waiting on data processing to be completed and more simulations to be required.
 +
|MS
 +
|2017-02-20
 
|In progress
 
|In progress
 
|
 
|

Revision as of 13:05, 20 February 2017

This page is for monitoring the progress of new(ish) GridPP VOs.

  • PoC - Point of Contact

All new VOs

These tasks will need to be completed for all

Action ID Action description Owner Target date Status Date closed Notes
VOI-GEN-001 Deploy test software to RVO CernVM-FS repositories. Duncan, Daniela, Gareth, Alessandra, Ewan 2015-05-26 Closed 2016-08-23 New users in the Regional VOs will need to run test jobs using software in the RVO CernVM-FS repositories. This test software will need to be uploaded by the RVO admins. Instructions for doing this can be found here. Tested for vo.londongrid.ac.uk (--Daniela)
VOI-GEN-002 Write up the VO registration procedure Tom 2015-05-31 Closed 2016-08-23 Guide started here - comments and feedback appreciated. Use gridpp guide.

DEAP3600

PoC: Jeremy Coles (JC)

  • Update requested February 2016.
  • No response as of 21st March 2016
  • 24th May: Awaiting main local user at RHUL to begin activities.
  • 23rd Aug: DEAP3600 will generate around 10TB of (calibrated) data per year for 5 years, starting this year I think. The original (much larger) raw data are backed up on tape in Canada, but the calibrated data are not. For reasons of backup and access, we were hoping it would it be possible to get these 50TB calibrated data stored at the Tier0 at RAL.

The model would be only to use RAL for custodial data storage and to copy data as needed to Tier2 sites such as RHUL for analysis. There will also be around 60 TB (possible x 2 generations) of MC which will be kept only at a Tier2 because it can be regenerated in case it is lost.

vo.DiRAC.ac.uk

PoC: Jens Jensen, Brian Davies

  • 03/05/16: Durham have now moved 940TB in 6 months. Expect ~2.5PB in total from Durham.


Action ID Action description Owner Target date Status Date closed Notes
VOI-DIRAC-001 Set up VO in GridPP JJ 2015-04-30 Closed
VOI-DIRAC-002 Register DiRAC with EGI JJ 2015-05-31 Closed [1]
VOI-DIRAC-003 Write up DiRAC site setup document LH 2015-08-31 Closed Version 1.3 circulated to DIRAC-USERS
VOI-DIRAC-004 Re-evaluate data packaging method JJ 2015-11-30 Closed 2016-01-31 Lydia re-engineered the packaging, focusing on tar, after extensive testing with GridPP
VOI-DIRAC-005 Restart transfers with new method (VOI-DIRAC-004) LH 2015-12-10 Closed 2016-01-31 Successful large data transfer over new year 2016, http://gridpp-storage.blogspot.co.uk/2016/01/update-on-vodiracacuk-data-mopvement.html slight error in script (leading to some failures) being updated
VOI-DIRAC-006 Get Leicester ready for transfer JJ 2015-11-30 Closed 2016-07-05 Currently (March '16) debugging script

05/07/16 Believe all work at Leicester is complete

VOI-DIRAC-007 Request or requirement for ACL between sites. BD 2015-11-30 Closed 2016-12-13 Awaiting to here if it is a request or a requirement to keep data access separate between DIRAC sites. If needed new voms roles and castor setup may need to be enabled.

03/05/16 update: VO does indeed need separation. requires voms-proxy usage on transfers and groups to be enacted in voms/. Also needs setting namespace configuratiuon to separate different users.

23/05/16 Now plan ( if acceptable to VO) to explicitly amend gridmapfile to assign specific DNs associated with each site to their own uid within castor.

28/06/16 This is acceptable to VO.

05/07/16 Awaiting deployment by T1.

29/11/16 Still in progress, Further sites closer to needing data trnasfer. other sites need to be implemented. 13/12/2016 Implemented.

VOI-DIRAC-008 Identify next (third) DiRAC site LH 2016-03-22 In Progress 28/06/16 LH contacted to find out who contacts at other sites ( BD points out that these are Cambridge and ECDF so may know them or their colleagues from GridPP

07/02/17 Contacts detail update: Contact at Durham is Lydaia Heck. Contact at Cambridge (HPC) is Stuart Rankin. Contact at Edinburgh (EPCC) is Linda Dewar. Contact at Leicester is Jon Wakelin. Contact for Cambridge (DAMTP) may well be Juha Jäykkä. (Need to confirm this second conatct with LH.)

VOI-DIRAC009 Renewal of Robot certificate at Durham LH 2016-08-05 Open Robot certificate needs to be renewed at Durham (due in one month). LH unsure of methodology.
VOI-DIRAC010 Track Data transfer from further Sites BD 2016-11-29 Open Following vo.dirac.ac.uk transfer working group meeting. dirac have identified initial data from non Durham sites which need to be transferred. ( ~700TB per site this needs verifying.)

07/02/17 First test transfers from Cambridge and Leicester now succeeding.


EUCLID

PoC: Andrew Lahiff (AL)

Action ID Action description Owner Target date Status Date closed Notes
VOI-EUC-001 Enable /cvmfs/euclid.in2p3.fr on RAL worker nodes AL 2016-02-24 Open [2016-04-01] Now works on a test worker node. Unable to deploy changes to all worker nodes due to an issue with Aquilon. [2016-04-13] No longer have access to Edinburgh stratum 1. [2016-05-24] No change, but seems to be not needed yet.
VOI-EUC-002 Setup /cvmfs/euclid-uk.egi.eu repository CC 2016-02-26 Open [2016-05-25] Not needed yet.
VOI-EUC-003 Setup accounts on RAL UIs AL 2016-02-26 Closed 2016-05-05 All users who requested accounts have access to our UIs.
VOI-EUC-004 Run jobs at RAL AL 2016-02-26 Closed 2016-05-10 Tom Kitching has successfully run jobs.
VOI-EUC-005 Ganga + HTCondor on RAL Tier-1 AL/TW 2016-11-15 Closed 2016-11-17 Giuseppe Congedo (Edinburgh, EUCLID/GridPP has, after an issue with running Ganga on the NFS-based Tier-1 cluster, got going with the UserGuide (now based on Ganga). Thanks to AL + Ganga team for support and prompt responses to ensure as smooth operation as possible.

GalDyn

  • PoC: Tom Whyntie (TW)
  • UCLan: Adam Clarke (AC)
Action ID Action description Owner Target date Status Date closed Notes
VOI-GAL-001 Assist GalDyn users with CernVM creation and testing. TW 2015-02-18 Closed 2015-02-18 GalDyn users have successfully instantiated CernVMs for accessing the grid.
VOI-GAL-002 Assist GalDyn users with running test jobs on the Imperial DIRAC instance. TW 2015-02-23 Closed 2015-02-23 GalDyn users have successfully run test jobs on the Imperial DIRAC instance via a GridPP CernVM.
VOI-GAL-003 Assist GalDyn users with compiling user software on the CernVM. TW 2015-03-08 Closed 1016-02-15 The code compiles and runs, but needs to be put in a grid/CernVM-FS-friendly format.
VOI-GAL-004 Create the GalDyn CernVM-FS repository TW, CC 2015-05-05 Closed 2015-05-05 New CernVM-FS repository galdyn.egi.eu has been created on the RAL Stratum-1 for the GalDyn VO.
VOI-GAL-005 (Re)new grid certificate for AC AC 2015-02-22 Closed 2015-03-15 UK CA managed to renew the old certificate. Work on hold - user preparing for PhD viva!
VOI-GAL-006 Creating an account for a UCLan student on the Lancaster cluster TW/Robin Long (Lancaster) 2016-09-15 Open The student (visiting from China) will pick up Adam's work on grid deployment for an upcoming paper. They will have a UCLan computing account but an account on the Lancaster cluster would further speed things up. Under discussion. 2016-11-16: TW emailed Victor D (group head) for an update.

LIGO

PoC: Catalin Condurache (CC)
LIGO: Paul Hopkins (PH)
Other people: Andrew Lahiff (AL)

Action ID Action description Owner Target date Status Date closed Notes
VOI-LIGO-001 Create the LIGO CernVM-FS repository CC 2014-12-01 Closed 2015-02-15 New CernVM-FS repository ligo.egi.eu has been created on the RAL Stratum-1 for the LIGO VO.
VOI-LIGO-002 Assist LIGO users with using Condor + nordugrid to access ARC-CE@RAL. AL, CC 2015-12 Closed 2016-02-12 Test jobs submitted from LIGO Condor instance to ARC-CE service at RAL were successful.
VOI-LIGO-003 Plan to run proper analyses jobs using scientists involvement PH, CC 2016-02-12 Open [2016-05-24]Still chasing scientists to run analyses jobs. Some promises.
VOI-LIGO-004 Get file storage working via the GridPP CernVM. PH, CC 2016-02-24 Closed 2016-03-08 PH managed to get file transfers working with the GridPP CernVM using bridged networking and getting the VM registered on the university network.

LOFAR

PoC: George Ryall (GR), from April 2016 - Alex Dibbo (AD)
21/03/16: LOFAR should be in a posetion to perform an analysis run on a limited number of VMs with real data in the next few weeks. (GR) 25/05/16: Note that LOFAR is a VO supported under 'STFC' not GridPP. Communication with SCD cloud users is good. A recent issue with the cloud storage/CEPH has led to less recent activity.

LSST

PoC: Alessandra Forti (AF)
Other people: Joe Zuntz (JZ), Andy Washbrook (AW), Ewan McMahon (EM), Steve Jones (SJ), Catalin Condurache (CC), Daniela Bauer (DB), Marcus Ebert (ME), Kashif Mohammad (KM), Dan Traynor (DT), Andrew Lahiff (AL), Gareth Roy (GR), Matt Doidge (MD)

Action ID Action description Owner Target date Status Last update Notes
VOI-LSST-001 Ganga direct job submission using Northgrid AF, JZ 2015-01-31 Closed 2015-12-11 Do direct job submission testing using northgrid infrastructure and ganga
VOI-LSST-002 Get European users on the LSST VOMS server at FNAL AF, JZ 2015-02-28 Closed 2015-12-11
VOI-LSST-003 Enable LSST at sites AF, AW, EM, SJ 2015-06-30 Closed 2015-12-11 Get the correct configuration. Sites get the info from Operations portal which has some obsolete information about VOMS. We should ask to fix it.
VOI-LSST-004 Find which LSST CVMFS stratum0 is usable by us AF 2015-08-10 Closed 2015-12-11 3 instances, in France, OSG and FNAL. Chosen FNAL
VOI-LSST-005 Get the repository at FNAL and replicated at RAL AF, CC 2015-08-25 Closed 2015-12-11 Repository automatically mounted at EGI sites as part of OSG EGI agreement, but was not replicated on any EGI stratum1. https://ggus.eu/index.php?mode=ticket_info&ticket_id=115335


VOI-LSST-006 Run jobs using LSST CVMFS using direct job submission and ganga JZ 2015-09-30 Closed 2015-12-11 Joe uploaded the software and used it to run jobs with direct job submission using lsst VO
VOI-LSST-007 Enable LSST on Dirac AF, DB, AW, EM, SJ 2015-09-30 Closed 2015-12-11 Got the Dirac pilot DN assigned to the pilot role on VOMS, enabled the VO on Dirac, enabled pilot at sites, tested submission and fixed misconfigured sites. JIRA
1) https://its.cern.ch/jira/browse/GRIDPP-22
2) https://ggus.eu/index.php?mode=ticket_info&ticket_id=117585
3) https://ggus.eu/index.php?mode=ticket_info&ticket_id=117586
VOI-LSST-008 Test Ganga Dirac setup AF Closed 2015-12-11 1) https://its.cern.ch/jira/browse/GRIDPP-22
2) https://github.com/ganga-devs/ganga/issues/45
3) https://its.cern.ch/jira/browse/GRIDPP-29

since gfal_util doesn't work instead of debugging it I've started to look into the dirac file catalogue clients, which have better compatiblity chances on top of having the file catalogue. See VO-LSST-014

VOI-LSST-009 voms-proxy-init EMI-3 not working AF, DB Closed 2015-04-05 FNAL upgraded the VOMS to a EMI-3 clients compatible version. Ticket https://ggus.eu/index.php?mode=ticket_info&ticket_id=117587 closed. Ticket for VOMS developers: https://ggus.eu/index.php?mode=ticket_info&ticket_id=114044 closed.
VOI-LSST-010 Update the Operations portal with correct VOMS info AF Closed 2015-01-05 LSST US managers added voms1 and voms2 to the ops portal. http://operations-portal.egi.eu/vo/view/voname/lsst
VOI-LSST-011 Long lived proxies AF Closed 2016-04-05 Ticket with FNAL RITM0302478 was closed after we moved to dirac and we could run jobs longer than 24h with the renwal mechanism.
VOI-LSST-012 Check with NERSC how to use gridftp, authentication and authorisation mechanisms JZ, ME, AF Closed 2016-01-11 NERSC account required to do the transfers. Marcus has now an account.
VOI-LSST-013 Test gridtp transfers with NERSC JZ, ME Closed 2016-01-25 Data copied from NERSC with globus-url-copy
VOI-LSST-014 Investigate Dirac file catalogue usage AF, JZ Closed 2016-02-22 https://its.cern.ch/jira/browse/GRIDPP-30
More info in
https://groups.google.com/forum/#!topic/diracgrid-forum/sclcLrQBPFY.
Files transferred to Manchester with correct directory naming scheme is working. See VOI-LSST-015.
VOI-LSST-015 Spread the LSST data on different sites AF Closed 2016-11-22 https://its.cern.ch/jira/browse/GRIDPP-31
This was completed in May
VOI-LSST-016 Move to use gridpp VO because GridPP management requested to run an analysis with any "means possible" AF, JZ, RF, DB, KM Closed 2016-04-05 We went back to LSST and dirac
VOI-LSST-017 Edinburgh LSST data access ME Closed 2016-02-08 The local time out problem with the data access was solved and data is available now for jobs and transfers without timing out
VOI-LSST-018 Adapt Joe's ganga and bash scripts to submit to dirac and use the dirac file catalog clients instead of gfal_utils. AF Closed 2016-04-05 AF adapted Joe's scripts and have been running most of the first part of data. The new data will be processed with a completely revised ganga script after the involvement of ganga developers.
VOI-LSST-019 Run a larger sample of jobs (effectively 2500) AF, JZ Closed 2016-11-22 https://its.cern.ch/jira/browse/GRIDPP-33 run another batch of 5000 jobs but mostly had a timeout problem when downloading the input. It is not clear if the problem is from the DFC or the storages. We hope the new ganga client can get better error codes and solve this more easily. Ticket for this https://github.com/ganga-devs/ganga/issues/343.
VOI-LSST-020 Implement JZ workflow in plain dirac cli ME Closed 2016-11-29 dirac cli is implemented in scripts and usable for job submission and getting OutputSandbox using a given input file list, a test running over 1000 input files was successful, needs to be tested for overall analysis usage by JZ - was tested to work some month ago, available for future use if needed
VOI-LSST-021 Copy new data from NERSC to one of the storage elements and register them in the DFC JZ, AF Closed 2016-04-30 Data have been copied to Liverpoool and registered.
VOI-LSST-022 Test gfal and dirac utils on NERSC ME, JZ Closed 2016-04-29 Dirac UI can be installed in the user's $HOME on the data transfer machines at NERSC (RH6 OS installed) and then directly be used to transfer the data and register the files in the catalogue in a single step.
VOI-LSST-023 Enable LSST at RAL, QMUL, Glasgow AF, DT, AL, GR Closed 2016-11-22 Tickets opened for sites: Sites enabled LSST and were tested with LSST jobs.
https://ggus.eu/?mode=ticket_info&ticket_id=120352
https://ggus.eu/?mode=ticket_info&ticket_id=120351
https://ggus.eu/?mode=ticket_info&ticket_id=120350
VOI-LSST-024 Debug Lancaster failures AF, MD Closed. 2016-11-22 More recent tests were successful.

LZ

PoC: David Colling (DC)

All Monte Carlo for the TDR was generated using Dirac and shell scripts. Only a few extra TDR jobs being run to fix holes. After TDR next step is to have full gaudi simulations.

LZ has been successfully running large scale simulations at Imperial.
Currently the following CEs are enabled for LZ, but anything apart from Imperial needs testing (and Sheffield needs pilot roles):
Imperial: ceprod05.grid.hep.ph.ic.ac.uk:8443/cream-sge-grid.q, ceprod06.grid.hep.ph.ic.ac.uk:8443/cream-sge-grid.q, ceprod07.grid.hep.ph.ic.ac.uk:8443/cream-sge-grid.q, ceprod08.grid.hep.ph.ic.ac.uk:8443/cream-sge-grid.q
Brunel: dc2-grid-21.brunel.ac.uk:2811/nordugrid-Condor-default
RALPPD: heplnv146.pp.rl.ac.uk:2811/nordugrid-Condor-grid, heplnv147.pp.rl.ac.uk:2811/nordugrid-Condor-grid
Sheffield: lcgce1.shef.ac.uk:8443/cream-pbs-lz


23rd August (EK): Lancanster added. LZ running big production (Lancs, IC, Brunel, Manc, ....). Work ongoing on job submission interface.

dune

PoC: Elena Korolkova/Matt Robinson

Dune is now in production. Jobs are submitted through the gridpp dirac server. If you want to enable dune, please consult the "approved VOs" webpage.

PRaVDA

  • PoC: Mark Slater/Matt Williams (MS/MW)
  • End User: Tony Price (TP)
  • Update requested 3rd Feb., 19th Feb. 2016 by TW. TP replied 2016-03-21 - they have been busy building the actual device!
  • TP in the process of changing roles. Need to finalise the new end user.
  • 20th Feb: Asked TP about status - currently processing data so need of simulations reduced. Will pick up again when more simulation required.
Action ID Action description Owner Target date Status Date closed Notes
VOI-PRA-001 Get PRaVDA up and running with DIRAC and Ganga. MS/MW 2015-10-01 Closed 2016-03-21 TP has successfully got simulations running using DIRAC and Ganga.
VOI-PRA-002 Issues with DIRAC, Ganga and LFN names when copying data back. MS/MW 2015-03-21 In progress MS/MW assisting on the Ganga side.
VOI-PRA-003 TP changing roles. Need to make contact with new end user. MS 2016-05-23 In progress
VOI-PRA-004 Waiting on data processing to be completed and more simulations to be required. MS 2017-02-20 In progress

SKA Regional Centre

  • 'PoC: Andrew McNab' (AM)
  • 'SKA: David Mulcahy' (DM)
  • VO: skatelescope.eu

SNO+

  • PoC: Pete Gronbech (PG)
  • SNO+: David Auty (DA)
Action ID Action description Owner Target date Status Last update Notes
VOI-SNO+-001 Check on progress via GridPP-Support list. PG 2016-02-17 Closed 2016-03-10 See VOI-SNO+-003 - success, closing this for now.
VOI-SNO+-002 MM to join GridPP Storage meeting PG 2016-02-24 Closed 2016-02-24 MM joined the meeting to discuss requirements and various options. See minutes.
VOI-SNO+-003 MM to transfer files out of the SNO+ cavern via an FTP server. PG 2016-02-17 Closed 2016-03-10 Success after fantastic support/discussion on the GRIDPP-SUPPORT mailing list.
VOI-SNO+-004 Check on progress via GridPP-Support list (16thMay). PG 2016-05-31 On-going 2016-05-23 Help requested for setting up Condor submission tools. Dialogue started on the mailing list. See also: https://ggus.eu/index.php?mode=ticket_info&ticket_id=119167
  • 23rd August 2016: Condor installed on test server. Like other tools it is not contained - not versioned or quarantined so unable to run on production server. Not sure how to move to production. Solution is a hack and may not scale. No time to progress on IC support side. Good to resolve as US sites have this submission route.

SuperNEMO

  • PoC: Pete Gronbech (PG)
  • SuperNEMO: Ben Morgan (BM)
Action ID Action description Owner Target date Status Last update Notes
VOI-SuperNEMO-001 Check on progress via GridPP-Support list PG 2016-02-17 On going 2016-02-17
VOI-SuperNEMO-002 Resurrect the SuperNEMO VO. PG 2016-02-17 Open 2016-03-22
VOI-SuperNEMO-003 Establish the feasibility of using the Ganga interface to the Atlas Metadata Interface (AMI) PG, MS 2016-02-24 On going 2016-03-22
VOI-SuperNEMO-004 PG asked if more help required (10th April) PG 2016-05-31 On going 2016-05-23 UK SuperNEMO are having a meeting in May so will get back with an update then.

UKQCD

PoC: Jeremy Coles (JC)

  • Update requested February 2016.
  • Update 21st March: "Hoping to do more with the gridpp resources".
    • Preliminary result in conference proceedings - "Investigating Some Technical Improvements to Glueball Calculations" e-Print: arXiv:1511.09303.
  • 24th May: Will try and leverage some of the international lattice data grid stuff. Nothing immediate planned.
  • 22nd August: No recent activity or planned activities.