Present: Pete Gronbech (Chair), Dave Colling, Dave Kelsey Claire Devereux, Tony Doyle, Andrew Sansum, Jeremy Coles, Gareth Smith, Andrew McNab, Roger Jones, Steve Lloyd, Pete Clarke, Tony Cass, Louisa Campbell (Minutes).
Apologies: Dave Britton, Tony Cass.
1. GridPP36 Agenda
PG and PC discussed themes for GridPP36 at the recent WLCG workshop. Evolution of Tier-2 and other sites is an obvious candidate and potential themes are being sought. Evolution in general has been suggested. UKT0 was also suggested and possibly also WLCG project outcomes since this will be the first meeting going into GridPP5 – technological changes and working practices. Agreement was reached to focus on the work we are doing leading to work and employment in other communities, this will probably become clearer by the next GridPP meeting, recognising some of the VOs have been slower than anticipated.
Theme suggested: “New methods of working in GridPP5”
Posible theme: “UKT0”
PC has asked for members to make suggestions for themes and email them round the PMB over the next few days for consideration.
PC has been in touch with Swindon to arrange the next oversight meeting and advised them to contact him, this will probably be pushed into May (3rd, 10th or 11th have been discussed as potential dates). A new person has taken from Malcolm Booy in Swindon. PC advised initial suggestions are to concentrate on the closing out of GridPP4+ and leave GridPP5 until thereafter, but conversations will be taken up with Tony Medland when he returns from holiday. Updating new paperwork for GridPP5 and closing down GridPP4 would be too excessive workload so agreement was reached it is preferable to concentrate on GridPP4 for the moment. PC will discuss with DB on Friday to determine whether he prefers to take this forward.
ACTION 588.1: PG will discuss with DB and generate emails to update PMB later in the week.
ACTION 588.2: ALL to make email suggestions for GridPP36 themes over the next few days.
2. Researchfish Status
PG uploaded a quick Powerpoint presentation to provide an update. Page 1 is a snapshot containing common outputs (publications, collaboration, funding, etc). PG looked at it recently and several publications are due out Tom kindly provided a recent list for ATLAS and GridPP publications. It was noted that Inspire listed 58 papers since 2014 but Researcfish only 4 are listed, it is not clear why there is a dichotomy. In 2015 there was 70-odd – Bibtech option is no longer available but various search options are available on Researchfish, including Inspire, which can be imported. SL will provide information on how to do this.
There have been other issues as some Sheffield Tier-2 grants are not contained on the list and some information is incomplete. The last email from STFC indicated all hardware grants should have a special code against them so that reports do not need to be produced, but this is not always happening. This is a recurring issue and becomes more complicated each year. If some staff grants are not input then a negative scoring appears against the site concerned so it is necessary to insert something to receive recognition. It is also not clear whether members can access the publication papers associated with each grant included in Researchfish. Members can sign in and click ‘GridPP Project Coordination’ then ‘Research Team’ which lists people and their associated grants, but this sometimes lists old grants that are now irrelevant – PG will email Ian Puller at STFC to update grant lists. Some consistency should be applied in the uploading of themed publications against individual grants, for example, the way PG links to GridPP, to demonstrate output.
ACTION 588.3: SL will provide PG with a script for importing lists of publications from Inspire.
ACTION 588.4: ALL to inform PG of any new roles and other items that need to be inserted into different categories and grants on Researcfish so that he can ensure all are included and circulate to PMB to check.
ACTION: 588.5 PG will email Ian Puller at STFC to update the lists of current grants associated with PIs.
There have been issues previously referred to with engagement with VOs on Astronomy side. This needs to be dealt with as ‘agile engagement’ so that new users can have a positive experience. This can be put onto the agenda for full discussion next week.
PC noted DB questioned a report on saturation on OPN. GS confirmed some issues had been experienced over the last couple of weeks, but the reason for this is not yet apparent. There was a lot of traffic in December, but it is not clear if the problem over the last couple of weeks relates to that. GS will investigate and report back.
ACTION 588.6: GS will investigate reasons for saturation on OPM and report back to the PMB with findings.
4. Standing Items
SI-0 Bi-Weekly Report from Technical Group (DC)
Last meeting was cancelled and no report submitted.
SI-1 Dissemination Report (SL)
##GridPP Dissemination Officer Notes for PMB
###New User Engagement Programme
* PRaVDA: TW has emailed Tony Price for an update on the proton therapy GEANT4 simulations that have been running on GridPP DIRAC.
* Climate Prediction: we have recently made contact with the Oxford team from http://www.climateprediction.net who have an interest in using GridPP resources for their simulations. They have successfully been following the UserGuide and Ewan M has been liaising with them to establish parameters and possible user cases.
###GridPP website content requests
* If Collaboration members would like to suggest a change to the public-facing website content, it has been suggested that the GridPP JIRA group could be used to submit, discuss, and track requests. The group can be found here:
### GridPP UserGuide update
TW has added instructions for using the DIRAC File Catalog Command Line Interface to the UserGuide:
This covers using the DFC CLI to upload, manage, replicate and download data using the command line, providing the new user with a relatively gentle introduction to putting data on grid Storage Elements.
There is also a guide to some first steps with DIRAC’s metadata functionality:
Tom has agreed to adding the half a dozen usual suspects from the PMB as Editors, so they can edit the main website run by WordPress now (not the Wiki part.) You need to join the WordPress site at https://www.gridpp.ac.uk/wp-admin/ using the certificate in your browser (it’s automatic) and then mail me. Please don’t edit any pages that look like publicity pages though: please mail Tom about suggested changes to those, as it’s possible to break the formatting of some of the more complicated pages.
SI-2 ATLAS Weekly Review and Plans (RJ)
Tier 1 CVMS issue impacted ATLAS analysis and tests. Production issues are now resolved. Availability was not terribly good in January, this is being investigated and resolved – Glasgow is not correlated with storage issues, for example. General issues are with availability and performance at the moment.
Quarterly report will be supplied to PG in the next couple of days.
SI-3 CMS Weekly Review and Plans (DC)
Nothing of significance to report.
SI-4 LHCb Weekly Review and Plans (PC)
Nothing much to report, but some load has been variable, e.g. now running at full capacity, but a few days ago this was much lower.
SI-5 Production Manager’s report (JC)
1. The 2016 WLCG Collaboration Workshop took place in Lisbon last week: https://indico.cern.ch/event/433164/other-view?view=standard. GridPP had about 12 people present. We will have a summary review of the event at our ops meeting tomorrow. The event was followed by a DPHEP workshop: https://indico.cern.ch/event/444264/.
2. EGI is conducting a survey about NGI services to support individual users (Long tail of science). The focus is on domain specific services that are run. The completion deadline is 12th February.
3. EGI now has a timeline for aligning Fedcloud sites to the EGI A/R procedures. Probe integration starts this month and will become critical in June.
4. EGI is preparing an “Impact Report”. GridPP has been asked to provide input which Tom has now done on our behalf by providing links to publications made possible by the infrastructure.
5. Support for our new VOs continues to show up issues. We have recently enabled some LSST users in the gridpp and made some VO changes to accommodate them. The latest issue concerning LSST is that the use of DIRAC requires a renaming of their catalogue naming schema.
6. CHEP 2016 has had a first bulletin released: http://chep2016.org/sites/default/files/bulletins/CHEP2016FirstBulletin.pdf. It is recognised as hugely expensive to attend, but costs can be cut by not booking the recommended hotels. Other hotels should be advertised in due course of hotels that are significantly cheaper.
7. Last week there was an APEL synchronisation backlog issue that affected the global infrastructure (publication and sync tests for most sites started failing). It was fixed by the APEL team on 4th February.
JC has not yet received Tier-2 figures but will circulate when available.
SI-6 Tier-1 Manager’s Report (GS)
– There was a problem with one tape containing Atlas data. 5898 files were lost. The files were from 2012 simulation. The tape is being sent off for analysis to help us understand why this occurred.
– This last couple of weeks has seen frequent saturation of the 10Gbit OPN link.
(I have attached a plot showing the traffic over the last month)
– The ARC CEs have been updated (to version 5.0.5) and a Condor update has also been applied. GS will look at availability for last few months and report to PMB next week.
– No change (Disk and CPU capacity orders in place).
585.1 GS to report on RAL job efficiencies before Christmas. Ongoing.
– The January CPU Efficiencies have returned to normal. I.e. it was only the December CPU Efficiencies that were abnormally low.
– We were aware of batch problems during December – one of the symptoms was jobs being re-run on other worker nodes. Initially we could not see how these would affect the efficiency measurements. However, checking the Condor documentation shows that when a job is restarted like this the wallclock values are maintained across the restart but the CPUtime is reset. This would give lower efficiencies for these jobs. We will investigate how to correct this in the Condor configuration.
587.3: GS to report back on the outcome of meetings on Clustervision 11. This will develop a plan on how to handle an increase caused by catastrophic failures since using old kit to replace existing kit is not effective.
– The meeting concluded there was a significant problem with the ClusterVision ’11 servers. Two main actions came out of it: The first, which was already underway, was to replace failing disk drives in this batch with drives from an older batch that had been decommissioned. These disks are from a different manufacturer and, according to our statistics, have shown a lower failure rate. It was also agreed to update the firmware in all the RAID cards in this batch of servers. This does not reduce the disk failure rate but is expected to improve the behaviour of the system if/when a disk fails. Since that meeting it has also been agreed to purchase some disk servers for use in the tape-backed service classes. This effectively replaces these disk servers. There is some concern about another batch of servers (Viglen ’12) was also noted. A follow-up meeting will take place in a week or so as we wish to maintain a close watch on these disk servers.
ACTION 588.7: GS to circulate to PMB availability for all 4 VOs for last 3 months then discuss at next week’s meeting. He currently produces a monthly report for DB.
ACTION 588.8: GS to report on ongoing disc server issues in general.
SI-7 LCG Management Board Report of Issues (DB)
Nothing to report.
REVIEW OF ACTIONS
582.4: DC to insert an update in the wiki page regarding communication with LZ. Ongoing.
585.1: GS to report on RAL job efficiencies before Christmas. Done.
585.2: DB and AM will determine who best to send to SSI Collaboration meeting and report back on outcomes. This will take place on 23 March. Ongoing.
586.1: DC will discuss proposed IT reorganisation at CERN with Tim Bell. Done.
586.2: AS will contact MoBrain and discuss resources for their EGI project. Next step is to make allocations. Ongoing.
587.1: PG will look at the EVAL information and seek a better solution for submitting the required information. He will forward to the PMB a detailed complaint from Birmingham to STFC highlighting some issues being raised at site level. Done.
587.2: AM will invite selected small, medium and large sites to contribute presentations at GridPP36 on their plans for site evolution over the next few years and construct a session around this. Ongoing.
587.3: GS to report back on the outcome of meetings on Clustervision 11. This will develop a plan on how to handle an increase caused by catastrophic failures since using old kit to replace existing kit is not effective. Done.
ACTIONS AS OF 08.02.16
582.4: DC to insert an update in the wiki page regarding communication with LZ. Ongoing.
586.2: AS will contact MoBrain and discuss resources for their EGI project. Next step is to make allocations. The ticket is now with Catalin – AS will check and report to the PMB next week. Ongoing.
587.2: AM will invite selected small, medium and large sites to contribute presentations at GridPP36 on their plans for site evolution over the next few years and construct a session around this.
588.1: PG will discuss with DB and generate emails to update PMB later in the week.
588.2: ALL to make email suggestions for GridPP36 themes over the next few days.
588.3: SL will provide PG with a script for importing lists of publications from Inspire.
588.4: ALL to inform PG of any new roles and other items that need to be inserted into different categories and grants on Researcfish so that he can ensure all are included and circulate to PMB to check.
588.5 PG will email Ian Puller at STFC to update the lists of current grants associated with PIs.
588.6: GS will investigate reasons for saturation on OPM and report back to the PMB with findings.
588.7: GS to circulate to PMB availability for all 4 VOs for last 3 months then discuss at next week’s meeting. He currently produces a monthly report for DB.
588.8: GS to report on ongoing disc server issues in general.