Operations Bulletin Latest

Bulletin archive

Week commencing 18th May 2015

Task Areas

General updates

Tuesday 19th May

There was a GDB last week. The summary is available.
The summary of the pre-GDB about batch systems is available.
GridPP contacts for other VOs established (these are a current priority). Contacts expected to provide weekly updates on progress and status.
- DIRAC: Jens Jensen (-> Brian Davies?) – vo being created
- LIGO: Catalin Condurache – vo being created
- LOFAR: George Ryan
- LSST: Alessandra Forti
- LZ: David Colling
- UKQCD: Jeremy Coles

Glexec: Matt is redirecting efforts from coming up with a relocatable glexec tarball, to a recipe that sites could follow. He comments that this would be a lot more involved than he would like for a tarball install, but thinks that it's the only way to proceed with any confidence.
gstat is not supported. Note this ticket.
The network issues resolution process/procedure...
Assessment of the impact on User Communities/NGIs of the EGI core activities 2015 (results uploaded to the meeting page)
The EGI conference is taking place this week - link to the detailed agenda.
A reminder that the HEPSYSMAN & security training meeting is taking place 1st-3rd June.
STFC (through Catalin Condurache) are interested in investigating joining EGI Fed Cloud

Tuesday 12th May

There is a pre-GDB on batch systems at CERN this week. Tier-2 participation encouraged.
There will be a GDB on Wednesday 13th.
The next GridPP Technical Discussion meeting is scheduled for next Friday.
LSST preparing test dataset (involves Manchester, Liverpool and Edinburgh).

Monday 11th May

There was an EGI Operations Management Board (OMB) meeting on 30th April.
Operations updates:
- 12 service types will be removed from GOC DB due to not being used. They are defined in GGUS 113432
- A list Tools-admins at mailman.egi.eu has been created for ops tools administrator discussion.
- EGI OLA period 1 May 2015 - 30 April 2016
- Security coordination moves to CERN after SNIC.
- Only NGI-Argus servers should accept Nagios probes
- What HPC facilities are available in NGIs for federating?
- Suggestion for common RC suspension process.
- EGI conference in Lisbon 18-22 May.
FedCloud
- No stable monitoring tests. Proposal to create a new CLOUD-MON_CRITICAL (inc. eu.egi.cloud.APEL-Pub; eu.egi.cloud.OCCI-VM ...).
- New sites IN2P3-IRES (FR) and NCG-INGRID-PT (PT). 2 others in process.
- EGI to provide capacity to instantiate virtual machines to run the computational tasks (on earth observation datasets) generated by the users of the ESA funded Terradue for the development of the e-Collaboration for Earth Observation (e-CEO) platform.
- Auger moving to production on FedCloud.
EGI CSIRT
- Concern about effort going into perfSONAR issues (cacti; web interface; shellshocked...)
- CRITICAL CVE handling. Want EGI CSIRT hook into site re-certification by NGIs.
- Have no way to probe specific WNs. Proposed pakiti client run manually. (More UK feedback given).
- EGI-CSIRT got reviewed by TI and certified according to maturity parameters. Looking to run review on sites/NGIs.
UMD support for SL5/SL6
- Torque 4.2 is not backward compatible to 2.5.7. Update not recommended. Move to Torque 2.5.13 (patched by SVG) using AppDB repositoy with highest priority.
- SL5 support alligned with RHEL5. In "Maintenance" until March 31, 2017 ... but >80% sites not using it anyway and some sites on SL7 + struggling with MW deployment.
- Supporting CentOS7 in UMD requires to schedule the end of support of SL5 in UMD.
- EPEL7/CentOS7: 13 products are ready for EPEL7.
- No move from SL5 campaign foreseen.
- 60% of cloud sites base their cloud infrastructure on RHEL-compat distribution. Most of these are Ubuntu.
- Proposal: UMD4: September 2015. Decommissioning of SL5: March 2016.
ARGO Central Monitoring
- Deploy test central instance in May. Review results in June.
- High availability instances deployment in July (Croatia and Greece). Monitor during August.
- Switch A/R engine in September.
- Decommission NGI instances October 2015 (they can still be run for local alarms).
EGI Strategy Summary
- See document. Basically: Expand cloud. Push 'commons' and open platforms.
- "Consider open science as a production and dissemination system that needs integrated, easy and fair access to several types of shared resources (physical, digital, intellectual), engaged communities that contribute to the process and collaborates in the management and stewardship of the resources, a suitable governance with rules to allow/exclude access, to resolve conflicts, and finally financial support for the long-term availability".

Tuesday 5th May

It is a CMS week this week.
A pre-GDB on batch systems is taking place next Tuesday 12th May. More T2 participation is sought. Still need to define T2 GDB rep.
CHEP'15 proceedings submissions due byMay 17th.
April A/R figures circulated. No real issues this month except getting UCL (VAC/Cloud only) site correctly monitored.

WLCG Operations Coordination - Agendas

Thursday 7th May

The agenda. Minutes
News: Alessandra will present the WLCG workshop conclusions at next week's GDB.
Middleware news: UMD 3.12.0 released this week (fixes for ARGUS-PAP and dCache server)
Middleware baselines: dCache 2.6.x removed. New version 2.10.28/ 2.12.8 of dCache. Sites should avoid simultaneous updates.
Middleware issues: major upgrade of torque arrived in EPEL (from torque-2.5.7 to torque-4.2.10) which is not compatible standard EMI torque installation. If upgraded the patched 2.5.13 version of torque has been pushed to the EMI third-party repo in order to downgrade.
T0 & T1 upgrades: FTS 3.2.33 upgraded at CERN & RAL.
T0 news: batch HTCondor pilot is open for grid submission. Lower-than-usual WLCG availability figures in March for Atlas and CMS - possible overload.
T1 feedback: NTR
T2 feedback: NTR
OS support in UMD: Plans in EGI for CentOS7 support. 13 products are ready for EPEL7, but in general CentOS7 is not a viable option for sites. The release of UMD4 (supporting EPEL7 and Ubuntu) is foreseen for September 2015 and the decommissioning of SL5 for March 2016. It is likely that some products relevant for WLCG will not be ready for EPEL7 before 2016. The requirement for WLCG is to provide SL6 until the end of Run2, however, there are already offers for resources on CentOS7 and this is an incentive for experiments to validate their software on it.
ALICE: CASTOR at CERN - some re-reco job instabilities.
ATLAS: ~running full. Considering increasing job lengths for all MCORE jobs. Need sites to provide MCORE resources. Rucio/FTS issue was discovered - fix via update. Tier-0 data and computing workflow fully commissioned.
CMS: CMS production activities continue - Several sites reported network saturation. Evaluating to use selected “strong" Tier-2 sites to add computing capacity for DIGI-RECO. Plan to drop support of CRC32 checksum in CMS data transfer systems.
LHCb: Various operational issues reported - CASTOR/CERN SRM access problems; other data access issues.
gLExec: ATLAS 61 out of 94 sites. RAL, RALPP and TW-FTT issue was due to a bug in the pilot code that showed up with ARC CE + Condor sites.
SHA-2: old VOMS server aliases (lcg-)voms.cern.ch were removed on Tue Apr 28.
RFC proxies: RFC proxy readiness to be followed up per experiment. SAM-Nagios proxy renewal code fix to support RFC proxies.
Machine/Job features: NTR
MW readiness: 10th meeting on 6th agenda. WG is making a check-point of goals and priorities. ARGUS testbed at CERN is set-up and ready to start. Pakiti client requested at other test sites.
MC deployment: NTR
IPv6: LHCb: DIRAC was made IPv6-compatible back in November, but testing has started in April. Issue found at CERN with python library (wrong IPV6 address returned).
Network/Transfers WG: NTR
HTTP deployment: perfSONAR - Security: NDT 3.7.0.1 was released. The latest perfSONAR Toolkit version that all sites should be running is 3.4.2-12.pSPS. Network performance incidents process put in place as was agreed at the last meeting. OSG/Datastore validation progressing well. Publishing results to message bus progressing, development has finalized for esmond2mq prototype. Recent meeting focussed on FTS performance. Next meeting 3rd June. Plan is to focus it on latency ramp up and proximity service.

Tier-1 - Status Page

Monday 18th May

A reminder that there is a weekly Tier-1 experiment liaison meeting.
The agenda follows this format:
- 1. Summary of Operational Status and Issues
- 2. Highlights/summary of the Tier1 Monday operations meeting (Grid Services; Fabric; CASTOR and Other)
- 3. Experiment plans and operational issues (CMS; ATLAS; LHCb; ALICE and Others)
- 4. Special presentations
- 5. Actions
- 6. Highlights for Operations Bulletin Latest
- 7. AoB

Tuesday 129th May

Remaining CREAM CEs were turned off last week.
The problems with our primary network router are still being followed up - likely to be an intervention one morning next week (to be planned).
We are planning an update to the version of the Oracle database behind Castor. Dates to be finalised.

Storage & Data Management - Agendas/Minutes

Tuesday 18th May

Minutes are available.

Tuesday 21st April

Has there been any Tier-1 contact with DiRAC?
Proposal to setup an 'other VOs' users list. GridPP-Users is too tied with WLCG projects.

Wednesday 15 April

Backing up data from DiRAC to GridPP (tape)
More case studies on supporting non-LHC VOs on GridPP: we have a lot of great stuff that can do great stuff - non-LHC VOs tend to have less regimented data models so maybe we need more case studies.

Accounting - UK Grid Metrics HEPSPEC06 Atlas Dashboard HS06

Tuesday 12th May

Issues noted with sync for Brunel, Liv, ECDF (see EGI ticket 113473). Message broker issues (memory related) are likely the underlying EGI problem.
Need to check on VAC sync publishing.

Tuesday 21st April

(Slight) Accounting delays seen for: UCL; Sheffield; QMUL & RALPP.

Tuesday 14th April

APEL delays for UCL; Sheffield; RALPP and Bristol

A reminder to keep updating the HEPSPEC06 tables.

APEL status: An issue at Sheffield?

Documentation - KeyDocs

See the worst KeyDocs list for documents needing review now and the names of the responsible people.

Tuesday 21st April

The Approved VOs document has been updated to take account of changes to the Ops Portal VOID cards.For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503. Sites that support SNOPLUS.SNOLAB.CA should ensure that their configuration conforms to these settings: Approved VOs

KeyDocs still need updating since agreements reached at last core ops meeting.

Question of where to direct new VOs. EGI welcome page?

New section in Wiki called "Project Management Pages".

The idea is to cluster all Self-Edited Site Tracking Tables
in here. Sites should keep entries in Current Activities
up to date. Once a Self-Edited Site Tracking Tables has
served its purpose, PM to move it to  Historical Archive 
or otherwise dispose of the table.

Interoperation - EGI ops agendas

Tuesday 21st April

There was an EGI ops meeting on Monday 20th.
David updated the UK SL5 response.
Please review the agenda/minutes.

Monday 9th March

The agenda for February's EGI ops meeting is here. Minutes are here

- APEL 1.4.0
  - Added Month and Year columns to primary key of CloudSummaries table in cloud schema.
- DPM-Xrootd 3.5.2 is in EPEL stable - this is the first version of the component compatible with xrootd4
- gLExec-wn - v. 1.2.3: lcmaps-plugins-c-pep 1.3.0-1 & mkgltempdir 0.0.5-1
  - "The lcmaps-plugins-c-pep-1.3.0-1 preferably needs the argus-pep-api-c-2.3.0. This version will be released into EMI & UMD repositories in a near future."
- UMD 3.11.0 released on 16.02.2014, UMD 3.11.1 released on 4.03.2014
- lcg-CA 1.62 noted with an intention to broadcast these as they occur as opposed to monthly.
- EGI looking at the decommissioning of SL5, possibly by end of 2015, as a byproduct of adding CentOS 7 to UMD. NGIs to make a note if extended SL5 support is required.
- Vincenzo Spinoso has joined EGI Ops team from NGI_IT. Vincenzo will chair EGI Ops.
- Next meeting is April 20th.

Monitoring - Links MyWLCG

Tuesday 31st March

Glasgow Graphite/Grafana documentation: http://www.scotgrid.ac.uk/graphite/

Monday 7th December

Meeting last Friday - agenda: https://indico.cern.ch/event/356853/ minutes: https://indico.cern.ch/event/356853/material/minutes/1.pdf
This was the wrap-up meeting of the consolidation TF; the mailing list will remain extant for a while yet.

On-duty - Dashboard ROD rota

Monday 11th May

Rota responses awaited from Andrew and Daniela.
Handover summary should be uploaded to the bulletin please.

Tuesday 28th April

Glasgow: A GLUE2 problem is transient and doesn't have a short-term solution (if the service status was checked a little more frequently it would help). Currently on hold. IC sometimes see this too.

UCL: No change to the on-going situation. UCL has hopped from one downtime to another this week. Note – AM visiting UCL this week to setup VAC. Services will be decommissioned after this step.

Tuesday 21st April

UCL have put themselves into a downtime until the 21st April. (Start of next week). Noted this in their outstanding tickets.
Birmingham's availability has steadily recovered over the week - and the low availability ticket against them should be closable next week.

Rollout Status WLCG Baseline

Tuesday 12th May

MW Readiness WG meeting Wed May 6th at 4pm. Attended by Raul, Matt, Sam and Jeremy.

Tuesday 17th March

Daniela has updated the [ https://www.gridpp.ac.uk/wiki/Staged_rollout_emi3 EMI-3 testing table]. Please check it is correct for your site. We want a clear view of where we are contributing.
There is a middleware readiness meeting this Wednesday. Would be good if a few site representatives joined.
Machine job features solution testing. Fed back that we will only commence tests if more documentation made available. This stops the HTC solution until after CHEP. Is there interest in testing other batch systems? Raul mentioned SLURM. There is also SGE and Torque.

References

Staged Rollout pages (now separated into EMI1 & 2), and the page listing the deployed versions is extractable from the bdii, so they should all be reasonably up-to-date:
http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout.html
http://www.hep.ph.ic.ac.uk/~dbauer/grid/staged_rollout_emi2.html
http://www.hep.ph.ic.ac.uk/~dbauer/grid/state_of_the_nation.html

Security - Incident Procedure Policies Rota

Tuesday 18th May

EGI SVG and CSIRT Advisory "Critical/Low?". "VENOM: QEMU vulnerability (CVE-2015-3456)
Issue with VM appliance - image ships with ...
EGI SVG Advisory 'High' Risk - Dirac SQL injection vulnerability [EGI-SVG-2014-7553]
IGTF is about to release an update to the trust anchor repository (1.64)

Tuesday 12th May

EGI-SVG-2015-8479
Security rota under discussion.

The EGI security dashboard.

Services - PerfSonar dashboard | GridPP VOMS

- This includes notifying of (inter)national services that will have an outage in the coming weeks or will be impacted by work elsewhere. (Cross-check the Tier-1 update).

Tuesday 12th May

HCOPN & LHCONE joint meeting at LBL June 1st & 2nd. Agenda taking shape.

Tuesday 31st March

Duncan has been giving the IPv6 nodes a push! See Steve's IPv6 transfers page.

Tuesday 10th March

From the recent WLCG meeting, two slides (1 & 2) give the direction of the network monitoring and metrics progress: integration of perfSONAR event types into experiment monitoring and an architecture for data to get from RSV probes to client. Components described on slide 3.
The next LHCOPN and LHCONE joint meeting will take place on Monday 1st and Tuesday 2nd of June 2015 in Berkeley (US) (hosted by LBL and ESnet).

Tickets

Monday 18th May 2015, 14.30 BST
Full review this week.

Other VO Nagios
At time of writing I see problems with test jobs at Brunel for pheno and Liverpool for a number of VOs (see Sno+ ticket for probable cause and fix at Liverpool).

22 Open UK Tickets this week. Going site-by-site:

APEL/NGI
113473 (4/5)
Missing accounting date for April for some sites. Raul is discussing things for Brunel in the ticket, although they have republished. I think it's only ECDF left to republish their April data. In progress (16/5)

OXFORD
113482 (26/4)
Loss of accounting data for Oxford needing a APEL republish. The Oxford guys republished, but there is some confusion with the resulting numbers. Discussion is ongoing, John G is currently looking at the records. In progress (14/5)

113650 (11/5)
CMS glideins failing at Oxford. The original problem was with a config tweak being left out of the cvmfs setup, but the ticket has been reopened citing problems persisting on the ARC CE (the CREAM appears to be fixed). Reopened (16/5)

GLASGOW 113095 (17/4)
ROD ticket about batch system BDII failures, left open to avoid unnecessary ticket filing. Gareth noted that the full migration to ARC and HTCondor, which should see the end of these issues, will hopefully be completed by the end of June. On Hold (12/5)

ECDF 95303 (31/7/13)
Somehow left this one out of the e-mail update. Edinburgh's glexec ticket, dependent on the tarball. I put in my tuppence worth today with my tarball hat on. On hold (18/5)

SHEFFIELD
113769 (18/5)
LHCB see a cvmfs problem at Sheffield. Elena has probably fixed the problem(restarted the sssd), just waiting to see if it all pans out. In progress (18/5)

MANCHESTER
113744 (15/5)
For the VOMS rather then the site, Jens' request for the creation of the dIrac VO, vo.dirac.ac.uk. In progress (18/5)

113692 (13/5)
A request from pheno to add support to for their new cvmfs area at Manchester, and as I understand it, to support them in a new "form" (pheno.egi.eu). In progress (13/5)

LIVERPOOL
113742 (15/5)
Sno+ noticed their nagios failures at Liverpool. Rob reckons this was a problem with the DPM BDII service certificate not being updated (that's bitten me too), and fixed things this morning. Let's see how that goes. In progress (18/5)

LANCASTER
95299 (1/7/13!)
Lancaster's vintage glexec ticket. An update on this - after have a roundtuit session last week I was building glexec for different paths. It still needs some testing to make sure it works properly. There however definitely won't be a one-size-fits-all tarball solution. On hold (15/5)

100566 (27/1/14)
Only the crustiest old tickets for us at Lancaster! Poor perfsonar performance. Sadly didn't get roundtuit on this one - we're pushing getting these nodes dual stacked as Ewan had pointed out that it would be interesting to see if IPv6 tests also saw this issue. On hild (18/5)

UCL
113721 (14/5)
The only UCL ticket, this is a egi "low availability" ticket. However Daniela notes that the plots are on the rise, so things are looking alright. Probably want to "On Hold" it but otherwise not much to be done. In progress (14/5)

IMPERIAL
113743 (15/5)
A ticket from Durham concerning the Dirac instance at Imperial's settings for their site. Daniela hopes to get it fixed soon. In progress (15/5)

100IT
112948 (10/4)
CA certificate update at 100IT leading to a discussion of other authentication based failures. David has asked for voms information after posting his configs. In progress (13/5)

TIER 1
113035 (14/4)
Ticket tracking the decommissioning of the Tier 1 CREAM CEs. I think things are just about done now, this ticket can soon be closed. In progress (11/5)

109694 (28/10/14)
Sno+ gfal-copy ticket. Brian reports that the Tier 1 is upgrading gfal2 on their WNs, and notes that there's a lot of active debugging work going on in the area. As he eloquently puts it "situation is quite fluid". In progress (13/5)

108944 (1/10/14)
CMS AAA tests failing at the Tier 1. There's been a lot of work on this, deploying then trying to get the new xrootd director configured. New problems have cropped up, and are under investigation. In progress (11/5)

112721 (28/3)
Atlas transfer failures ("failed to get source file size"). Tracked to a odd double transfer error, possibly introduced in one of the recent "upgrades". Brian has been declaring these files as bad, and a workaround or solution is being thought about. In progress (14/5)

113705 (13/5)
Atlas transfer failures from RAL tape. Checksum failures, which Brian tracked to being due to not being of a type Castor supports. Brian has asked if this can be changed at the CERN FTS or in rucio. Waiting for reply (14/5)

113748 (16/5)
Another atlas transfer ticket, but as the error indicates no space left at the Brunel space token being transferred to Elena has noted that this isn't a site problem, telling the submitter to put in a JIRA ticket instead. Waiting for reply, but probably can be just closed (16/5)

112866 (7/4)
Lots of cms job failures at RAL. This has been traced to some super-hot files, mitigation is being looked into. A candidate for perhaps On Holding, depends on the time frame of a work around. In progress (13/5)

113320 (27/4)
CMS data transfer issues. I'm not actually too sure what's going on. There are files that need invalidating, which seems to be the root of the evil befalling transfers. The issue is being actively worked on though. In progress (18/5)

Tools - MyEGI Nagios

Tuesday 17th February

Another period where message brokers were temporarily unavailable seen yesterday. Any news on the last follow-up?

Tuesday 27th January

Unscheduled outage of the EGI message broker (GRNET) caused a short-lived disruption to GridPP site monitoring (jobs failed) last Thursday 22nd January. Suspect BDII caching meant no immediate failover to stomp://mq.cro-ngi.hr:6163/ from stomp://mq.afroditi.hellasgrid.gr:6163/

Blog about VO Nagios
Oxford VO Nagios currently monitoring gridpp, pheno, t2k.org, snoplus.snolab.ca, vo.southgrid.ac.uk.

VOs - GridPP VOMS VO IDs Approved VO table

Tuesday 19th May

There is a current priority for enabling/supporting our joining communities.

Tuesday 5th May

We have a number of VOs to be removed. Dedicated follow-up meeting proposed.

Tuesday 28th April

For SNOPLUS.SNOLAB.CA, the port numbers for voms02.gridpp.ac.uk and voms03.gridpp.ac.uk have both been updated from 15003 to 15503.

Tuesday 31st March

LIGO are in need of additional support for debugging some tests.
LSST now enabled on 3 sites. No 'own' CVMFS yet.

Impact
- Citation policy (https://www.gridpp.ac.uk/acknowledging.html)

Site Updates

Tuesday 24th February

Next review of status today.

Tuesday 27th January

Squids not in GOCDB for: UCL; ECDF; Birmingham; Durham; RHUL; IC; Sussex; Lancaster
Squids in GOCDB for: EFDA-JET; Manchester; Liverpool; Cambridge; Sheffield; Bristol; Brunel; QMUL; T1; Oxford; Glasgow; RALPPD.

Tuesday 2nd December

Multicore status. Queues available (63%)
- YES: RAL T1; Brunel; Imperial; QMUL; Lancaster; Liverpool; Manchester; Glasgow; Cambridge; Oxford; RALPP; Sussex (12)
- NO: RHUL (testing); UCL; Sheffield (testing); Durham; ECDF (testing); Birmingham; Bristol (7)

According to our table for cloud/VMs (26%)
- YES: RAL T1; Brunel; Imperial; Manchester; Oxford (5)
- NO: QMUL; RHUL; UCL; Lancaster; Liverpool; Sheffield; Durham; ECDF; Glasgow; Birmingham; Bristol; Cambridge; RALPP; Sussex (14)

GridPP DIRAC jobs successful (58%)
- YES: Bristol; Glasgow; Lancaster; Liverpool; Manchester; Oxford; Sheffield; Brunel; IC; QMUL; RHUL (11)
- NO: Cambridge; Durham; RALPP; RAL T1 (4) + ECDF; Sussex; UCL; Birmingham (4)

IPv6 status
- Allocation - 42%
- YES: RAL T1; Brunel; IC; QMUL; Manchester; Sheffield; Cambridge; Oxford (8)
- NO: RHUL; UCL; Lancaster; Liverpool; Durham; ECDF; Glasgow; Birmingham; Bristol; RALPP; Sussex

Dual stack nodes - 21%
- YES: Brunel; IC; QMUL; Oxford (4)
- NO: RHUL; UCL; Lancaster; Glasgow; Liverpool; Manchester; Sheffield; Durham; ECDF; Birmingham; Bristol; Cambridge; RALPP; Sussex, RAL T1 (15)

Tuesday 21st October

High loads seen in xroot by several sites: Liverpool and RALT1... and also Bristol (see Luke's TB-S email on 16/10 for questions about changes to help).

Tuesday 9th September

Intel announced the new generation of Xeon based on Haswell.

Meeting Summaries

Project Management Board - Members Minutes Quarterly Reports

Empty

GridPP ops meeting - Agendas Actions Core Tasks

Empty

RAL Tier-1 Experiment Liaison Meeting (Wednesday 13:30) Agenda Meeting takes place on Vidyo.

Wednesday 13th May 2015 Operations report

We are investigating some significant Castor performance issues for CMS.
Revised (still draft) dates for the upgrade of the Oracle database behind Castor (to version 11.2.0.4) were presented.
The decommissioning of the the CREAM CEs was effectively done a wek ago. They are now marked as being not in production in the GOC DB.
We are still working to fix the problematic Tier1 router.

WLCG Grid Deployment Board - Agendas MB agendas

Empty

NGI UK - Homepage CA

Empty

Events

UK ATLAS - Shifter view News & Links

Atlas S&C week 2-6 Feb 2015

Production

• Prodsys-2 in production since Dec 1st

• Deployment has not been transparent , many issued has been solved, the grid is filled again

• MC15 is expected to start soon, waiting for physics validations, evgen testing is underway and close to finalised.. Simulation expected to be broadly similar to MC14, no blockers expected.

Rucio

• Rucio in production since Dec 1st and is ready for LHC RUN-2. Some fields need improvements, including transfer and deletion agents, documentation and monitoring.

• Rucio dumps available.

• Dark data cleaning

• files declaration . Only Only DDM ops can issue lost files declaration for now, cloud support needs to fill a ticket.

• Webdav panda functional tests with Hammercloud are ongoing

Monitoring

• Main page

• DDM Accounting

• space

• Deletion

ASAP

• ASAP (ATLAS Site Availability Performance) in place. Every 3 months the T2s sites performing BELOW 80% are reported to the International Computing Board.

UK CMS

Empty

UK LHCb

Empty

UK OTHER

N/A

To note

N/A

Operations Bulletin Latest

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools