Difference between revisions of "RAL Tier1 weekly operations castor 09/12/2016"

From GridPP Wiki
Jump to: navigation, search
Line 35: Line 35:
 
== Operation news ==
 
== Operation news ==
  
LHCbUser and LHCbDst disk pools have now been merged
+
Five CV11 disk servers from aliceDisk (gdss613, gdss614, gdss615, gdss616, gdss617) have been drained and decommissioned [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=176040 link RT176040]
 +
 
 +
LHCbUser and LHCbDst disk pools have now been merged. LHCbUser are in read-only mode and will be drained out of data mexxt week [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=181921&results=1c3d7c7d08bce980aadf0d895ef34244 RT181921]
  
 
== Plans for next week ==
 
== Plans for next week ==
Line 59: Line 61:
 
Delete empty dirs from CASTOR (prompted by BD)
 
Delete empty dirs from CASTOR (prompted by BD)
  
Consider to move puppetdev to new hardware or VM (suggested by Kashif)
+
Consider to move puppetdev to new hardware or VM (suggested by Kashif) [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=177712 RT177712]
 +
 
 +
== Stafing ==
 +
 
 +
Rob on A/L on Tue, Wed, Thu next week
 +
 
 +
GP on call

Revision as of 11:30, 9 December 2016

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

  1. Castor 2.1.15
  2. SL7 upgrade on tape servers

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB


Operation problems

gdss650 (LHCbUser) failed on Saturday morning, 3rd Dec. It was returned to service on 6th Dec. A disk had failed - and the replacement to that disk also failed. During the RAID rebuild a further disk drive started reporting problems and was also swapped.

gdss701 (LHCbDst) was taken out of service on Saturday (3rd Dec) when it reported FSProbe errors when a disk was replaced. It was returned to service on the 5th Dec.

There a problem on one of the Power Distribution Units to a rack in the UPS room during the early hours of Monday morning (5th Dec). This affected two network switches - which in turn affected some core services

Operation news

Five CV11 disk servers from aliceDisk (gdss613, gdss614, gdss615, gdss616, gdss617) have been drained and decommissioned link RT176040

LHCbUser and LHCbDst disk pools have now been merged. LHCbUser are in read-only mode and will be drained out of data mexxt week RT181921

Plans for next week

CV13 disk server firmware upgrade. need to decide on an intervention plan that minimizes service disruption and it is effficient enough to complete in reasonable time

Initiate the 2.1.15 upgrade path on preprod

Long-term projects

Castor 2.1.15 upgrade has been postponed until January 2017

First draft of castor tapeserver features completed and published for review.

Actions

Create new tape pools for dirac and update accordingly the SRM grid-map file RT1660227

RA to talk to AL about merging old CMS tape pools

Start gathering tape recall stats for ATLAS RT177612

Delete empty dirs from CASTOR (prompted by BD)

Consider to move puppetdev to new hardware or VM (suggested by Kashif) RT177712

Stafing

Rob on A/L on Tue, Wed, Thu next week

GP on call