Difference between revisions of "RAL Tier1 weekly operations castor 09/12/2016"

From GridPP Wiki
Jump to: navigation, search
(Created page with "1. Problems encountered this week 2. Upgrades/improvements made this week 3. What are we planning to do next week? 4. Long-term project updates (if not already covered) ...")
 
(Operation problems)
Line 29: Line 29:
 
gdss650 (LHCbUser) failed on Saturday morning, 3rd Dec. It was returned to service on 6th Dec. A disk had failed - and the replacement to that disk also failed. During the RAID rebuild a further disk drive started reporting problems and was also swapped.
 
gdss650 (LHCbUser) failed on Saturday morning, 3rd Dec. It was returned to service on 6th Dec. A disk had failed - and the replacement to that disk also failed. During the RAID rebuild a further disk drive started reporting problems and was also swapped.
  
gdss701 (LHCbDst) was taken out of service on Saturday (3rd Dec) when it reported FSProbe errors when a disk was replaced. It was returned to service on the 5th Dec.  
+
gdss701 (LHCbDst) was taken out of service on Saturday (3rd Dec) when it reported FSProbe errors when a disk was replaced. It was returned to service on the 5th Dec.
 +
 
 +
There a problem on one of the Power Distribution Units to a rack in the UPS room during the early hours of Monday morning (5th Dec). This affected two network switches - which in turn affected some core services
  
 
== Operation news ==
 
== Operation news ==
  
 
LHCbUser and LHCbDst disk pools have now been merged
 
LHCbUser and LHCbDst disk pools have now been merged

Revision as of 10:08, 9 December 2016

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

  1. Castor 2.1.15
  2. SL7 upgrade on tape servers

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB


Operation problems

gdss650 (LHCbUser) failed on Saturday morning, 3rd Dec. It was returned to service on 6th Dec. A disk had failed - and the replacement to that disk also failed. During the RAID rebuild a further disk drive started reporting problems and was also swapped.

gdss701 (LHCbDst) was taken out of service on Saturday (3rd Dec) when it reported FSProbe errors when a disk was replaced. It was returned to service on the 5th Dec.

There a problem on one of the Power Distribution Units to a rack in the UPS room during the early hours of Monday morning (5th Dec). This affected two network switches - which in turn affected some core services

Operation news

LHCbUser and LHCbDst disk pools have now been merged