Difference between revisions of "RAL Tier1 weekly operations castor 16/11/2018"

From GridPP Wiki
Jump to: navigation, search
(Created page with "== Standing agenda == 1. Problems encountered this week 2. Upgrades/improvements made this week 3. What are we planning to do next week? 4. Long-term project updates (if n...")
 
(Operation problems)
Line 26: Line 26:
 
== Operation problems ==
 
== Operation problems ==
  
Massive recall in Facilities caused conjestion; resolved by just waiting
+
gdss736 (lhcbDst) crashed and removed from prod; back again
 +
 
 +
/etc/cron.d/check_tape_pools.ncm-cron.cron file was missing from the WLCGTape headnodes and as a result was the tape pools were not topped up with free tapes and a large backlog of ATLAS canbemigrs was created [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=218153 RT218153]. This was foxed on aquilon on Mon 12/11 and the backlog is now cleatring
  
 
== Operation news ==
 
== Operation news ==

Revision as of 14:40, 16 November 2018

Standing agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

5. Special topics

6. Actions

7. Review Fabric tasks

  1.   Link

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

gdss736 (lhcbDst) crashed and removed from prod; back again

/etc/cron.d/check_tape_pools.ncm-cron.cron file was missing from the WLCGTape headnodes and as a result was the tape pools were not topped up with free tapes and a large backlog of ATLAS canbemigrs was created RT218153. This was foxed on aquilon on Mon 12/11 and the backlog is now cleatring

Operation news

 * na62 has moved to WLCGTape
 * Repack upgraded to Sl7/2.1.17-35

Plans for next few weeks

  * Decommission disk servers from ATLAS d1t0.
  * Move all needed disk servers from ATLAS d0t1 to wlcgTape (gdss893, gdss894, gdss895)
  * Move the rest of the Gen VOs to WLCGTape
  * Proceed with the cmsDisk decommissioning
  * Decommission xrootd-cms-manager

Long-term projects

  * New CASTOR WLCGTape instance. Things need doing: Create a seperate xrootd redirector for ALICE

Actions

Staffing

  * RA out until a week Monday.
  * GP on call