Difference between revisions of "RAL Tier1 weekly operations castor 23/11/2018"

From GridPP Wiki
Jump to: navigation, search
(Created page with "== Standing agenda == 1. Problems encountered this week 2. Upgrades/improvements made this week 3. What are we planning to do next week? 4. Long-term project updates (if n...")
 
(Operation news)
 
(6 intermediate revisions by one user not shown)
Line 25: Line 25:
  
 
== Operation problems ==
 
== Operation problems ==
 
  * gdss736 (lhcbDst) crashed and removed from prod; back again
 
 
  * /etc/cron.d/check_tape_pools.ncm-cron.cron file was missing from the WLCGTape headnodes and as a result was the tape pools were not
 
    topped up with free tapes and a large backlog of ATLAS canbemigrs was created [https://helpdesk.gridpp.rl.ac.uk/Ticket/Display.html?id=218153 RT218153].
 
    This was fixed on aquilon on Mon 12/11 and the backlog is now cleatring
 
  
 
== Operation news ==
 
== Operation news ==
 
 
  * Decommissioned all disk servers from ATLAS atlasStripInput and atlasTape
 
  
   * Moved all needed disk servers from atlasTape to wlcgTape (gdss893, gdss894, gdss895)
+
   * Neptune and Pluto DB patching completed on Tue
  
   * Allocated lcgcts27 and lcgcts28 to WLCGTape
+
   * Continue with deleting CMS files on cmsDisk
  
   * Migration of the Gen VOs (except Alice) to WLCGTape
+
   * Recovery of more na62 files
 
+
  * fdsdss20 and fdsdss21 were removed from Facilities facD0T1 pool and decommissioned
+
  
 
== Plans for next few weeks ==
 
== Plans for next few weeks ==
Line 49: Line 39:
  
 
   * Decommission xrootd-cms-manager
 
   * Decommission xrootd-cms-manager
 +
 +
  * Decommission ATLAS headnodes
 +
 +
  * Complete kernel patching on CASTOR hosts
 +
 +
  * Oracle/kernel patching for CASTOR Facilities DB
 +
 +
  * Deploy new disk servers for Facilities
  
 
== Long-term projects ==
 
== Long-term projects ==
Line 62: Line 60:
 
== Staffing ==
 
== Staffing ==
  
   * RA out from Thu 22/11
+
   * RA out until 10/12

Latest revision as of 10:54, 23 November 2018

Standing agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

5. Special topics

6. Actions

7. Review Fabric tasks

  1.   Link

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

Operation news

 * Neptune and Pluto DB patching completed on Tue
 * Continue with deleting CMS files on cmsDisk
 * Recovery of more na62 files

Plans for next few weeks

  * Proceed with the cmsDisk decommissioning
  * Decommission xrootd-cms-manager
  * Decommission ATLAS headnodes
  * Complete kernel patching on CASTOR hosts
  * Oracle/kernel patching for CASTOR Facilities DB
  * Deploy new disk servers for Facilities

Long-term projects

  * New CASTOR WLCGTape instance. Things need doing: Create a seperate xrootd redirector for ALICE
  * CASTOR disk server migration to Aquilon: gdss742 has been compiled with a draft aquilon profile
    but there are problems with the SL7 installation RT216885 

Actions

Staffing

  * RA out until 10/12