Difference between revisions of "RAL Tier1 weekly operations castor 23/09/2016"

From GridPP Wiki
Jump to: navigation, search
(Actions)
Line 51: Line 51:
 
Andrey to create a wiki page to capture the details of the DB problem that caused problems in Castor 2.1.15 draining
 
Andrey to create a wiki page to capture the details of the DB problem that caused problems in Castor 2.1.15 draining
  
RA to find a backup head node with SL6 to be used as a spare head node
+
RA to find a machine with SL6 to be used as a spare head node
  
 
GP to come up with a procedure to deal with a failed head node
 
GP to come up with a procedure to deal with a failed head node

Revision as of 15:01, 27 September 2016

Draft agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

  1. Castor 2.1.15
  2. SL7 upgrade on tape servers

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

Unrouted files to tape (CMS). Andrew intervened and the problem was fixed

Operation news

Long-term projects

Stress testing on Castor 2.1.15 continues.

Development effort continues to migrate castor tape servers to aquilon

Actions

RA disks servers requiring RAID update - locate servers and plan for update with fabric

Stress test Castor 2.1.15 on the vCert nameserver

Follow up the impact of the new WAN parameters deployed on CMS disk servers

Talk to AL about the issue with unrouted files to tape

Andrey to create a wiki page to capture the details of the DB problem that caused problems in Castor 2.1.15 draining

RA to find a machine with SL6 to be used as a spare head node

GP to come up with a procedure to deal with a failed head node

Staffing

GP on call this week with RA as a back up