Difference between revisions of "RAL Tier1 weekly operations castor 23/09/2016"

From GridPP Wiki
Jump to: navigation, search
(Created page with " == Draft agenda == 1. Problems encountered this week 2. Upgrades/improvements made this week 3. What are we planning to do next week? 4. Long-term project updates (if not...")
 
Line 31: Line 31:
 
Unrouted files to tape (CMS). Andrew intervened and the problem was fixed
 
Unrouted files to tape (CMS). Andrew intervened and the problem was fixed
  
== Operation news =
+
== Operation news ==
  
 
== Long-term projects ==
 
== Long-term projects ==
Line 40: Line 40:
  
 
== Actions ==
 
== Actions ==
 +
 +
RA disks servers requiring RAID update - locate servers and plan for update with fabric
 +
 +
Stress test Castor 2.1.15 on the vCert nameserver
 +
 +
Follow up the impact of the new WAN parameters deployed on CMS disk servers
 +
 +
Talk to AL about the issue with unrouted files to tape
 +
 +
Andrey to create a wiki page to capture the details of the DB problem that caused problems in Castor 2.1.15 draining
 +
 +
RA to find a backup head node with SL6 to be used as a spare head node
 +
 +
GP to come up with a procedure to deal with a failed head node
 +
 +
== Staffing ==
 +
 +
GP on call this week with RA as a back up

Revision as of 15:00, 27 September 2016

Draft agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

  1. Castor 2.1.15
  2. SL7 upgrade on tape servers

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

Unrouted files to tape (CMS). Andrew intervened and the problem was fixed

Operation news

Long-term projects

Stress testing on Castor 2.1.15 continues.

Development effort continues to migrate castor tape servers to aquilon

Actions

RA disks servers requiring RAID update - locate servers and plan for update with fabric

Stress test Castor 2.1.15 on the vCert nameserver

Follow up the impact of the new WAN parameters deployed on CMS disk servers

Talk to AL about the issue with unrouted files to tape

Andrey to create a wiki page to capture the details of the DB problem that caused problems in Castor 2.1.15 draining

RA to find a backup head node with SL6 to be used as a spare head node

GP to come up with a procedure to deal with a failed head node

Staffing

GP on call this week with RA as a back up