RAL Tier1 weekly operations castor 25/11/2016

From GridPP Wiki
Revision as of 12:16, 25 November 2016 by George Patargias c592d6dd61 (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Draft agenda

1. Problems encountered this week

2. Upgrades/improvements made this week

3. What are we planning to do next week?

4. Long-term project updates (if not already covered)

1. Castor 2.1.15
2. SL7 upgrade on tape servers

5. Special topics

6. Actions

7. Anything for CASTOR-Fabric?

8. AoTechnicalB

9. Availability for next week

10. On-Call

11. AoOtherB

Operation problems

gdss750 (lhcbDst) failed due to fsprobe errors and was removed from production

gdss651 (preProd) is still down RT177006

Complication with the renewal of the Gen SRM host certificates due to the need to include the alternative hostnames for the different VOs in each certificate. This requires talking to Jens before the approval of the ceritificate request. Whenever a new VO is added to Gen, the certificates need to be re-issued.

Service alarm: Castor functional test lhcbUser on host lcgcadm05 177667

gdss784 does not appear on ganglia with its host name but with its IP

Operation news

All CV14 disk servers have been deployed into full production in lhcbDst 176041

Started draining and decomissioning the CV11 disk servers in aliceDisk 176040

Long-term projects

Castor 2.1.15 upgrade has been postponed until January 2017

First draft of castor tapeserver features almost complete

Special topics

Remake transfer rate plots for larger files (> 0.5 GB) and covering longer time periods: implemented these requirements in the script. Need to modify the script to ingnore treansfers that finished on the next day after they started.


Create new tape pools for dirac and update accordingly the SRM grid-map file 160227

Start gathering tape recall stats for ATLAS 177612

Present AL two alternatives to choose from: 1) Create generic fileclass/tapepool 2) Remove the "unroutable file to tape" callout to working hours

Discuss with Khash about the urgency of RAID upgrade on CV13 ds and plan the intervention

Delete empty dirs from CASTOR (prompted by BD)

Test DB upgrade to CASTOR 2.1.15

Schedule with AL a CASTOR upgrade of preprod from scratch

RA to talk to AL about merging old CMS tape pools


GP on call next week

RA away on A/L

Miguel, the new DBA, has started