Difference between revisions of "RAL Tier1 weekly operations castor 15/07/2016"
(Created page with " == Minutes from the previous meeting == Operation problems High load on one of the servers pf the cmsTape - gdss676(?) gdss730 and gdss654 went out production Draining of...") |
(No difference)
|
Latest revision as of 13:25, 15 July 2016
Contents
Minutes from the previous meeting
Operation problems
High load on one of the servers pf the cmsTape - gdss676(?)
gdss730 and gdss654 went out production
Draining of gdss748 is complete. The server is out of castor and handed over to the fabric team to swap back drives with gdss755
Operation news
The tape system is now fixed and it is back to nomrmal operation with all drives included Preventive maintaince of the two robots will be carried out on a date to be agreed
DB load was not excessive but need to find out why the atlas stager caused load peaks. Some focussed effort, perhaps what we need to do is ensure we have enough space for the logs from the primary to the backup
The draining script is ready
Long-term projects
Work on 2.1.15 upgrade continues liaising with CERN. Need to find the license under which CASTOR is distibuted for the new users.
Migration to aquilon and SL7 upgrade
Staffing
GS out Tuesday
AS out Friday
RA oncall I assume TBC
Actions
CASTOR TEAM Durham / Leicester Dirac data - need to create separate tape pools / uid / gid
RA disks servers requiring RAID update - locate servers and plan for update with fabric
RA decide what to do with persistent data (for daily test) is still on GenScratch
RA to update the doc for xroot certificates
GP to review with RA the mailing lists he is on
GP/RA to look at the stress test results for gdss596 and evaluate the WAN tuning parameters
Complete testing of the SRM DB duplicates removal script written by RA
Operation problems
CMS external xroot test is failing
gdss619 showed hardware problems and had to be set to read-only mode for RAID verify tests.
No route to tape issues for CMS due to the way file classes are set up
An number of facilities tape drives were down
AN LHCb tape containing 800 files has been physically lost. Tim is chasing this up.
Operation news
Tape system library is stable
Deployment of the Dell 2015 tape buffers has started. Three of them have been deployed to atlasNonProd service class
Long-term projects
Not much success with fixing the 2.1.15 installation in liason with CERN
Migration to aquilon and SL7 upgrade. Intermediate step: configure a VM as a tape server.
Staffing
CP on call next week
RA may take of some time in lieu
GP may leave earlier certain days
Actions
CASTOR TEAM Durham / Leicester Dirac data - need to create separate tape pools / uid / gid
RA disks servers requiring RAID update - locate servers and plan for update with fabric
RA decide what to do with persistent data (for daily test) is still on GenScratch
RA to update the doc for xroot certificates
GP to review with RA the mailing lists he is on
GP/RA to look at the stress test results for gdss596 and evaluate the WAN tuning parameters