RAL Tier1 weekly operations castor 29/02/2016
From GridPP Wiki
Revision as of 10:41, 26 February 2016 by Alison Packer 52064d6050 (Talk | contribs)
Operations News
- No disk server issues this week
- globc updates applied, all CASTOR systems rebooted. initial issues with head nodes, 7 failed to reboot due to their build history. ACTION: they need their quattor build revisited so that this does not recur.
- Main CIP system failed, have failed over to test CIP machine. HW failure to be fixed then will fail back over to production system
- 11.2.0.4 DB client update had to be rescheduled, should go ahead Monday 29th, has been running in pre-prod for considerable amount of time. This should be transparent.
- castor 2.1.15 update
- ns upgrade on day of 29thFeb-3March; Downtime for all VOs
- stager upgrade for one VO week commencing 21/3/16
- Repack updated to 2.1.14-15
- 2.1.15 works on preprod (RAL xroot rpm build) had not been put under stress yet
- castor 2.1.16 coming soon - SRM integration into CASTOR code base
- ATLAS gSoap Errors; JK (SdW advised) restarted SRM front ends
- CMS AAA still an issue
- LHCb upload still problematic
- VO DiRAC people from Leicester are coming online -
- 2.1.15 change control had its first airing in change control - 2.1.15 currently not working for us.
- new tape backed disk servers for Tier1 - to replace CV11, recommendation made to Martin
- Merging tape pools wiki created by Shaun
- 2.1.15 name server tested
- New SRM on vcert2
- New SRM (SL6) with bug fixes available - needs test
- Gfal-cat command failing for atlas reading of nsdumps form castor: https://ggus.eu/index.php?mode=ticket_info&ticket_id=117846. Developers looking to fix within: https://ggus.eu/index.php?mode=ticket_info&ticket_id=118842
- LHCb batch jobs failing to copy results into castor - changes made seems to have improved the situation but not fix (Raja). Increasing the number of connections to the NS db (more threads)
- BD looking at porting persistent tests to Ceph