RAL Tier1 weekly operations castor 08/08/2011
From GridPP Wiki
Revision as of 10:56, 5 August 2011 by Matt viljoen (Talk | contribs)
Contents
Operations News
- Repack deployed into production with 6 disk servers
- New Facilities databases successfully stress tested with preprod. Up to 1000 db transactions per minute recorded - double the typical production peaks on Tier1 instances
- Tier1 dashboard now switched from old Castormon to new Nagios Castor monitoring replacement
- Cron on ATLAS now changed to run at 1300 rather than at night, to reduce number of low space related callouts
Operations Problems
- 7 LHCb files from Sep/Oct 2010 were found to be present in NS but physically missing. Cause of their loss unknown.
Blocking Issues
none
Planned, Scheduled and Cancelled Interventions
Entries in/planned to go to GOCDB none
Advanced Planning
- Move Tier1 instances to new Database infrastructure which with a Dataguard backup instance in R26
- Move Facilities DB instance to new Database hardware running 10g
- Upgrade SRMs to 2.11 which incorporates VOMS support
- Start migrating from T10KA to T10KC media later this year
- Certify 2.1.11 and evaluate the new LSF replacement
- Quattorization of remaining SRM servers
- Hardware upgrade, Quattorization and Upgrade to SL5 of Tier1 CASTOR headnodes
Staffing
- Castor on Call person: Chris
- Staff absence/out of the office:
- Matthew (A/L)