Difference between revisions of "RAL Tier1 weekly operations castor 11/01/2010"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 15:15, 11 January 2010

Summary of Previous Week

  • Restarted Castor services after UPS intervention (Castor Team)
  • Investigating RMmaster problem on GEN instance (Chris, Eter)
  • Cleaning up stager GEN DB to resolve the problem with RMmaster (Shaun, Chris, Eter)
  • SRM development (Shaun)
  • Investigating BigID on Atlas (DB Team)
  • Generated list of corrupted files on gdss70 and gdss79 (Chris)
  • Problem with gc on repack instance (Castor Team)
  • Set up ipmi bios configs on database servers (Cheney)
  • Updating of twiki for list of servers & ssh sigs (Cheney)
  • Writing of techwatch newsletter (Cheney)
  • Set up Vulcan testing of EMC kit (Cheney)
  • Set up database multipath ahead of EMC return to use (Cheney)

Developments for this week

  • Test new kernel on certification before implementing it during next week intervention (Chris)
  • Test restriction for users access on disk servers (Jonathan, Chris)
  • Investigating BigID on Atlas (DB Team)
  • Work on PreProduction instance (Richard, Chris, DB Team)
  • Continue investigation to find out why gc doesn't work on repack instance (Castor Team)
  • Return EMC kit to use on production servers (Cheney)
  • Build replacement database server (Cheney)
  • Install memory upgrade on castor databases (Cheney)
  • Config bios for ipmi on castor head nodes (Cheney)

Ongoing work

  • Investigate lhcbUser D2D copy problems (Matthew)

Operations Issues

  • Continuing SCSI errors appearing on rack nodes connected to Overland. Power related? - Disappeared since reboot last week

Blocking issues

  • Lack of Quattor configuration files for SLC4.8 is stopping us evaluating Quattor alongside CASTOR 2.1.8. Preprod setup will initially proceed with a Kickstart-based deployment.
  • Preprod DB can only be delivered after EMC testing is done (2nd week after Jan'10)

Planned, Scheduled and Cancelled Interventions

  • 13/14 January - migrate the Castor DBs back to the EMC disk arrays [NOT READY YET]
  • 19/20 January
- FSCK Disk servers and pick up new kernels.
- Add IPMI to Castor Head Nodes.
- Replace cdbc08 and add new DB archive log destination.
- Install NameServer CheckSum Trigger
- Upgrade of memory to DB nodes
  • The following have not been folded into the above schedule. These can be fitted around as they are, at worst, an ‘At Risk’.
- SRM Castor Client upgrade
- Update fetch-crl rpm on disk servers
- Restrict user login on disk servers

Advanced Planning

  • Gen upgrade to 2.1.8 2010Q1
  • Install/enable gridftp-internal on Gen (This year/before 2.1.8 upgrade)

Staffing

  • Castor on Call person: Shaun
  • A/L: Matt - Monday