RAL Tier1 weekly operations castor 31/08/2009

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

  • CIP testing(Jens)
  • Deployment of new LSF archiving script (Chris/Cheney)
  • Installation and restart of all CASTOR services following security patch (All)
  • ATLAS data reconciliation (Brian)
  • Improved database metric monitoring (Eter/Carmine/Rich)
  • Development of SRM monitoring (Shaun)
  • Deployment of new SRM on certification (Shaun)

Developments for this week

  • CIP development (Jens)
  • Installation of new CASTOR servers (Chris/Cheney)
  • Analysis of database performance tuning results(Eter/Carmine/Rich)
  • Certification of 2.1.8
  • Schedule deployment of SRM 2.8

Ongoing

  • CastorMon monitoring graphs for Gen instance (Brian)
  • Setting up Preproduction (Matt, Chris)

Operations Issues

  • Some problems seen with disk servers following kernel upgrade. Fabric team investigating.

Blocking issues

  • Problems with ganglia check on GEN instance delaying work on monitoring (in hand)

Scheduled and Cancelled Down Times

none

Changes to Production Milestones

none

Advanced Planning

  • CIP upgrade to include nearline publishing (Sept)
  • SRM 2.8 upgrade (Sept)
  • Work with Fabric to add extra RAID card in remaining Viglen'06 disk servers (Second half of August)
  • Upgrade nameserver to 2.1.8 (Possibly during September)
  • Black and White lists? (Possibly during September)
  • Improve resiliency to central services (This year)

Staffing

  • Castor on Call person: Shaun