Difference between revisions of "RAL Tier1 weekly operations castor 07/09/2009"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 14:56, 7 September 2009

Summary of Previous Week

  • CIP testing(Jens)
  • Deployment of new LSF archiving script (Chris/Cheney)
  • Improved database metric monitoring (Eter/Carmine/Rich)
  • Development of SRM monitoring (Shaun)
  • Installation of new CASTOR servers (Chris/Cheney)
  • Disaster recovery document (Matt)
  • 2.1.8 upgrade planning (Matt)

Developments for this week

  • CIP development (Jens)
  • Installation of new CASTOR servers (Chris/Cheney)
  • Finalizing plans for database performance tuning (Eter/Carmine/Rich)
  • Certification of 2.1.8 NS (Chris)
  • Schedule deployment of SRM 2.8 (Shaun)

Ongoing

  • CastorMon monitoring graphs for Gen instance (Brian)
  • Setting up Preproduction (Matt, Chris)

Operations Issues

none

Blocking issues

  • Problems with ganglia check on GEN instance delaying work on monitoring (in hand)

Planned, Scheduled and Cancelled Down Times

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s)
Upgrade Gen SRM to 2.8 9/9/09 1000 9/9/09 1200 Downtime Gen
Upgrade LHCb SRM to 2.8 14/9/09 1000 14/9/09 1200 Downtime LHCb
Nameserver upgrade and database optimization 15/9/09 0900 15/9/09 1300 Downtime All
Update kernels on database servers 15/9/09 1300 16/9/09 1700 At Risk All
Upgrade CMS SRM to 2.8 16/9/09 1000 16/9/09 1200 Downtime CMS
Upgrade ATLAS SRM to 2.8 21/9/09 1000 21/9/09 1200 Downtime ATLAS
Suspend CASTOR during R89 UPS test 22/9/09 0800 22/9/09 1000 Downtime All

Changes to Production Milestones

none

Advanced Planning

  • CIP upgrade to include nearline publishing (Sept)
  • SRM 2.8 upgrade (Sept)
  • Upgrade nameserver to 2.1.8 (Sept)
  • Black and White lists? (Possibly during Sept)
  • Improve resiliency to central services (This year)

Staffing

  • Castor on Call person: Chris