Difference between revisions of "RAL Tier1 weekly operations castor 26/10/2009"

From GridPP Wiki
Jump to: navigation, search
 
(No difference)

Latest revision as of 15:40, 2 November 2009

Summary of Previous Week

  • Building Quattor templates for preprod (Richard)
  • Deployment+draining training for Alistair (Brian)
  • Established locations for ATLAS disk deployment (Brian)
  • SRM debugging of disk copy problem - fix in 2.1.8-2 (Shaun)
  • Developed DB fix to allow checksumming to work on 2.1.7 (Shaun)
  • Deployed new disk servers for LHCb,ATLAS,CMS (Chris)
  • Deployed disk servers to nonprod (Tiju,Richard,Alistair,AndrewL)
  • Setting up repack (Chris)
  • Attending LTUG (Tim)
  • All tape drives now up and running (Tim)
  • Testing various combinations of EMC kit versus power supply (Cheney)
  • Regen nagios config for diskservers (Cheney)
  • Build spare tape robot controller (Cheney)
  • Build replacement DB server (Cheney)
  • Fixed Nagios callout problem (Cheney)
  • CASTOR/Fabric work transfer proposal (All)
  • Wrote script to bump up of unique file IDs of files with reused IDs (Matthew)
  • Making ATLAS file lists for comparison to LFC (Matthew)
  • Disaster Management of recent data-loss (Matthew)

Developments for this week

  • SRM 2.8-2 deployment on all instances (Shaun)
  • Working on puppet manifest for polymorphic central servers (Chris)
  • Setup 2.1.9 on repack server (Chris, Tim)
  • Testing Quattor templates on preprod servers (Richard)
  • Techwatch newsletter (Cheney)
  • Chasing up strategic objectives (Matthew)
  • Reviewing preprod plans (Matthew)
  • Disaster Management of recent data-loss (Matthew)
  • Deploying 3 new disk servers for repack server (Matthew, Shaun)

Ongoing

  • Improving resilience on central servers (Chris, Shaun)
  • CastorMon monitoring graphs for Gen instance (Brian)
  • Disaster recovery document (Matthew)

Operations Issues

none

Blocking issues

none

Planned, Scheduled and Cancelled Down Times

Entries in/planned to go to GOCDB

Description Start End Type Affected VO(s)
Upgrade SRM to 2.8-2 26/10/09 1000 26/10/09 1200 At Risk ATLAS and LHCb
Upgrade SRM to 2.8-2 27/10/09 1000 27/10/09 1200 At Risk CMS and Gen

Changes to Production Milestones

none

Advanced Planning

  • Black and White lists will be tested and introduced on ATLAS
  • Install/enable gridftp-internal on Gen (This year)

Staffing

  • Cheney away (Thurs)
  • Castor on Call person: Shaun