RAL Tier1 weekly operations castor 27/07/2009

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

  • SRM development. Testing now underway at CERN (Shaun)
  • Setting up BDII host for Certification for CIP testing (Cheney)
  • c2probe work (Cheney)
  • Investigating low data rates between BNL and RAL. Suspect it's max. TCP window size (Brian)
  • Cleaning ATLAS namespace (Brian)
  • Add final 2 tape drives to CASTOR - all online now(Tim)
  • Deployed 2.1.8-8 on 4 more tape servers (Chris)
  • Setting up SRM on certification to work with production for CMS (Shaun)
  • Rewriting and improving puppet restarter (Shaun)
  • CIP development on certification (Jens)
  • CASTOR Disaster Recovery document (Matt)
  • Increased lhcbDst total ROOTD slots from 100-400 (Matt)

Developments for this week

  • CIP development on certification (Jens)
  • Investigate 2.1.8 NS client on 2.1.7 NS DB (Chris)
  • Deployed 2.1.8-8 on remaining tape servers (Chris)
  • Configuring disk servers to use gridftp-internal (Chris)
  • ATLAS bulk deletion on files in CASTOR but not in LSF (Brian)
  • Tier1 Production Manager (Matt)
  • CASTOR Disaster Recovery document (Matt)
  • Interviewing for CASTOR Pre Production Service Manager post (Matt)
  • Establish with experiments when to intervene with hardware on D0T1 disk servers (Matt)
  • Write CASTOR status for oversight committee (Matt)
  • Find out more about CERN's virtualized certification setup (Chris, Matt)
  • Python 3 training course (Cheney)

Ongoing

  • Fix missing graphs on castormon - Atlas tape migration and aggregated Gen monitoring(Brian)
  • Cleaning up database for a future 2.1.8 upgrade (Shaun)
  • Setting up Preproduction (Matt)

Operations Issues

none

Blocking issues

none

Scheduled and Cancelled Down Times

none

Changes to Production Milestones

none

Advanced Planning

  • Work with Fabric to add extra RAID card in remaining Viglen'06 disk servers (July/August)
  • SRM 2.8 upgrade (sometime during August)
  • CIP upgrade to include nearline publishing (July/August)
  • Improve resiliency to central services (This year)
  • Upgrade nameserver to 2.1.8 (Possibly during September)
  • Database optimization tasks (September)
  • Black and White lists? In discussion with ATLAS to establish their requirements

Staffing

  • Castor on Call person: Chris
  • Shaun on A/L
  • Matt on TOIL Mon morning, A/L Thurs