RAL Tier1 weekly operations Fabric 20100726

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
    • Abortive multipath update and consequential sortout
    • Disk ITT evaluation
    • Yet more spend planning
  • Ian:
    • Got first hyper-v vms working
    • Provided initial vms for GST testbed
    • Two days a/l
  • Tim:
    • Tape library microcode updated
    • Tweaks to repack system (different scheduling policies, extra disks)
    • DMF data removal for some second copies
    • CMS non-migration investigation
  • Jonathan:
    • arranged disposal of redundant servers
    • wrote archive tapes for several old experiment filesystems
    • created new pool accounts for CMS and then fixed related NIS problem problem
    • assisted user with AFS and reset his password
    • removed Tier1 userid
    • reset password for Tier1 user
    • 1 Nagios update
  • James A:
    • Focussing on new Quattor server
    • Generally assistance where needed
    • Continued learning about BIND and DNS.
  • James T
    • Configured rsyslog to log to central loggers
    • Re-cabled the Streamline 2009 disk servers with James A (thanks to James A)
    • Started acceptace tests on Streamline 2009 kit
    • Created a CASTOR 2.1.9 disk server build in quattor
    • Helped Kash with the "shrinking" of pre-prod disk servers
    • Read through some of tender responses
    • Two disk servers for Repack
  • Cheney
    • swapped in replacement robot controller
    • rebooted disk arrays on preprod
    • reset preprod disk arrays after hard lockup
    • tweaked tsbn to pick up data from changed tape servers
    • write script to automate restore of database backups for testing
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • gdss78 passed acceptance test and given back to Castor team.
    • Replaced 10 drives in Streamline 2009 (Test) disk servers by Gareth (Streamline).
    • gdss207 crashed again. (Intervention)
    • gdss486 received back from Streamline. (Testing)
    • gdss105 and gdss106 assigned to Tim for testing.
    • gdss187 fsprobe errors. (Intervention)
    • Hardware failure stats/graphs.
    • gdss536 and gdss537 replaced Adaptec cards with LSI cards. (Gareth Streamline)
    • Preparing Viglen 2006 disk servers with new raid configuration for Castor Preprod.
    • Streamline/areca disk servers crashed due to single faulty drive. (ongoing)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)

Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
    • Disk ITT evaluation
  • Ian:
    • Further development of services virtualisation testbed
    • Support for Castor Quattor configuration
    • Planning cernvm-fs testing
  • Tim:
    • Facilities Castor planning
    • ADS futures planning
  • Cheney
    • set up quatted castor core servers
  • Jonathan:
    • On leave Tuesday - Thursday, so out all week
  • James T:
    • Away on Scout Camp all week
  • James A:
    • Finalise and test new Quattor server
    • Planning of CVMFS load testing
    • Learning about errata updates in Quattor
    • Continue learning about BIND and DNS.
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • gdss207 received wrong raid card reported again.
    • gdss380 run 7 days acceptance test.
    • Look after Streamline 2009 disk servers testing in absence of James T.
    • Continuous decommissioning old batch systems.(R 27)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)
  • Jonathan on leave Tuesday - Thursday
  • James T on special leave all week
  • Kashif Annual leave on Tuesday.

Fabric On-Call

    • Ian Primary oncall all week

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1