RAL Tier1 weekly operations Fabric 20100607

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
    • Preparing for CRISTAL 2 course.
    • Working on Virtualisation testbed.
  • Tim:
    • Air quality problems
    • CMS T10KB migration
    • Hepsysman
  • Cheney:
    • look for problem with sls availability stats
    • investigate infosec on web server logs
    • fix tape server down problem
    • investigate array reset problem
    • set up samba for Technology Dept
  • Jonathan:
    • showed Production Team how to solve atlasbackup problems
    • checked for root mail configuration
    • renewed host certificate for scrooge
    • renamed AFS userid and increased quota
    • helped user with access problems to AFS cell
    • created AFS area for PSCSG with 10Gb space limit
    • added new AFS userids
    • with James A prepared and sent DVD copy of /rutherford/robin-atlas1 to owner
    • issued new version of RPM tier1-sudo-config
    • 5 Nagios configuration updates
  • James A:
    • Migrated atlas software server to newer hardware
    • Propped up t1pg0373 (overwatch & myactions)
  • James T
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • Streamline 2009 disk servers Testing.
    • gdss67 replaced 4x1gb memory with James T and John. (Intervention)
    • gdss423 moved back into rack. (Fixed)
    • gdss469, 473, 474 and 476 faulty raid controller cards.(Intervention)
    • C2certdb replaced drive with Chris.
    • gdss207 need to reinstall.
    • gdss390 replaced memory by Ian and James T. Borrowed from gdss368.
    • Streamline/areca disk servers c

Absences

  • Monday (31st) was Bank Holiday
  • Jonathan on partial retirement (not in on Monday and Friday)

Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
    • On CRISTAL 2 course.
    • Working on Virtualisation testbed.
  • Tim:
    • Facilities castor - Plan install
  • Cheney
    • Sort out DMF space problems
    • track down quattor problem with service checking
  • Jonathan:
    • start regular check restores of home filesystem
    • continue investigations on setting up AFS directory as Atlas software server
    • Nagios configuration updates
  • James T:
    • On Leave.
  • James A:
    • Fixing t1pg0373.
    • Cleaning up after atlas software server migration.
    • Getting to grips with multi-path issues.
    • HEPSysMan.
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Continuous decommissioning old batch systems.(R 27)
    • Streamline 2009 disk servers testing.

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)
  • James T on leave 7th to 11th June

Fabric On-Call

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1