RAL Tier1 weekly operations Fabric 20100315

From GridPP Wiki
Jump to: navigation, search

Summary of week gone

Developments

  • All:
  • Martin:
    • Finaliased C300 procurement
    • Team stuff
    • Decommissioning Compusys04 disk arrays
  • Ian:
    • Further work on Second installation server
    • Organised initial meetings of T1 Virtualisation startegy working group
    • Further work on Quattorisation of castor servers with Chris
    • Worked with Catalin on LFC
  • James T:
    • ATLAS WAN Tuning on disk servers that were missing it (and checked that kickstarts were OK)
    • Tweaks to Quattor disk server build
    • Tier1 tour prep
    • Security meeting with Chris and Tiju
    • Work on testing Viglen 09 kit
  • Jonathan:
    • sorted out atlasbackup problems for several servers
    • responded to query about missing SGM userids
    • tested Nagios process on nagios06; worked on configuration problems encountered during test
  • James A:
    • Started migration of Worker Nodes to SL5.4
    • Started installation of latest Worker Node purchase.
    • Various certificate updates (eg. overwatch).
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • gdss203 replaced 8x1gb memory and given back to castor.(Fixed)
    • Cabling in R89 HPD room with James A.
    • Worked with Streamline Engineer. (Gareth)
    • gdss347 replaced 4x2gb memory fixed and back into production.
    • Castor servers (cdbc13/cdbd03) still working. (Intervention)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)
  • Jonathan: ½ day Annual Leave (10/3 pm), 1 day sick leave (11/3)

Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
    • Team reorganisation stuff
    • Dell delivery + installation
    • A5L Services racks installations
  • Ian:
    • Attending Quattor workshop
    • Work on second installation server
    • Status reports on strategy tasks (Quattor & Virtualisation)
  • James T:
    • Viglen 09 testing
    • Tier1 tour prep
    • Work on quattorising other generations of disk server (if possible).
  • Jonathan:
    • continue reconfiguration of nagios06
    • work on disposal of old kit from A1 Upper machine room
  • James A:
    • Cabling CASTOR racks for Data and IPMI.
    • Continuing Worker Node update to SL5.4.
    • Monitoring acceptance testing on new Worker Nodes.
    • Setting up new QUATTOR server.
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Continuous decommissioning old batch systems.(R 27)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)
  • Ian attending Quattor workshop Tuesday - Friday

Fabric On-Call

James T Monday - Thursday

James A Friday - Saturday

Ian Sunday

Advanced Warning of Requirements and Blocking issues

Services Issues

Category:RAL_Tier1

RAL Tier1 weekly operations fabric