RAL Tier1 weekly operations Fabric 20100412

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
    • On leave last week. Week before:
    • Developed prototype SRM machine type in Quattor
    • Helped ChrisK apply new lsf licenses
    • Work on Virtualisation Platform
  • Tim:
    • Putting new hardware into production
    • configuring more disk on DMF service
    • T10K testing etc
  • Cheney:
    • patching
    • wrote mucho docco on the wiki (ads, dmf, castor)
    • fix atlasbackups after ads crash
    • edit tsbn and sls scripts for changed tape pools
  • James T:
    • A/L Tuesday
    • Chasing up of Streamline disk
    • Viglen 09 disk testing
    • SL5 disk server
    • Worked on fixing kickstart issues (ongoing)
  • Jonathan:
    • sorted out atlasbackup problems on 45 nodes
    • configured sapphire to Tier1 standards
    • updated RPMs on central servers
    • Nagios configuration updates
    • updated RPMs on Nagios slave server and rebooted for new kernel
    • updated documentation for Nagios database callouts to add sapphire
    • updated configuration of nagios06 and restarted server
  • James A:
    • Finished upgrade of WNs to SL54.
    • Benchmarked three WNs of each generation with HEPSPEC2006 and calculated new Scaling Factors for farm.
    • Viglen 2009 WNs ready for production.
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • install01 replaced heatsink fan.(Fixed)
    • gdss274 replaced 3 drives and given back to castor.
    • gdss318 given back to castor.
    • lcg1235 replaced cpu/motherboard by HP engineer. (Fixed)
    • ccse03 faulty PSU. (Intervention)

Absences

  • Jonathan on partial retirement (not in on Monday and Friday)
  • Tim out Wednesday morning

Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
    • Further work on Quattorised SRM with Shaun
    • Preparation for Hepix
    • Quattor medium term planning
    • Work on virtualisation platform
  • Tim:
    • Job plans
    • T10K install
    • Facilities castor installation
  • Cheney:
    • patching
    • job plan tasks
  • James T:
    • GridPP storage workshop Mon/Tues
    • GridPP 24 Wed/Thurs
    • Fix kickstart issues ASAP
    • Keep an eye on the last few days of Viglen 09 disk testing
  • Jonathan:
    • on leave all week
  • James A:
    • Develop reliable test for Tier 1 Internet connectivity.
    • Benchmark all Viglen 2009 WNs to verify performance.
    • Prepare QUATTOR for Streamline 2009 WNs.
    • Fix faulty ARTEMIS unit in LPD room.
    • Provide network cabling for new ADS rack.
    • Create BMS object in Nagios.
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Continuous decommissioning old batch systems.(R 27)

Absences

  • Jonathan on leave all week
  • Kashif A/L (Tuesday)
  • James T at GridPP until Thursday

Fabric On-Call

  • Ian fabric on call Monday - Saturday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1