Difference between revisions of "RAL Tier1 weekly operations Fabric 20100412"
From GridPP Wiki
Tim folkes (Talk | contribs) |
(No difference)
|
Latest revision as of 14:03, 12 April 2010
Contents
Developments
- All:
- Martin:
- Ian:
- On leave last week. Week before:
- Developed prototype SRM machine type in Quattor
- Helped ChrisK apply new lsf licenses
- Work on Virtualisation Platform
- Tim:
- Putting new hardware into production
- configuring more disk on DMF service
- T10K testing etc
- Cheney:
- patching
- wrote mucho docco on the wiki (ads, dmf, castor)
- fix atlasbackups after ads crash
- edit tsbn and sls scripts for changed tape pools
- James T:
- A/L Tuesday
- Chasing up of Streamline disk
- Viglen 09 disk testing
- SL5 disk server
- Worked on fixing kickstart issues (ongoing)
- Jonathan:
- sorted out atlasbackup problems on 45 nodes
- configured sapphire to Tier1 standards
- updated RPMs on central servers
- Nagios configuration updates
- updated RPMs on Nagios slave server and rebooted for new kernel
- updated documentation for Nagios database callouts to add sapphire
- updated configuration of nagios06 and restarted server
- James A:
- Finished upgrade of WNs to SL54.
- Benchmarked three WNs of each generation with HEPSPEC2006 and calculated new Scaling Factors for farm.
- Viglen 2009 WNs ready for production.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- install01 replaced heatsink fan.(Fixed)
- gdss274 replaced 3 drives and given back to castor.
- gdss318 given back to castor.
- lcg1235 replaced cpu/motherboard by HP engineer. (Fixed)
- ccse03 faulty PSU. (Intervention)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Tim out Wednesday morning
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Further work on Quattorised SRM with Shaun
- Preparation for Hepix
- Quattor medium term planning
- Work on virtualisation platform
- Tim:
- Job plans
- T10K install
- Facilities castor installation
- Cheney:
- patching
- job plan tasks
- James T:
- GridPP storage workshop Mon/Tues
- GridPP 24 Wed/Thurs
- Fix kickstart issues ASAP
- Keep an eye on the last few days of Viglen 09 disk testing
- Jonathan:
- on leave all week
- James A:
- Develop reliable test for Tier 1 Internet connectivity.
- Benchmark all Viglen 2009 WNs to verify performance.
- Prepare QUATTOR for Streamline 2009 WNs.
- Fix faulty ARTEMIS unit in LPD room.
- Provide network cabling for new ADS rack.
- Create BMS object in Nagios.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Continuous decommissioning old batch systems.(R 27)
Absences
- Jonathan on leave all week
- Kashif A/L (Tuesday)
- James T at GridPP until Thursday
Fabric On-Call
- Ian fabric on call Monday - Saturday