RAL Tier1 weekly operations Fabric 20100322
From GridPP Wiki
Contents
Developments
- All:
- Martin:
- Chasing minor procurement receipting
- Networking planning
- Discussions surrounding Atlas software area and use of AFS for it
- Decommissioning CV04 kit
- Installing new Dell hardware in rack
- HEPiX bookings
- Management issues
- Ian:
- Attended Quattor workshop
- Set up gLite update 62 - and contributed back to QWG
- Got test version of Aquilon running on system at RAL
- James T:
- Quattor installation of Viglen 09 kit for testing
- Testing of Viglen 09 kit
- Started testing
- Monitoring for faults
- Tier1 tour preparation
- Disk server deployment
- Documented procedure for deployment using quattor
- Allocated disk servers for deployment into Atlas NonProd with Quattor
- Jonathan:
- fixed atlasbackup problems on several nodes
- fixed ntpd process problem on lcgfts0423
- investigated issues around setting up software area for Atlas VO
- added AFS userid
- Nagios configuration updates
- built local up-to-date 64 bit version of nagios-plugins RPM
- James A:
- Completed SL54 upgrade on first five racks of WNs (270 Nodes, ~49% of Farm)
- Started acceptance testing on Viglen 2009 WNs.
- Provisioned network and power cabling for CASTOR Rack G.
- Worked on OPB and open day tours with JIT.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- Moved Viglen twin system to logistics for collection.
- gdss126 given back to castor.
- Unpacked and moved New Dell servers in UPS room with MJB.
- Added switches and cable bars in Rack in R27 A5 lower with MJB.
- Worked with HP Engineer. (Graham)
- gdss211 partition and re-install.
- Castor server cdbd03 managed to install Linux. (working ok)
- Castor server Fakecdb13 still working. (Intervention)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- minor procurements issues (receipting, invoicing)
- open day talk
- catch up with Cheney and Tim
- drafting various change control notices
- Ian:
- Help James T with 64 bit Castor disk server
- Help James A with new quattor server
- finalise brining new software install server into production
- James T:
- SL5.4 x86_64 + XFS disk server build for CASTOR testing
- Keep an eye on Viglen 2009 testing
- Tier1 tour prep
- Jonathan:
- continue work on setting up AFS storage for Atlas software
- continue reconfiguration of nagios06
- continue work on disposal of old kit from A1 Upper machine room
- James A:
- Monitoring acceptance testing.
- Starting acceptance testing on Streamline 2009 WNs.
- Continue SL54 upgrade, aiming for four-to-five more racks (112-148 nodes, another ~20-27% of the farm) by the end of the week.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Continuous decommissioning old batch systems.(R 27)
- Continuous working with HP engineer.
- Re-pack New disk servers rack sliders for return. (Wrong sliders)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)
- Kashif A/L (Tuesday)
Fabric On-Call
Ian Primary on call Monday-Sunday