Summary of Previous Week
Developments
- All
- Tier1 relocation
- Sporadic attendance at HEPSysMan
- Martin
- Managing Tier1 relocation
- Network management
- Procurements
- Ian
- Monitoring network etc during move
- Work with MattH and Derek on gLite/Quattor
- Hepsysman talk - and attending some sessions
- James T
- Re-initialized arrays on Streamline 2008 kit as they were configured differently.
- Presentation on verifies at HEPSysMan
- Jonathan
- updated DHCP server to set new MAC address for puppetdev (for Chris K)
- worked on list of Fabric Team documentation (for Martin/Gareth)
- worked with James A on network cabling for Datastore
- updated iptables on lcgsql0363 to correct netmask for some rules
- allow pat to handle mail for Castor DB systems (request from Cheney)
- corrected atlasbackup problems for 6 nodes (old tapes not deleted)
- updated Nagios configuration on netnag (temporary Nagios server for R89 migration)
- disabled check for nagios process on master server from nagios01/2/5
- prepared and released updated RPM tier1-nrpe-config with additional servers (nagger, netnag)
- worked on installation of Nagios 3 on nagger
- James A
- Started up all batch capacity (old & new).
- Started simple load testing on all WNs in R89 to test air-con and begin acceptance testing of new systems.
- Assisted where necessary with cabling of various racks and systems.
- Laid cables for ADS shelves with JFW.
- Kash
- Drive replacement.
- Fixing broken WNs.
- gdss156 ready for production.
- Moved srm servers from R27 to R89 with MJB.
- Replaced new memory in gdss192, 207, 192, 226 and 357.
- Working on gdss73, 192, 196, 198, 102, 128, 266, 121, 135, 150, 243.
Operational Issues and Incidents
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Tier1 Move
|
18 June
|
6 July
|
All
|
Severe
|
Plans for Week(s) Ahead
Operational Issues and Incidents
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Network access off site will be down due to software and board updates to the site routers.
|
~07:30 7 July
|
~10:30 7 July
|
All
|
Severe
|
Development Priorities
- Martin
- Procurement preparation
- Updating of network switch software (Tuesday)
- Ian
- Fabric Working group
- Start work on production quattor server
- Quattor FP7 bid preparation
- James T
- Polish off move (xrootd/NFS servers, remaining interventions)
- Acceptance testing new disk hardware
- Jonathan
- Nagios configuration updates as required
- restart normal Nagios service with callouts
- compete list of Fabric Team documentation
- complete adding simple Nagios configuration documentation to wiki
- continue configuration work on nagger
- resurrect plan to move home filesystem to new server
- create SL5 version of tier1-sendmail-config RPM
- continue to plan AFS migration from Kerberos 4 to Kerberos 5
- James A
- Startup of batch system.
- Join Ian's work on QUATTOR.
- Updating IPMI card firmware on various systems.
- Kash
- Drive replacement.
- Fixing broken WNs.
- Working with Viglen Engineer.
- Continue working on gdss73, 192, 196, 198, 102, 128, 266, 121, 135, 150, 243.
Absences
Fabric On-Call
Advanced Warning of Requirements and Blocking issues
Service Issues
- RT# 38567 - Dedicated WN for Alice (SW area + gridftp area):
- RT# 40180 - Resurrect PPS hardware
- RT# 44835 – non capacity HW for testing (Services)
Category:RAL Tier1
RAL Tier1 weekly operations fabric