RAL Tier1 weekly operations Fabric 20100419
From GridPP Wiki
Revision as of 10:02, 21 April 2010 by Jonathan wheeler (Talk | contribs)
Contents
Developments
- All:
- Martin:
- Ian:
- Tim:
- Installing new ADS servers (3 down one to go)
- Configuring new disk on DMF service.
- New enlarged DMF area put into production to improve stability of system
- Investigating CMS migration problems
- Cheney:
- racked up the infortrends
- wrote some docco on castor
- started on docco for DR
- wrote up next years job plan
- planned a new backups server for db
- wrote change control docco
- James T:
- GridPP storage workshop Monday and Tuesday
- GridPP 24 Wednesday and Thursday
- Worked on SL5 disk server build with Ian
- Played with Hadoop Distributed File System
- Jonathan:
- On leave
- James A:
- Swapped RAM in gdss370 and 405 to expedite 405's return to service.
- Updated ARTEMIS room images.
- Ran HEPSPEC2006 ten times on all new Viglen 2009 WNs to verify performance.
- Generated Weathermap video for Gareth.
- Wrote a Nagios plugin to check for web connectivity.
- Prepared QUATTOR for Streamline 2009 WNs.
- Switched ARTEMIS to xml feed monitoring of temperatures for 100x precision increase.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Decommissioning old batch systems.(R 27)
- gdss405 kernel panic .(Faulty memory) Fixed by James A.
- propod1 reported single psu failure. (Transtec)
- lcg1235/1236 bios settings updated by Streamline engineer.
- ccse03 faulty PSU reported.(Intervention)
Absences
- Jonathan on leave all week
- Kashif A/L (Tuesday)
- James T at GridPP until Thursday
- Tim working at home Monday
Operational Issues and Incidents
Index | Description | Start | End | Severity | Affected VO(s) |
---|
Summary of plans for week ahead
Scheduled and Cancelled Down Times
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB
Component | Description | Start | End | Affected VO(s) | Type |
---|
Development priorities
- All
- Martin:
- Ian:
- Tim:
- Finish installing ADS servers.
- Finish configuring new ADS servers (Nagios/Ganglia etc)
- Continue investigating CMS migtration problems
- Investigate Atlas recall problems
- APR/Job plans
- DB Backup procedures
- SSC finance training Thursday
- Cheney:
- Last year APR/job plan
- Evaluate mgt utilities for SRB guys
- Write more docco
- install nagios on new ads servers
- James T:
- Fix disk server kickstart issues
- Quattor disk server build tweaks
- Follow up problems with new disk servers with the vendor
- APR and job plan
- SSC finance training Tuesday morning
- Jonathan:
- Catch up after being away
- APR and job plan
- SSC finance training Tuesday morning
- James A:
- Sort out flaky ARTEMIS base unit in LPD room.
- Begin acceptance testing Streamline 2009 WNs when handed over.
- Establish Viglen 2009 WN compatibility with SL4.
- Reclaim Nortel 5510 from CASTOR rack.
- Put Viglen 2009 WNs into production.
- Investigate cool-sounding PBS idle WN control.
- Kash:
- Drive replacement.
- Fixing broken WNs.
- Continuous decommissioning old batch systems.(R 27)
Absences
- Jonathan on partial retirement (not in on Monday and Friday)