https://www.gridpp.ac.uk/w/index.php?title=RAL_Tier1_weekly_operations_Fabric_20091116&feed=atom&action=history
RAL Tier1 weekly operations Fabric 20091116 - Revision history
2024-03-28T19:13:07Z
Revision history for this page on the wiki
MediaWiki 1.22.0
https://www.gridpp.ac.uk/w/index.php?title=RAL_Tier1_weekly_operations_Fabric_20091116&diff=2269&oldid=prev
James thorne at 15:50, 16 November 2009
2009-11-16T15:50:28Z
<p></p>
<p><b>New page</b></p><div>== Summary of week gone ==<br />
<br />
=== Developments ===<br />
* All:<br />
<br />
* Martin:<br />
** Completed Disk Procurement eval<br />
** more work on EMC arrays problems<br />
** CPU ITT evaluation<br />
<br />
* Ian:<br />
** Work on Quest FP7 bid<br />
** Rolling out kernel security update on quattor system<br />
** First look at disk failure stats<br />
<br />
* James T:<br />
** Updated ganglia configs for Storage_LHCb and Services_Grid<br />
** Viglen disk server problems<br />
** TOASTER prep<br />
<br />
* Jonathan:<br />
** reconfigured NIS servers to allow access to shadow map from any port<br />
** check AFS servers for contacts from compromised Manchester system<br />
** BIOS update for sv-08-06 (to be lcgcc-s3-06)<br />
** sorted out problems with atlasbackup for many nodes<br />
** sorted out ntp configuration problem on t1pg0373<br />
** Nagios configuration updates<br />
** updated tier1-nagios-plugins to version 2.0-58<br />
** gave talks about Nagios to Production Team etc <br />
<br />
* James A:<br />
** A/L<br />
<br />
* Kash:<br />
** Drive replacement.<br />
** Fixing broken WNs.<br />
** gdss262 replaced 8x1gb memory fixed and back in production.<br />
** gdss67 need to run 7 days test.<br />
** gdss125 given back to castor<br />
** gdss413 replaced 4x2gb memory.(Ready for deployment)<br />
** sl4sys32-sl4sys64 replaced PSU.<br />
** Working on 2008 Disk servers and working nodes.<br />
** Working on gdss67, 163 and 282.<br />
<br />
=== Operational Issues and Incidents ===<br />
<br />
{| border=1 align=center<br />
|- bgcolor="#7c8aaf"<br />
! Index<br />
! Description<br />
! Start<br />
! End<br />
! Severity<br />
! Affected VO(s)<br />
|-<br />
| <br />
| EMC arrays serving 3D/LFC/FTS databases made unstable by attempts to stabilise the Castor EMC arrays<br />
| Tuesday 6/0ct am<br />
| not in sight<br />
| Catastrophic<br />
| All<br />
|-<br />
|}<br />
== Summary of plans for week ahead ==<br />
<br />
=== Scheduled and Cancelled Down Times ===<br />
<br />
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB<br />
<br />
{| border=1 align=center<br />
|- bgcolor="#7c8aaf"<br />
! Component<br />
! Description<br />
! Start<br />
! End<br />
! Affected VO(s)<br />
! Type<br />
|-<br />
|}<br />
<br />
=== Development priorities ===<br />
* All<br />
** Work on evacuating A1 Upper (Castor admin and LSF systems)<br />
<br />
* Martin:<br />
** complete CPU ITT evaluation<br />
** testing sample hardware<br />
** install database test boxes<br />
<br />
* Ian:<br />
** Further Quattor FP7 work (last week)<br />
** Finish roll out new kernels on Quattor managed machines<br />
** Kernels on SL4 batch workers<br />
** Work on CPU procurements<br />
** Castor Quattor tutorial<br />
<br />
* James T:<br />
** Viglen disk server problems<br />
** CRISTAL2 preparation<br />
** Catch up on helpdesk tickets and other actions<br />
** Disk server kernel updates<br />
<br />
* Jonathan:<br />
** Set up regular checks of backups for home filesystem, AFS volumes and MySQL databases<br />
** Quattor implementation for Nagios slave<br />
** update environment for SL5 systems<br />
** updates to farm to allow Babar functional userids to migrate home filesystem<br />
** Nagios configuration updates <br />
<br />
* James A:<br />
** A/L<br />
<br />
* Kash:<br />
** Drive replacement.<br />
** Fixing broken WNs.<br />
** gdss67 rebuild from scratch and move in HPD room.<br />
** Continuous working on 2008 disk servers and working nodes.<br />
** Continuous working on gdss67, 163 and 282.<br />
<br />
=== Absences ===<br />
<br />
* James A<br />
** Annual Leave (Mon 9th - Fri 20th).<br />
<br />
=== Fabric On-Call ===<br />
<br />
* Mon-Sun: Ian<br />
<br />
=== Advanced Warning of Requirements and Blocking issues ===<br />
<br />
=== Services Issues ===<br />
<br />
* Various requests for hardware.<br />
<br />
[[:Category:RAL_Tier1]]<br />
<br />
[[RAL Tier1 weekly operations fabric]]</div>
James thorne