https://www.gridpp.ac.uk/w/index.php?title=RAL_Tier1_weekly_operations_Fabric_20110228&feed=atom&action=history
RAL Tier1 weekly operations Fabric 20110228 - Revision history
2024-03-29T00:52:21Z
Revision history for this page on the wiki
MediaWiki 1.22.0
https://www.gridpp.ac.uk/w/index.php?title=RAL_Tier1_weekly_operations_Fabric_20110228&diff=2766&oldid=prev
Kashif hafeez at 14:15, 7 March 2011
2011-03-07T14:15:51Z
<p></p>
<p><b>New page</b></p><div>== Developments ==<br />
* All:<br />
<br />
* Martin:<br />
** <br />
<br />
* Ian:<br />
** iSCSI performance testing/tuning<br />
** Building production cvmfs mirror/replica<br />
** Virtualisation<br />
<br />
* Tim:<br />
** <br />
<br />
* James A:<br />
** <br />
<br />
* James T<br />
** Applied new WAN tuning to all remaining CASTOR instances<br />
** Documentation (Loggers, Ganglia)<br />
** V10/SL10<br />
** SL08<br />
<br />
* Cheney<br />
** DMF disaster recovery testing<br />
** set up rsync for Greg Matthews<br />
** tinker with backups for Nick H<br />
** help Johnathn Churchill with his fibre<br />
** investigate problems with tape controller<br />
** <br />
<br />
* Kash:<br />
** Drive replacement.<br />
** Fixing broken WNs.<br />
** Decommissioning old batch systems.(R 27)<br />
** gdss380 added new mac address in dhcp, need re-install.<br />
** Start adding correct hotspare in (SL09 & SL10)<br />
** gdss496 started verify fix.(Intervention)<br />
** gdss115 multiple drives failure. (Out of production)<br />
** Reported faulty memory in New Dell system in ups room.<br />
** Update firmware on Jetstor systems.(ongoing) Updated on two.<br />
** Checked all SL09 and SL10 disk servers. (for failed stripes)<br />
** Test room review. (Every Monday morning)<br />
** Check Clustervision new batch systems. (Testing)<br />
** Replaced drive in system in EMC rack.(MTI)<br />
** Replaced drives in loggers1 & 2.<br />
** SL08 testing continue.<br />
** Pack Viglen switch and cables for return.<br />
<br />
<br />
<br />
=== Operational Issues and Incidents ===<br />
<br />
{| border=1 align=center<br />
|- bgcolor="#7c8aaf"<br />
! Index<br />
! Description<br />
! Start<br />
! End<br />
! Severity<br />
! Affected VO(s)<br />
|-<br />
|}<br />
<br />
== Summary of plans for week ahead ==<br />
<br />
=== Scheduled and Cancelled Down Times ===<br />
<br />
Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB<br />
<br />
{| border=1 align=center<br />
|- bgcolor="#7c8aaf"<br />
! Component<br />
! Description<br />
! Start<br />
! End<br />
! Affected VO(s)<br />
! Type<br />
|-<br />
|}<br />
<br />
=== Development priorities ===<br />
<br />
* All<br />
<br />
* Martin:<br />
**<br />
<br />
* Ian:<br />
** Finalising production cvmfs mirror/replica<br />
** switch WNs to use on site cvmfs replica<br />
** further iSCSI reserach<br />
** Plan deployment of management network hardware<br />
<br />
<br />
* Tim:<br />
** <br />
<br />
* Cheney<br />
** DMF disaster recovery<br />
** Backups<br />
** Rysnc<br />
<br />
* James T:<br />
** Documentation<br />
** Preparation for handover<br />
<br />
* James A:<br />
** <br />
<br />
* Kash:<br />
** Drive replacement.<br />
** Fixing broken WNs.<br />
** Correct hotspare configuration in SL09 disk servers.<br />
** Hardware failure metrics continue.<br />
** SL08 testing.<br />
** Continuous decommissioning old batch systems.(R 27)<br />
<br />
=== Absences ===<br />
*<br />
** Cheney on leave - tues, wed possibly thurs.<br />
** James A on Leave Monday<br />
** IAn out Wednesday<br />
<br />
=== Fabric On-Call ===<br />
<br />
* Kashif Monday - Sunday<br />
<br />
=== Advanced Warning of Requirements and Blocking issues ===<br />
<br />
<br />
=== Services Issues ===<br />
<br />
<br />
----<br />
[[RAL Tier1 weekly operations fabric]]<br />
<br />
[[:Category:RAL_Tier1]]</div>
Kashif hafeez