https://www.gridpp.ac.uk/w/index.php?title=RAL_Tier1_weekly_operations_castor_26/03/2012&feed=atom&action=historyRAL Tier1 weekly operations castor 26/03/2012 - Revision history2024-03-28T15:57:33ZRevision history for this page on the wikiMediaWiki 1.22.0https://www.gridpp.ac.uk/w/index.php?title=RAL_Tier1_weekly_operations_castor_26/03/2012&diff=3142&oldid=prevMatt viljoen at 08:26, 28 March 20122012-03-28T08:26:40Z<p></p>
<p><b>New page</b></p><div>== Operations News ==<br />
* (Tue) Latest errata, kernel applied to all SRMs and rebooted.<br />
* 2.1.12 installed and being prepared and tested for repack upgrade<br />
<br />
== Operations Problems ==<br />
* Still seeing occasional problems on ATLAS SRM with failed requests, but no more crashing since 2.11-1 upgrade. CERN have now seen this problem, meaning it's a genuine bug in the CGSI_gSOAP package. Too low level to worry about -> parking.<br />
* Wrong checksum algorithm in xroot client libraries - need to upgrade them on disk servers<br />
<br />
== Blocking Issues ==<br />
* none<br />
<br />
== Planned, Scheduled and Cancelled Interventions ==<br />
'''Entries in/planned to go to GOCDB'''<br />
{| border=1 align=center<br />
|- bgcolor="#7c8aaf"<br />
! Description<br />
! Start<br />
! End<br />
! Type<br />
! Affected VO(s)<br />
! Lead by <br />
|- <br />
| CIP 2.2.0 upgrade (STC)<br />
| TBD<br />
| TBD<br />
| <span style="background:#ffff00">At-risk</span><br />
| All<br />
| Matthew<br />
|}<br />
<br />
== Advanced Planning ==<br />
'''Tasks'''<br />
* Test and re-apply CIP upgrade (Jens, Matthew)<br />
* Test and certify 2.1.12-4 and 2.1.11-9 (Matthew, Chris)<br />
* Stress testing of Transfer Manager (TM) (Shaun, All)<br />
* Ganglia monitoring for TM (Rob, Chris)<br />
* Re-instantiate certification on VMs using Quattor+Puppet (Rob)<br />
* Stress testing of CV11 generation disk servers on preprod (Rob, Matthew)<br />
* Selection of disk-only prototype solution (Shaun, Rob, Brian, James)<br />
'''Interventions'''<br />
* Upgrade repack to 2.1.12-4 (Apr)<br />
* Switch from LSF to TM after 2.1.11-8 upgrade. Will need to better stress-test TM on preprod with more disk servers. (Apr)<br />
* Switch to Tape Gateway (TG) once it has been tested on repack (May)<br />
* Upgrade Castor Facilities and Tier1 instances to 2.1.11-9 (Jun)<br />
* Upgrade Oracle to 11g (Jun)<br />
* Upgrade to 2.1.12 on Tier1 instances once we are happy with TM and TG in performance (Jul)<br />
<br />
== Staffing ==<br />
* Castor on Call person: Matthew<br />
* Staff absence/out of the office: <br />
** Chris A/L<br />
[[Category:RAL Tier1]]<br />
[[Category:CASTOR]]</div>Matt viljoen