RAL Tier1 weekly operations Fabric 20110328

From GridPP Wiki
Jump to: navigation, search

Developments

  • All:
  • Martin:
  • Ian:
    • Planning rollout of latest CVMFS client
    • Rolled out CVMFS upgrade on 1 cluster to test
    • Setting up FreeNAS & configured iscsi tested and had live migration and failover working with hyperv


  • Tim:
  • James A:
    • Receiving handover from James T
    • Tour for SuperB representatives.
    • Preparing for worker node deployment.
  • James T:
    • Handover
    • Documentation
    • Tours
  • Cheney
    • DMF DR
    • research alternatives to DMF
    • created DMF DR docco even though i can't get it to work...
    • fix database backups
    • fix zora access
    • fix dmf half-dead disk
    • fix hinode webserver down
    • fix hinode webstats out of date
    • set up nfs for greg matthews
    • set up another virtual machine for tessella testing
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Decommissioning old batch systems.(R 27)
    • Test room review. (Every Monday morning)
    • gdss496 created Raid1 arrays.
    • Configure StroMan on SL09 disk servers.
    • lcg0851-852 sent to Clustervision for fix.
    • Update firmware on Jetstor systems.(ongoing) Updated on three.
    • logger01 re-created raid10 array after replacing 3 drives.
    • gdss150 and gdss460 given back to Castor team.
    • Disk handover with James T and A.
    • SL08 testing stopped due to IP change.
    • Labelling racks and systems in UPS and HPD room.


Operational Issues and Incidents

Index Description Start End Severity Affected VO(s)

Summary of plans for week ahead

Scheduled and Cancelled Down Times

Type=Down/At Risk/Cancelled entries in/planned to go to GOCDB

Component Description Start End Affected VO(s) Type

Development priorities

  • All
  • Martin:
  • Ian:
    • GridPP 26
    • Storage workshop
    • Prep for Atlas sw week
    • Further work on services virtualisation


  • Tim:
  • Cheney
    • DMF DR
    • job plan tasklets
  • James T:
    • Help with CASTOR 2.1.10-0 upgrade
    • Handover
    • Covering for Kash in his absence on Wednesday/Thursday
    • Any last little bits of documentation
  • James A:
    • Beginning roll-out of new worker nodes into production.
    • GridPP 26 in Sussex (Tuesday to Thursday).
  • Kash:
    • Drive replacement.
    • Fixing broken WNs.
    • Hardware failure metrics continue.
    • Continue SL08 testing.
    • Continuous decommissioning old batch systems.(R 27)
    • Continue Labelling racks and systems in UPS and HPD room.

Absences

  • Ian at GridPP 26 & Storage Workshop Tuesday-Thursday
  • James A at GridPP 26 & Storage Workshop Tuesday-Thursday
  • Kash A/L Wednesday-Thursday

Fabric On-Call

  • Ian Fabric on-call Monday - Sunday

Advanced Warning of Requirements and Blocking issues

Services Issues


RAL Tier1 weekly operations fabric

Category:RAL_Tier1