RAL Tier1 weekly operations Grid 20091116

From GridPP Wiki
Revision as of 15:49, 16 November 2009 by Matt hodges (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Summary of Previous Week

Developments

  • Alastair
    • Continued with various training/Tutorials
    • Produced first draft of experiment requirements
    • Produced code to access checksums on DDM faster
    • Finished UB schedule
  • Andrew
    • deployed gdss383 to cmsFarmRead
    • completed scripts for automated generation of UB schedule spreadsheet
    • changed YAIM configuration files for change of VO name (t2k)
    • Installed MySQL client etc on lcgui02 using Quattor
    • attended CMS UK computing meeting (IC)
    • training: R89 machine room training
  • Catalin
    • kernel and glite upgrades
    • SL5 VOBOX for Alice in production
    • few FronTier tests with squids at T2s
    • review of glite-services on nagios
  • Derek
    • Kernel updates
    • Wrote profile for test batch server
    • Writing document about CE information system
    • Deployed SCAS but having difficulty testing
  • Matt
    • Kernel updates (problems on top-level BDIIs)
    • CIP monitoring on site BDII
  • Richard
    • Attended (via EVO) the GDB meeting
    • Updated one of the BDII RPMs to place a crontab entry omitted in the previous release
    • CASTOR activities: (i) Requested user accounts and groups used by CASTOR to be entered into NIS (ii) Re-arranged PPS quattor templates to allow 3 levels of conditionality (server type, instance and service class)
  • Mayo
    • Metric system: fixed bug where IE was submitting duplicate records
    • Metric system: added page for users to view the whole months metric results
    • Worked on automated spreadsheet project
    • Worked on importing Nagios alarm data into svn

Operational Issues and Incidents

Description Start End Affected VO(s) Severity Status

Plans for Week(s) Ahead

Plans

  • Alastair
    • Away
  • Andrew
    • complete VO name change (t2k to t2k.org)
    • FTS channel adjustments for CMS
    • learn more about the CMSSW framework
    • apply kernel upgrades to csflnx414
  • Catalin
    • ready to start deployment on 2nd Alice SL5 VOBOX (waiting for HW)
    • ready to start deployment on LHCB SL5 VOBOX (waiting for "Quattor ready to go")
  • Derek
    • Test SCAS
    • Testbed proposal
    • Working on helpdesk end to end restore
  • Matt
    • Caching CIP information on site BDII
    • Disaster recovery planning
  • Richard
    • CASTOR activities: Complete the new set of pre-production Quattor templates
    • Apply the recent quattor experience to completing quattor config/build for BDII servers
  • Mayo
    • annual leave Monday- Tuesday
    • Continue working on new spreadsheet system
    • Continue working on automated spreadsheet project
    • Continue working on importing Nagios alarm data into svn

Resource Requests

Downtimes

Description Hosts Type Start End Affected VO(s)
Kernel upgrades FTS Scheduled Downtime Tuesday 17 Nov 07:00 Tuesday 17 Nov 09:00 all
Kernel upgrades MyProxy Scheduled Downtime Tuesday 17 Nov 08:00 Tuesday 17 Nov 09:00 all

Requirements and Blocking Issues

Description Required By Priority Status
Hardware for 2nd ALICE SL5 64bit VOBOX 16 Nov 2009 High Request to re-deploy lcg0614 (ALICE SW WN) as SL5 VOBOX (using quattor or not) - RT#53338
Hardware for LHCb SL5 64bit VOBOX 25 Nov 2009 Medium Request for HW allocation (RT#53392)
Hardware for testing LFC/FTS resilience High DataServices want to deploy a DataGuard configuration to test LFC/FTS resilience; request for HW made through RT Fabric queue
Hardware for PPS High We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this.
Hardware for Grid Services testbed Medium

OnCall/AoD Cover

  • Primary OnCall: Catalin (Mon, Tue, Thu-Sun)
  • Grid OnCall:
  • AoD: