RAL Tier1 weekly operations Grid 20090907

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

Developments

  • Andrew
    • Adding August data for UB spreadsheet
    • Deleted some CMS "dark" data
  • Catalin
    • glite-WMS upgrade on lcgwms03
    • glite-LB upgrade on lcglb01, lcglb02
    • glite-VOBOX upgrade on Alice, LHCb VOBOXes
    • glite-MON upgrade on lcgmon01
  • Derek
    • SL5 Migration planning
  • Matt
    • Deployed and tested latest PPS BDII.
    • Reviewed Fabric queue ticket priorities.
  • Richard
    • Developed initial version of a Grid Services dashboard within the RT helpdesk system
    • Developed a Perl module and associated scripts for interfacing with RT so that custom reports can be generated. One of these scripts now in production and providing daily notifications for Grid Services.

Operational Issues and Incidents

Description Start End Affected VO(s) Severity
lfc0448 - SMART errors detected 2009-06-15 Ongoing ATLAS Low

Plans for Week(s) Ahead

Development Priorities

  • Andrew
    • Test File Transfer Service
    • Monitor job efficiencies
    • helpdesk familiarisation and triaging Grid Services tickets
    • testing restoring the helpdesk from backups
  • Catalin
    • glite-WMS upgrade on lcgwms01
    • GridPP23 - Tue, Wed
    • glite-VOBOX upgrade on second Alice VOBOX
  • Derek
    • GridPP 23 + Deployment Board
    • Install and configure lcgce08 and lcgce06
  • Matt
    • Disaster recovery planning
    • Check status of disk deployment process
    • Review Grid Services documentation
    • Check FTS2.2 configuration
  • Richard
    • Investigating BDII
    • Investigating Quattor
    • Generate reports for Grid Services tickets on helpdesk

Resource Requests

Downtimes

Description Type Start End Affected VO(s)
glite-WMS upgrade on lcgwms01 At risk Scheduled Sep 7 (12:00) Sep 7 (16:00) LHC VOs

Requirements and Blocking Issues

Description Required By Priority Status
SL5 Worker Node Kickstart High Post-kickstart configuration needed; not yet suitable for bulk deployment
Hardware for lcgce08 High Requirement for SL5 migration
Non-capacity HW for testing Medium Still using the old HW
Hardware for PPS Medium We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this.
lfc0448 disk failures Low Disk replacement needed


OnCall/AoD Cover

  • Primary OnCall: Catalin (Thu-Sun)
  • Grid OnCall: Matt (Mon-Wed)
  • AoD: