RAL Tier1 weekly operations Grid 20091019

From GridPP Wiki
Jump to: navigation, search

Summary of Previous Week

Developments

  • Alastair
    • ATLAS jamboree
    • Training: Manual Handling
  • Andrew
    • Investigated missing CMS files
    • Completed UB schedule for September
    • Investigated & resolved problem with PhEDEx mss-remove agent
    • Deleted CMS "dark" data
    • Started deploying disk servers for CMS
    • FTS adjustments: increased number of streams on STAR-FIHIPT2; added .gr & .su to CLOUDCMSCERN
    • Training: Oracle self-service, fire-fighting
  • Catalin
    • Oracle SSC training
    • prepared SL5 VOBOX kickstart
    • investigated FronTier/squid deployment for ATLAS
  • Derek
    • Catching up
    • Updated CREAM CE to fix security vuln & reconfigured to pass CREAM SAM tests
    • Investigated gstat2 errors with CE publishing
    • Updated maui on lcgui01 and lcgce6-8 to fix diagnose
    • Metrics report
    • Listened to GDB
    • Applied fix for LD_LIBRARY_PATH issue on WNs
    • Started updating CE documentation
  • Matt
    • Met with Cristina to discuss publishing of HEPSPEC-06 in APEL
  • Richard
    • Catching up after return from sick leave
    • Began work on deploying a disk server to Atlas NonProd
  • Mayo
    • Training: Oracle Self Service
    • Worked on the new Metrics Gathering System Input Form
    • Finished the new Metrics Gathering System user authentication scheme

Operational Issues and Incidents

Description Start End Affected VO(s) Severity Status

Plans for Week(s) Ahead

Plans

  • Alastair
    • Perform Security Audit
    • Learn how to deploy disk servers for ATLAS
    • Discuss Job Plan with Matt
    • Discuss allocation of ATLAS disk space with Brian Davies and Stephen Burke
    • Go to Shared Service training
  • Andrew
    • Complete deployment of disk servers for CMS
    • Fix unpublished CREAM CE records in APEL & check consistency with PBS accounting
    • CMS skimming testing (continued)
  • Catalin
    • ready to deploy SL5 VOBOX for Alice (waiting for HW)
    • ready to FronTier/squid for ATLAS (waiting for HW)
    • assist the LFC ATLAS cleaning operation
    • Alice disk servers installation
    • CRISTAL 1 - Thursday
  • Derek
    • Test helpdesk restore
    • SSC and Cristal Level 1 training
    • Update CE documentation
  • Matt
    • Determine LHCb service class requirements for new allocation
    • Disk deployment meeting
    • Disaster recovery planning
  • Richard
    • Finish deployment of disk server to Atlas NonProd
    • SSC Training
    • Roll out new BDII connection throttling script
    • Roll out new BDII monitoring script
  • Mayo
    • New Metric Gathering System - Report Functionality
    • Start IPMI power control project

Resource Requests

Downtimes

Description Hosts Type Start End Affected VO(s)

Requirements and Blocking Issues

Description Required By Priority Status
HW for Squid deployment ATLAS High request made via RT Fabric queue
HW for FronTier deployment ATLAS High request made via RT Fabric queue
HW for SL5 64-bit VOBOX Alice High request made via RT Fabric queue
Non-capacity HW for testing Medium Still using the old HW
Hardware for PPS Medium We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this.
Hardware for testing LFC/FTS resilience Medium DataServices want to deploy a DataGuard configuration to test LFC/FTS resilience; request for HW made through RT Fabric queue

OnCall/AoD Cover

  • Primary OnCall: Catalin (Fri - Sun)
  • Grid OnCall:
  • AoD: Catalin (Wed)