RAL Tier1 weekly operations Grid 20091019
From GridPP Wiki
Contents
Summary of Previous Week
Developments
- Alastair
- ATLAS jamboree
- Training: Manual Handling
- Andrew
- Investigated missing CMS files
- Completed UB schedule for September
- Investigated & resolved problem with PhEDEx mss-remove agent
- Deleted CMS "dark" data
- Started deploying disk servers for CMS
- FTS adjustments: increased number of streams on STAR-FIHIPT2; added .gr & .su to CLOUDCMSCERN
- Training: Oracle self-service, fire-fighting
- Catalin
- Oracle SSC training
- prepared SL5 VOBOX kickstart
- investigated FronTier/squid deployment for ATLAS
- Derek
- Catching up
- Updated CREAM CE to fix security vuln & reconfigured to pass CREAM SAM tests
- Investigated gstat2 errors with CE publishing
- Updated maui on lcgui01 and lcgce6-8 to fix diagnose
- Metrics report
- Listened to GDB
- Applied fix for LD_LIBRARY_PATH issue on WNs
- Started updating CE documentation
- Matt
- Met with Cristina to discuss publishing of HEPSPEC-06 in APEL
- Richard
- Catching up after return from sick leave
- Began work on deploying a disk server to Atlas NonProd
- Mayo
- Training: Oracle Self Service
- Worked on the new Metrics Gathering System Input Form
- Finished the new Metrics Gathering System user authentication scheme
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity | Status |
---|
Plans for Week(s) Ahead
Plans
- Alastair
- Perform Security Audit
- Learn how to deploy disk servers for ATLAS
- Discuss Job Plan with Matt
- Discuss allocation of ATLAS disk space with Brian Davies and Stephen Burke
- Go to Shared Service training
- Andrew
- Complete deployment of disk servers for CMS
- Fix unpublished CREAM CE records in APEL & check consistency with PBS accounting
- CMS skimming testing (continued)
- Catalin
- ready to deploy SL5 VOBOX for Alice (waiting for HW)
- ready to FronTier/squid for ATLAS (waiting for HW)
- assist the LFC ATLAS cleaning operation
- Alice disk servers installation
- CRISTAL 1 - Thursday
- Derek
- Test helpdesk restore
- SSC and Cristal Level 1 training
- Update CE documentation
- Matt
- Determine LHCb service class requirements for new allocation
- Disk deployment meeting
- Disaster recovery planning
- Richard
- Finish deployment of disk server to Atlas NonProd
- SSC Training
- Roll out new BDII connection throttling script
- Roll out new BDII monitoring script
- Mayo
- New Metric Gathering System - Report Functionality
- Start IPMI power control project
Resource Requests
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
HW for Squid deployment | ATLAS | High | request made via RT Fabric queue |
HW for FronTier deployment | ATLAS | High | request made via RT Fabric queue |
HW for SL5 64-bit VOBOX | Alice | High | request made via RT Fabric queue |
Non-capacity HW for testing | Medium | Still using the old HW | |
Hardware for PPS | Medium | We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this. | |
Hardware for testing LFC/FTS resilience | Medium | DataServices want to deploy a DataGuard configuration to test LFC/FTS resilience; request for HW made through RT Fabric queue |
OnCall/AoD Cover
- Primary OnCall: Catalin (Fri - Sun)
- Grid OnCall:
- AoD: Catalin (Wed)