From GridPP Wiki
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Summary of Previous Week
Developments
- Catalin
- Work on LFC streaming
- WMS service draining
- Tests on SL5 WNs
- Derek
- YII Objectives
- Wrote Nagios test for Cream CE issue
- Investigating possible solutions to LCG CE file limit problem
- Deployed test Quattor configuration for site and top-level BDIIs
- Matt
- Completed R89 Rack Migration templates (whole team).
- Migrated MyProxy service to non-migrating rack.
Operational Issues and Incidents
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Production pool account at 32k subdirectory limit
|
2009-06-03
|
Ongoing
|
ATLAS
|
High
|
LB01 RAID failure
|
2009-06-17
|
Ongoing
|
All
|
Low
|
lfc0448 - SMART errors detected
|
2009-06-15
|
Ongoing
|
All
|
Low
|
ce.ngs - SAN problems
|
2009-06-16 - 17
|
Done
|
egee test vo
|
Low
|
Plans for Week(s) Ahead
Downtimes
Description
|
Start
|
End
|
Affected VO(s)
|
WMS drain ahead of R89 move
|
2009-06-17 10:00
|
2009-06-26 12:00
|
All
|
Development Priorities
- Catalin
- support the R89 move (if needed)
- finalise recovery documentation
- debug the LFC streaming (with Carmine)
- Derek
- Continuing investigation of LCG CE 32k file solutions
- Refine YII Objectives
- Quattorise test LFC
- Matt
- Plan SL4 to SL5 migration.
- Migrate MyProxy service to R89 CPU rack.
- R89 late rota cover Thu/Fri.
Requirements and Blocking Issues
Description
|
Required By
|
Priority
|
Status
|
SL5 Worker Node Kickstart
|
|
High
|
Post-kickstart configuration needed; not yet suitable for bulk deployment
|
LB01 RAID failure
|
|
Medium
|
Disk replacement needed
|
lfc0448 disk failures
|
|
Medium
|
Disk replacement needed
|
Non-capacity HW for testing
|
|
Medium
|
Still using the old HW
|
Hardware for PPS
|
|
Medium
|
May need to deploy imminently
|
OnCall/AoD Cover
- Primary OnCall
- Grid Oncall
- AoD