From GridPP Wiki
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Summary of Previous Week
Developments
- Alastair
- Andrew
- Added documentation for adding VO for LFC and WMS; improvements to YAIM
- Completed tests on CMS efficiencies/network traffic for different read/cache hints and different numbers of simultaneous running jobs
- Updated CMS VOBOX SLA
- Started work on CMS computing model spreadsheet
- Started preparing plan for KSI2K-HEPSPEC06 migration
- Catalin
- worked on systems audit - 1 (backup, recovery)
- worked on systems audit - 2 (90-day log retention policy)
- t2k -> t2k.org migration on LFC, WMS
- Frontier checks (java, squid)
- Frontier issues on slow ATLAS 3D DB access
- Derek
- Continuing work on quattorising helpdesk frontend
- Updated lcgce02 for T2K name change
- Examined gstat2 errors for RAL-LCG2 CEs
- Matt
- Richard
- CASTOR activities: Worked with CK and d/b folk to be able to script database setup for new pre-prod instance; also looking at using custom ncm- components for configuration
- Built and two 64-bit flavours version of a top BDII server (for different rev's of glite)
- Mayo
- Worked on New Metrics system: added new features in preparation for November results entry
- Wrote a report on the feasibility and possible issues of extending the new metric system to include Gridpp users
- Worked on automating tape robot spreadsheet project
Operational Issues and Incidents
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Status
|
WMS Jobdirs full
|
Wed 18 Nov
|
Thu 19 Nov
|
All
|
Medium
|
Resolved
|
FroNTier crash
|
Wed 11 Nov
|
Fri 20 Nov
|
ATLAS
|
Low
|
Resolved
|
Plans for Week(s) Ahead
Plans
- Alastair
- Add check_world_writable.sh to Nagios
- Make wiki page for Computing requirements
- Run tests for user analysis at RAL.
- Andrew
- Complete November accounting (& update docs where necessary); apply December fairshares
- Complete CMS computing model spreadsheet
- Add checksum checking into PhEDEx for files being migrated to tape
- Catalin
- fix Frontier issue on slow ATLAS 3D DB access
- continue working on backup/recovery
- install 2nd ALICE SL5 VOBOX
- ready to start deployment on LHCB SL5 VOBOX (waiting for "Quattor ready to go")
- Derek
- Test SCAS
- Working on helpdesk end to end restore
- Change control process via RT
- Matt
- Richard
- CASTOR activities: Setting up Quattor templates for SLC 4.8 plus misc updates to pps templates
- Quattor template(s) for a production CIP server
- Mayo
- Collect feedback on recent changes to new Metric system
- Work on possible exstention of system to include Gridpp
- Continue working on automated spreadsheet project
- Continue working on importing Nagios alarm data into svn
Resource Requests
Downtimes
Description
|
Hosts
|
Type
|
Start
|
End
|
Affected VO(s)
|
|
|
|
|
|
|
Requirements and Blocking Issues
Description
|
Required By
|
Priority
|
Status
|
LHCb SL5 64bit VOBOX deployment using Quattor
|
25 Nov 2009
|
Medium
|
Quattor recipe not yet available (RT#53392)
|
Hardware for testing LFC/FTS resilience
|
|
High
|
DataServices want to deploy a DataGuard configuration to test LFC/FTS resilience; request for HW made through RT Fabric queue
|
Hardware for PPS
|
|
High
|
We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this.
|
Hardware for Grid Services testbed
|
|
Medium
|
|
OnCall/AoD Cover
- Primary OnCall:
- Grid OnCall:
- AoD: