From GridPP Wiki
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Summary of Previous Week
Developments
- Alastair
- Andrew
- CMS PhEDEx ganglia monitoring
- FTS channels: adjustments to STAR-UKILT2BRUNEL, STAR-UKILT2ICHEP, STAR-UKISTHGRIDRALPP for CMS
- Updated kernel on csflnx414
- Deleted old CMS files from /store/unmerged
- Completed automatic generation of UB schedule CPU & disk emails
- Started work on CMS computing model spreadsheet
- Training: attending Nagios training session
- Out sick 1 day
- Catalin
- **no** progress on remaining SL5 VOBOXes
- started work on backup, recovery (machines audit)
- dealt with FronTier following java update
- sorted out the WMS ICE issue
- Derek
- Metric report
- Testbed proposal
- Adding SL53 i386 to quattor for dev helpdesk
- Matt
- Kernel updates for FTS/MyProxy
- Caching CIP provider script (not deployed)
- Disaster recovery planning
- Backup/recovery planning
- Checked batch system for signs of SL4/SL5 crosstalk, and other job allocation problems; appears clean since restart of pbs_server daemon
- Richard
- CASTOR activities: Finished the new structure for the family of pre-production Quattor templates
- Built a 32-bit version of a BDII server and updated template to place log files etc in RAL-preferred location
- Took 2 RT tickets on BDII server config's
- Mayo
- Anual leave Monday and Tuesday
- Worked on New Metrics system
- Exported data from new metrics gathering system to enable Derek to produce the monthly report
- Worked on automating tape robot spreadsheet project
Operational Issues and Incidents
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Status
|
WMS Jobdirs full
|
Wed 18 Nov
|
Thu 19 Nov
|
All
|
Medium
|
Resolved
|
FroNTier crash
|
Wed 11 Nov
|
Fri 20 Nov
|
ATLAS
|
Low
|
Resolved
|
Plans for Week(s) Ahead
Plans
- Andrew
- CMS computing model spreadsheet
- t2k to t2k.org VO name change
- Catalin
- start deployment on 2nd Alice SL5 VOBOX (HW made available on Monday)
- ready to start deployment on LHCB SL5 VOBOX (waiting for "Quattor ready to go")
- implement Nagios checks for FronTier
- continue working on systems audit (backup, recovery)
- Derek
- Test SCAS
- Fix problems with CE information system
- Working on helpdesk end to end restore
- Matt
- Richard
- CASTOR activities: Working with CK and d/b folk to be able to script database setup for new pre-prod instance; also looking at using custom ncm- components for configuration
- Building and testing a 64-bit version of BDII server
- Mayo
- Implement feedback into second version of metrics gathering system in prperation for November Metrics
- Continue working on automated spreadsheet project
- Continue working on importing Nagios alarm data into svn
Resource Requests
Downtimes
Description
|
Hosts
|
Type
|
Start
|
End
|
Affected VO(s)
|
|
|
|
|
|
|
Requirements and Blocking Issues
Description
|
Required By
|
Priority
|
Status
|
LHCb SL5 64bit VOBOX deployment using Quattor
|
25 Nov 2009
|
Medium
|
HW allocated but Quattor recipe not yet available (RT#53392)
|
Hardware for testing LFC/FTS resilience
|
|
High
|
DataServices want to deploy a DataGuard configuration to test LFC/FTS resilience; request for HW made through RT Fabric queue
|
Hardware for PPS
|
|
High
|
We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this.
|
Hardware for Grid Services testbed
|
|
Medium
|
|
OnCall/AoD Cover
- Primary OnCall: Catalin (Mon-Thu)
- Grid OnCall:
- AoD: