Difference between revisions of "RAL Tier1 weekly operations Grid 20091214"
From GridPP Wiki
Matt hodges (Talk | contribs) |
(No difference)
|
Latest revision as of 17:02, 16 December 2009
Contents
Summary of Previous Week
Developments
- Alastair
- Deployed 2 Disk servers.
- Contacted Panda/Ganga developers to improve error information for ATLAS jobs at RAL.
- Tested poweruser analysis at RAL, found problem with CERN WMS.
- Andrew
- Completed November accounting
- Updated trace-job.pl to run on CREAM CE
- Updated FTS configuration due to a missing endpoint
- Continued work on CMSSW TTreeCache & read coalescing patches IO testing
- Testing of PhEDEx dev instance on lcgvo0599
- Deteled CMS data from /store/unmerged; carried out PhEDEx storage consistency check
- Completed preliminary CMS computing model spreadsheet
- Catalin
- worked on MySQL migration plan
- worked on LHCb VOBOX quattorising
- had discussions on FronTier issues
- still waiting from LFC@CERN feedback for recovery and consistency checks
- Derek
- Implemented Change Control process on dev helpdesk
- Matt
- Prepared Tier-1 review presentation
- Added caching CIP plugin on site BDIIs
- Richard
- Attended Cheney's NRPE training
- CASTOR activities:
- Completed the "data configurator" tool for sending config files to quattorised CASTOR servers
- Continued activity on SLC 4.8 templates
- Wrote a script to complete the post-install setup of CASTOR machines in new pps instance
- Mayo
- Created admin UI for metric system and wrote system user documentation
- created user account for Sarah Pearce to enable testing with regads to the possible gridpp extension
- Attended Cheney's NRPE training
- Worked on automating tape robot spreadsheet project
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity | Status |
---|
Plans for Week(s) Ahead
Plans
- Alastair
- Try and fix Poweruser issues
- Look into "slow" FTS rates in UK Cloud.
- Andrew
- Continue CMSSW TTreeCache IO & read coalescing patch testing
- Attend PPD Christmas lunch
- Catalin
- continue work on MySQL migration
- LHCb VOBOX
- decomission old SL4 ALICE VOBOXes
- Derek
- Rollout change control process on production helpdesk
- Test, implement and document proposed disaster mitigation for lcgcenfs
- Matt
- Test new production CIP on test site BDII
- Tier-1 Review
- GridPP4 input
- Richard
- Write detailed plan for proposed BDII changes during January
- CASTOR activities:
- Add in configuration data to pps machines built via kickstart scripts
- Set up database connections on new pps machines
- Continue activity on SLC 4.8 templates
- Mayo
- Work on Metric system: adding change password feature for users / report printing features
- Work on possible exstention of system to include Gridpp
- Continue working on automated spreadsheet project
Resource Requests
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|---|---|---|---|---|
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
LHCb SL5 64bit VOBOX deployment using Quattor | 25 Nov 2009 | Medium | Quattor recipe not yet available (RT#53392) |
Hardware for testing LFC/FTS resilience | High | DataServices want to deploy a DataGuard configuration to test LFC/FTS resilience; request for HW made through RT Fabric queue | |
Hardware for PPS | High | We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this. | |
Hardware for Grid Services testbed | Medium |
OnCall/AoD Cover
- Primary OnCall: Catalin (Mon, Wed-Sun)
- Grid OnCall:
- AoD: