Difference between revisions of "RAL Tier1 weekly operations Grid 20100111"
From GridPP Wiki
(No difference)
|
Latest revision as of 13:28, 12 January 2010
Contents
Summary of Previous Week
Developments
- Alastair
- Prepare slides and run Hammer Cloud test for ATLAS UK meeting in Cambridge.
- Away at ATLAS UK meeting 6th - 8th January.
- Andrew
- Improved CMS monitoring, wrote scripts checking: JobRobot, SAM tests, production jobs, pre-staging of RAW data, proxy on VOBOX, 5 data transfer checks
- Deleted CMS files in /store/unmerged
- Investigated various CMS transfer problems (debug instance) & jobs with low CPU efficiencies
- Completed December accounting; updated spreadsheet generating Perl script to handle multiple years
- Planning KSI2K to HEP-SPEC06 migration
- Catalin
- SL5 LHCb VOBOX installation (with Ian)
- followed up issue with t2k.org 'zero size' LFC entries
- decomissioned old SL4 ALICE VOBOXes
- atlasbackup 'exclude files' fix
- Derek
- Updated Wordpress
- Published fairshares in information system
- Configured GlExec on a SL4 WN
- Tested CE reinstallation instructions
- Matt
- Richard
- Finished plan for BDII changes
- Continued writing discussion document for DNS proposal
- Continued work on the CASTOR pre-prod instance
- Built a test machine as a BDII server to test quattor templates
- Worked with JK and GS on a script to check CASTOR checksums
- Mayo
- Encrypted passwords within the Metric system
- Added a change password feature to the metric system
- Fixed a bug within the Metric system
- Worked on tape statistics spreadsheet project: converting excel chatrs to HTML
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity | Status |
---|
Plans for Week(s) Ahead
Plans
- Alastair
- Look into results of Hammer Cloud test to try and understand slow Frontier Performance (run more tests if necessary.
- Update RAL PP twiki with feedback from ATLAS meeting.
- Find out about if there is still a need for Pacman mirror.
- Andrew
- Include checksum checking in PhEDEx production instance
- Write Nagios plugin to check for recently-migrated files with incorrect checksums
- Catalin
- finalise SL5 LHCb VOBOX deployment
- work on t2k.org 'zero size' LFC entries issue
- work on LFC schemas tidying up (with Carmine)
- exercise Alice xrootd (manager + peer) re-installation (on old SL4 voboxes)
- Derek
- Testing helpdesk restore
- Verifying Cream CE installation instructions
- Matt
- Test R-GMA recovery (Flexible Archiver component)
- Richard
- Finish discussion document for DNS proposal
- Continue working on CASTOR pre-prod instance
- Further work on the Quattor templates for BDII server
- Re-do existing STP time bookings and enter EGEE timesheets back to starting date
- 2 days A/L
- Mayo
- Automating Metric report system
- Adding charts to the metric system
- Web interface and script to fetch data for Tape robot statistics spreadsheet project
Resource Requests
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|---|---|---|---|---|
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
Hardware for testing LFC/FTS resilience | High | DataServices want to deploy a DataGuard configuration to test LFC/FTS resilience; request for HW made through RT Fabric queue | |
Hardware for PPS | High | We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this. | |
Hardware for Grid Services testbed | Medium |
OnCall/AoD Cover
- Primary OnCall:
- Grid OnCall: Derek (Mon-Fri)
- AoD: