RAL Tier1 weekly operations Grid 20091116
From GridPP Wiki
Contents
Summary of Previous Week
Developments
- Alastair
- Continued with various training/Tutorials
- Produced first draft of experiment requirements
- Produced code to access checksums on DDM faster
- Finished UB schedule
- Andrew
- deployed gdss383 to cmsFarmRead
- completed scripts for automated generation of UB schedule spreadsheet
- changed YAIM configuration files for change of VO name (t2k)
- Installed MySQL client etc on lcgui02 using Quattor
- attended CMS UK computing meeting (IC)
- training: R89 machine room training
- Catalin
- kernel and glite upgrades
- SL5 VOBOX for Alice in production
- few FronTier tests with squids at T2s
- review of glite-services on nagios
- Derek
- Kernel updates
- Wrote profile for test batch server
- Writing document about CE information system
- Deployed SCAS but having difficulty testing
- Matt
- Kernel updates (problems on top-level BDIIs)
- CIP monitoring on site BDII
- Richard
- Attended (via EVO) the GDB meeting
- Updated one of the BDII RPMs to place a crontab entry omitted in the previous release
- CASTOR activities: (i) Requested user accounts and groups used by CASTOR to be entered into NIS (ii) Re-arranged PPS quattor templates to allow 3 levels of conditionality (server type, instance and service class)
- Mayo
- Metric system: fixed bug where IE was submitting duplicate records
- Metric system: added page for users to view the whole months metric results
- Worked on automated spreadsheet project
- Worked on importing Nagios alarm data into svn
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity | Status |
---|
Plans for Week(s) Ahead
Plans
- Alastair
- Away
- Andrew
- complete VO name change (t2k to t2k.org)
- FTS channel adjustments for CMS
- learn more about the CMSSW framework
- apply kernel upgrades to csflnx414
- Catalin
- ready to start deployment on 2nd Alice SL5 VOBOX (waiting for HW)
- ready to start deployment on LHCB SL5 VOBOX (waiting for "Quattor ready to go")
- Derek
- Test SCAS
- Testbed proposal
- Working on helpdesk end to end restore
- Matt
- Caching CIP information on site BDII
- Disaster recovery planning
- Richard
- CASTOR activities: Complete the new set of pre-production Quattor templates
- Apply the recent quattor experience to completing quattor config/build for BDII servers
- Mayo
- annual leave Monday- Tuesday
- Continue working on new spreadsheet system
- Continue working on automated spreadsheet project
- Continue working on importing Nagios alarm data into svn
Resource Requests
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|---|---|---|---|---|
Kernel upgrades | FTS | Scheduled Downtime | Tuesday 17 Nov 07:00 | Tuesday 17 Nov 09:00 | all |
Kernel upgrades | MyProxy | Scheduled Downtime | Tuesday 17 Nov 08:00 | Tuesday 17 Nov 09:00 | all |
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
Hardware for 2nd ALICE SL5 64bit VOBOX | 16 Nov 2009 | High | Request to re-deploy lcg0614 (ALICE SW WN) as SL5 VOBOX (using quattor or not) - RT#53338 |
Hardware for LHCb SL5 64bit VOBOX | 25 Nov 2009 | Medium | Request for HW allocation (RT#53392) |
Hardware for testing LFC/FTS resilience | High | DataServices want to deploy a DataGuard configuration to test LFC/FTS resilience; request for HW made through RT Fabric queue | |
Hardware for PPS | High | We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this. | |
Hardware for Grid Services testbed | Medium |
OnCall/AoD Cover
- Primary OnCall: Catalin (Mon, Tue, Thu-Sun)
- Grid OnCall:
- AoD: