Difference between revisions of "RAL Tier1 weekly operations Grid 20091102"
From GridPP Wiki
(No difference)
|
Latest revision as of 10:42, 2 November 2009
Contents
Summary of Previous Week
Developments
- Alastair
- Finished security audit
- Deployed disk servers from non-prod to prod
- Went through most of castor training when Shaun wasn't too busy.
- Learnt about Tier 2 data storage allocation from Brian
- Learnt how to make changes with quattor and updated twiki
- Updated PPD twiki
- Andrew
- Completed consistency check of Aug 09 APEL & PBS; resolved problems with Oct 09 pbsjobs MySQL
- Writing Perl script to generate UB Schedule spreadsheet
- Attended CMS Offline and Computing Workshop, CERN
- Obtained CMS production role
- Meeting with a member of CMS data ops about ProdAgent
- Deleted 190,000 CMS files in /store/unmerged
- Training: CERN level 1 & 2 safety
- Catalin
- finished the ALICE disk servers deployment
- deployed and tested the FronTier/squid server for ATLAS
- installed the SL5 VOBOX for Alice
- started the drain operation for WMS03
- Derek
- Deployed updated vo config in quattor
- Fixed quattor directory creation on WNs
- Writing RAL talk for Quattor workshop
- Documenting CE information system setup
- Matt
- Deployed gLite 3.2 SL5 VOBOX
- Checked priorities for deploying Viglen 08 kit after it passes acceptance tests (meet shortfalls in ATLAS and LHCb pledges)
- Richard
- DSE Training
- 5 X disk server deployments into AtlasSimStrip
- Packaged RT helpdesk scripts plus their associated cron entries as an RPM using DR's layout
- Repackaged the gmetric-bdii-top.pl and tier1-bdii-top-config RPMs using DR's layout
- Updated the log analysis perl scripts in the gmetric-bdii-top.pl and tier1-bdii-top-config RPMs for better performance. One shows ~ 15X improvement, the other ~ 10X.
- CASTOR activities: continued development of quattor templates for servers in pre-prod instance; also DNS changes
- Mayo
- Rolled out first prototype of the new Metric Gathering System
- Collected some feedback on the new Metric Gathering System prototype
- Resolved SVN access issues
Operational Issues and Incidents
Description | Start | End | Affected VO(s) | Severity | Status |
---|
Plans for Week(s) Ahead
Plans
- Alastair
- go through gLite training
- Finish Castor training
- Update CPU efficiencies
- Test UK Frontier/Squid using Athena release 15.5.1
- Test prod/poweruser0/user permissions at the Tier 1.
- Continue updating ppd twiki on ATLAS software.
- Andrew
- Continue work on automated generation of UB Schedule spreadsheet
- Deploy a spare service node as a VOBOX using Quattor; install & setup ProdAgent; run a test production job
- Catalin
- finish deployment of SL5 VOBOX for Alice
- re-install WMS03 (hotswaping)
- integrate FronTier within ATLAS Frontier/squid network
- Derek
- Attend quattor workshop (Brussels)
- Investigate/deploy SCAS
- Matt
- Disaster recovery planning
- Richard
- Update Job Plan
- Complete quattor config/build for BDII servers
- CASTOR activities: Continue work on new pre-prod instance
- Mayo
- Collect More feedback on prototype system
- Begin working on additional functionality for future releases of the Metric System
- work on phase two of the on call documentation project
- design specification for IPMI project
Resource Requests
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|---|---|---|---|---|
WMS03 hotswappable | lcgwms03.gridpp.rl.ac.uk | Scheduled Outage | Oct 30 (09:00) | Nov 05 (16:00) | non-LHC |
Requirements and Blocking Issues
Description | Required By | Priority | Status |
---|---|---|---|
HW for Squid deployment | ATLAS | High | request made via RT Fabric queue; used reserved hardware |
HW for FronTier deployment | ATLAS | High | request made via RT Fabric queue; used reserved hardware |
HW for SL5 64-bit VOBOX | Alice | High | request made via RT Fabric queue; used reserved hardware |
Hardware for testing LFC/FTS resilience | High | DataServices want to deploy a DataGuard configuration to test LFC/FTS resilience; request for HW made through RT Fabric queue | |
Non-capacity HW for testing | Medium | Still using the old HW | |
Hardware for PPS | Medium | We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this. |
OnCall/AoD Cover
- Primary OnCall: Catalin (Mon-Thu)
- Grid OnCall:
- AoD: