RAL Tier1 weekly operations Grid 20091026

Summary of Previous Week

Alastair
- Perform Security Audit
- Learn how to deploy disk servers for ATLAS
- Discuss Job Plan with Matt
- Discuss allocation of ATLAS disk space with Brian Davies and Stephen Burke
- Go to Shared Service training
Andrew
- FTS channel adjustments: timeouts doubled for STAR-FIHIPT2 & RALLCG2-CLOUDCMSITALY
- Disk server deployment (5 servers to cmsFarmRead)
- APEL & PBS comparisons for CREAM CE
- Correcting PBS jobs MySQL table for October
- Resolved problem with PhEDEx mss-remove agent
- Upgraded PhEDEx to 3.2.9
- Completed CMS "dark" data removal
- Investigating consistency between missing files lists from PhEDEx & CASTOR team
Catalin
- CRISTAL 1 course
- finished kickstarts for FronTier and SL5 VOBOX and waited for HW
- assisted the LFC ATLAS cleaning operation
- disk servers deployment for ALICE
Derek
- Updating vo configuration in quattor
- Testing helpdesk backup
- Cristal level 1
- SSC Training
- Out sick 1 day
Matt
- Determine LHCb service class requirements for new allocation
- Disk deployment meeting
Richard
- ORACLE SSC Training
- Further disk server deployments into Atlas NonProd (including updates to the TWiki instructions)
- Continued work on BDII/Quattor task
- CASTOR activities: Read through SDW's training slides; work on new pre-prod instance
Mayo
- Worked on the new Metrics Gathering System
- Thought Bubble website now in operation
- Initial research into IPMI power control project

Description	Start	End	Affected VO(s)	Severity	Status

Alastair
- Finish security audit (if not already finished)
- go through gLite training
- go through castor training slides
- learn about FTS and outputs that I will take over from Brian
- Update CPU efficiencies
Andrew
- Attend CMS Offline & Computing Workshop, CERN
Catalin
- ready to deploy SL5 VOBOX for Alice (waiting for HW)
- ready to deploy FronTier/squid for ATLAS (waiting for HW)
- finish Alice disk servers deployment
- start WMS03 drain
Derek
- Test helpdesk restore
- Updating quattor vo configuration
- Update CE documentation
Matt
- Check priorities for deploying Viglen 08 kit after it passes acceptance tests
- VO requirements capture
- Disaster recovery planning
Richard
- RPM packaging and installation for new BDII connection throttling script
- RPM packaging and installation for new BDII monitoring script
- Complete quattor config/build for BDII servers
- CASTOR activities: Continue work on new pre-prod instance
Mayo
- Continued work on New Metric Gathering System
- Begin Stage 2 of on call documentation project
- Continue research into IPMI power control project

Description	Hosts	Type	Start	End	Affected VO(s)
WMS03 hotswappable	lcgwms03.gridpp.rl.ac.uk	Scheduled Outage	Oct 30 (09:00)	Nov 05 (16:00)	non-LHC

Description	Required By	Priority	Status
HW for Squid deployment	ATLAS	High	request made via RT Fabric queue
HW for FronTier deployment	ATLAS	High	request made via RT Fabric queue
HW for SL5 64-bit VOBOX	Alice	High	request made via RT Fabric queue
Hardware for testing LFC/FTS resilience		High	DataServices want to deploy a DataGuard configuration to test LFC/FTS resilience; request for HW made through RT Fabric queue
Non-capacity HW for testing		Medium	Still using the old HW
Hardware for PPS		Medium	We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this.