From GridPP Wiki
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Summary of Previous Week
Developments
- Alastair
- Deployed 2 Disk servers.
- Contacted Panda/Ganga developers to improve error information for ATLAS jobs at RAL.
- Tested poweruser analysis at RAL, found problem with CERN WMS.
- Andrew
- Submitted change request document for FTS channel timeout adjustment; applied change
- Updates check_pbs_efficiencies.pl Nagios script to allow automatic killing of low efficiency jobs for selected VOs
- Resolved failing SRMv2-user CMS SAM test
- TTreeCache & read-coalescing IO testing on reco & skimming jobs
- Added Ganglia monitoring of CMS tape migrations (from PhEDEx logs, not CASTOR)
- Investigated various CMS issues
- Catalin
- worked on old/new ALICE VOBOXes
- no progress on LHCb VOBOX quattorising
- still waiting from LFC@CERN feedback for recovery and consistency checks
- some work on MySQL migration
- attended various meetings
- Derek
- Moved change control system from dev helpdesk to prod helpdesk
- Produced metrics report
- Implemented cron jobs to back up lcgcenfs files to CEs
- Matt
- Tested new production CIP on test site BDII
- Tier-1 Review
- Richard
- Continued plan for proposed BDII changes during January
- Wrote a script to dump our DNS domain to simplify "which machine is that" type queries arising from monitoring alerts/emails
- CASTOR activities:
- Defined disk and tape servers to use with new pre-prod instance
- Mayo
- Created admin UI for metric system and wrote system user documentation
- created user account for Sarah Pearce to enable testing with regads to the possible gridpp extension
- Attended Cheney's NRPE training
- Worked on automating tape robot spreadsheet project
Operational Issues and Incidents
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Status
|
Plans for Week(s) Ahead
Plans
- Alastair
- Try and fix Poweruser issues
- Look into "slow" FTS rates in UK Cloud.
- Andrew
- Catalin
- continue work on MySQL migration
- follow up issue with t2k.org 'zero size' LFC entries
- minor issues with ALICE VOBOXes central monitoring
- decomission old SL4 ALICE VOBOXes
- Derek
- Document process for coping with catastrophic failure of lcgcenfs
- Document process for breaking helpdesk mail loops
- Matt
- Switch Site BDIIs to new CIPs
- GridPP4 input
- R-GMA Registry recovery testing
- Investigate APEL publishing problems (lcgbatch01)
- Richard
- Finish off the plan for proposed BDII changes during January
- Work with MB on getting a DNS zone delegated to Tier1
- Work with JA/DR on placing a link to "DNS dump" script on Tier1 web page
- CASTOR activities:
- Rebuild disk servers to be used in new pre-prod instance
- Update the software on tape server for new pre-prod instance
- Continue activity on SLC 4.8 templates
- Mayo
- Work on Metric system: adding change password feature for users / report printing features
- Work on possible exstention of system to include Gridpp
- Continue working on automated spreadsheet project
Resource Requests
Downtimes
Description
|
Hosts
|
Type
|
Start
|
End
|
Affected VO(s)
|
|
|
|
|
|
|
Requirements and Blocking Issues
Description
|
Required By
|
Priority
|
Status
|
LHCb SL5 64bit VOBOX deployment using Quattor
|
25 Nov 2009
|
Medium
|
Quattor recipe not yet available (RT#53392)
|
Hardware for testing LFC/FTS resilience
|
|
High
|
DataServices want to deploy a DataGuard configuration to test LFC/FTS resilience; request for HW made through RT Fabric queue
|
Hardware for PPS
|
|
High
|
We have made a commitment to test PPS pre-releases, and have no hardware dedicated for this.
|
Hardware for Grid Services testbed
|
|
Medium
|
|
OnCall/AoD Cover
- Primary OnCall:
- Grid OnCall:
- AoD: