From GridPP Wiki
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Operational Issues
Description
|
Start
|
End
|
Affected VO(s)
|
Severity
|
Status
|
Job status monitoring from CREAMCE
|
2-Feb-2010
|
|
CMS
|
medium
|
[10-Feb-2010] WMS patch available soon; CREAMCE new version available soon [07-Apr-2010] CMS tests have shown that WMS patches resolve the problem; still waiting for patch to be installed on the production WMSs in Italy [13-Jul-2010] CNAF WMSs have been updated; testing using backfill is in progress [19-Jul-2010] So far everything looks good
|
|
Downtimes
Description
|
Hosts
|
Type
|
Start
|
End
|
Affected VO(s)
|
gLite-WMS update + maintenance
|
lcgwms02
|
|
Thu 9 Sep 15:00
|
Thu 16 Sep 15:00
|
LHC
|
Blocking Issues
Description
|
Requested Date
|
Required By Date
|
Priority
|
Status
|
HW needed to test Dataguard technology for LFC/FTS
|
19 May 2010
|
15 June 2010
|
Medium
|
[24-05-2010]HW available; needs to be deployed by Fabric and then handed over to Dataservices
|
Developments/Plans
Highlights for Tier-1 Ops Meeting
Highlights for Tier-1 VO Liaison Meeting
Detailed Individual Reports
Alastair
- Investigated unavailable files on gdss81.
- Testing gdss5xx series disk server for deployment into production.
- Working on ATLAS software server, testing CVMFS
- 825 test jobs have been run.
- lcg0805 has been setup for production style testing, need to add queue into ATLAS system.
- Writing script to graph transfer times for FTS transfers
- Working on Hammer cloud test of castor 2.1.9
- Analysis queue setup
- Need to copy DBrelease into pre-prod and replicate
Andrew
- CMS CASTOR 2.1.9 testing
- Transfers from RAL to Imperial using gridftp internal [Ongoing]
- cmsFarmRead stress testing with xroot [Ongoing]
- Tested PhEDEx on lcgvo-s3-02
- VO Support Survey
- Preparing presentation for Monday 4pm meeting
- Updated ganglia job high time resolution CPU efficiency plots [Done]
- glite-APEL [Ongoing]
Catalin
- add new frontends to non-LHC LFC alias [done]
- add new frontends to LHCb LFC alias [done]
- gLite updates WMS01 LHC [done]
- gLite updates WMS02 LHC [ongoing]
- improve WMS monitoring [ongoing]
- add new frontends to Atlas LFC alias
- work on improving ganglia monitoring for Grid Services [ongoing]
- work on Helpdesk MySQL database migration [ongoing]
Derek
- Catching up
- CREAM CE quattor profile [ongoing]
- Investigating CREAM CE instability [ongoing]
- Deployed quattorised sudo config
- Refactored quattorised atlasbackup configuration
- Intervened on lcgce01 over weekend(11-12) to resolve job submission issue
Matt
- Further testing of Quattorised FTS FEs. [Ongoing]
- Rework FTS change control; factor out ATLAS power off. [Ongoing]
- Quattorisation of MyProxy nodes (write up Change Control). [Ongoing]
- Develop Ganglia service metrics [Done]
- Capacity Tracking plan/meeting [Done]
Richard
- Preparing an update to RAL top-level BDIIs to avoid file space problems
- Adding a Nagios check for # of entries in BDII servers
- Working on the "team status page" being developed as an action from team awayday [ongoing]
- Reviewing G/S process documentation [ongoing]
- CASTOR items:
- Helped Cheney with quattor issues building head nodes for facilities instance
VO Reports
ALICE
ATLAS
CMS
LHCb
OnCall/AoD Cover
OnCall Rota
- Primary OnCall: Catalin (Fri-Sun)
- Grid OnCall: Derek (Mon-Thu)