RAL Tier1 weekly Operations Grid 20121126
From GridPP Wiki
Revision as of 16:50, 26 November 2012 by Orlin alexandrov (Talk | contribs)
Contents
Operational Issues
Description | Start | End | Affected VO(s) | Severity | Status |
---|---|---|---|---|---|
Downtimes
Description | Hosts | Type | Start | End | Affected VO(s) |
---|---|---|---|---|---|
Blocking Issues
Description | Requested Date | Required By Date | Priority | Status |
---|---|---|---|---|
Developments/Plans
Highlights for Tier-1 Ops Meeting
Highlights for Tier-1 VO Liaison Meeting
Detailed Individual Reports
Andrew
- Last week:
- Upgraded MyProxy to UMD-2
- Restoring services after electrical problems
- Carried out a little test using Condor + Stratuslab where worker nodes are created on demand
- CMS processing
- Coming week:
- Capacity signoff meeting + preparations
- Upgrade lcgui02 to EMI-2
- Start testing APEL upgrade to EMI-2
- CMS processing
Catalin
- Last week
- dealing with power cut disruptions
- work on CVMFS for MICE and NA62
- This week
- more work on CVMFS
Ian
- Last week:
- Dealing with Power event
- Coming week:
- FedCloud F2F Amsterdam
- Work on Stratuslab Cloud
- Aquilon
- Follow up with synthetic ethernet Quattor config issue
James
- Last Week
- Fixing things that blew up after over-voltage event.
- This Week
- Fixing things that blew up after over-voltage event.
- Debugging system healthcheck problems.
Orlin
- Check & start some grid services after Tier1 power failure [done]
- Upgrade WNs to SL5 EMI2 [ongoing]
- Implement logging to syslog & export the logs to central server [ongoing]
- Assign some production WNs to authenticate with EMI2/SL6 Argus Server [to do]
- Prepare & Submit change-control for EMI2/SL6 Argus Server [to do]
- Test High Availability & failover for Argus server with Corosync/RGManager/CMAN [to do]
- Bring back the Testbed back in order, check the list of services [to do]
- Quattorise, Install & Test EMI2/SL6 WNs on the gridTetst queue [to do]
- Test a possibility of EMI2/SL6 WN - preinstalled cloud image with a batch-client [to do]
- Test and compare jobs running on cloud/hypervisor with physical hardware [to do]
- Test & implement Extra monitoring tools for CREAMCEs (if necessary) [to do]
- Grid certificates and elastic FTS [to think about]
VO Reports
ALICE
ATLAS
CMS
LHCb
OnCall/AoD Cover
- Primary OnCall: Catalin (Mon-Sun)
Absences
James A/L Tuesday Ian EGI FedCloud F2F Tuesday/Wednesday