Search results

Jump to: navigation, search
  • ...ng filer migration. No dates yet but it will affect ARGUS, all CEs and all batch worker nodes (glExec) running GridJobs. The downtime is foreseen to last 1h * Lydia's document - Setup a system to do data archiving using FTS3
    41 KB (5,000 words) - 04:11, 1 September 2015
  • ...ng filer migration. No dates yet but it will affect ARGUS, all CEs and all batch worker nodes (glExec) running GridJobs. The downtime is foreseen to last 1h * Lydia's document - Setup a system to do data archiving using FTS3
    43 KB (5,351 words) - 16:49, 6 September 2015
  • ...ng filer migration. No dates yet but it will affect ARGUS, all CEs and all batch worker nodes (glExec) running GridJobs. The downtime is foreseen to last 1h * Lydia's document - Setup a system to do data archiving using FTS3
    44 KB (5,604 words) - 10:22, 15 September 2015
  • ...ng filer migration. No dates yet but it will affect ARGUS, all CEs and all batch worker nodes (glExec) running GridJobs. The downtime is foreseen to last 1h * Lydia's document - Setup a system to do data archiving using FTS3
    44 KB (5,552 words) - 22:25, 19 September 2015
  • ..., following the application of the updated FTS3 software to the production system last week a memory leak was introduced which also caused a set of problems * Updating the first batch of the remaining Castor disk servers (those in tape-backed service classes)
    14 KB (1,604 words) - 12:01, 23 September 2015
  • * Lydia's document - Setup a system to do data archiving using FTS3 ...mblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool
    45 KB (5,699 words) - 08:31, 28 September 2015
  • * Lydia's document - Setup a system to do data archiving using FTS3 ...mblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool
    46 KB (5,818 words) - 10:21, 19 October 2015
  • * Lydia's document - Setup a system to do data archiving using FTS3 ...mblyness with the CEs. However, I understand much of this is caused by the batch farm being busy. There are low-availability tickets 'on hold' for Liverpool
    52 KB (6,786 words) - 16:22, 12 October 2015
  • * We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail * Lydia's document - Setup a system to do data archiving using FTS3
    47 KB (6,004 words) - 20:08, 25 October 2015
  • * We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail * Lydia's document - Setup a system to do data archiving using FTS3
    48 KB (6,098 words) - 09:45, 2 November 2015
  • * We have been investigating the behaviour of some batch jobs as there is a low level of failures that are not understood. * gdss664 (AtlasTape - D0T1) was removed from service on the 28th Oct. The system was having some problems running some network commands which were resolved
    12 KB (1,234 words) - 14:05, 11 November 2015
  • ...ng a disk replacement and updating the firmware in the disk controller the system was re-run through the acceptance testing for 5 days before being returned ...d battery replacement and updating the firmware in the disk controller the system was re-run through the acceptance testing for 5 days before being returned
    12 KB (1,253 words) - 10:56, 4 November 2015
  • * We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail * Lydia's document - Setup a system to do data archiving using FTS3
    54 KB (7,032 words) - 12:38, 8 November 2015
  • * LHCb batch jobs failing to copy results into castor - changes made seems to have impro ...e that it is just not possible to simulate the behaviour on pre-production system. ACTIONS: RA to ensure the procedure for dealing with any recurrence of thi
    5 KB (850 words) - 11:33, 27 November 2015
  • * We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail * Lydia's document - Setup a system to do data archiving using FTS3
    48 KB (6,095 words) - 12:37, 16 November 2015
  • * We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail * Lydia's document - Setup a system to do data archiving using FTS3
    48 KB (6,163 words) - 16:24, 29 November 2015
  • ...- also a possible improvement in configuration, frequent write of a Oracle system log is slow and can be improved by writing to a dedicated area with a diffe * LHCb batch jobs failing to copy results into castor - changes made seems to have impro
    6 KB (1,018 words) - 12:25, 4 December 2015
  • ...repository/Technical_Documents/WLCGFutureISUseCases_1.3.pdf An Information System future use cases document has been produced]. * We are investigating why LHCB batch jobs sometimes fail to write results back to Castor (and the sometimes fail
    47 KB (5,899 words) - 20:49, 6 December 2015
  • * Approach for configuring batch systems (e.g. setting up mem limits). ...nical_Documents/WLCGFutureISUseCases_1.6.pdf PDF]). Looking at information system owned by WLCG (an interesting idea). Starting to prepare a Roadmap to GLUE
    47 KB (5,834 words) - 10:11, 11 January 2016
  • ...- also a possible improvement in configuration, frequent write of a Oracle system log is slow and can be improved by writing to a dedicated area with a diffe * LHCb batch jobs failing to copy results into castor - changes made seems to have impro
    7 KB (1,141 words) - 15:00, 11 December 2015
  • ...nical_Documents/WLCGFutureISUseCases_1.6.pdf PDF]). Looking at information system owned by WLCG (an interesting idea). Starting to prepare a Roadmap to GLUE ...repository/Technical_Documents/WLCGFutureISUseCases_1.3.pdf An Information System future use cases document has been produced].
    44 KB (5,454 words) - 14:43, 17 December 2015
  • ...nical_Documents/WLCGFutureISUseCases_1.6.pdf PDF]). Looking at information system owned by WLCG (an interesting idea). Starting to prepare a Roadmap to GLUE ...repository/Technical_Documents/WLCGFutureISUseCases_1.3.pdf An Information System future use cases document has been produced].
    53 KB (6,852 words) - 11:31, 21 December 2015
  • ... This problem was initially reported at the last meeting as a high rate of batch job failures seen by LHCb since around the 9th December. ...nd all canbemigr files migrated to tape. A faulty disk drive was replaced. System returned to production on Christmas Day!
    16 KB (1,824 words) - 12:29, 6 January 2016
  • ...nical_Documents/WLCGFutureISUseCases_1.6.pdf PDF]). Looking at information system owned by WLCG (an interesting idea). Starting to prepare a Roadmap to GLUE ...repository/Technical_Documents/WLCGFutureISUseCases_1.3.pdf An Information System future use cases document has been produced].
    53 KB (6,920 words) - 15:52, 4 January 2016
  • | Setting of alarms in the GridLoad system. |Various discussions indicated that the only flexible system is for sites to raise events on a case by case basis. Sites should do this
    68 KB (11,032 words) - 13:08, 16 September 2016
  • BIRMINGHAM ticket, regarding small VOs and their batch system. Daniela had to reopen the ticket, which I think has meant it snuck by Mark Small VO acls on the Birmingham batch system, Mark is just getting round to look at this too. In progress (29/11)
    150 KB (23,740 words) - 12:54, 9 January 2017
  • ...rver failures we have reviewed the situation - particularly looking at one batch of systems which show very high drive failure rates. ...f service. One disk was showing a lot of errors. That was replaced and the system returned to service the following day (20th Jan).
    13 KB (1,350 words) - 10:02, 27 January 2016
  • ...- also a possible improvement in configuration, frequent write of a Oracle system log is slow and can be improved by writing to a dedicated area with a diffe * LHCb batch jobs failing to copy results into castor - changes made seems to have impro
    7 KB (1,085 words) - 16:11, 18 January 2016
  • * Approach for configuring batch systems (e.g. setting up mem limits). ...nical_Documents/WLCGFutureISUseCases_1.6.pdf PDF]). Looking at information system owned by WLCG (an interesting idea). Starting to prepare a Roadmap to GLUE
    51 KB (6,516 words) - 23:59, 18 January 2016
  • So far, methods exist for ARC CE, and Torque batch system. Method for VAC still rough and being worked out by (e.g.) ...15th a partial but significant database corruption occurred on the signing system for the CA. Data was restored from (offline) backups but the rebuild was n
    46 KB (5,834 words) - 12:12, 28 November 2016
  • configuration, frequent write of a Oracle system log is slow and can be improved by writing to * LHCb batch jobs failing to copy results into castor - changes made seems to have impro
    7 KB (1,203 words) - 17:47, 23 January 2016
  • * Approach for configuring batch systems (e.g. setting up mem limits). * We are investigating why LHCb batch jobs sometimes fail to write results back to Castor (and the sometimes fail
    47 KB (5,867 words) - 21:18, 31 January 2016
  • ... (AtlasScratchDisk - D1T0) Failed on Monday 18th Jan with a read-only file system. On investigation three disks in the RAID set had problems. Following a lot ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    15 KB (1,664 words) - 09:48, 10 February 2016
  • * Approach for configuring batch systems (e.g. setting up mem limits). * We are investigating why LHCb batch jobs sometimes fail to write results back to Castor (and the sometimes fail
    46 KB (5,812 words) - 23:00, 8 February 2016
  • * We are working a refresh of the database system behind the LFC. * WLCG Information System Evolution Task Force is drafting refined definitions for LOG_CPU and PHYS_C
    54 KB (7,071 words) - 09:10, 15 February 2016
  • <!-- ******************Start Limits On Batch System Jobs***************** -----> ...; padding-top: 0.1em; padding-bottom: 0.1em;" | Limits on concurrent batch system jobs.
    14 KB (1,457 words) - 10:30, 26 April 2017
  • * EGI now has a timeline for deployment of the ARGO central monitoring system. ... LHCb pilot scripts tested in VMs: same pilot scripts can be used on VM or batch sites in multiprocessor slots.
    42 KB (5,278 words) - 01:55, 20 March 2016
  • ...etc) were also rebooted at this time and there was a confusion that led to batch jobs not being re-allowed to start until later that evening. ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    12 KB (1,283 words) - 21:43, 22 March 2016
  • * EGI now has a timeline for deployment of the ARGO central monitoring system. ... LHCb pilot scripts tested in VMs: same pilot scripts can be used on VM or batch sites in multiprocessor slots.
    44 KB (5,455 words) - 02:03, 29 March 2016
  • * GDSS620 (GenTape - D0T1) Reported a read-only file system on the 15th March and was taken out of production. Two T2K files that were ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    13 KB (1,394 words) - 11:01, 30 March 2016
  • * EGI now has a timeline for deployment of the ARGO central monitoring system. ... LHCb pilot scripts tested in VMs: same pilot scripts can be used on VM or batch sites in multiprocessor slots.
    44 KB (5,446 words) - 23:24, 3 April 2016
  • ... LHCb pilot scripts tested in VMs: same pilot scripts can be used on VM or batch sites in multiprocessor slots. ...las jobs fail due to a lost heartbeat. Alessandra's digging revealed batch system memory restrictions as the likely culprit, but we can chat about it if it d
    46 KB (5,853 words) - 07:32, 9 May 2016
  • ... LHCb pilot scripts tested in VMs: same pilot scripts can be used on VM or batch sites in multiprocessor slots. ...las jobs fail due to a lost heartbeat. Alessandra's digging revealed batch system memory restrictions as the likely culprit, but we can chat about it if it d
    46 KB (5,853 words) - 07:33, 9 May 2016
  • ...ring that afternoon full tape access (read & write) was restored. The tape system was left "at risk" over the weekend. ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    14 KB (1,576 words) - 08:37, 25 May 2016
  • ...squids are multiple-use this had a knock-on effect on CVMFS clients on the batch worker nodes. ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    16 KB (1,868 words) - 12:28, 19 October 2016
  • ...ch jobs were un-paused. In order to minimise load through the night no new batch jobs were started until the following morning. See blog post at: http://www ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    15 KB (1,734 words) - 10:46, 11 May 2016
  • ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when | At risk on tape system overnight following problem mounting tapes.
    13 KB (1,454 words) - 07:42, 7 June 2016
  • ...CG regarding the [https://indico.cern.ch/event/517084/ use of information system] (vs GOCDB)| [https://indico.cern.ch/event/517084/contributions/2151002/att ...hing. We have put in place various mitigations (e.g. a re-starter) and the system has worked through the weekend. The vendor is coming in tomorrow (Wed) to f
    44 KB (5,505 words) - 16:19, 3 June 2016
  • ... - this includes four Tier1 drives physically located in that library. The system ran stably during last night - with this very limited Tier1 tape capacity. ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    13 KB (1,445 words) - 10:54, 15 June 2016
  • ... weekend, although the control software (which has been running on a spare system) has been crashing a few times per day. Yesterday (Tuesday 7th June) we mov ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    14 KB (1,602 words) - 11:11, 8 June 2016
  • ...CG regarding the [https://indico.cern.ch/event/517084/ use of information system] (vs GOCDB)| [https://indico.cern.ch/event/517084/contributions/2151002/att ... LHCb pilot scripts tested in VMs: same pilot scripts can be used on VM or batch sites in multiprocessor slots.
    51 KB (6,664 words) - 22:21, 13 June 2016
  • * Batch system monitoring HEPiX working group - contact A Lahiff. ...CG regarding the [https://indico.cern.ch/event/517084/ use of information system] (vs GOCDB)| [https://indico.cern.ch/event/517084/contributions/2151002/att
    44 KB (5,489 words) - 22:21, 19 June 2016
  • * AL: a batch system used entirely by non-LHC users? * Batch system monitoring HEPiX working group - contact A Lahiff.
    43 KB (5,335 words) - 16:28, 3 July 2016
  • * Batch system monitoring HEPiX working group - contact A Lahiff. ...CG regarding the [https://indico.cern.ch/event/517084/ use of information system] (vs GOCDB)| [https://indico.cern.ch/event/517084/contributions/2151002/att
    45 KB (5,697 words) - 11:17, 27 June 2016
  • ...m and we have worked closely with the vendor (Oracle). Since that date the system has been stable - with no crashes at all for a week. We do have a reduced n ...n Tuesday 21st June. It is being drained ahead of sorting out the re-named system.
    15 KB (1,605 words) - 09:56, 29 June 2016
  • * AL: a batch system used entirely by non-LHC users? ...ng problems with the tape library control software: We are able to run the system stably but with a reduced number of the Tier1 tape drives enabled. The prob
    50 KB (6,425 words) - 22:25, 11 July 2016
  • ...ape drives. Initial results suggest this enables us to stably run the full system, with all tape drives in use, ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    12 KB (1,248 words) - 12:21, 6 July 2016
  • * AL: a batch system used entirely by non-LHC users? ... LHCb pilot scripts tested in VMs: same pilot scripts can be used on VM or batch sites in multiprocessor slots.
    45 KB (5,664 words) - 20:46, 17 July 2016
  • ... LHCb pilot scripts tested in VMs: same pilot scripts can be used on VM or batch sites in multiprocessor slots. ...er asks if LHCB can take a look as the jobs are consistently hitting batch system limits and wasting CPU resources because of this. Waiting for reply (18/7)
    46 KB (5,862 words) - 06:50, 25 July 2016
  • * The LSST VO has been enabled on the batch system. * Note: Upgrade of Database System behind the LFC on Monday (1st August).
    42 KB (5,278 words) - 19:59, 1 August 2016
  • ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when * The 2009 worker nodes are being drained from the batch system ahead of their use as tests systems before final decommissioning.
    12 KB (1,328 words) - 15:58, 9 August 2016
  • * The LSST VO has been enabled on the batch system. * Note: Upgrade of Database System behind the LFC on Monday (1st August).
    42 KB (5,192 words) - 21:21, 7 August 2016
  • * The LSST VO has been enabled on the batch system. * The Database System behind the LFC was upgraded (to new hardware) at the start of last week (Mo
    45 KB (5,779 words) - 09:19, 22 August 2016
  • * The LSST VO has been enabled on the batch system. * The Database System behind the LFC was upgraded (to new hardware) at the start of last week (Mo
    50 KB (6,392 words) - 07:47, 16 August 2016
  • * GDSS776 (LHCbDst - D1T0) failed with a read-only file system on Thursday 1st September, It was put back in service the following day - i ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    13 KB (1,414 words) - 14:58, 7 September 2016
  • * Atlas reported a problem with the batch system last Friday (9th Sep). It turned out that there was a problem on one partic ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    12 KB (1,290 words) - 11:29, 14 September 2016
  • ...w on use the [ https://operations-portal.egi.eu/downtimes/subscription new system]. ... LHCb pilot scripts tested in VMs: same pilot scripts can be used on VM or batch sites in multiprocessor slots.
    49 KB (6,409 words) - 00:39, 26 September 2016
  • ...w on use the [ https://operations-portal.egi.eu/downtimes/subscription new system]. ... LHCb pilot scripts tested in VMs: same pilot scripts can be used on VM or batch sites in multiprocessor slots.
    54 KB (7,110 words) - 14:59, 10 October 2016
  • So far, methods exist for ARC CE, and Torque batch system. Method for VAC still rough and being worked out by (e.g.) ...15th a partial but significant database corruption occurred on the signing system for the CA. Data was restored from (offline) backups but the rebuild was n
    46 KB (5,756 words) - 13:12, 30 January 2017
  • So far, methods exist for ARC CE, and Torque batch system. Method for VAC still rough and being worked out by (e.g.) ...15th a partial but significant database corruption occurred on the signing system for the CA. Data was restored from (offline) backups but the rebuild was n
    47 KB (6,026 words) - 09:22, 21 November 2016
  • ...nt - and attempted to move VMs to other nodes. It took a few hours for the system to recover. This affected a number of services including BDIIs, FTS nodes a <!-- ******************Start Limits On Batch System Jobs***************** ----->
    15 KB (1,598 words) - 12:18, 15 March 2017
  • ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when * There was a short (one to two hour) interruption to tape mounts while the system that runs the tape library control software was swapped on Tuesday morning
    12 KB (1,270 words) - 10:53, 17 November 2016
  • ...esponse a number of services were stopped. In essence we stopped the batch system on Monday (24th Oct). Storage (Castor) was able to continue running. At the ...day evening (19th Oct). This had a knock-on effect on CVMFS clients on the batch worker nodes and for some hours reduced the number of worker nodes availabl
    14 KB (1,523 words) - 13:14, 26 October 2016
  • * US: PNNL LHCONE system outage planned October 17-21 * Some changes were made to increase the number of CMS batch jobs that we run in order to bring the number more into line with the pledg
    50 KB (6,401 words) - 08:48, 31 October 2016
  • ...e others that were more exposed - were stopped since the Monday. The batch system and most of the others were brought back up by the end of Wednesday afterno ... modules were swapped over. On re-test the fault had cleared. However, the system crashed on Friday 28th Oct. It was returned to service yesterday (1st Nov).
    14 KB (1,581 words) - 17:12, 2 November 2016
  • <!-- ******************Start Limits On Batch System Jobs***************** -----> ...; padding-top: 0.1em; padding-bottom: 0.1em;" | Limits on concurrent batch system jobs.
    15 KB (1,509 words) - 10:08, 11 October 2017
  • ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when * There was an intervention on the ECHO Ceph system last week to enable a reconfiguration of its underlying network.
    13 KB (1,436 words) - 14:51, 23 November 2016
  • * We are seeing a high rate of reported disk problems on the OCF '14 batch of disk servers. In some of the cases the vendor finds no fault in the driv <!-- ******************Start Limits On Batch System Jobs***************** ----->
    16 KB (1,833 words) - 13:11, 7 June 2017
  • ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when * There was restart test of the ECHO Ceph system yesterday> this was to understand how best to do this and set-up appropriat
    13 KB (1,400 words) - 14:23, 30 November 2016
  • * We need to carry out firmware updates on a particular batch of servers - which are in use by Atlas, VMS and LHCb. Will arrange when thi So far, methods exist for ARC CE, and Torque batch system. Method for VAC still rough and being worked out by (e.g.)
    44 KB (5,462 words) - 10:47, 6 December 2016
  • * We need to carry out firmware updates on a particular batch of servers - which are in use by Atlas, VMS and LHCb. Will arrange when thi ...nning HTCondor jobs in Vac/Vcycle compatible VMs: adapted for ATLAS, local batch, and now being tested for ALICE at Manchester using a pool of HTCondor jobs
    45 KB (5,745 words) - 13:04, 12 December 2016
  • So far, methods exist for ARC CE, and Torque batch system. Method for VAC still rough and being worked out by (e.g.) ...15th a partial but significant database corruption occurred on the signing system for the CA. Data was restored from (offline) backups but the rebuild was n
    49 KB (6,219 words) - 11:44, 9 January 2017
  • ...ednesday (14th) we will carry out rolling firmware updates on a particular batch of servers - which are in use by Atlas, CMS and LHCb. ...nning HTCondor jobs in Vac/Vcycle compatible VMs: adapted for ATLAS, local batch, and now being tested for ALICE at Manchester using a pool of HTCondor jobs
    46 KB (5,848 words) - 09:12, 20 December 2016
  • The webdav/xroot ticket - after rebuilding the system from scratch and getting help from Dan it looks like xroot still isn't play ...info&ticket_id=130537 130537]) there's an invitation to the VO to test the system. Waiting for reply (13/9)
    121 KB (19,081 words) - 12:04, 23 January 2018
  • ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when ...) Firmware updates were applied to the RAID cards in the Clustervision '13 batch of disk servers.
    14 KB (1,569 words) - 14:30, 21 December 2016
  • ...ednesday (14th) we will carry out rolling firmware updates on a particular batch of servers - which are in use by Atlas, CMS and LHCb. ...nning HTCondor jobs in Vac/Vcycle compatible VMs: adapted for ATLAS, local batch, and now being tested for ALICE at Manchester using a pool of HTCondor jobs
    49 KB (6,317 words) - 09:33, 3 January 2017
  • ...ers but not much activity. At the end of the afternoon the number of ALICE batch jobs was cut back (to 500) as a temporary measure to reduce the load on the ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    14 KB (1,561 words) - 14:46, 18 January 2017
  • * GDSS665 (LhcbRawRdst - D0T1) failed on Saturday 31st Dec. Two disks in the system were replaced and it was returned to service on Friday 6th Jan. ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when
    14 KB (1,531 words) - 18:01, 17 January 2017
  • So far, methods exist for ARC CE, and Torque batch system. Method for VAC still rough and being worked out by (e.g.) ...15th a partial but significant database corruption occurred on the signing system for the CA. Data was restored from (offline) backups but the rebuild was n
    51 KB (6,530 words) - 08:56, 16 January 2017
  • So far, methods exist for ARC CE, and Torque batch system. Method for VAC still rough and being worked out by (e.g.) ...15th a partial but significant database corruption occurred on the signing system for the CA. Data was restored from (offline) backups but the rebuild was n
    47 KB (5,971 words) - 09:04, 23 January 2017
  • ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when * GDSS780 (LHCbDst - D1T0) crashed at around 8am this morning (Wed 25th Jan). System under investigation.
    15 KB (1,614 words) - 14:30, 25 January 2017
  • So far, methods exist for ARC CE, and Torque batch system. Method for VAC still rough and being worked out by (e.g.) ...15th a partial but significant database corruption occurred on the signing system for the CA. Data was restored from (offline) backups but the rebuild was n
    44 KB (5,446 words) - 15:00, 8 February 2017
  • So far, methods exist for ARC CE, and Torque batch system. Method for VAC still rough and being worked out by (e.g.) ...15th a partial but significant database corruption occurred on the signing system for the CA. Data was restored from (offline) backups but the rebuild was n
    49 KB (6,270 words) - 11:57, 13 February 2017
  • <!-- ******************Start Limits On Batch System Jobs***************** -----> ...; padding-top: 0.1em; padding-bottom: 0.1em;" | Limits on concurrent batch system jobs.
    14 KB (1,425 words) - 14:24, 22 February 2017
  • * The number of Atlas batch jobs being run is lower than expected. The batch (Condor) scheduling will be looked at to try and understand and improve thi <!-- ******************Start Limits On Batch System Jobs***************** ----->
    17 KB (1,714 words) - 14:41, 3 January 2018
  • * Steve: ARC sites are getting a beating from the ARGO monitoring system. Why? * Tests ongoing with some batch jobs for the LHC VOs running in SL6 containers on worker nodes running SL7.
    44 KB (5,413 words) - 16:17, 27 February 2017
  • <!-- ******************Start Limits On Batch System Jobs***************** -----> ...; padding-top: 0.1em; padding-bottom: 0.1em;" | Limits on concurrent batch system jobs.
    15 KB (1,577 words) - 14:44, 1 March 2017
  • <!-- ******************Start Limits On Batch System Jobs***************** -----> ...; padding-top: 0.1em; padding-bottom: 0.1em;" | Limits on concurrent batch system jobs.
    16 KB (1,690 words) - 13:58, 16 August 2017
  • * Steve: ARC sites are getting a beating from the ARGO monitoring system. Why? * Ongoing tests ongoing with some batch jobs for the LHC VOs running in SL6 containers on worker nodes running SL7.
    44 KB (5,399 words) - 12:16, 6 March 2017
  • *** Durham: Batch system upgrade led to one outage and a University wide internet connection loss le * Steve: ARC sites are getting a beating from the ARGO monitoring system. Why?
    42 KB (5,126 words) - 10:15, 13 March 2017

View (previous 100 | next 100) (20 | 50 | 100 | 250 | 500)