Search results

Create the page "Batch System" on this wiki!

Operations Bulletin 141215

...nical_Documents/WLCGFutureISUseCases_1.6.pdf PDF]). Looking at information system owned by WLCG (an interesting idea). Starting to prepare a Roadmap to GLUE ...repository/Technical_Documents/WLCGFutureISUseCases_1.3.pdf An Information System future use cases document has been produced].

44 KB (5,454 words) - 14:43, 17 December 2015
Operations Bulletin 211215

...nical_Documents/WLCGFutureISUseCases_1.6.pdf PDF]). Looking at information system owned by WLCG (an interesting idea). Starting to prepare a Roadmap to GLUE ...repository/Technical_Documents/WLCGFutureISUseCases_1.3.pdf An Information System future use cases document has been produced].

53 KB (6,852 words) - 11:31, 21 December 2015
Tier1 Operations Report 2016-01-06

... This problem was initially reported at the last meeting as a high rate of batch job failures seen by LHCb since around the 9th December. ...nd all canbemigr files migrated to tape. A faulty disk drive was replaced. System returned to production on Christmas Day!

16 KB (1,824 words) - 12:29, 6 January 2016
Operations Bulletin 281215

...nical_Documents/WLCGFutureISUseCases_1.6.pdf PDF]). Looking at information system owned by WLCG (an interesting idea). Starting to prepare a Roadmap to GLUE ...repository/Technical_Documents/WLCGFutureISUseCases_1.3.pdf An Information System future use cases document has been produced].

53 KB (6,920 words) - 15:52, 4 January 2016
Deployment Team Completed Actions

| Setting of alarms in the GridLoad system. |Various discussions indicated that the only flexible system is for sites to raise events on a case by case basis. Sites should do this

68 KB (11,032 words) - 13:08, 16 September 2016
Past Ticket Bulletins 2016

BIRMINGHAM ticket, regarding small VOs and their batch system. Daniela had to reopen the ticket, which I think has meant it snuck by Mark Small VO acls on the Birmingham batch system, Mark is just getting round to look at this too. In progress (29/11)

150 KB (23,740 words) - 12:54, 9 January 2017
Tier1 Operations Report 2016-01-27

...rver failures we have reviewed the situation - particularly looking at one batch of systems which show very high drive failure rates. ...f service. One disk was showing a lot of errors. That was replaced and the system returned to service the following day (20th Jan).

13 KB (1,350 words) - 10:02, 27 January 2016
RAL Tier1 weekly operations castor 15/01/2016

...- also a possible improvement in configuration, frequent write of a Oracle system log is slow and can be improved by writing to a dedicated area with a diffe * LHCb batch jobs failing to copy results into castor - changes made seems to have impro

7 KB (1,085 words) - 16:11, 18 January 2016
Operations Bulletin 180116

* Approach for configuring batch systems (e.g. setting up mem limits). ...nical_Documents/WLCGFutureISUseCases_1.6.pdf PDF]). Looking at information system owned by WLCG (an interesting idea). Starting to prepare a Roadmap to GLUE

51 KB (6,516 words) - 23:59, 18 January 2016
Operations Bulletin 281116

So far, methods exist for ARC CE, and Torque batch system. Method for VAC still rough and being worked out by (e.g.) ...15th a partial but significant database corruption occurred on the signing system for the CA. Data was restored from (offline) backups but the rebuild was n

46 KB (5,834 words) - 12:12, 28 November 2016
RAL Tier1 weekly operations castor 25/01/2016

configuration, frequent write of a Oracle system log is slow and can be improved by writing to * LHCb batch jobs failing to copy results into castor - changes made seems to have impro

7 KB (1,203 words) - 17:47, 23 January 2016
Operations Bulletin 010216

* Approach for configuring batch systems (e.g. setting up mem limits). * We are investigating why LHCb batch jobs sometimes fail to write results back to Castor (and the sometimes fail

47 KB (5,867 words) - 21:18, 31 January 2016
Tier1 Operations Report 2016-02-10

... (AtlasScratchDisk - D1T0) Failed on Monday 18th Jan with a read-only file system. On investigation three disks in the RAID set had problems. Following a lot ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when

15 KB (1,664 words) - 09:48, 10 February 2016
Operations Bulletin 080216

* Approach for configuring batch systems (e.g. setting up mem limits). * We are investigating why LHCb batch jobs sometimes fail to write results back to Castor (and the sometimes fail

46 KB (5,812 words) - 23:00, 8 February 2016
Operations Bulletin 150216

* We are working a refresh of the database system behind the LFC. * WLCG Information System Evolution Task Force is drafting refined definitions for LOG_CPU and PHYS_C

54 KB (7,071 words) - 09:10, 15 February 2016
Tier1 Operations Report 2017-04-26

 ...; padding-top: 0.1em; padding-bottom: 0.1em;" | Limits on concurrent batch system jobs.

14 KB (1,457 words) - 10:30, 26 April 2017
Operations Bulletin 210316

* EGI now has a timeline for deployment of the ARGO central monitoring system. ... LHCb pilot scripts tested in VMs: same pilot scripts can be used on VM or batch sites in multiprocessor slots.

42 KB (5,278 words) - 01:55, 20 March 2016
Tier1 Operations Report 2016-03-23

...etc) were also rebooted at this time and there was a confusion that led to batch jobs not being re-allowed to start until later that evening. ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when

12 KB (1,283 words) - 21:43, 22 March 2016
Operations Bulletin 280316

* EGI now has a timeline for deployment of the ARGO central monitoring system. ... LHCb pilot scripts tested in VMs: same pilot scripts can be used on VM or batch sites in multiprocessor slots.

44 KB (5,455 words) - 02:03, 29 March 2016
Tier1 Operations Report 2016-03-30

* GDSS620 (GenTape - D0T1) Reported a read-only file system on the 15th March and was taken out of production. Two T2K files that were ...y LHCb of a low but persistent rate of failure when copying the results of batch jobs to Castor. There is also a further problem that sometimes occurs when

13 KB (1,394 words) - 11:01, 30 March 2016

Search results

Navigation menu

Personal tools

Namespaces

Variants

Views

Actions

Search

Main GridPP website

Navigation

Tools