December Summary

Dashboard Snapshots

  • Production snapshot:

Production snapshot

  • Prodctionn errors:

Production errors snapshot


  • EXEPANDA_JOBDISPATCHER_HEARTBEAT (945): power cut at CERN. Jobs lost contact with the panda server.
  • EXEPANDA_JOBKILL_SIGTERM (383): job killed by panda server, cause might be a condor bug but updating the software on the factory hasn't solved the problem. Factory have also be centralized so if it was any overload problem this should have solved it. Only certain tasks seem to suffer heavily from this error.
  • EXEPANDA_JOBKILL (101): Jobs from the task 95282 suffering from the error above have been killed by the operator
  • EXEPANDA_DQ2PUT_FILECOPYTIMEOUT (28): input files being copied from SE to local disk timed out. Under investigation.
  • EXEPANDA_JOBKILL_BYPILOT (23): looping jobs (not sure what it means in athena terms yet) are killed by the pilot job.