Thursday, November 09, 2006

SAM, NGS

  • Yesterday we have found why we had Aborted SAM tests. The reason was simple, the SAM jobs seems to use a lot of virtual memory and they where killed by our lovely SGE. We where unlucky since all the tests could be published and where showing ok. Only the RB state was showing an Abort with a Maradona error. The key to find why the jobs where failing was to use the sam tests ourself with the information found here.
  • Concerning NGS the setup is done at LeSC. But we have a problem with the home directories that are not mounting properly on the worker nodes. Keith is looking at it.
  • Another usefull info is that we do not have to setup a specific UI for the NGS. They can submit from their UI and they use the stage in switch of globus-job-submit (globus-job-submit mars-ce2.mars.lesc.doc.ic.ac.uk:2119/jobmanager-sge -q 10min -s `pwd`/test.sh)
  • APEL accounting, Duncan and Giuseppe are chasing problems with the data not being published.
  • LHCB installation at Imperial failed, Joel is trying another time
  • Today we reached more than 1.8k jobs in London, Thanks everyone and welcome to the new clusters. Brunel is running full steam and Imperial is catching up.