Monday, February 19, 2007

Wrestling with our RB

We are still observing very long time (several minutes) to have a job going from the waiting state to the scheduled state. This means that the network server of the rb is accepting the job but the workload manager is running out of steam to process it and do the match making.
  • I monitored the rb by looking at the number of entries in the input queue (/var/log/edgwl/workload_manager/input.fl). Checked the number of entries that matches the regular expression ("g$").
  • Plotting the number of entries waiting to be accepted by the workload manager as a function of time. The result is here.

The left scale (blue dots) is the number of jobs waiting to be matched. The right scale (red dots) is the number of jobs submitted per unit of 10 minutes

  • You can see a clear drop at the end of the x range. I think this is because I have reduced the number of threads for the network server and increased that number for the workload manager. The file to look at is /opt/edg/etc/edg_wl.conf .
    • For the NetWorkServer:
      • MasterThreads = 4;
      • DispatcherThreads = 6;
    • For the WorkLoadManager
      • NumberOfWorkerThreads = 10;
I will continue to monitor it during the night because the drop is not fully understood. Maybe it is the cms production that has slowed down and is giving some air to the rb.

No comments: