Friday, July 20, 2007

Imperial SE - dCache removed ~30TB of CMS data

As requested by CMS users, this week we have cleaned up around ~30TB (orphaned files) of CMS data from IC dCache. We need to understand why so many orphaned files are generated in dCache.

Brunel SE running DPM 1.6.5

We were having problems with the storage element at Brunel so I upgraded it to DPM version 1.6.5 (via 1.6.3) this week. The upgrade didn't go totally smoothly but now things seem a lot better. Thanks to Greig for his usual excellent support.

Brunel running SL4 cluster

The worker nodes of dgc-grid-40 are now running the glite worker node release on SL4. It is passing the ops SAM tests and the VO tests that have run recently. There was a problem with LHCb production jobs trying to use edg-brokerinfo rather than glite-brokerinfo which I reported and they have now fixed. CMS production jobs have also completed successfully. Steve Lloyd's ATLAS tests pass apart from the 'New Package' part. Steve's comment was "My tests are still running release 12.0.6 for which the requirement is SL3 so they shouldn't really go into SL4 machines...this problem will go away when I switch to release 13.0.X as that's supposed to work on SL4". ATLAS production jobs seem to run OK but there seems to be a problem copying the output files back.

RHUL accounting problem

There was a problem with the apel accounting at RHUL this week:

ZoneInfo: /usr/java/j2sdk1.4.2_12/jre/lib/zi/ZoneInfoMappings (Too
many open files)
Thu Jul 19 00:35:06 GMT 2007: apel-pbs-log-parser - WARNING -
Exception opening file /var/spool/PBS/server_priv/accounting/20070713
java.io.FileNotFoundException:
/var/spool/PBS/server_priv/accounting/20070713 (Too many open files)

we solved it by moving some of the files out of /var/spool/PBS/server_priv/accounting.