Software Update: HTCondor 8.8.5

Spread the love

The HTCondor Team at the University of Wisconsin-Madison has released a new stable version of its workload management system HTCondor. The version number has ended up at 8.8.5. HTCondor focuses on the management of compute-intensive tasks and can distribute them over several connected nodes. The user sends his task to HTCondor, after which it handles the process based on set policies and the availability of connected resources, and finally sends the results back to the user. HTCondor can, for example, control a dedicated Beowulf cluster, but also regular desktops that have nothing to do for a while. During the day SC16 Google, Fermilab and the HTCondor Team have a 160k-core cloud-based elastic compute cluster demonstrated. The announcement of this release looks like this:

HTCondor 8.8.5 released!

The HTCondor team is pleased to announce the release of HTCondor 8.8.5. A stable series release contains significant bug fixes.

New Features:

  • Added configuration parameter MAX_UDP_MSGS_PER_CYCLE, which controls how many UDP messages a daemon will read per DaemonCore event cycle. The default value of 1 maintains the behavior in previous versions of HTCondor. Setting a larger value can aid the ability of the condor_schedd and condor_collector daemons to handle heavy loads. (Ticket #7149)
  • Added configuration parameter MAX_TIMER_EVENTS_PER_CYCLE, which controls how many internal timer events a daemon will dispatch per event cycle. The default value of 3 maintains the behavior in previous versions of HTCondor. Changing the value to zero (meaning no limit) could help the condor_schedd handle heavy loads. (Ticket #7195)
  • Updated condor_gpu_discovery to recognize nVidia Volta and Turing GPUs (Ticket #7197)
  • By default, HTCondor will no longer collect general usage information and forward it back to the HTCondor team. (Ticket #7219)

Bugs Fixed:

  • Fixed a bug that would sometimes result in the condor_schedd on Windows becoming slow to respond to commands after a period of time. The slowness would persist until the condor_schedd was restarted. (Ticket #7143)
  • HTCondor daemons will no longer sit in a tight loop consuming the CPU when a network connection closes unexpectedly on Windows systems. (Ticket #7164)
  • Fixed a packaging error that caused the Java universe to be non-functional on Debian and Ubuntu systems. (Ticket #7209)
  • Fix a bug where singularity jobs with SINGULARITY_TARGET_DIR set would not have the job’s environment properly set. (Ticket #7140)
  • Fixed a bug that caused incorrect values ​​to be reported for the time taken to upload a job’s files. (Ticket #7147)
  • HTCondor will now always use TCP to release slots claimed by the dedicated scheduler during shutdown. This prevents some slots from staying in the Claimed/Idle state after a condor_schedd shutdown when running parallel jobs. (Ticket #7144)
  • Fixed a bug that caused the condor_schedd to not write a core file when it crashes on Linux. (Ticket #7163)
  • Fixed a bug in the condor_schedd that caused submit transforms to always reject submissions with more than one cluster id. This bug was particularly easy to trigger by attempting to queue more than one submit object in a single transaction using the Python bindings. (Ticket #7036)
  • Fixed a bug that prevented new jobs from materializing when jobs changed to run state and a max_idle value was specified. (Ticket #7178)
  • Fixed a bug that caused condor_chirp to crash when the getdir command was used for an empty directory. (Ticket #7168)
  • Fixed a bug that caused GPU utilization to not be reported in the job ad when an encrypted execute directory is used. (Ticket #7169)
  • Integer values ​​in ClassAds in HTCondor that are in hexadecimal or octal format are now rejected. Previously, they were read incorrectly. (Ticket #7127)
  • Fixed a bug in the condor_dagman parser which caused it to crash when certain commands were missing tokens. (Ticket #7196)
  • Fixed a bug in condor_dagman that caused it to fail when retrying a failed node with late materialization enabled. (Ticket #6946)
  • Minor change to the Python bindings to work around a bug in the third party collected program on Linux that resulted in a crash trying to load the HTCondor Python module. (Ticket #7182)
  • Fixed a bug that could cause a daemon’s log file to be created with the wrong owner. This would prevent the daemon from operating properly. (Ticket #7214)
  • Fixed a bug in condor_submit where it would require a match to a machine with GPUs when a job requested 0 GPUs. (Ticket #6938)
  • Fixed a bug in condor_qedit which was causing it to report an incorrect number of matching jobs. (Ticket #7119)
  • Fixed a bug where the annex-ec2 service would be disabled on Enterprise Linux systems when upgrading the HTCondor packages. (Ticket #7161)
  • Fixed an issue where condor_ssh_to_job would fail on Enterprise Linux systems when the administrator changed or deleted HTCondor’s default configuration file. (Ticket #7116)
  • HTCondor will update its default configuration file by default on Enterprise Linux systems. Previously, if the administrator modified the default configuration file, the new file would appear as /etc/condor/condor_config.rpmnew. (Ticket #7183)

Version number 8.8.5
Release status stable
Operating systems Windows 7, Linux, BSD, macOS, Solaris, UNIX, Windows Server 2012, Windows 8, Windows 10, Windows Server 2016
Website HTCondor
Download
License type Conditions (GNU/BSD/etc.)
You might also like