Software Update: Xapian / Omega 1.0.11

Spread the love

Xapian is an open source information retrieval library written in c++ and can be used as an engine behind a search engine. It includes a proprietary database format, APIs to edit and search these databases, tools to check the databases, and linking capabilities for other languages ​​such as Java, Ruby, PHP, and Python. Omega is an application that can be used on top of Xapian as a search engine to search Xapian databases. Omega also includes some tools that can be used to populate databases with data. The development team at The Xapian Project released version 1.0.11 of Xapian and Omega a few days ago. The lists of changes for the various components are as follows:

Xapian core 1.0.11:

API:

  • Inquire::get_mset():
    • Now throws UnimplementedError if there’s a percentage cutoff and sorting is primarily by value – this has never been correctly supported and it’s better to warn people than give incorrect results.
    • No longer needlessly copies the results internally.
    • When searching multiple databases, now recalculates the maximum attainable weight after each database which may allow it to terminate earlier. (ticket#336).
    • Fix inconsistent percentage scores when sorting primarily by value, except when a MatchDecider is also being used; document this remaining problem case. (ticket#216)
  • Enquire::set_sort_by_value() (and similar methods): Rename the wrongly named “ascending” parameter to “reverse”, and note that its value should always be explicitly given since defaulting to “reverse=true” is confusing and the default will be deprecated in 1.1.0. (ticket#311)
  • Database::allterms_begin(): Fix memory leak when iterating all terms from more than one database.
  • Query::get_terms_begin(): Don’t return “” from the TermIterator (happened when the query contained or was Query::MatchAll).
  • Add QueryParser::FLAG_DEFAULT to make it easier to add flags to those set by default.

test suite:

  • The testsuite now reports problems detected by valgrind with newer valgrind versions. Drop support for running the testsuite under valgrind < 3.3.0 (well over a year old) as this greatly simplifies the configure tests.
  • Fix usage message for options which take arguments in –help output from test programs – “-x=foo” doesn’t work, the correct syntax is “-x foo”.
  • If comparing MSet percentages fails, report the differing percentages if in verbose mode.
  • Add test that backends don’t truncate total document length to 32 bits.
  • Disable lockfileumask1 (regression testcase added in 1.0.10) on Cygwin and on OS/2.

flint back end:

  • The configure test for pread() and pwrite() got accidentally disabled in 0.8.4 and we’ve always been using llseek() followed by read() or write() since then. The configure test is now fixed, and gives a slight speedup (3% measured for searching).
  • The child process used to implement WritableDatabase locking now changes directory to / so that it doesn’t block unmounting of any partitions and closes any open file descriptors which aren’t relating to locking so that if those files are closed by our parent and deleted the disk space gets released right away.
  • We now reuse the same zlib zstream structures rather than using a fresh one for each operation. This doesn’t make a measurable difference in our own tests on Linux but reportedly is measurably faster on some systems. (ticket #325)

quartz back end:

  • The pread()/pwrite() fix also speeds up quartz.

remote backend:

  • Avoid copying Query::Internal objects needlessly when unserializing Query objects.

in-memory backend:

  • Store the (non-normalised) document lengths as Xapian::termcount (unsigned int) rather than Xapian::doclength (double) which saves 4 bytes per document.

build system:

  • configure: The output of g++ –version changed format (again) with GCC 4.3 which meant configure got “g++” for the version. Instead use the (hopefully) more robust technique of using g++ -E to pull out __GNUC__ and __GNUC_MINOR__.

documentation:

  • API documentation:
    • WritableDatabase::flush() can’t throw DatabaseLockError.
    • WritableDatabase’s constructor can throw at least DatabaseCorruptError or DatabaseLockError.
    • Document how to get all matches from Inquire::get_mset().
    • Other minor improvements.
  • docs/sorting.html: Clarify meaning.

portability:

  • Fix “#line” directives in generated file queryparser/queryparser_internal.cc to give a relative path – previously they had a full path when generated by a VPATH build (as release tarballs are), and this confused GCC 2.95 and depcomp.
  • Fix for compiling with Sun’s compiler (untested as we no longer have access to it).

Omega 1.0.11:

documentation:

  • cgiparams.html: Note the technique of using a stub database file to allow a default of searching over multiple databases.

indexers – omindex:

  • Add support for indexing Microsoft Office 2007 formats and XPS files (bug#290).
  • Fix the extraction of metadata from OpenDocument formats.
  • Fix “-l” which would previously always cause a segmentation fault if used (“–depth-limit” wasn’t affected).

build system:

  • configure: The output of g++ –version changed format (again) with GCC 4.3 which meant configure got “g++” for the version. Instead use the (hopefully) more robust technique of using g++ -E to pull out __GNUC__ and __GNUC_MINOR__.
  • configure: Turn on _FORTIFY_SOURCE where available (as we do in xapian-core).

portability:

  • Fix to compile when RLIMIT_AS isn’t available (as on NetBSD and OpenBSD). Instead use RLIMIT_VMEM or RLIMIT_DATA if either is available, else don’t try to limit the memory the filter process can use.

Xapian bindings 1.0.11:

Documentation:

  • README: Note that 1.0.x doesn’t (and isn’t currently planned to) support Python 3, and possible current issues with Ruby 1.9.

portability:

  • Merge fixes from Cygwin Ports, so bindings should build out of the box on Cygwin.

Python:

  • python/docs/examples/: Use str(obj) rather than obj.get_description() (the latter is deprecated, and support was removed in 1.0.0).
  • Add support for using the new name (“reverse”) for the second argument of set_sort_by_key() and set_sort_by_value() and friends as a named parameter. The old name (“ascending”) is still supported, but will be deprecated in 1.1.0.
  • Keep Python references to Sorter, Stopper, and ValueRangeProcessor objects which get set on other objects to avoid segmentation faults if they go out of scope before the object they are set on does. (ticket#341)

ruby:

  • Fixes for Ruby 1.9 compatibility (ticket#323). The test harness currently fails so “make check” doesn’t pass, but code using the bindings should work.

Version number 1.0.11
Release status Final
Operating systems Windows 2000, Linux, BSD, Windows XP, macOS, OS/2, Solaris, UNIX, Windows Server 2003, Windows Vista, Windows Server 2008
Website The Xapian Project
Download
License type GPL
You might also like