[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Boolean expression finder

From: Marc Tardif
Subject: Boolean expression finder
Date: Fri, 11 May 2001 00:01:05 -0400 (EDT)

What is it?

GNU Bool is a utility for finding files that match a boolean expression.

The boolean operators supported are AND, OR and NOT.  Also supported is
the NEAR operator for locating two expressions within a short distance
from each other.

The text processing algorithm understands that newlines don't always mean
the end of a sentence.  Therefore, the string "afternoon sun" matches
"afternoon\nsun" (notice the newline) because adjacent lines are assumed
to be in the same context.  On the other hand, two newlines would not
match because they normally indicate a new paragraph which means a
different context.  The dash character is also supported to separate words
before a newline, so the string "after-\nnoon sun" would match.

The HTML processing algorithm understands many features of the HTML 4.01
standard.  Entities such as é are supported in decimal as é or
in hexadecimal as é.  Elements also retain their structural meaning
where the string "daytime" matches "<b>day</b>time" because the bold text
style does not separate words.  On the other hand, "<p>day</p><p>time</p>"
does not match because paragraphs separate context.

What next?

GNU Bool aims to support many other file formats.  In the near future, I
will be working on the mbox format and various compression algorithms.  
Afterwards, I intend to work on more involving text formats such as TeX
and PostScript.

Where is it?

For more information, contact me:
Marc Tardif <address@hidden>

reply via email to

[Prev in Thread] Current Thread [Next in Thread]