Getting Started
You should have the following installed:
Pike 7.8
Fins framework
Xapian full text library
Public.Xapian module
To start the server, first look at the settings in the config/dev.cfg file to
make sure that everything looks good. Once you've done that, set the FINS_HOME
environment variable so that it points to the location you've installed the
Fins framework. Once you've done that, you should be able to run the start
script, which is located in the bin/ directory:
FINS_HOME=/path/to/fins
export FINS_HOME
cd /path/to/fulltext
bin/start.sh
(or "bin/start.sh -d", optionally once you've verified everything's working)
FTAdmin
Fins/Xapian provides a script that performs certain administrative functions. This
script is located within the bin directory and performs the following functions:
Create a new index:
bin/ftadmin.sh new indexname
Grant access to an index (prints out the newly granted auth code):
bin/ftadmin.sh grant indexname
Revoke access to an index for an auth code:
bin/ftadmin.sh grant indexname authcode
Shut down the server (optionally after a delay):
bin/ftadmin.sh shutdown [seconds]
Note that in order for the script to work, the server must be running on the local
host.
Security
The FullText application supports 2 levels of security: standard and simplified. You
may choose either based on your particular needs, however the "standard" model is
enabled by default.
When using the standard security model, there are administrative authorization codes
that are used to create new indices as well as to grant or revoke access to a given
index. The administrative authorization codes are placed in the "auth" section of the
application configuration file, and multiple administrative authorization codes may
be enabled at one time. These codes are read at start up time and the application must
be shut down in order to flush existing codes.
In order to search or update an index while running in standard security mode, a client
must provide a valid index authorization code. A given code is specific for a particular
index and may be obtained by using the administrative client. Similarly, codes may
be revoked using the administrative client. Codes may be granted and accessed at any
time, without restarting the FullText application.
When using the simplified mode, during search or update operations, the FullText
application simply validates the authorization code provided by a client against its
list of administrative authorization codes. This can simplify management of
authorization codes for certain scenarios, such as developement or other small scale
installations at the expense of giving each user "the keys to the castle".
You may enable the simplified security mechanism by setting the "use_simple_security"
flag in the "auth" section of the application configuration file. When running in
the simplified mode, the grant and revoke functionality is disabled.
In either case, if a valid administrative access code is not present in the application
configuration file on startup, one will be created and enabled. A message will be
displayed in the application log along with the new administration authorization code.
Client Example
import FullText;
string index = "myFTIndex";
// change to '1' if you want to create the index if it doesn't exist.
int create_if_new = 0;
string authcode = "1234567890"; // see the security section for details on auth codes.
// if we're running the FullText application on http://localhost:8124,
// we can use the default url.
object a = AdminClient(0, authcode);
if(!a->exists(index))
{
a->new(index);
werror("new auth code for index: %O\n",
authcode = a->grant_access(index));
}
object u = UpdateClient(0, index, authcode);
// now, let's add some content
string content = "mary had a little lamb, its fleece was white as snow.";
string title = "mary and her lamb"; // the title of the content, stored and returned with searches
string handle = "/rhymes/mary"; // a (hopefully) unique identifier for this bit of content
u->add(title, Calendar.now()->seconds(), content, handle, 0, "text/plain");
// ok, now that we've added, we can search:
object s = SearchClient(0, index, authcode);
foreach(s->search("lamb");; mapping doc)
werror("found a hit: %O, rating: %O, handle: %O", doc->title, doc->score, doc->handle);
Indexing support for various file formats
The indexer has built in support for plain text files and HTML. You may add support for
additional file formats by telling the engine about programs that can convert other formats
to HTML or plain text.
Some examples that have been successfully tested:
PDF
http://pdftohtml.sourceforge.net/
Install pdftohtml and then add the following to your FullText config file:
[transform_pdf]
type=converter
mimetype=application/pdf
command=/usr/local/bin/pdftohtml -stdout -q %f
RTF
http://sourceforge.net/projects/rtf2html-lite/
The free tool, rtf2html, can be used to process rtf files. However, out of the box,
this tool does not behave as either a filter or converter. A simple script is included
in the extras folder which can be used to make the rtf2html utility behave in a compatible
mannter.
Install rtf2html, edit the rtf2html_converter script appropriately, and then add the
following to your FullText config file:
[transform_rtf]
type=converter
mimetype=text/rtf
command=/path/to/extras/rtf2html_converter %f
DOC/DOCX/ODT/ABW
AbiWord can load various Word/OpenOffice formats and includes a tool called "AbiCommand"
that can be used to read a file and convert it into HTML format. The actual implementation
is left as an exercise to for the reader, however, the following page includes almost
everything a user might need to make this happen. Hint: start with the RTF filter above.
http://www.abisource.com/wiki/AbiCommand
OTHERS
Apache Tika seems like it could be a useful tool, it includes support for a number
of popular file formats and has an out of the box command line utility.
Drawbacks include:
- written in java, so not exactly nimble
If you download the Tika jar from the Apache Tika website, you can use the following
config section to handle pdf, doc and various other formats:
[transform_tika]
type=converter
command=java -jar /home/hww3/Fins/FullText/extras/tika-app-1.0.jar -h %f
mimetype=application/pdf
mimetype=text/rtf
mimetype=application/rtf
mimetype=application/msword
mimetype=application/vnd.openxmlformats-officedocument.wordprocessingml.document
mimetype=application/vnd.oasis.opendocument.text
mimetype=application/x-vnd.oasis.opendocument.text