Added tag 2.0 for changeset d0b5de3889d4
update ChangeLog.md
add more information to README.md on the provided shell scripts
This repository provides a build system for learner corpora which generates ANNIS data from EXB files created with EXMARaLDA’s Dulko tools.
In addition, this repository contains shell scripts for batch processing EXB
files with the Dulko XSLT stylesheets, contained in the main EXMARaLDA JAR file
(/opt/exmaralda/lib/app/exmaralda_nb.jar
in a standard Linux installation):
exb2exb.sh
exb2html.sh
exb2text.sh
exb2metadata.sh
For usage info, call them with -h
.
makeDulko requires a Unix-like system such as Linux, MacOS, or the Windows Linux Subsystem with the following prerequisites:
make
)sed
)openjdk-21-jre
)rsync
)zip
)In order to run ANNIS locally with the generated data, you will have to install ANNIS Desktop (available at https://corpus-tools.org/annis/).
makeDulko has been tested with OpenJDK Java runtime 21.0.2, Pepper 3.6.0, and ANNIS 4.7.0.
Open a terminal and cd
to src/
.
Create the path src/exmaralda/corpus/
by running mkdir -p exmaralda/corpus
.
Copy or link your EXMARaLDA sources to src/exmaralda/corpus/
.
Run make
or make all
in order to generate ANNIS data in annis/
from
your EXMARaLDA sources in src/exmaralda/corpus/
.
This presupposes that EXMARaLDA is installed in /opt/exmaralda
, containing
the main EXMARaLDA JAR file lib/app/exmaralda_nb.jar
and the Saxon JAR file
lib/app/lib/saxon9.jar
.
If EXMARaLDA is installed in another <directory>
, run make EXMARALDADIR=<directory>
instead. In order to change the path to the JAR
files, run make EXMARALDA=<EXMARaLDA JAR file> SAXON=<Saxon JAR file>
.
The default corpus name is MyCorpus
and the default version is 1.0
. Run
make CORPUS=<corpus> VERSION=<version>
to set them to <corpus>
and
<version>
, respectively.
Run make dist
in order to generate a ZIP file in dist/
with the ANNIS
data in annis/
. Running make src
will generate an additional ZIP file
with the build system and the EXMARaLDA sources.
Run ANNIS Desktop, click on "Administration" and upload the generated ZIP
file in dist/
.
Optionally, run make clean
in order to remove intermediate build files.
Running make distclean
also removes ANNIS data and ZIP files, if any.
Andreas Nolda (andreas@nolda.org)