A build system for learner corpora annotated with the EXMARaLDA (Dulko) tools.
Added tag 2.0 for changeset d0b5de3889d4
update ChangeLog.md
add more information to README.md on the provided shell scripts

clone

read-only
https://hg.sr.ht/~nolda/makedulko
read/write
ssh://hg@hg.sr.ht/~nolda/makedulko

#makeDulko

This repository provides a build system for learner corpora which generates ANNIS data from EXB files created with EXMARaLDA’s Dulko tools.

In addition, this repository contains shell scripts for batch processing EXB files with the Dulko XSLT stylesheets, contained in the main EXMARaLDA JAR file (/opt/exmaralda/lib/app/exmaralda_nb.jar in a standard Linux installation):

  • exb2exb.sh
  • exb2html.sh
  • exb2text.sh
  • exb2metadata.sh

For usage info, call them with -h.

#Prerequisites

makeDulko requires a Unix-like system such as Linux, MacOS, or the Windows Linux Subsystem with the following prerequisites:

  • GNU make (provided by the Debian package make)
  • GNU sed (provided by the Debian package sed)
  • a Java runtime with version > 17 (as provided by the Debian package openjdk-21-jre)
  • rsync (provided by the Debian package rsync)
  • zip (provided by the Debian package zip)
  • EXMARaLDA (available at https://exmaralda.org)
  • Pepper (available at https://corpus-tools.org/pepper/)

In order to run ANNIS locally with the generated data, you will have to install ANNIS Desktop (available at https://corpus-tools.org/annis/).

makeDulko has been tested with OpenJDK Java runtime 21.0.2, Pepper 3.6.0, and ANNIS 4.7.0.

#Usage

  1. Open a terminal and cd to src/.

  2. Create the path src/exmaralda/corpus/ by running mkdir -p exmaralda/corpus.

  3. Copy or link your EXMARaLDA sources to src/exmaralda/corpus/.

  4. Run make or make all in order to generate ANNIS data in annis/ from your EXMARaLDA sources in src/exmaralda/corpus/.

    This presupposes that EXMARaLDA is installed in /opt/exmaralda, containing the main EXMARaLDA JAR file lib/app/exmaralda_nb.jar and the Saxon JAR file lib/app/lib/saxon9.jar.

    If EXMARaLDA is installed in another <directory>, run make EXMARALDADIR=<directory> instead. In order to change the path to the JAR files, run make EXMARALDA=<EXMARaLDA JAR file> SAXON=<Saxon JAR file>.

    The default corpus name is MyCorpus and the default version is 1.0. Run make CORPUS=<corpus> VERSION=<version> to set them to <corpus> and <version>, respectively.

  5. Run make dist in order to generate a ZIP file in dist/ with the ANNIS data in annis/. Running make src will generate an additional ZIP file with the build system and the EXMARaLDA sources.

  6. Run ANNIS Desktop, click on "Administration" and upload the generated ZIP file in dist/.

  7. Optionally, run make clean in order to remove intermediate build files. Running make distclean also removes ANNIS data and ZIP files, if any.

Andreas Nolda (andreas@nolda.org)