fix link
add link to EXMARaLDA blog
update README.md
This repository used to provide EXMARaLDA (Dulko) – a toolset for the EXMARaLDA Partitur-Editor for the annotation of learner data in learner corpora. From 2016 to 2021, it was developed separately from EXMARaLDA mainline by Andreas Nolda for the Dulko learner-corpus project. In 2018, this work was awarded the Innovation Prize in the engineering category by the University of Szeged.
Since July 2023, the Dulko toolset is an integrated part of the release version of the EXMARaLDA Partitur-Editor. Development and support of the Dulko tools continue on the EXMARaLDA GitHub repository.
Any existing user of EXMARaLDA (Dulko) is strongly encouraged to use the current EXMARaLDA release with the integrated Dulko tools instead. Installation and configuration are much easier now, and various Dulko components have been improved and generalised for the annotation of data beyond German learner corpora.
The original README below is provided for reference only.
EXMARaLDA (Dulko) is a set of tools for the EXMARaLDA Partitur-Editor with transformation scenarios (actually, XSLT 2.0 stylesheets) for the annotation of learner data in learner corpora, supporting tokenisation, part-of-speech tagging, lemmatisation, sentence-span computation, editing target hypotheses, detection of differences between target hypotheses and the learner text, error analysis, and metadata management (Hirschmann and Nolda 2019, Nolda 2019). It has been developed for the Dulko learner-corpus project at the University of Szeged.
This repository provides the sources of EXMARaLDA (Dulko). The latest release is
available as a ZIP archive
exmaralda-dulko-<VERSION>.zip
which contains, in particular, an executable for Microsoft Windows
(exmaralda-dulko.exe
) as well as start-up scripts for MacOS
(exmaralda-dulko.command
) and Linux (exmaralda-dulko.sh
).
Unless already installed, install a Java runtime environment (JRE) or Java development kit (JDK), e.g. Oracle Java[^1] or Amazon Corretto; on Linux, you can also use OpenJDK. Note that currently, Java version 8 is required.
Unless already installed, download
TreeTagger and
install it into some directory <DIR1>
.
On Microsoft Windows, extract the downloaded ZIP archive into
C:\Program Files\TreeTagger
or another directory. Note this directory for
future reference.
On MacOS, there should be a directory called tree-tagger-MacOSX-<version>
or similar in the Downloads folder. Drag this directory into the
Applications
folder or onto the desktop and rename it to tree-tagger
.
On Linux, extract the downloaded TAR.GZ archive into /opt/tree-tagger
or
$HOME/tree-tagger
.
After the installation, there should be a directory <DIR1>/bin
with the
binary tree-tagger.exe
on Microsoft Windows or tree-tagger
on MacOS and
Linux.
Create a subdirectory lib
in the TreeTagger directory <DIR1>
.
Download the German parameter file for TreeTagger.
On Microsoft Windows, uncompress this GZ file with
7-Zip or another tool, rename it to
german-utf8.par
and copy or move this file to <DIR1>\lib
.
On MacOS, there should be a file called german.par
in the Downloads
folder. Rename it to german-utf8.par
and drag it into <DIR1>/lib
.
On Linux, uncompress the GZ file, rename it to german-utf8.par
, and copy
or move it into <DIR1>/lib
.
Unless already installed, download the
release version of EXMARaLDA (1.6.1)
corresponding to your system and install it into some directory <DIR2>
.
Note that EXMARaLDA (Dulko) no longer works with older versions of EXMARaLDA.
On Microsoft Windows, it is recommended to use the default path for <DIR2>
(typically, C:\Program Files\EXMARaLDA
).
If you are running MacOS and have Oracle Java installed on your system, you
only need the
Partitur-Editor disk image for Oracle Java,
which you can install by dragging the icon called PartiturEditor_OJ
in
PartiturEditor_OJ.dmg
into the Applications
folder or onto the desktop.
On Linux, install EXMARaLDA into /opt/exmaralda
or $HOME/exmaralda
.
Download
EXMARaLDA (Dulko)
and install it into some directory <DIR3>
.
On Microsoft Windows, you can use for this task the setup program
exmaralda-dulko-<VERSION>-setup.exe
, which is included in the downloaded
ZIP archive. Please note that on this system, the installation directory
<DIR3>
must be a sister directory of <DIR2>
, which is the setup
program’s default (typically, C:\Program Files\EXMARaLDA (Dulko)
).
On MacOS, there should be a directory called exmaralda-dulko-<VERSION>
in
the Downloads folder. While you may run EXMARaLDA (Dulko) from there, it is
recommended to drag the directory into the Applications
folder or onto the
desktop and rename it to exmaralda-dulko
.
On Linux, extract the ZIP archive to /opt/exmaralda-dulko
,
$HOME/exmaralda-dulko
, or another directory of your choice.
On Microsoft Windows, search for SystemPropertiesAdvanced
and create a
system environment variable with the name TREETAGGER_HOME
and the path
to the TreeTagger directory <DIR1>
which you noted during the installation
of the TreeTagger.
On MacOS and Linux, the environment variable TREETAGGER_HOME
is set by the
start-up script exmaralda-dulko.command
or exmaralda-dulko.sh
in
<DIR3>
(unless already set by the environment). If you have installed
TreeTagger into one of the directories recommended in the installation
instructions above, nothing needs to be done. If you have installed it into
a non-standard directory <DIR1>
, open the start-up script with a text
editor and set the variable TREETAGGER_HOME
to <DIR1>
.
If you have installed EXMARaLDA into a non-standard directory <DIR2>
on
MacOS or Linux, set the variable EXMARALDADIR
in the start-up script
exmaralda-dulko.command
or exmaralda-dulko.sh
to <DIR2>
.
Run EXMARaLDA (Dulko).
In order to run EXMARaLDA (Dulko) on Microsoft Windows, click on the
EXMARaLDA (Dulko)
icon on the desktop or run it from the EXMARaLDA submenu
in the start menu.
On MacOS, run the start-up script exmaralda-dulko.command
in <DIR3>
. If
the script cannot be run with a double click, right-click on it and open it
with the terminal.
On Linux, run the start-up script exmaralda-dulko.sh
in <DIR3>
. If you
add export PATH=<DIR3>:$PATH
to /etc/profile
or $HOME/.profile
and
copy or move the desktop file exmaralda-dulko.desktop
from <DIR3>
to
/usr/local/share/applications
or $HOME/.local/share/applications
, you
can also run EXMARaLDA (Dulko) from your desktop’s application menu.
Open the annotation panel (‘View’ > ‘Annotation panel’) and open the file
<DIR3>/annotation-panel.xml
.
Optionally, open the preferences (‘Edit’ > ‘Preferences’), switch to the
‘Stylesheets’ tab, and set the ‘Transcription to format table’ stylesheet to
<DIR3>/format-table.xsl
.
Open <DIR3>/dulko.template.exb
in EXMARaLDA (Dulko) (‘File’ > ‘Open’) and
save it under a new name (‘File’ > ‘Save as’).[^2]
Open the metainformation dialog (‘Transcription’ > ‘Metainformation’) and edit general metadata.
Open the speakertable (‘Transcription’ > ‘Speakertable’) and edit the speaker metadata.[^3]
On the main window, write or paste the learner text into one or several cells of the first tier. You can also first work on a proper part of the learner text (e.g. the first sentence) and add further parts later on.
Apply the transformation scenario ‘Dulko: word-Spur (Lernertext)’ (‘Transcription’ > ‘Transformation’), which tokenises the learner text and normalises punctuation marks.
If you want to annotate editorial changes by the learner, apply the
transformation scenario ‘Dulko: orig-Spur (Lernertext)’, which adds a tier
for the original, unchanged, learner text. When editing this tier, you can
use the symbols ¶
, |
, -
, and _
for marking paragraph breaks, line
breaks, hyphenations, and omissions, respectively.[^4]
Apply the transformation scenario ‘Dulko: S-, pos- und lemma-Spuren (Lernertext)’ for parts-of-speech tagging, lemmatisation, and sentence-span identification of the learner text.[^5]
If you have added a tier for the original learner text in step 6, apply the transformation scenario ‘Dulko: Diff-Spur (Lernertext)’, which detects editorial changes.
If you have used some of the symbols ¶
, |
, -
, or _
, mentioned above
in step 6, on the tier for the original learner text, apply the
transformation scenario ‘Dulko: Layout-Spur (Lernertext)’, which
automatically tags those symbols.
Optionally, apply the transformation scenario ‘Dulko: Graph-Spur (Lernertext)’, which adds a tier on which you can tag graphical renditions of the learner text by means of the annotation panel.
Apply the transformation scenario ‘Dulko: trans-Spur (Lernertext)’ in case the learner text is a translation. Write or paste the text translated by the learner into the cells of the new tier.
Apply the transformation scenario ‘Dulko: ZH- und Fehler-Spuren (1. Zielhypothese)’, which adds tiers for a target hypothesis and for error analysis. Edit the target hypothesis, and tag errors by means of the annotation panel.
Apply the transformation scenario ‘Dulko: ZHS-, ZHpos- und ZHlemma-Spuren (1. Zielhypothese)’ for parts-of-speech tagging, lemmatisation, and sentence-span identification of the target hypothesis.
Finally, apply the transformation scenario ‘Dulko: ZHDiff-Spur (1. Zielhypothese)’, which detects differences between the target hypothesis and the learner text.[^6]
In order to annotate further target hypotheses, apply the transformation scenarios for ‘2. Zielhypothese’, ‘3. Zielhypothese’, or ‘weitere Zielhypothese’. These transformation scenarios do not operate on the learner text but on the preceding target hypothesis.
Note that you can re-apply any of the above transformation scenarios in case you want to update the corresponding tiers, e.g. in order to revise the annotations or annotate further parts of the learner text.[^7]
If required, additional timeline items can be inserted by clicking on the next timeline item and choosing ‘Timeline’ > ‘Insert timeline item’. The transformation scenario ‘Dulko: Zeitachse’, in turn, removes unused timeline items.
Apply the transformation scenario ‘Dulko: HTML-Version’ for exporting the table sentence-wise into a HTML file, which can be viewed and printed by means of your favourite browser.
Run ‘Transcription’ > ‘Export segmented transcription’ for exporting the table to an EXS file, which can be used in COMA and EXAKT.[^8]
Apply the transformation scenarios ‘Dulko: ANNIS-kompatible Version’ and ‘Dulko:
Pepper-kompatible Metadaten-Liste’ before exporting the final EXMARaLDA file to
ANNIS via Pepper. The former transformation
scenario deletes redundant annotations and adds namespace prefixes like ZH1
and ZH2
to the target-hypothesis and error tiers; those namespace prefixes are
needed for properly ordering the tiers in ANNIS. The latter transformation
scenario outputs an attribute-value list with corpus-level metadata for Pepper
(cf. Pepper’s customisation property
pepper.before.readMeta
).[^9]
Andreas Nolda (andreas@nolda.org)
[^1]: A user of Microsoft Windows 8.1 reported that the installation program of
the Oracle Java runtime environment does not set the system environment
variable JAVA_HOME
to the JRE installation path, which prevented
EXMARaLDA from running. Cf. the configuration instructions in this README
on how to set such variables. Alternatively, you can install the Oracle
Java development kit or Amazon Corretto, which both appear to properly set
this variable.
[^2]: Alternatively, you may start from a blank table (‘File’ > ‘New’). Metadata
can be imported from <DIR3>/dulko.template.exb
by applying the
transformation scenario ‘Dulko: Metadaten’.
[^3]: Part of the speaker metadata (viz. the value of the ‘Abbreviation’ field)
is used to generate tier names. If changed, the tier names can be updated
by means of the the transformation scenario ‘Dulko: Spurnamen’.
[^4]: In order to mark a hyphenation in the learner text, the corresponding word
on the tier for the original learner text has to be split into three
events consisting of the first part of the word, the symbol -
, and the
second part of the word, respectively. Optionally, you can add a further
event with the symbol |
after -
as an explicit line-break mark.
[^5]: The stylesheets for sentence-span tiers (Satzspannen) automatically
identifies sentence spans ending in a punctuation character that
TreeTagger tags as $.
or ending in an abbreviation followed by a
capitalised version of a non-noun. Sentence spans with different endings
have to be tagged manually by splitting the corresponding sentence-span
event inserted by the stylesheet; the sentence-span names can then be
regenerated with the transformation scenario ‘Dulko: Satzspannen’.
[^6]: The stylesheet for difference tiers (Differenz-Spuren) tries hard to
detect movement source and target pairs, which are tagged with
MOV[EMENT]S[OURCE]
and MOV[EMENT]T[ARGET]
, respectively. If unsure, it
tags potential movement sources and targets with the tags MOVS/DEL
and
MOVT/INS
, which have to be manually disambiguated (e.g. by means of the
annotation panel).
[^7]: The only exception is the transformation scenario ‘Dulko: ZH- und
Fehler-Spuren (weitere Zielhypothese)’, which always creates new tiers.
[^8]: In EXMARaLDA (Dulko), this menu entry runs the XSLT stylesheet
exb2exs.xsl
on the current EXB file.
[^9]: A build system for generating ANNIS data from EXMARaLDA sources annotated
with EXMARaLDA (Dulko) is available at
makeDulko.