e25929ac03ee — Phillip Alday default tip 10 years ago
work on figures so that they fit better into the size requirements used by linguistic vanguard
34f4bd8158ef — Phillip Alday 10 years ago
rtf output path for the journal
57154806f8c0 — Phillip Alday 10 years ago
fixed date

clone

read-only
https://hg.sr.ht/~palday/linguistic-vanguard
read/write
ssh://hg@hg.sr.ht/~palday/linguistic-vanguard

#Midgard

#License

Files without an explicit license or source notice (e.g. data and config files, short helper scripts) are licensed under the Creative Commons Attribution-ShareAlike 4.0 International License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/4.0/. Files with an explicit source notice are copyright their respective notices and are subject to the licenses the copyright holder places upon them.

#Required Software

If you only wish to generate the figures and documentation from Linguistic Vanguard, then you only need Python, R and the listed R packages. Additional Python packages, OpenSesame and RStudio are only necessary if you wish to run the experiment or otherwise do further exploration.

Please note that in our previous work, we used "old" lme4. The optimizers and convergence checks in the newer versions are generally much better yet more sensitive to issues of scaling. In our older work, the variables were generally only centered; here, they are both scaled and centered (i.e., they are $z$-transformed), but the model fits are near identical. (Individual estimates may differ in some of the less significant digits.)

#Sample Run

#Use Your Own Data

  1. python load_data.py
  2. Run experiment via OpenSesame. The experiment starts with a few prompts then continues on to test sentences.
    1. Choose which key will correspond to the left option.
    2. Choose which key will correspond to the right option.
    3. Pay close attention to each sentence and afterwards choose the option (left or right) that best fills blink in the comprehension question.
  3. python pickle2csv.py result.pickle
  4. Examine your data
    • Interactively with the same analysis used in the article: R --vanilla < regression.R
    • Generate a quick custom report (lots of pictures) for your own weights: knit individual_report.Rmd with a recent version of RStudio

#Using one of the supplied sample data sets

  1. python pickle2csv.py sample01a.pickle
  2. R --vanilla < regression.R

#Generating Stimuli

The file load_data.py generates stimuli by extracting the nouns and verbs from the stimuli provided by Alexander Dröge (standarddeutsch_items.csv, citation to come). Because the source stimuli consists of items with a clear semantic directionality (both in terms of individual semantic features and world knowledge), the subjects, objects and verbs are mixed in a random way to produce a new set of items. Moreover, the original stimuli have been expanded so that each "subject" and "object" is present in both accusative-singular, nominative-singular and plural (which is always case ambiguous in German), and this extra variants are used in the generation of new stimuli.

For each new item, two NPs are chosen at random, one each from the "subject" and the "object" pools. Morphological case for each NP is also chosen at random, thus allowing for items where morphological case is not a reliable cue, either due to ambiguity or ungrammaticality. The verb is chosen similarly, with a weak constraint that the verb always agrees in number with the NP taken from the subject pool, even if that NP is now accusative. It is thus possible that a sentence has an accusative object that agrees in number with the verb and a nominative object that doesn't.

The list of stimuli is permuted for order and then serialized to disk via pickling. The pickle serves as input for the experiment.

#Stimulus Dictionary

#Running an Experiment

opensesame

#Post-Processing the Experimental Data

convert detailed output pickle to minimal csv file

#Analysing the Data and Examining the Parameters

models and plots