DFKI-LT - iREAD

Personalised Reading Apps for Primary School Children

The overarching aim of the iREAD project is to develop a software infrastructure of personalised, adaptive technologies and a diverse set of applications for supporting learning and teaching of reading skills.

For more information, click here.

iREAD Services

MunderLine - The Multilingual Universal Dependency and Relation Pipeline

MunderLine provides a multilingual pipeline that applies the following processing steps on an input text:

tokenization
part-of-speech tagging
morphology tagging
named entity recognition
dependency parsing

Tokenization (and sentence border recognition) is done by a simple tokenizer. Part-of-speech tagging as well as named entity recognition is done by two instances of the GNTagger using statistical-based models. Dependency parsing is done by the MDParser, again using a statistical-based model.

Currently, MunderLine supports English, German, Spanish and Greek (without named entity recognition).

MunderLine results are created in CoNLL format. Here the result for a sentence Angela Merkel works in Berlin.:

1    Angela    _    NNP    _    Number=Sing                                              2    NAME    _    _    B-PER    _
2    Merkel    _    NNP    _    Number=Sing                                              3    SBJ     _    _    L-PER    _
3    works     _    VBZ    _    Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin    0    ROOT    _    _    O        _
4    in        _    IN     _    _                                                        3    LOC     _    _    O        _
5    Berlin    _    NNP    _    Number=Sing                                              4    PMOD    _    _    U-LOC    _
6    .         _    $.     _    _                                                        3    PUNC    _    _    O        _

The first column holds the token number, the second column holds the token form, the fourth column holds the part-of-speech tag, the sixth column holds the morphology tagging. The columns seven and eight hold the dependency structure. The eleventh column holds the named entity tag.

Usage

MunderLine runs as a REST service that accepts form-data inside the body of an HTTP POST request that is send to http://iread.dfki.de/munderline/<lang>.

Replace <lang> with one of the following language ids to use language specific models:

English: en
German: de
Spanish: es
Greek: el

Replace <lang> with one of the following language ids to use universal dependency models:

English: en_ud
German: de_ud
Spanish: es_ud
Greek: el_ud

The server can be tested using the curl tool.

For analyzing some short text with MunderLine, run

curl -X POST -H "Content-Type: application/x-www-form-urlencoded; charset=utf-8" \
--data-urlencode input="This is a test" \
http://iread.dfki.de/munderline/en

For analyzing a plain text file with MunderLine, run

curl -X POST -H "Content-Type: application/x-www-form-urlencoded; charset=utf-8" \
--data-urlencode input@my_input_text.txt \
http://iread.dfki.de/munderline/en

For analyzing a plain text file with one sentence per line with MunderLine, run

curl -X POST -H "Content-Type: application/x-www-form-urlencoded; charset=utf-8" \
--data-urlencode input@my_sentences.txt \
--data "linewise=true" \
http://iread.dfki.de/munderline/en