DFKI-LT - iREAD
Personalised Reading Apps for Primary School Children
The overarching aim of the iREAD project is to develop a software infrastructure of personalised, adaptive technologies and a diverse set of applications for supporting learning and teaching of reading skills.
For more information, click here.
iREAD Services
MunderLine - The Multilingual Universal Dependency and Relation Pipeline
MunderLine provides a multilingual pipeline that applies the following processing steps on an input text:
tokenization
part-of-speech tagging
morphology tagging
named entity recognition
dependency parsing
Tokenization (and sentence border recognition) is done by a simple tokenizer. Part-of-speech tagging as well as named entity recognition is done by two instances of the GNTagger using statistical-based models. Dependency parsing is done by the MDParser, again using a statistical-based model.
Currently, MunderLine supports English, German, Spanish and Greek (without named entity recognition).
MunderLine results are created in CoNLL format. Here the result for a sentence Angela Merkel works in Berlin.
:
1 Angela _ NNP _ Number=Sing 2 NAME _ _ B-PER _
2 Merkel _ NNP _ Number=Sing 3 SBJ _ _ L-PER _
3 works _ VBZ _ Mood=Ind|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin 0 ROOT _ _ O _
4 in _ IN _ _ 3 LOC _ _ O _
5 Berlin _ NNP _ Number=Sing 4 PMOD _ _ U-LOC _
6 . _ $. _ _ 3 PUNC _ _ O _
The first column holds the token number, the second column holds the token form, the fourth column holds the part-of-speech tag, the sixth column holds the morphology tagging. The columns seven and eight hold the dependency structure. The eleventh column holds the named entity tag.
Usage
MunderLine runs as a REST service that accepts form-data inside the body of an HTTP POST request that is send to http://iread.dfki.de/munderline/<lang>
.
Replace <lang>
with one of the following language ids to use language specific models:
English:
en
German:
de
Spanish:
es
Greek:
el
Replace <lang>
with one of the following language ids to use universal dependency models:
English: en_ud
German: de_ud
Spanish: es_ud
Greek: el_ud
The server can be tested using the curl tool.
For analyzing some short text with MunderLine, run
curl -X POST -H "Content-Type: application/x-www-form-urlencoded; charset=utf-8" \
--data-urlencode input="This is a test" \
http://iread.dfki.de/munderline/en
For analyzing a plain text file with MunderLine, run
curl -X POST -H "Content-Type: application/x-www-form-urlencoded; charset=utf-8" \
--data-urlencode input@my_input_text.txt \
http://iread.dfki.de/munderline/en
For analyzing a plain text file with one sentence per line with MunderLine, run
curl -X POST -H "Content-Type: application/x-www-form-urlencoded; charset=utf-8" \
--data-urlencode input@my_sentences.txt \
--data "linewise=true" \
http://iread.dfki.de/munderline/en