13.11.09

Talking about SRX in LT during LTC

Jarek Lipski and me had a talk on using the SRX segmentation standard for LanguageTool during LTC 2009. We were asked a couple of times where the file is available, so I'm putting the link to our free SRX file here - it's a current version from our CVS, and at the time of writing it supports Polish, English, Dutch, Romanian, Russian, Icelandic, Slovak and Slovenian. It's on LGPL, so you can freely reuse it. There are also some SRX segmentation hints on our LanguageTool wiki. Especially important is the fact that there is a free (as speech) editor, Ratel, which helps to write the rules (and testing them).

The draft version of the paper is available online here. In case you want to cite it, here is the complete record:

Marcin Miłkowski, Jarosław Lipski, 2009. Using SRX standard for sentence segmentation in LanguageTool, in: Human Language Technologies as a Challenge for Computer Science and Linguistics, ed. by Zygmunt Vetulani, Poznań: Wydawnictwo Poznańskie, Fundacja Uniwersytetu im. A. Mickiewicza, p. 556-560.

2 komentarze:

Anonimowy pisze...

Zastanawiam się co nowego przyniesie projekt. Od czasu do czasu warto było by wspomnieć co ciekawsze dokonania w wersji rozwojowej. Lubię widzieć życie i rozwój projektów na których mi zależy.

Marcin Miłkowski pisze...

W nowej wersji na razie niewiele nowego dla języka polskiego, ale będzie za to obsługa białoruskiego i malayalam.