Jarek Lipski and me had a talk on using the SRX segmentation standard for LanguageTool during LTC 2009. We were asked a couple of times where the file is available, so I'm putting the link to our free SRX file here - it's a current version from our CVS, and at the time of writing it supports Polish, English, Dutch, Romanian, Russian, Icelandic, Slovak and Slovenian. It's on LGPL, so you can freely reuse it. There are also some SRX segmentation hints on our LanguageTool wiki. Especially important is the fact that there is a free (as speech) editor, Ratel, which helps to write the rules (and testing them).
The draft version of the paper is available online here. In case you want to cite it, here is the complete record:
Marcin Miłkowski, Jarosław Lipski, 2009. Using SRX standard for sentence segmentation in LanguageTool, in: Human Language Technologies as a Challenge for Computer Science and Linguistics, ed. by Zygmunt Vetulani, Poznań: Wydawnictwo Poznańskie, Fundacja Uniwersytetu im. A. Mickiewicza, p. 556-560.
The draft version of the paper is available online here. In case you want to cite it, here is the complete record:
Marcin Miłkowski, Jarosław Lipski, 2009. Using SRX standard for sentence segmentation in LanguageTool, in: Human Language Technologies as a Challenge for Computer Science and Linguistics, ed. by Zygmunt Vetulani, Poznań: Wydawnictwo Poznańskie, Fundacja Uniwersytetu im. A. Mickiewicza, p. 556-560.
Komentarze