(As this is of interest not only to the Polish-speaking community, this post is in English.) Recently, after some discussions on the lingucomponent list at OpenOffice.org on the method of finding frequent typos, I did some experiments on the revision history logs. Background. The developers of grammar checkers, and autocorrect lists, have hard times with finding relevant corpora. Revision history is an excellent source about native speakers perception of linguistic norms. Frequently revised typos are perceived as errors that need to be corrected, so using these typos on autocorrect lists is justified. The same goes for style, grammar and usage errors. Method . Experiments involved three steps: Clean the history dump (??wiki-latest-pages-meta-history.xml), to get only relevant parts of the dump. Using XML tools isn't recommended (I tried XSLT, forget it). Using a simple awk script, I was able to clean the > 30GB dump in an hour or so, and got a >17 GB file. The script is simp...