From 1370074bd345fa54297b79d726dc7fa37453ec3d Mon Sep 17 00:00:00 2001 From: Laura Orvokki Kursula Date: Sat, 24 Aug 2024 14:32:53 +0200 Subject: Filter unsupported words from nn.wl for cleaner compilation. --- README | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) (limited to 'README') diff --git a/README b/README index 60d9e94..d147b6d 100644 --- a/README +++ b/README @@ -1,7 +1,15 @@ This is a GNU aspell dictionary for Nynorsk. The wordlist is adopted -unchanged from the Norwegian Language Bank, licenced under CC BY -4.0[1], and may be downloaded from the National Library of Norway's -website[2]. +from the Norwegian Language Bank, licenced under CC BY 4.0[1], and may +be downloaded from the National Library of Norway's website[2]. It has +been modified to remove words and phrases unsupported by aspell. The +file nn.wl is produced from the Language Bank's fullformer_2012.txt as +follows: + +cat norsk_ordbank/fullformer_2012.txt | iconv -f ISO-8859-1 -t UTF-8\ +| cut -f3 + +sed -E -i .bak -e '/ /d' -e '/^-/d' -e '/-$/d' -e '/[^a-zA-Z.-]/d' -e\ + '/[.-]{2,}/d' nn.wl The metadata files and configure script are adapted from Morten Bo Johansen's aspell-da[3]. -- cgit v1.2.3