CLARIN Tool Portal

The CLASSLA-StanfordNLP model for named entity recognition of standard Slovenian 1.0

3 resources

This model for named entity recognition of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1204).

Use "The CLASSLA-StanfordNLP model for named entity recognition of standard Slovenian 1.0"

EVALD 2.0 for Foreigners

3 resources

EVALD 2.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.

Use "EVALD 2.0 for Foreigners"

EVALD 4.0 for Foreigners – Evaluator of Discourse

3 resources

EVALD 4.0 for Foreigners is a software for automatic evaluation of surface coherence (cohesion) in Czech texts written by non-native speakers of Czech.

Use "EVALD 4.0 for Foreigners – Evaluator of Discourse"

Byte-Level Neural Error Correction Model for Icelandic - Yfirlestur (22.09)

7 resources

This Byte-Level Neural Error Correction Model for Icelandic is a fine-tuned byT5-base Transformer model for error correction in natural language. It acts as a machine translation model in that it “translates” from deficient Icelandic to correct Icelandic. The model is trained on parallel synthetic error data and real error data from the iceErrorCorpus (IceEC, http://hdl.handle.net/20.500.12537/73) and the three specialised error corpora (L2: http://hdl.handle.net/20.500.12537/131, dyslexia: http://hdl.handle.net/20.500.12537/132, child language: http://hdl.handle.net/20.500.12537/133). The synthetic error data (35M lines of parallel data) was created by filtering and then scrambling the Icelandic Gigaword Corpus (IGC, http://hdl.handle.net/20.500.12537/192) to simulate real grammatical and typographical errors. The pretrained byT5 model was trained on the synthetic data and finally fine-tuned on the real error data from IceEC. It can correct a variety of textual errors, even in texts containing many errors, such as those written by people with dyslexia. Measured on the iceEC test data, the model scores 0.862917 on the GLEU metric (modified BLEU for grammatical error correction) and 0.06% in TER (translation error rate). --- Þetta leiðréttingarlíkan fyrir íslensku er fínþjálfað byT5-base Transformer-líkan. Það er í raun þýðingalíkan sem þýðir úr íslenskum texta með villum yfir í texta án villna. Líkanið er þjálfað á samhliða gervivillugögnum og raunverulegum villum úr íslensku villumálheildinni (http://hdl.handle.net/20.500.12537/73) og sérhæfðu villumálheildunum þremur (íslenska sem erlent mál: http://hdl.handle.net/20.500.12537/131, lesblinda: http://hdl.handle.net/20.500.12537/132, barnatextar: http://hdl.handle.net/20.500.12537/133). Gervivillugögnin (35 milljón línur af samhliða gögnum) voru búin til með því að sía og svo rugla íslensku Risamálheildinni (http://hdl.handle.net/20.500.12537/192) með því að nota margs konar villumynstur til að líkja eftir raunverulegum málfræði- og ritunarvillum. Forþjálfaða byT5-líkanið var þjálfað á gervivillugögnunum og svo fínþjálfað á raungögnum úr villumálheildunum. Það getur leiðrétt fjölbreyttar textavillur, jafnvel í texta sem inniheldur mjög margar villur, svo sem frá fólki með lesblindu. Líkanið skorar 0.862917 GLEU-stig (BLEU nema lagað að málrýni) og er með 0.06% villuhlutfall í þýðingu (translation error rate), þegar það er metið á prófunarhluta íslensku villumálheildarinnar.

Use "Byte-Level Neural Error Correction Model for Icelandic - Yfirlestur (22.09)"

BinPackage 0.4.4 (22.10)

2 resources

BinPackage is a Python Package that embeds the vocabulary of the DMII (https://bin.arnastofnun.is) and offers various lookups and queries of the data. The database, maintained by The Árni Magnússon Institute for Icelandic Studies, contains over 6.5 million entries, over 3.1 million unique word forms, and about 300,000 distinct lemmas. The database has been encapsulated in an easy-to-install Python package, and compressed from 400+ megabyte CSV file to an ~80 megabyte indexed binary structure. More information at: https://github.com/mideind/BinPackage BinPackage er Python-pakki utan um BÍN, Beygingarlýsingu íslensks nútímamáls (https://bin.arnastofnun.is), sem inniheldur yfir 6,5 milljónir færslna, 3,1 milljón einstakra orðmynda og um 300.000 stakar lemmur. Stofnun Árna Magnússonar í íslenskum fræðum heldur utan um gagnagrunninn. Gagnagrunninum, um 400 megabæta CSV-skrá, hefur verið pakkað í um 80 megabæta tvíundarbyggingu með vísum. Frekari upplýsingar á: https://github.com/mideind/BinPackage

Use "BinPackage 0.4.4 (22.10)"

Service for querying dependency treebanks Drevesnik 1.0

2 resources

Drevesnik (https://orodja.cjvt.si/drevesnik/) is an online service for querying syntactically parsed corpora in Slovenian using the Universal Dependencies annotation scheme with easy-to-use query language on the one hand and user-friendly graph visualizations on the other. It is based on the open-source dep_search tool (https://github.com/TurkuNLP/dep_search), which was localized and modified so as to also support querying by JOS morphosyntactic tags, random distribution of results, and filtering by sentence length. The source code and the documentation for the search backend and the web user interface are publicly available on the CLARIN.SI GitHub repository https://github.com/clarinsi/drevesnik. This submission corresponds to release 1.0: https://github.com/clarinsi/drevesnik/releases/tag/1.0.

Use "Service for querying dependency treebanks Drevesnik 1.0"

GreynirCorrect4LT (1.0)

2 resources

This is a slightly adapted version of Miðeind's spell and grammar checker GreynirCorrect <CLARIN link: http://hdl.handle.net/20.500.12537/174> . This version is implemented for use in a text-to-speech text pre-processing pipeline, but includes guidelines for a quick adaptation to other use cases in language technology applications as well. [ICELANDIC] Þetta er lítillega aðlöguð útgáfa af málrýnitólinu GreynirCorrect <CLARIN link: http://hdl.handle.net/20.500.12537/174> til notkunar í textavinnslu fyrir talgervla. Einnig inniheldur útgáfan leiðbeiningar um það hvernig aðlaga má GreyniCorrect að öðrum notkunartilvikum í máltækni, þar sem þarfirnar kunna að vera aðrar en í málrýni fyrir almenna notendur.

Use "GreynirCorrect4LT (1.0)"

Biaffine-based UD Parser for Icelandic 22.12

6 resources

ENGLISH: This Universal Dependencies parser for Icelandic was trained with Diaparser [1]. This version of it was trained on v2.11 of UD_Icelandic-IcePaHC [2] and UD_Icelandic-Modern [3]. (Note that texts in UD_Icelandic-Modern [3] labeled RUV_TGS_2017 and RUV_ESP_2017 were not included here as these were originally parsed with COMBO-based UD Parser 22.10 [4] and the output subsequently corrected.) The parser utilizes information from an ELECTRA language model [5]. Its UAS (unlabeled attachment score) is 89.58 and its LAS (labeled attachment score) is 86.46. ICELANDIC: Þessi UD-þáttari var þjálfaður með Diaparser [1]. Þessi útgáfa hans var þjálfuð á útgáfu 2.11 af UD_Icelandic-IcePaHC [2] og UD_Icelandic-Modern [3]. (Ath. að textar í UD_Icelandic-Modern [3] merktir RUV_TGS_2017 og RUV_ESP_2017 voru ekki notaðir við þjálfunina þar sem þeir voru upphaflega þáttaðir með COMBO-based UD Parser 22.10 [4] og úttakið leiðrétt að því loknu.) Þáttarinn nýtir sér upplýsingar úr ELECTRA-mállíkani [5]. Hann skorar 89.58 á UAS (unlabeled attachment score) og 86.46 á LAS (labeled attachment score). [1] Diaparser: https://github.com/Unipisa/diaparser [2] UD_Icelandic-IcePaHC: https://github.com/UniversalDependencies/UD_Icelandic-IcePaHC/ [3] UD_Icelandic-Modern: https://github.com/UniversalDependencies/UD_Icelandic-Modern/ [4] COMBO-based UD Parser 22.10: http://hdl.handle.net/20.500.12537/272 [5] electra-base-igc-is: https://huggingface.co/jonfd/electra-base-igc-is

Use "Biaffine-based UD Parser for Icelandic 22.12"

Yfirlestur 1.0.0 (22.06)

3 resources

Yfirlestur.is is a public website where you can enter or submit your Icelandic text and have it checked for spelling and grammar errors. The tool also gives hints on words and structures that might not be appropriate, depending on the intended audience for the text. The core spelling and grammar checking functionality of Yfirlestur.is is provided by the GreynirCorrect engine, by the same authors. This software is licensed under the MIT License. More information at https://github.com/mideind/Yfirlestur.

Use "Yfirlestur 1.0.0 (22.06)"

Biaffine-based UD Parser 22.10

2 resources

ENGLISH: This Universal Dependencies parser for Icelandic was trained with Diaparser [1] on IcePaHC [2] and UD_Icelandic-Modern [3], the latter one having been revised before training, as some duplicate sentences had to be removed. The parser utilizes information from an ELECTRA language model [4]. Its UAS (unlabeled attachment score) is 89.52 and its LAS (labeled attachment score) is 86.23.

Use "Biaffine-based UD Parser 22.10"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

The CLASSLA-StanfordNLP model for named entity recognition of standard Slovenian 1.0

EVALD 2.0 for Foreigners

EVALD 4.0 for Foreigners – Evaluator of Discourse

Byte-Level Neural Error Correction Model for Icelandic - Yfirlestur (22.09)

BinPackage 0.4.4 (22.10)

Service for querying dependency treebanks Drevesnik 1.0

GreynirCorrect4LT (1.0)

Biaffine-based UD Parser for Icelandic 22.12

Yfirlestur 1.0.0 (22.06)

Biaffine-based UD Parser 22.10

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Session recording