CLARIN Tool Portal

Liner2.5

2 resources

Generic framework for information extraction tasks, including recognition of named entities, temporal expressions, spatial expressions and events.

Use "Liner2.5"

Morphological Analyzer for Shipibo-Konibo

2 resources

This tool is the first morphological analyzer ever for this language. The analyzer is a FST that produces all possible segmentations and tagging sequences in a word-by-word fashion.

Use "Morphological Analyzer for Shipibo-Konibo"

Universal Dependencies 1.2 Models for Parsito

2 resources

Parsing models for all Universal Depenencies 1.2 Treebanks, created solely using UD 1.2 data (http://hdl.handle.net/11234/1-1548). To use these models, you need Parsito binary, which you can download from http://hdl.handle.net/11234/1-1584.

Use "Universal Dependencies 1.2 Models for Parsito"

The Model latinpipe-evalatin24-240520 for LatinPipe 2024

2 resources

The latinpipe-evalatin24-240520 is a PhilBerta-based model for LatinPipe 2024 <https://github.com/ufal/evalatin2024-latinpipe>, performing tagging, lemmatization, and dependency parsing of Latin, based on the winning entry to the EvaLatin 2024 <https://circse.github.io/LT4HALA/2024/EvaLatin> shared task. It is released under the CC BY-NC-SA 4.0 license.

Use "The Model latinpipe-evalatin24-240520 for LatinPipe 2024"

WMT21 Marian translation model (ca-oc multi-task)

1 resources

Marian NMT model for Catalan to Occitan translation. It is a multi-task model, producing also a phonemic transcription of the Catalan source. The model was submitted to WMT'21 Shared Task on Multilingual Low-Resource Translation for Indo-European Languages as a CUNI-Contrastive system for Catalan to Occitan.

Use "WMT21 Marian translation model (ca-oc multi-task)"

CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

2 resources

The `corpipe23-corefud1.1-231206` is a `mT5-large`-based multilingual model for coreference resolution usable in CorPipe 23 (https://github.com/ufal/crac2023-corpipe). It is released under the CC BY-NC-SA 4.0 license. The model is language agnostic (no _corpus id_ on input), so it can be used to predict coreference in any `mT5` language (for zero-shot evaluation, see the paper). However, note that the empty nodes must be present already on input, they are not predicted (the same settings as in the CRAC23 shared task).

Use "CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)"

CorpoGrabber-Desktop: The Toolchain to Automatic Acquiring and Extraction of the Website Content

3 resources

Desktop version of CorpoGrabber CLI

Use "CorpoGrabber-Desktop: The Toolchain to Automatic Acquiring and Extraction of the Website Content"

The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.0

2 resources

The model for lemmatisation of standard Bulgarian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the BulTreeBank training corpus (http://hdl.handle.net/11495/D93F-C6E9-65D9-2) and using the Bulgarian inflectional lexicon (Popov, Simov, and Vidinska 1998). The estimated F1 of the lemma annotations is ~98.8.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.0"

The CLASSLA-Stanza model for morphosyntactic annotation of spoken Slovenian 2.2

3 resources

This model for morphosyntactic annotation of spoken Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SST treebank of spoken Slovenian (https://github.com/UniversalDependencies/UD_Slovenian-SST) combined with the SUK training corpus (http://hdl.handle.net/11356/1959) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1791) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The model produces simultaneously UPOS, FEATS and XPOS (MULTEXT-East) labels. The estimated F1 of the XPOS annotations is ~96.76.

Use "The CLASSLA-Stanza model for morphosyntactic annotation of spoken Slovenian 2.2"

The CLASSLA-Stanza model for lemmatisation of spoken Slovenian 2.2

2 resources

This model for lemmatisation of spoken Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SST treebank of spoken Slovenian (https://github.com/UniversalDependencies/UD_Slovenian-SST) combined with the SUK training corpus (http://hdl.handle.net/11356/1959) and using the CLARIN.SI-embed.sl word embeddings (http://hdl.handle.net/11356/1791) that were expanded with the MaCoCu-sl Slovene web corpus (http://hdl.handle.net/11356/1517). The estimated F1 of the lemma annotations is ~99.23.

Use "The CLASSLA-Stanza model for lemmatisation of spoken Slovenian 2.2"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Liner2.5

Morphological Analyzer for Shipibo-Konibo

Universal Dependencies 1.2 Models for Parsito

The Model latinpipe-evalatin24-240520 for LatinPipe 2024

WMT21 Marian translation model (ca-oc multi-task)

CorPipe 23 multilingual CorefUD 1.1 model (corpipe23-corefud1.1-231206)

CorpoGrabber-Desktop: The Toolchain to Automatic Acquiring and Extraction of the Website Content

The CLASSLA-StanfordNLP model for lemmatisation of standard Bulgarian 1.0

The CLASSLA-Stanza model for morphosyntactic annotation of spoken Slovenian 2.2

The CLASSLA-Stanza model for lemmatisation of spoken Slovenian 2.2

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Session recording