CLARIN Tool Portal

XLM-RoBERTa-LARGE events relation recognition

1 resources

A set of basic language tools for the Polish language. Z4.2a Improving the quality of recognition of relations between events using Transformer-type deep networks.

Use "XLM-RoBERTa-LARGE events relation recognition"

RÚV-DI Speaker Diarization v5 models (21.05)

2 resources

English This archive contains files generated from the recipe in kaldi-speaker-diarization/v5/. Its contents should be placed in a similar directory type, with symbolic links to diarization/, sid/, steps/, etc. It was created when Kaldi's master branch was at git commit 321d3959dabf667ea73cc98881400614308ccbbb. v5 These models are trained on the Althingi Parliamentary Speech corpus available on malfong.is. It uses MFCCS, x-vectors, PLDA and AHC. The recipe uses the Icelandic Rúv-di corpus as two hold out sets for tuning parameters. The Icelandic Rúv-di corpus is currently not publicly available. Íslenska Þetta skjalasafn inniheldur skrár frá kaldi-speaker-diarization v5. Innihaldi skjalasafnsins ætti að setja í eins möppu, með hlekki (symlinks) á diarization, sid, steps, o.s.frv. Notast var við Kaldi af master grein og Git commit 321d3959dabf667ea73cc98881400614308ccbbb. v5 Þessi líkön eru þjálfuð á gagnasafninu Alþingisræður til talgreiningar sem er aðgengilegt á malfong.is. Þau nota MFCC, x-vigra, PLDA, og AHC. Uppskriftin notar RÚV-di gagnasafnið sem hold-out gagnasöfn til að stilla forsendur. Eins og er þá er RÚV-di gagnasafnið ekki aðgengilegt almenningi.

Use "RÚV-DI Speaker Diarization v5 models (21.05)"

The CLASSLA-Stanza model for lemmatisation of standard Serbian 2.1

2 resources

The model for lemmatisation of standard Serbian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) combined with the Serbian non-standard training corpus ReLDI-NormTagNER-sr (http://hdl.handle.net/11356/1794) and using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). The estimated F1 of the lemma annotations is ~98.02. The difference to the previous version is that this version was trained on a combination of the standard (SETimes.SR) and non-standard (ReLDI-NormTagNER-sr) Serbian training corpora.

Use "The CLASSLA-Stanza model for lemmatisation of standard Serbian 2.1"

NLP Web services and NLP workflow engine

2 resources

Web based system for natural language processing of texts in Polish. It allows running complex workflows of language and machine learning tools. Making it avaliable via REST Web Services.

Use "NLP Web services and NLP workflow engine"

TimeAssign

1 resources

TimeAssign is a program which recognizes temporal expressions and assigns TimeML labels to words in Polish text using a Bi-LSTM based neural net and wordform embeddings.

Use "TimeAssign"

Slovenian RoBERTa contextual embeddings model: SloBERTa 2.0

3 resources

The monolingual Slovene RoBERTa (A Robustly Optimized Bidirectional Encoder Representations from Transformers) model is a state-of-the-art model representing words/tokens as contextually dependent word embeddings, used for various NLP tasks. Word embeddings can be extracted for every word occurrence and then used in training a model for an end task, but typically the whole RoBERTa model is fine-tuned end-to-end. SloBERTa model is closely related to French Camembert model https://camembert-model.fr/. The corpora used for training the model have 3.47 billion tokens in total. The subword vocabulary contains 32,000 tokens. The scripts and programs used for data preparation and training the model are available on https://github.com/clarinsi/Slovene-BERT-Tool Compared with the previous version (1.0), this version was trained for further 61 epochs (v1.0 37 epochs, v2.0 98 epochs), for a total of 200,000 iterations/updates. The released model here is a pytorch neural network model, intended for usage with the transformers library https://github.com/huggingface/transformers (sloberta.2.0.transformers.tar.gz) or fairseq library https://github.com/pytorch/fairseq (sloberta.2.0.fairseq.tar.gz)

Use "Slovenian RoBERTa contextual embeddings model: SloBERTa 2.0"

Djview for Shapes - a demonstration (Open Virtual Appliance)

3 resources

A shape browser for DjVu documents

Use "Djview for Shapes - a demonstration (Open Virtual Appliance)"

The CLASSLA-Stanza model for semantic role labeling of standard Slovenian

2 resources

The model for semantic role labeling of standard Slovenian was built with the CLASSLA-Stanza tool (https://github.com/clarinsi/classla) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1434). The estimated F1 of the semantic role annotations is ~77.2.

Use "The CLASSLA-Stanza model for semantic role labeling of standard Slovenian"

The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian 1.1

2 resources

The model for lemmatisation of standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200) and using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). The estimated F1 of the lemma annotations is ~97.9. The difference to the previous version of the model is that it is trained with the lemmatiser padding bug removed, cf. https://github.com/stanfordnlp/stanfordnlp/issues/143.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian 1.1"

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.1

2 resources

The model for lemmatisation of standard Slovenian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the ssj500k training corpus (http://hdl.handle.net/11356/1210) and using the Sloleks inflectional lexicon (http://hdl.handle.net/11356/1230). The estimated F1 of the lemma annotations is ~99.0. The difference to the previous version of the model is that it is trained with the lemmatiser padding bug removed, cf. https://github.com/stanfordnlp/stanfordnlp/issues/143.

Use "The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.1"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

XLM-RoBERTa-LARGE events relation recognition

RÚV-DI Speaker Diarization v5 models (21.05)

The CLASSLA-Stanza model for lemmatisation of standard Serbian 2.1

NLP Web services and NLP workflow engine

TimeAssign

Slovenian RoBERTa contextual embeddings model: SloBERTa 2.0

Djview for Shapes - a demonstration (Open Virtual Appliance)

The CLASSLA-Stanza model for semantic role labeling of standard Slovenian

The CLASSLA-StanfordNLP model for lemmatisation of standard Serbian 1.1

The CLASSLA-StanfordNLP model for lemmatisation of standard Slovenian 1.1

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Session recording