CLARIN Tool Portal

NeMo Neural Machine Translation service RSDO-DS4-NMT-API 1.0

2 resources

Neural Machine Translation service for NeMo AAYN Base models. For more details about building such models, see the official NVIDIA NeMo documentation (https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/main/nlp/machine_translation/machine_translation.html) and NVIDIA NeMo GitHub (https://github.com/NVIDIA/NeMo). A model for language pair SL-EN can be downloaded from http://hdl.handle.net/11356/1736. The service accepts the source language and target language, and either a single string or list of strings to be translated. The result will be in the same format as the request, either as a single string or list of strings. The maximal accepted text length is 5000c. Note that transcription of one 5000c text block on cpu will take advantage of all available cores, consume up to 3GB RAM and may take ~200s (on a system with 24 vCPU). See the service README.md for further details.

Use "NeMo Neural Machine Translation service RSDO-DS4-NMT-API 1.0"

MAFIA (Match-Finder Aligner): A speech/text aligning tool (22.06)

2 resources

MAFIA is the acronym for Match-Finder Aligner. The MAFIA aligner is a software tool destined to automatically create ASR corpora out of speech files along with scripts reflecting what is spoken in such speech files. If the text of the scripts is not completely accurate, MAFIA will infer a transcription using automatic speech recognition.

Use "MAFIA (Match-Finder Aligner): A speech/text aligning tool (22.06)"

Public License Selector

3 resources

Customizable tool that will help user select the right open license for his data or software

Use "Public License Selector"

Slovenian commonsense reasoning model SloMET-ATOMIC 2020

2 resources

The SloMET-ATOMIC 2020 is a Slovene commonsense reasoning model that is able to predict commonsense descriptions in a natural language for a given input sentence. The model is an adaptation of the Slovene GPT-2 model (https://huggingface.co/cjvt/gpt-sl-base) that has been finetuned using the SloATOMIC 2020 corpus (http://hdl.handle.net/11356/1724), consisting of 1.33M everyday interence knowledge tuples about entities and events. The released model is a pytorch neural network model, intended for usage with the transformers library (https://github.com/huggingface/transformers).

Use "Slovenian commonsense reasoning model SloMET-ATOMIC 2020"

CEC6-Converter

2 resources

Diese Software erlaubt eine Konvertierung von *.cec6.gz-Dateien in 24 Formate, die in der Korpuslinguistik / NLProc üblich sind. Die Ausführung ist unter allen modernen Betriebssystemen möglich (Windows, Linux, MacOS). Die Binärdateien wurden für die x64-Architektur kompiliert. Sollten Sie einen Prozessor (CPU) verwenden, der eine x86- oder ARM-Architektur hat, dann nutzen Sie bitte die Anleitung: andere Betriebssysteme bzw. x86 / ARM / ARM64. --- This software allows the conversion of *.cec6.gz files into 24 formats that are commonly used in corpus linguistics / NLProc. Execution is possible under all modern operating systems (Windows, Linux, MacOS). The binary files have been compiled for the x64 architecture. If you are using a processor (CPU) with x86 or ARM architecture, please use the instructions for "other operating systems or x86 / ARM / ARM64".

Use "CEC6-Converter"

Slowal (2018-06-29)

2 resources

Slowal is a web tool designed for creating, editing and browsing valence dictionaries. So far, it has mainly been used for creating The Polish Valence Dictionary (Walenty). Slowal supports the process of creating the dictionary; it also facilitates access by making it possible to browse the dictionary using an advanced built-in filtering system, covering both syntactic and semantic phenomena. Slowal also gives control over the work of lexicographers involved in creating dictionary, for instance by using predefined lists of values, which prevents spelling errors and enforces consistency, as well as by imposing strict validation rules. Last but not least, the created dictionary can be exported from Slowal in various formats: plain text, TeX, PDF, and TEI XML. This version was adapted for creating semantics of nouns and adjectives.

Use "Slowal (2018-06-29)"

Punctuation model (20.09)

9 resources

A python package that punctuates Icelandic text. The input data is unpunctuated text and punctuated text is returned. The user can choose between two punctuation models, a BERT-based Transformer and a bidirectional RNN ([Punctuator 2](www.github.com/ottokart/punctuator2)) in Tensorflow 2. [Icelandic] Python-pakki sem greinarmerkjasetur íslenskan texta. Inntakið er á formi ógreinarmerkjasetts texta og greinarmerkjasettum texta er skilað. Notandinn getur valið milli tveggja greinarmerkjasetningalíkana, annars vegar umbreytis sem byggir á BERT og tvístefnu-endurkvæmnisneti ([Punctuator 2](www.github.com/ottokart/punctuator2)) í Tensorflow 2.

Use "Punctuation model (20.09)"

Yfirlestur Word 22.10

2 resources

Yfirlestur Word is the source code for a spelling and grammar correction add-on for Icelandic, for use with Microsoft Word. The plugin provides error annotation and replacement, based on user interaction. The source code is intended for third party development and can be installed and tested locally using Node.js. The plugin requires third party correction software for its functionality. For development and testing, the open-access Yfirlestur.is API produced by Miðeind was used (see: https://github.com/icelandic-lt/Yfirlestur)) but is not intended for production use. This software is licensed under the MIT License. More information at https://github.com/icelandic-lt/Yfirlestur-Word.

Use "Yfirlestur Word 22.10"

GreynirSeq - A Natural Language Processing Toolkit for Icelandic (v0.2.0)

2 resources

GreynirSeq is a natural language parsing toolkit for Icelandic focused on sequence modeling with neural networks. The modeling part (nicenlp) of GreynirSeq is built on top of the excellent Fairseq from Meta (which is built on top of PyTorch). Interfaces for POS-tagging, NER-tagging and machine translation are included in this version v.0.2.0. For updated versions of the software please refer to https://github.com/mideind/GreynirSeq -- GreynirSeq er málvinnsluhugbúnaður fyrir íslensku með áherslu á notkun runulíkana sem byggja á tauganetum. Sá hluti sem snýr að tauganetum er byggður á Fairseq frá Meta og byggir á PyTorch. Í þessari útgáfu (v0.2.0) er stuðningur við orðflokkagreiningu, nafnamörkun og þýðingu í gegnum viðmót á skipanalínu. Nýjustu útgáfu af hugbúnaðinum má ávallt finna á https://github.com/mideind/GreynirSeq

Use "GreynirSeq - A Natural Language Processing Toolkit for Icelandic (v0.2.0)"

WebRICE - An Open Source Web Reader (21.06)

2 resources

[ENGLISH] WebRICE (Web Reader ICE) is an open source web reader in development at Reykjavik University. We hope that Icelandic developers will add this free software to their websites to enable Icelandic audiences to listen to the web instead of reading it. For users, we also have the WebRICE browser extension (1).

Use "WebRICE - An Open Source Web Reader (21.06)"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

NeMo Neural Machine Translation service RSDO-DS4-NMT-API 1.0

MAFIA (Match-Finder Aligner): A speech/text aligning tool (22.06)

Public License Selector

Slovenian commonsense reasoning model SloMET-ATOMIC 2020

CEC6-Converter

Slowal (2018-06-29)

Punctuation model (20.09)

Yfirlestur Word 22.10

GreynirSeq - A Natural Language Processing Toolkit for Icelandic (v0.2.0)

WebRICE - An Open Source Web Reader (21.06)

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Session recording