CLARIN Tool Portal

Tokenizer for Icelandic text (2.3.1)

3 resources

Tokenizer is a compact pure-Python (2.7 and 3) executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, date, e-mail, URL/URI, etc. It also segments the token stream into sentences, considering corner cases such as abbreviations and dates in the middle of sentences. More information at: https://github.com/mideind/Tokenizer Tokenizer er pakki fyrir Python 2.7 og 3, ásamt skipanalínutóli, sem sér um tilreiðslu íslensks texta. Pakkinn umbreytir inntakstexta í tókastraum. Hver tóki er stakt orð, greinarmerki, tala/upphæð, dags-/tímasetning, netfang, vefslóð o.s.frv. Tólið skiptir tókastraumnum einnig í setningar og tekur tillit til jaðartilvika eins og skammstafana og dagsetninga í miðjum setningum. Frekari upplýsingar á: https://github.com/mideind/Tokenizer

Use "Tokenizer for Icelandic text (2.3.1)"

Yfirlestur 1.0.1 (22.10)

2 resources

Yfirlestur.is is a public website where you can enter or submit your Icelandic text and have it checked for spelling and grammar errors. The tool also gives hints on words and structures that might not be appropriate, depending on the intended audience for the text. The core spelling and grammar checking functionality of Yfirlestur.is is provided by the GreynirCorrect engine, by the same authors. This software is licensed under the MIT License. More information at https://github.com/icelandic-lt/Yfirlestur. Yfirlestur.is er opin vefsíða þar sem hægt er að senda inn íslenskan texta og finna stafsetningar- og málfræðivillur. Kerfið veitir einnig upplýsingar um orð og setningastrúktúra sem eru mögulega óviðeigandi fyrir ætlaðan lesendahóp textans. Málrýnivirknin Yfirlestur.is er fengin með GreynirCorrect eftir sömu höfunda. Frekari upplýsingar má finna á https://github.com/icelandic-lt/Yfirlestur.

Use "Yfirlestur 1.0.1 (22.10)"

EduPo: Analysis and Generation of Czech Poetry, v0.5

2 resources

A suite of tools for analysis and generation of Czech poetry. This is a snapshot of the public Github repository at https://github.com/ufal/edupo -- the beta-version of the tool suite, released together with a scientific paper at the NLP4DH 2025 conference. Sada nástrojů pro analýzu a generování české poezie. Tato verze veřejného repozitáře na Githubu https://github.com/ufal/edupo je beta-verzí doprovázející vydání vědeckého článku na konferenci NLP4DH 2025.

Use "EduPo: Analysis and Generation of Czech Poetry, v0.5"

UDConverter 22.01

2 resources

UDConverter is a tool for converting constituency treebanks in the format of PPCHE (Penn Parsed Corpora of Historical English) to dependency treebanks following the Universal Dependencies framework. The tool is specifically configured to convert treebanks in the IcePaHC format. This version has an 81.39 LAS (labeled attachment score). UDConverter er tól til að varpa liðgerðartrjábönkum á sniði PPCHE (Penn Parsed Corpora of Historical English) yfir í venslatrjábanka samkvæmt Universal Dependencies-sniði. Tólið er sérstaklega þróað til að varpa trjábönkum á sniði IcePaHC. Þessi útgáfa er með 81,39 LAS (labeled attachment score).

Use "UDConverter 22.01"

KAMOKO-Digitalizer

2 resources

This editor was developed especially for the needs of the KAMOKO project (https://lindat.mff.cuni.cz/repository/xmlui/handle/11372/LRT-3261). The editor allows the quick entry of example sentences and sentence variants as well as the corresponding speaker ratings.

Use "KAMOKO-Digitalizer"

Generator of Czech lyrics according to structure

3 resources

Fine-tuned Czech TinyLlama model (https://huggingface.co/BUT-FIT/CSTinyLlama-1.2B) and Czech GPT2 small model (https://huggingface.co/lchaloupsky/czech-gpt2-oscar) to generate lyrics of song sections based on the provided syllable counts, keywords and rhyme scheme. The TinyLlama-based model yields better results, however, the GPT2-based model can run locally. Both models are discussed in a Bachelor Thesis: Generation of Czech Lyrics to Cover Songs.

Use "Generator of Czech lyrics according to structure"

EFCL Channelizer

3 resources

Extremely fast digital audio channelizer implementation, usable as a building block for experimental ASR front-ends or signal denoising applications. Also applicable in software defined radios, due to its high throughput. It comes in a form of a C/C++ library and an executable example program which reads input stream, splitting it into equidistant frequency channels, emitting their data to the output. Features: (1) Hand tuned SIMD-aware assembly for x86 (SSE) and IA64 (AVX) as well as for ARM (NEON) processors. (2) Generic non-SIMD C++ implementation for other architectures. (3) Capable of taking advantage of multicore CPUs. (4) Fully configurable number of channels and the output decimation rate. (5) User supplied FIR of the channel separation filter, which allows to specify the width of the channels, whether they should overlap or be separated. (6) Input and output signal samples are treated as complex numbers. (7) Speed over 750 complex MS/s achieved on Core i7 4710HQ @ 2.5GHz, when channelizing into 72 output channels with a FIR length of 1152 samples, using 3 computing threads. (8) Runs under Linux OS.

Use "EFCL Channelizer"

Talrómur Utils

13 resources

This is a collection of utilities for Text-to-speech (TTS) development using the Talrómur corpus. This collection includes: - Alignments for all the voices in Talrómur created with the Montreal Forced Aligner - Train, evaluation and test splits for all the voices in Talrómur - Two baseline TTS models and vocoder models Þetta er hjálparpakki fyrir Talrómsgagnasettið. Pakkinn inniheldur allt nauðsynlegt til að þróa og keyra talgervla búna til með Talrómi.

Use "Talrómur Utils"

Service for querying dependency treebanks Drevesnik 1.1

2 resources

Drevesnik (https://orodja.cjvt.si/drevesnik/) is an online service for querying Slovenian corpora parsed with the Universal Dependencies annotation scheme. It features an easy-to-use query language on the one hand and user-friendly graph visualizations on the other. It is based on the open-source dep_search tool (https://github.com/TurkuNLP/dep_search), which was localized and modified so as to also support querying by JOS morphosyntactic tags, random distribution of results, and filtering by sentence length. The source code and the documentation for the search backend and the web user interface are publicly available on the CLARIN.SI GitHub repository https://github.com/clarinsi/drevesnik. This submission corresponds to release 1.1: https://github.com/clarinsi/drevesnik/releases/tag/1.1, which brings improved architecture, documentation and branding in comparison to release 1.0.

Use "Service for querying dependency treebanks Drevesnik 1.1"

Dependency tree extraction tool STARK 2.0

2 resources

STARK is a python-based command-line tool for extraction of dependency trees from parsed corpora, aimed at corpus-driven linguistic investigations of syntactic and lexical phenomena of various kinds. It takes a treebank in the CONLL-U format as input and returns a list of all relevant dependency trees with frequency information and other useful statistics, such as the strength of association between the nodes of a tree, or its significance in comparison to another treebank. For installation, execution and the description of various user-defined parameter settings, see the official project page at: https://github.com/clarinsi/STARK In comparison with v1, this version introduces several new features and improvements, such as the option to set parameters in the command line, compare treebanks or visualise results online.

Use "Dependency tree extraction tool STARK 2.0"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Tokenizer for Icelandic text (2.3.1)

Yfirlestur 1.0.1 (22.10)

EduPo: Analysis and Generation of Czech Poetry, v0.5

UDConverter 22.01

KAMOKO-Digitalizer

Generator of Czech lyrics according to structure

EFCL Channelizer

Talrómur Utils

Service for querying dependency treebanks Drevesnik 1.1

Dependency tree extraction tool STARK 2.0

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Session recording