CLARIN Tool Portal

Icelandic TTS for Android (24.04.)

2 resources

The Símarómur application provides an Icelandic TTS application for the Android TTS service. The application provides access to one on-device voice. The app is developed with the needs of the visually impaired in mind, i.e. the voice is lightweight and very fast. Furthermore, Símarómur includes a user dictionary that allows users to define their pronunciation of words and abbreviations. Símarómur er Android app sem gerir notendum kleift að nota íslenskan talgervil í símunum, t.d. sem skjálesara. Ein rödd er í appinu, en appið er sérstaklega miðað að þörfum blindra og sjónskertra, þ.e. röddin er "létt" og mjög hröð. Einnig inniheldur Símarómur orðabók þar sem notendur geta skilgreint eigin framburð á orðum og skammstöfunum.

Use "Icelandic TTS for Android (24.04.)"

Morfeusz 2

2 resources

Morfeusz 2 is a dictionary based morphological analyser and generator for Polish. This version of the program is decoupled from the dictionary. Two dictionaries of Polish developed within other projects are distributed with Morfeusz 2, namely SGJP and Polimorf.

Use "Morfeusz 2"

Dependency tree extraction tool STARK 1.0

2 resources

STARK is a python-based command-line tool for extraction of dependency trees from parsed corpora, aimed at corpus-driven linguistic investigations of syntactic phenomena of various kinds. It supports the CONLL-U format (https://universaldependencies.org/format.html) as input and returns a list of all relevant dependency trees, frequencies, and other associated information in the form of a tab-separated .tsv file. For installation, execution and the description of various user-defined parameter settings, see the official project page at: https://gitea.cjvt.si/lkrsnik/STARK. This entry corresponds to commit 421f12cac6 in the Git repository.

Use "Dependency tree extraction tool STARK 1.0"

ABLTagger (PoS) - 2.0.0

2 resources

A Part-of-Speech (PoS) tagger for Icelandic. In this submission, you will find ABLTagger v2.0.0. This is a PoS tagger that works with the revised tagset and achieves an accuracy of 96.95% on MIM-Gold (cross-validation). For additional details, error analysis and categorization of this tagger and other taggers (including a previous version of ABLTagger), see I4 report for M4 (2021) in Language Technology Programme for Icelandic 2019-2023. For installation, usage, and other instructions see https://github.com/cadia-lvl/POS/releases/tag/m4 You should also check if a newer version is out (see README.md - versions) on CLARIN: - Model files - Docker image, version 2.0.0 ------------------------------------------------------------------------------------------- Markari fyrir íslensku. Í þessum pakka er ABLTagger v2.0.0. Þetta er markari sem virkar fyrir nýja markamengið og nær 96,95% nákvæmni á MÍM-Gull (krossprófanir). Fyrir nánari upplýsingar, villugreiningu og villuflokkun fyrir þennan markara og aðra (ásamt fyrri útgáfu af þessum markara), sjá I4 skýrslu fyrir vörðu 4 (2021) í Máltækniáætlun fyrir íslensku 2019-2023. Fyrir uppsetningar-, notenda- og aðrar leiðbeiningar sjá https://github.com/cadia-lvl/POS/releases/tag/m4 Einnig er gott að athuga þar hvort ný útgáfa sé komin út (sjá README.md - versions) Á CLARIN: - Líkan - Docker mynd, útgáfa 2.0.0

Use "ABLTagger (PoS) - 2.0.0"

THEaiTRobot 2.0

4 resources

The THEaiTRobot 2.0 tool allows the user to interactively generate scripts for individual theatre play scenes. The previous version of the tool (http://hdl.handle.net/11234/1-3507) was based on GPT-2 XL generative language model, using the model without any fine-tuning, as we found that with a prompt formatted as a part of a theatre play script, the model usually generates continuation that retains the format. The current version also uses vanilla GPT-2 by default, but can also instead use a GPT-2 medium model fine-tuned on theatre play scripts (as well as film and TV series scripts). Apart from the basic "flat" generation using a theatrical starting prompt and the script model, the tool also features a second, hierarchical variant, where in the first step, a play synopsis is generated from its title using a synopsis model (GPT-2 medium fine-tuned on synopses of theatre plays, as well as film, TV series and book synopses). The synopsis is then used as input for the second stage, which uses the script model. The choice of models to use is done by setting the MODEL variable in start_server.sh and start_syn_server.sh THEaiTRobot 2.0 was used to generate the second THEaiTRE play, "Permeation/Prostoupení".

Use "THEaiTRobot 2.0"

Icelandic TTS for Android (22.10)

2 resources

ENGLISH: The Símarómur application provides an Icelandic TTS application for the Android TTS service. The application provides access to voices over network of the Tiro TTS API and on-device voices that are bundled via assets. The app offers connections to most of the voices that have been developed within the LT program at this time. The voices themselves and the TTS service were developed at Reykjavik University and at Tiro ehf. (see e.g. http://hdl.handle.net/20.500.12537/268) ÍSLENSKA: Símarómur er Android app sem gerir notendum kleift að nota íslenskan talgervil í símunum, t.d. sem skjálesara. Símarómur býður upp á tengingar við flestar þær raddir sem þróaðar hafa verið innan Máltækniáætlunarinnar, annars vegar gegnum vefþjónustu Tiro og hins vegar sem raddir sem keyra á símanum sjálfum. Raddirnar sem Símarómur notar voru þjálfaðar hjá Háskólanum í Reykjavík, Tiro ehf. þróaði TTS-vefþjónustuna sem Símarómur notar (sjá http://hdl.handle.net/20.500.12537/268)

Use "Icelandic TTS for Android (22.10)"

Slovene Text Denormalizator RSDO-DS2-DENORM 1.0

2 resources

This Text Denormalisator converts Slovene spoken-form text into written-form text. Typically it is used as a post-processing step in Automatic Speech Recognition, which traditionally outputs spoken-form text. As input it accepts text in either string form, list of tokens, or a list of dictionaries with a mandatory "text" field. The output is a dictionary. Example of use: denormalize("Danes, osmega sedmega dva tisoč dvaindvajset, je lep sončen dan, saj je zunaj prijetnih petindvajset stopinj Celzija.") {'denormalized_content': [{'text': 'Danes', 'index': [0]}, {'text': ',', 'index': [1]}, {'text': '8.', 'index': [2]}, {'text': '7.', 'index': [3]}, {'text': '2022', 'index': [4, 5, 6]}, {'text': ',', 'index': [7]}, {'text': 'je', 'index': [8]}, {'text': 'lep', 'index': [9]}, {'text': 'sončen', 'index': [10]}, {'text': 'dan', 'index': [11]}, {'text': ',', 'index': [12]}, {'text': 'saj', 'index': [13]}, {'text': 'je', 'index': [14]}, {'text': 'zunaj', 'index': [15]}, {'text': 'prijetnih', 'index': [16]}, {'text': '25', 'index': [17]}, {'text': '°C', 'index': [18, 19]}, {'text': '.', 'index': [20]}], 'denormalized_string': 'Danes, 8. 7. 2022, je lep sončen dan, saj je zunaj prijetnih 25 °C.'}

Use "Slovene Text Denormalizator RSDO-DS2-DENORM 1.0"

Web client on top of Google translate compliant API backends

2 resources

A configurable machine translation web client for Google Translate V3 compatible backends. Stillanlegt vefviðmót fyrir vélþýðingakerfi sem styðja Google Translate V3 API sniðmátið.

Use "Web client on top of Google translate compliant API backends"

MSTperl parser (2015-05-19)

2 resources

MSTperl is a Perl reimplementation of the MST parser of Ryan McDonald (http://www.seas.upenn.edu/~strctlrn/MSTParser/MSTParser.html). MST parser (Maximum Spanning Tree parser) is a state-of-the-art natural language dependency parser -- a tool that takes a sentence and returns its dependency tree. In MSTperl, only some functionality was implemented; the limitations include the following: the parser is a non-projective one, curently with no possibility of enforcing the requirement of projectivity of the parse trees; only first-order features are supported, i.e. no second-order or third-order features are possible; the implementation of MIRA is that of a single-best MIRA, with a closed-form update instead of using quadratic programming. On the other hand, the parser supports several advanced features: parallel features, i.e. enriching the parser input with word-aligned sentence in other language; adding large-scale information, i.e. the feature set enriched with features corresponding to pointwise mutual information of word pairs in a large corpus (CzEng); weighted/unweighted parser model interpolation; combination of several instances of the MSTperl parser (through MST algorithm); combination of several existing parses from any parsers (through MST algorithm). The MSTperl parser is tuned for parsing Czech. Trained models are available for Czech, English and German. We can train the parser for other languages on demand, or you can train it yourself -- the guidelines are part of the documentation. The parser, together with detailed documentation, is avalable on CPAN (http://search.cpan.org/~rur/Treex-Parser-MSTperl/).

Use "MSTperl parser (2015-05-19)"

Slovene Conformer CTC BPE E2E Automated Speech Recognition model RSDO-DS2-ASR-E2E 2.0

2 resources

This Conformer CTC BPE E2E Automated Speech Recognition model was trained following the NVIDIA NeMo Conformer-CTC recipe (for details see the official NVIDIA NeMo NMT documentation, https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/asr/intro.html, and NVIDIA NeMo GitHub repository https://github.com/NVIDIA/NeMo). It provides functionality for transcribing Slovene speech to text. The training, development and test datasets were based on the Artur dataset and consisted of 630.38, 16.48 and 15.12 hours of transcribed speech in standardised form, respectively. The model was trained for 200 epochs and reached WER 0.0429 on the development and WER 0.0558 on the test dataset.

Use "Slovene Conformer CTC BPE E2E Automated Speech Recognition model RSDO-DS2-ASR-E2E 2.0"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Icelandic TTS for Android (24.04.)

Morfeusz 2

Dependency tree extraction tool STARK 1.0

ABLTagger (PoS) - 2.0.0

THEaiTRobot 2.0

Icelandic TTS for Android (22.10)

Slovene Text Denormalizator RSDO-DS2-DENORM 1.0

Web client on top of Google translate compliant API backends

MSTperl parser (2015-05-19)

Slovene Conformer CTC BPE E2E Automated Speech Recognition model RSDO-DS2-ASR-E2E 2.0

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Session recording