CLARIN Tool Portal

Rule-based g2p for Icelandic

2 resources

Manually developed grapheme-to-phoneme (g2p) transcription rules for Icelandic, written in Thrax grammar syntax. The rules are for the standard Icelandic pronunciation, the northern variation, the north-eastern variation and the south pronunciation variation. The package also contains a command line tool in C++. Handskrifaðar hljóðritunarreglur fyrir íslensku, skrifaðar í Thrax. Reglurnar eru skrifaðar fyrir hefðbundinn íslenskan framburð, fyrir harðmæli, raddaðan framburð og hv-framburð. Skipanalínutól skrifað í C++ fylgir.

Use "Rule-based g2p for Icelandic"

Corpus extraction tool LIST 1.2

2 resources

The LIST corpus extraction tool is a Java program for extracting lists from text corpora on the levels of characters, word parts, words, and word sets. It supports VERT and TEI P5 XML formats and outputs .CSV files that can be imported into Microsoft Excel or similar statistical processing software. Version 1.2 adds support for Gigafida 2.0 in XML format and fixes a bug which disabled the extraction of character-level n-grams from normalized forms in the GOS 1.0 corpus.

Use "Corpus extraction tool LIST 1.2"

GreynirPackage 3.1.0

3 resources

GreynirPackage is a Python 3 package for working with Icelandic natural language text. Greynir can parse text into sentence trees, find lemmas, inflect noun phrases, assign part-of-speech tags and much more. Greynir's sentence trees can inter alia be used to extract information from text, for instance about people, titles, entities, facts, actions and opinions. Greynir uses the Tokenizer package, by the same authors, to tokenize text. More information at https://github.com/mideind/GreynirPackage and detailed documentation at https://greynir.is/doc/. GreynirPackage er Python 3 pakki sem vinnur með íslenskan texta. Greynir þáttar texta í setningar, lemmar og markar texta, beygir nafnliði og margt fleira. Hægt er að nýta þáttunartrén sem tólið býr til í þeim tilgangi að draga upplýsingar út úr texta, til dæmis um manneskjur, starfstitla, sérnafnaeiningar, staðreyndir, atburði og skoðanir. Greynir notar Tokenizer-pakkann, eftir sömu höfunda, til að tilreiða texta. Frekari upplýsingar má finna á https://github.com/mideind/GreynirPackage og ítarlega skjölun (á ensku) á https://greynir.is/doc/.

Use "GreynirPackage 3.1.0"

IceParser 1.5.0

2 resources

IceParser is a shallow parser for Icelandic. The parser comprises a sequence of finite-state transducers, which add syntactic information, in an incremental manner, into the input text. The input to IceParser is part-of-speech (PoS) tagged text and it produces output which includes annotation of both constituent structure and syntactic functions. The distributed file contains the entirety of IceNLP, a toolkit of various NLP tools for processing and analysing Icelandic. The current version of IceParser in IceNLP has been specifically changed and updated to be able to annotate input tagged with the revised Icelandic POS tagset. --- IceParser er hlutaþáttari fyrir íslensku. Þáttarinn samanstendur af röð af stöðuferjöldum sem bæta setningafræðilegum upplýsingum inn í inntakstextann á stigvaxandi hátt. Inntakið í IceParser er markaður texti og þáttarinn skilar af sér úttaki sem inniheldur bæði merkingar á setningaliðum og setningafræðilegum hlutverkum. Skráin sem fylgir inniheldur allt IceNLP, þ.e. safn tóla til að vinna með og greina íslensku. Núverandi útgáfa af IceParser í IceNLP hefur verið breytt og uppfærð til að greina og merkja inntak sem er markað með hinu endurskoðað íslenska markamengi.

Use "IceParser 1.5.0"

ALEXIA: Lexicon Acquisition Tool for Icelandic (Orðtökutólið Alexía) 3.0 (21.08)

2 resources

ALEXIA is a command-line based corpus tool used for comparing a certain vocabulary to that of a larger corpus or corpora. In order to maintain lexicons, dictionaries and terminologies, it is necessary to be able to systematically go through large amounts of text considered to be representative of the language or category in question in order to find potential gaps in the data. ALEXIA provides an easy way to generate such candidate lists. In order to successfully run ALEXIA, the user must run main.py This script offers two language options, Icelandic and English. It guides the user through a series of options, including the necessary set-up of SQL-databases. After the setup is completed, the user is offered the option of continuing to the actual program. The user is greeted with a welcome message and asked whether to create the default databases for the demo version of the program or if they want to provide their own lexicon files. If the default set-up is chosen, the user must indicate whether to use the Database of Icelandic Morphology (DIM) or A Dictionary of Contemporary Icelandic (DCI) whose vocabulary is then compared to that of the Icelandic Gigaword Corpus (IGC). A number of filters is used to limit distortion from the results. __ ALEXIA er málheildartól sem er keyrt í gegnum skipanalínuna og tilgangur þess er að bera saman orðaforða gagnasafns við orðaforða stórrar málheildar. Það er nauðsynlegt til þess að viðhalda orðasöfnum, orðabókum og íðorðabönkum að geta farið kerfisbundið í gegnum mikið magn texta sem er álitinn táknrænn fyrir tungumálið eða efnisflokkinn sem er verið að skoða hverju sinni. ALEXIA býður upp á auðvelda leið til þess að smíða slíka orðalista. Til þess að nota orðtökutólið með góðum árangri þarf notandinn að keyra main.py í gegnum skipanalínuna2 Skriftan býður upp á tvo tungumálavalmöguleika, ensku og íslensku. Hún leiðir notandann í gegnum ýmsa valmöguleika, þar á meðal uppsetningu SQL-gagnagrunna. Að uppsetningunni lokinni er notandanum boðið að halda áfram í keyrsluhluta forritsins. Notandinn er spurður hvort eigi að búa til gagnagrunna í gegnum sjálfvirka uppsetningu eða hvort hann vilji leggja til eigin orðasafnsskjöl. Ef sjálfgefin uppsetning er valin þarf notandinn að gefa til kynna hvort nota eigi Beygingarlýsingu íslensks nútímamáls (BÍN) eða Nútímamálsorðabókina (NMO) sem inntak. Orðaforði þeirra er þá borinn saman við orðaforða Risamálheildarinnar (RMH). Ýmiskonar síum er beitt til þess að úttakið verði sem best. The linked video includes detailed description of the tool's use // Myndbandið sem fylgir hér í hlekk inniheldur nákvæmar upplýsingar um notkun tólsins.

Use "ALEXIA: Lexicon Acquisition Tool for Icelandic (Orðtökutólið Alexía) 3.0 (21.08)"

Models for automatic g2p for Icelandic (20.10)

2 resources

Grapheme-to-phoneme (g2p) models for Icelandic, trained on an encoder-decoder LSTM neural network. The models are delivered with scripts for automatic transcription of Icelandic in the standard pronunciation variation, in the northern variation, north-east variation, and the south variation. To run the scripts the user needs to install Fairseq (see Readme in the project repository). Hljóðritunarlíkön fyrir íslensku, þjálfuð á LSTM tauganeti. Líkönunum fylgja skriftur til þess að hljóðrita íslensku skv. hefðbundnum framburði, harðmæli, rödduðum framburði og hv-framburði. Til þess að keyra skrifturnar þarf notandi að setja upp Fairseq (sjá nánari skjölun með verkefninu).

Use "Models for automatic g2p for Icelandic (20.10)"

XLM-RoBERTa events recognition

3 resources

Event recognition models for the Polish language, based on the XLM-RoBERTa language model.

Use "XLM-RoBERTa events recognition"

MOSI: TTS evaluation tool (22.01)

2 resources

EN: MOSI is a text-to-speech (TTS) evaluation platform. The platform is focused on listening tests. Organizers can upload audio clips to be evaluated using Mean opinion score (MOS), AB or ABX setups. The platform allows the organizers to arrange and plan the evaluations, customize the setup, send out invite links to participants and view and download the results. A detailed setup description can be found in README.md and a user guide can be found in HOW_TO_USE.md. IS: MOSI er tól/vettvangur þar sem hljóðgerving er metin. MOSI er búinn til fyrir hlustunarpróf. Notendur MOSA geta hlaðið upp hljóðklippum og notað MOS-, AB- eða ABX-fyrirkomulag. MOSI gerir skipuleggjendum kleift að skipuleggja kannanir, stilla þær eftir sinni hentisemi, senda boðshlekki til þátttakenda og skoða og hlaða niður niðurstöðum. Uppsetningarleiðbeiningar má finna í readme.md og notkunarleiðbeiningar má finna í HOW_TO_USE.md.

Use "MOSI: TTS evaluation tool (22.01)"

EWBST tests for english

4 resources

Submission contains test generated for EWBST test of English word embedding models. Tests were created with princeton wordnet and plWN english synsts.

Use "EWBST tests for english"

GreynirCorrect (1.0.2)

3 resources

GreynirCorrect is a Python 3 package and a command line tool for checking and correcting various types of spelling and grammar errors in Icelandic text. GreynirCorrect relies on the Tokenizer package, by the same authors, to tokenize text. More information can be found at https://github.com/mideind/GreynirCorrect, and detailed documentation at https://yfirlestur.is/doc/. GreynirCorrect er Python 3 pakki og skipanalínutól sem bendir á og leiðréttir ýmsar tegundir stafsetningar- og málvillna í íslenskum texta. GreynirCorrect reiðir sig á Tokenizer-pakkann, eftir sömu höfunda, til að tilreiða textann. Frekari upplýsingar má finna á https://github.com/mideind/GreynirCorrect, og ítarlega skjölun (á ensku) á https://yfirlestur.is/doc/.

Use "GreynirCorrect (1.0.2)"

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Rule-based g2p for Icelandic

Corpus extraction tool LIST 1.2

GreynirPackage 3.1.0

IceParser 1.5.0

ALEXIA: Lexicon Acquisition Tool for Icelandic (Orðtökutólið Alexía) 3.0 (21.08)

Models for automatic g2p for Icelandic (20.10)

XLM-RoBERTa events recognition

MOSI: TTS evaluation tool (22.01)

EWBST tests for english

GreynirCorrect (1.0.2)

Result filters

Metadata provider

Language

Resource type

Tool task

Availability

Project

Keywords

Active filters:

Search results

Session recording