Result filters

Metadata provider

Language

Resource type

  • Unspecified

Availability

Active filters:

  • Resource type: Unspecified
Loading...
419 record(s) found

Search results

  • The CLASSLA-StanfordNLP model for lemmatisation of non-standard Croatian 1.0

    The model for lemmatisation of non-standard Croatian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the hr500k training corpus (http://hdl.handle.net/11356/1210), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/) and the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), using the hrLex inflectional lexicon (http://hdl.handle.net/11356/1232). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~97.54.
  • The CLASSLA-StanfordNLP model for lemmatisation of non-standard Serbian 1.0

    The model for lemmatisation of non-standard Serbian was built with the CLASSLA-StanfordNLP tool (https://github.com/clarinsi/classla-stanfordnlp) by training on the SETimes.SR training corpus (http://hdl.handle.net/11356/1200), the ReLDI-NormTagNER-sr corpus (http://hdl.handle.net/11356/1240), the ReLDI-NormTagNER-hr corpus (http://hdl.handle.net/11356/1241), the hr500k training corpus (http://hdl.handle.net/11356/1210) and the RAPUT corpus (https://www.aclweb.org/anthology/L16-1513/), using the srLex inflectional lexicon (http://hdl.handle.net/11356/1233). These corpora were additionally augmented for handling missing diacritics by repeating parts of the corpora with diacritics removed. The estimated F1 of the lemma annotations is ~97.62.
  • Voice control and question answering (22.10)

    [English] The goal of this work package was to develop Kaldi recipes for voice control and question answering systems for Icelandic. We defined six tasks and either generated or gathered data for each, normalized the data and trained Kaldi language models. Included in this submission are six ASR language models, an acoustic model, the training data for the language model and all the code used to generate the data and create the models. For further information have a look at the file README.md. [Icelandic] Markmiðið með þessu verkefni var að búa til talgreiningar uppskriftir með Kalda fyrir raddskipanir og fyrirspurnir. Við skilgreindum sex verkefni og annaðhvort söfnuðum eða bjuggum til gögn fyrir hvert og eitt þeirra, undirbjuggum gögnin og þjálfuðum mállíkön. Í þessu safni er að finna sex sérhæfð mállíkön, hljóðlíkan, gögnin sem voru notuð til þess að búa til mállíkönin ásamt öllum kóða sem notaður var til þess að búa til gögnin og líkönin. Freakri upplýsingar má finna í skránni README.md.
  • Kaldi Recipe for Faroese

    - ENGLISH The "Kaldi Recipe for Faroese" is a code recipe intended to show how to use the corpus "Ravnursson Faroese Speech and Transcripts" [1] to create automatic speech recognition systems using the Kaldi toolkit [2]. - ÍSLENSLA "Kaldi Forskrift fyrir færeysku" er forskrift af því hvernig má nota gagnasafnið "Ravnursson Faroese Speech and Transcripts" [1] til að búa til talgreini í verkfærakistunni Kaldi [2]. [1] Hernández Mena, Carlos Daniel; Simonsen, Annika. "Ravnursson Faroese Speech and Transcripts". Web Downloading: http://hdl.handle.net/20.500.12537/276 [2] Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., ... & Vesely, K. (2011). The Kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding (No. CONF). IEEE Signal Processing Society.
  • CUBBITT Translation Models (en-cs) (v1.0)

    CUBBITT En-Cs translation models, exported via TensorFlow Serving, available in the Lindat translation service (https://lindat.mff.cuni.cz/services/translation/). Models are compatible with Tensor2tensor version 1.6.6. For details about the model training (data, model hyper-parameters), please contact the archive maintainer. Evaluation on newstest2014 (BLEU): en->cs: 27.6 cs->en: 34.4 (Evaluated using multeval: https://github.com/jhclark/multeval)