Result filters

Metadata provider

  • DSpace

Language

Resource type

Availability

Active filters:

  • Metadata provider: DSpace
Loading...
419 record(s) found

Search results

  • THEaiTRobot 1.0

    The THEaiTRobot 1.0 tool allows the user to interactively generate scripts for individual theatre play scenes. The tool is based on GPT-2 XL generative language model, using the model without any fine-tuning, as we found that with a prompt formatted as a part of a theatre play script, the model usually generates continuation that retains the format. We encountered numerous problems when generating the script in this way. We managed to tackle some of the problems with various adjustments, but some of them remain to be solved in a future version. THEaiTRobot 1.0 was used to generate the first THEaiTRE play, "AI: Když robot píše hru" ("AI: When a robot writes a play").
  • Tokenizer for Icelandic text (2.0.3)

    Tokenizer is a compact pure-Python (2 and 3) executable program and module for tokenizing Icelandic text. It converts input text to streams of tokens, where each token is a separate word, punctuation sign, number/amount, date, e-mail, URL/URI, etc. It also segments the token stream into sentences, considering corner cases such as abbreviations and dates in the middle of sentences.
  • Slavic Forest, Norwegian Wood (scripts)

    Tools and scripts used to create the cross-lingual parsing models submitted to VarDial 2017 shared task (https://bitbucket.org/hy-crossNLP/vardial2017), as described in the linked paper. The trained UDPipe models themselves are published in a separate submission (https://lindat.mff.cuni.cz/repository/xmlui/handle/11234/1-1971). For each source (SS, e.g. sl) and target (TT, e.g. hr) language, you need to add the following into this directory: - treebanks (Universal Dependencies v1.4): SS-ud-train.conllu TT-ud-predPoS-dev.conllu - parallel data (OpenSubtitles from Opus): OpenSubtitles2016.SS-TT.SS OpenSubtitles2016.SS-TT.TT !!! If they are originally called ...TT-SS... instead of ...SS-TT..., you need to symlink them (or move, or copy) !!! - target tagging model TT.tagger.udpipe All of these can be obtained from https://bitbucket.org/hy-crossNLP/vardial2017 You also need to have: - Bash - Perl 5 - Python 3 - word2vec (https://code.google.com/archive/p/word2vec/); we used rev 41 from 15th Sep 2014 - udpipe (https://github.com/ufal/udpipe); we used commit 3e65d69 from 3rd Jan 2017 - Treex (https://github.com/ufal/treex); we used commit d27ee8a from 21st Dec 2016 The most basic setup is the sl-hr one (train_sl-hr.sh): - normalization of deprels - 1:1 word-alignment of parallel data with Monolingual Greedy Aligner - simple word-by-word translation of source treebank - pre-training of target word embeddings - simplification of morpho feats (use only Case) - and finally, training and evaluating the parser Both da+sv-no (train_ds-no.sh) and cs-sk (train_cs-sk.sh) add some cross-tagging, which seems to be useful only in specific cases (see paper for details). Moreover, cs-sk also adds more morpho features, selecting those that seem to be very often shared in parallel data. The whole pipeline takes tens of hours to run, and uses several GB of RAM, so make sure to use a powerful computer.
  • Q-CAT Corpus Annotation Tool 1.2

    The Q-CAT (Querying-Supported Corpus Annotation Tool) is a computational tool for manual annotation of language corpora, which also enables advanced queries on top of these annotations. The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian (http://hdl.handle.net/11356/1210), such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system. Version 1.1 enables the automatic attribution of token IDs and personalized font adjustments. Version 1.2 supports the CONLL-U format and working with UD POS tags.
  • Q-CAT Corpus Annotation Tool 1.0

    The Q-CAT (Querying-Supported Corpus Annotation Tool) is a computational tool for manual annotation of language corpora, which also enables advanced queries on top of these annotations. The tool has been used in various annotation campaigns related to the ssj500k reference training corpus of Slovenian (http://hdl.handle.net/11356/1210), such as named entities, dependency syntax, semantic roles and multi-word expressions, but it can also be used for adding new annotation layers of various types to this or other language corpora. Q-CAT is a .NET application, which runs on Windows operating system
  • OCR Post-Processing Transformer Model 23.04

    ENGLISH During the project L11 - Error models for OCR of The Language Technology Programme 2019-2023, various OCR post-processing models were trained. This is the best performing one. On texts from the 19th century to the early 20th century, it reduces word error rate from 6.49% to 3.08%, and character error rate from 1.39% to 0.73%. On modern texts, it reduces word error rate from 5.52% to 3.60% and character error rate from 1.17% to 1.0%. More info, such as how to use the model for inference, in README. ICELANDIC Í verkefninu L11 - Error models for OCR í Máltækniáætlun 2019-2023 voru nokkur ljóslestrarvilluleiðréttingarlíkön þjálfuð. Þetta er best þeirra. Líkanið lækkar hlutfall orðavillna (e. word error rate) úr 6,49% í 3,08% í textum frá 19. öld og fyrri hluta 20. aldar og hlutfall stafvillna úr 1,39% í 0,73%. Í nútímamálstextum lækkar það hlutfall orðavillna úr 5,52% í 3,60% og hlutfall stafvillna úr 1,17% í 1,0%. Nánari upplýsingar, svo sem hvernig má nota líkanið, er að finna í meðfylgjandi README-skjali.
  • Prep for Adventure: A game for the acquisition of English prepositions

    The presented game is designed to teach the six most frequent English prepositions (to, of, in, for, on, and with) at the A1 to A2 levels of proficiency. Prep for Adventure is a single-player game comprised of five separate tasks – jumping puzzle, cooking, town maze, lighting the goblets, and a banter with a classmate. Their mechanics are then combined in the final task (The Final Fight) to elicit the correct responses of the subject. The language used in the game is adjusted for the subjects’ level of proficiency, the game is fully voiced and offers a degree of customization. All tasks are based on the gap-filling type of exercise where subjects have to complete a sentence with a missing word, either by typing it in or via different kinds of multiple-choice formats. The game is designed to advance the subjects’ performance in prepositional structures by exposing players to as much input as possible. The length of one average playthrough is approximately 30-45 minutes. The game was created in the RPG Maker MV engine where RPG stands for role-playing game, which is a genre of a game in which the player adopts a role/roles of a fictional character/characters in a (partly or fully) invented setting. The game story: The Grammar School of Witchcraft has been taken over by the Evil Preposition Magician and the player is trying to win their school back alongside with a young witch named Morphologina (the player’s guide).