Archive

Area of research Funding provider

Grants

Provider: Digital Europe Programme
Duration Provider Grant ID PI Area
OpenEuroLLM: Open European Family of Large Language Models 36 months Digital Europe Programme 101195233 Jan Hajič
Provider: HE
Duration Provider Grant ID PI Area
EVERSE: European Virtual Institute for Research Software Excellence 2024-2027 HE 101129744 Pavel Straňák
ATRIUM: Advancing FronTier Research In the Arts and hUManities 2024 - 2027 HE 101132163 Pavel Straňák
InCroMin: Interactive Crosslingual Minutes 2024 HE 101070631 Ondřej Bojar
RES-Q Plus: Comprehensive solutions of healthcare improvement based on the global Registry of Stroke Care Quality 2022-2026 HE 101057603 Pavel Pecina
MEMORISE: Virtualisation and Multimodal Exploration of Heritage on Nazi Persecution 2022-2026 HE 101061016 Pavel Pecina
HPLT: High Performance Language Technologies 2022-2025 HE 101070350 Jan Hajič Corpora, Data, Machine Learning, Machine Translation, Monolingual, Multilingual
Provider: Social Sciences and Humanities Research Council of Canada
Duration Provider Grant ID PI Area
DACT: Digital Analysis of Chant Transmission 2023-2029 Social Sciences and Humanities Research Council of Canada 895-2023-1002 Jan Hajič jr. Corpora, Data, Information Retrieval, Linked data, Machine Learning, Multi-modality, Tools
Provider: ETF UK
Duration Provider Grant ID PI Area
The Anthropology of Artificial Intelligence: Ethics, Understanding, Human Nature 2023-2024 ETF UK 247002 Rudolf Rosa Tools
Provider: Horizon Europe, ERC
Duration Provider Grant ID PI Area
NG-NLG: Next-Generation Natural Language Generation 2022-2027 Horizon Europe, ERC 101039303 Ondřej Dušek Dialog, Linked data, Machine Learning, Semantics
Provider: PPPA (EU)
Duration Provider Grant ID PI Area
ELE 2: European Language Equality 2 2022-2023 PPPA (EU) LC-01884166 (Project 101075356) Jan Hajič

MŠMT - velké infrastruktury

Duration Provider Grant ID PI Area
LINDAT/CLARIN: Centre for Language Research Infrastructure in the Czech Republic 2016 - 2019 MŠMT - velké infrastruktury LM2015071 Jan Hajič Annotations, Coreference, Corpora, Data, Dialog, Discourse, Lexicons, Linked data, Machine Learning, Machine Translation, Morphology, Multi-modality, Parsers, Publications, Semantics, Speech Recognition, Taggers, Tools, Valency
LINDAT/CLARIAH-CZ: LINDAT/CLARIAH-CZ Language Resources and Digital Arts and Humanities Research Infrastructure (2016-)2023-2026 MŠMT - velké infrastruktury LM2023062 Jan Hajič Annotations, Coreference, Corpora, Data, Dialog, Discourse, Information Structure, Lexicons, Linked data, Machine Learning, Machine Translation, Monolingual, Morphology, Multi-modality, Multilingual, Multiword Expressions, Parsers, Publications, Semantics, Speech Recognition, Speech Retrieval, Spellcheckers, Syntax, Taggers, Tools, Valency
Provider: MPO
Duration Provider Grant ID PI Area
CEDMO 2.0 NPO 1.9. 2024 - 30. 4. 2026 MPO MPO 60273/24/21300/21000 Ondřej Bojar Data, Information Retrieval, Information Structure, Multi-modality
Provider: EC Digital Europe Programme (DIGITAL)
Duration Provider Grant ID PI Area
CEDMO 2.0 EU: Central European Digital Media Observatory 2.0 1.1.2024-31.10.2026 EC Digital Europe Programme (DIGITAL) 101158609 Václav Moravec
Provider: MŠMT - OP JAK
Duration Provider Grant ID PI Area
HumanAId: AI zaměřená na člověka pro udržitelnou a adaptabilní společnost 1. 3. 2025 - 31. 12. 2028 MŠMT - OP JAK CZ.02.01.01/00/23_025/0008691 Barbora Vidová Hladká
Jazykověda, umělá inteligence a jazykové a řečové technologie: od výzkumu k aplikacím 1. 1. 2025 - 31. 12. 2028 MŠMT - OP JAK CZ.02.01.01/00/23_020/0008518 Jan Hajič

Institutional support for research at the Charles University

Duration Provider Grant ID PI Area
Multilingual Lens: Investigating Large Text Corpora from Different Methodological Perspectives 2024 - 2029 UK UNCE/24/SSH/009 Zdeněk Žabokrtský Annotations, Corpora, Data, Discourse, Information Structure, Multilingual
Language Neutral and Culturally Aware Multilingual Neural Sentence Representations 2023-2026 UK PRIMUS/23/SCI/023 Jindřich Libovický Machine Learning, Multi-modality, Multilingual

Horizon 2020 - European Commission

Duration Provider Grant ID PI Area
CLS Infra: Computational Literary Studies Infrastructure 2021-2025 H2020 101004984 Silvie Cinková Annotations, Corpora, Data, Multilingual, Parsers, Semantics, Taggers, Teaching, Tools
WELCOME: Multiple Intelligent Conversation Agent Services for Reception, Management and Integration of Third Country Nationals. 2020-2023 H2020 870930 Pavel Pecina Annotations, Data, Dialog, Linked data, Machine Translation, Multi-modality, Multilingual, Parsers, Semantics, Speech Recognition
HumanE-AI-Net: HumanE AI Network 1. 9. 2020 - 31. 8. 2024 H2020 952026 Jan Hajič

EU ERASMUS MUNDUS

Duration Provider Grant ID PI Area
LCT: European Masters Program Language and Communication Technologies IX.2007-VIII.2013, IX.2013-VIII.2019, IX.2019-VIII.2025 EU ERASMUS MUNDUS 610622-EPP-1-2019-1-DE-EPPKA1-JMD-MOB Vladislav Kuboň Teaching

MŠMT - OP VVV

Duration Provider Grant ID PI Area
OP VVV LINDAT: LINDAT/CLARIN - Research infrastructure for language technologies – extension of the repository and its computational power 2017–2019 MŠMT - OP VVV CZ.02.1.01/0.0/0.0/16_013/0001781 Jan Hajič Annotations, Corpora, Data, Tools
LangTech: Modernizace oboru Matematická lingvistika MŠMT - OP VVV CZ.02.2.69/0.0/0.0/16_018/0002373 Zdeněk Žabokrtský Machine Learning, Multilingual, Teaching

Technology Agency (Czech Republic)

Duration Provider Grant ID PI Area
PONK: Asistent přístupné úřední komunikace 9/2023-12/2025 TAČR TQ01000526 Barbora Vidová Hladká Annotations, Corpora, Machine Learning
EdUKate: Promoting digital education of foreign-language children through machine translation 2023-2026 TAČR TQ01000458 Lucie Poláková Data, Machine Translation, Multi-modality, Multilingual
MASAPI: Multilingual assistant for searching, analysing and processing information and decision support 2021-2024 TAČR FW03010656 Pavel Pecina Information Retrieval, Information Structure, Machine Learning, Machine Translation, Semantics
CZDEMOS4AI: Prospěšný multiagentní AI avatar v malé demokratické společnosti 09/2024-12/2029 TAČR TQ12000040 Martin Popel Dialog, Information Retrieval, Machine Learning, Monolingual, Multi-modality
EduPo: Generování české poezie v edukačním a multimediálním prostředí 09/2023 - 11/2026 TAČR TQ01000153 Rudolf Rosa Annotations, Corpora, Monolingual, Teaching, Tools
EDU-AI: AI asistent pro žáky a učitele 04/2021-12/2023 TAČR TL05000236 Ondřej Dušek Dialog, Information Retrieval

Czech Science Foundation

Duration Provider Grant ID PI Area
Better Tokenization for Multilingual Language Models and Machine Translation 3 years GAČR 25-16242S Jindřich Libovický Machine Translation, Multilingual
AIAI: AI: Authorship and Interpretation 2025-2027 GAČR 25-14501L Rudolf Rosa
NomVallex-Denom: Czech non-verbal predicates motivated by nouns and their syntactic behavior 2025-2027 GAČR 25-16716S Veronika Kolářová Annotations, Data, Lexicons, Linked data, Monolingual, Semantics, Syntax, Valency
HVar: Disagreement in corpus annotation and variation of human understanding of text 2024-2026 GAČR 24-11132S Šárka Zikánová Annotations, Data, Psycholinguistics, Semantics
SEEM-CZ: Epistemic and Evidential Markers in Czech 2023-2025 GAČR 23-05240S Barbora Štěpánková Annotations, Corpora, Data, Lexicons, Semantics
ForFun2: ForFun2: Functions and Forms of Circumstantial Modifications 2023-2025 GAČR 23-05238S Marie Mikulová Annotations, Semantics, Syntax
Identification and Prevention of Unwanted Gender Bias in Neural Language Models 2023-2024 GAČR 23-06912S David Mareček
NomVallexDer: Word-formation Relations Reflected in Noun Valency: The Case of Czech Deverbal and Deadjectival Nouns 2022-2024 GAČR 22-20927S Veronika Kolářová Annotations, Corpora, Lexicons, Monolingual, Syntax, Valency
RapiDisc: Metody pro rychlou diskurzní anotaci ve vybraných korpusech 2022-2024 GAČR 22-03269S Jiří Mírovský Annotations, Corpora, Data, Discourse, Parsers
LUSyD: Language Understanding: from Syntax to Discourse 2020–2024 GAČR GX20-16819X Jan Hajič Coreference, Machine Learning, Machine Translation, Parsers, Semantics, Syntax, Valency
Global Coherence: Global Coherence of Czech Texts in the Corpus-Based Perspective 2020 - 2023 GAČR 20-09853S Lucie Poláková Annotations, Corpora, Data, Discourse, Semantics
NEUREM3: Neuronové reprezentace v multimodálním a mnohojazyčném modelování (Neural Representations in Multi-modal and Multi-lingual Modelling) 2019-2023 GAČR 19-26934X Ondřej Bojar Machine Learning, Multi-modality, Multilingual

Ministry of Education, Youth and Sport (Czech Republic)

Duration Provider Grant ID PI Area
Uniform Meaning Representation (UMR) 1.3.2023 - 30.9.2027 MŠMT LUAUS23283 Jan Hajič Corpora, Data, Lexicons, Linked data, Multilingual, Multiword Expressions, Semantics, Syntax, Valency
Improving stomach examinations with Artificial Intelligence: A deep learning approach for assisted gastroscopy 1. 7. 2024 - 31. 12. 2026 MŠMT LUABA24136 Pavel Pecina

Ministry of Culture

Duration Provider Grant ID PI Area
Automatické hodnocení mluveného projevu v češtině [Automated Speech Scoring in Czech] 2023–2027 NAKI DH23P03OVV037 Kateřina Rysová Corpora, Data, Discourse, Monolingual, Tools
OmniOMR: OmniOMR - optical music recognition using machine learning for digital libraries 2023-2027 NAKI DH23P03OVV008 Jan Hajič jr. Annotations, Data, Machine Learning

Program START (UK - OP VVV)

Duration Provider Grant ID PI Area
Babel Octopus: Robust Multi-Source Speech Translation 2021-2023 START START/SCI/089 Peter Polák Machine Translation, Multilingual, Speech Recognition
A data-based approach to competition in word-formation: selected semantic categories across seven languages 2021-2023 START START/HUM/010 Annotations, Data, Lexicons, Morphology, Multilingual, Semantics

Grant Agency of the Charles University

Duration Provider Grant ID PI Area
Modeling Mopheme Flow among Languages Jan 2024- Dec 2026 GAUK 101924 Abishek Stephen Lexicons, Morphology, Multilingual
Reliable and Explainable Large Language Models for Text Generation 2025-2027 GAUK 252986 Patrícia Schmidtová Machine Learning
Coreference Resolution and Representation in Deep Universal Dependencies 2025 - 2027 GAUK 105124 Dima Taji Coreference
Adapting Uniform Meaning Representation (UMR) for the Italic/Romance languages 2024-2026 GAUK 104924 Federica Gamba Data, Semantics
Mashcima: Synthetic training data generation and other methods for handwritten music recognition 2023-2025 GAUK 289623 Jiří Mayer Data, Machine Learning, Tools
Methods for improving neural machine translation of diverse texts 2023-2025 GAUK 244523 Josef Jon
Using Auxiliary Subtasks for Learning Constraints in NLP 2023-2025 GAUK 272323 Dávid Javorský Coreference, Machine Learning, Machine Translation, Semantics
Morphological complexity of the verbal lexicon in four languages: Quantitative research based on corpus data 2023-2025 GAUK 246723 Hana Hledíková Corpora, Morphology, Multilingual
Compound Identification and Splitting in Four Languages: A Deep Learning Approach 2022-2024 GAUK 128122 Emil Svoboda Machine Learning, Morphology, Multilingual, Tools
ECSS: Evaluation of conversational speech synthesis 2022-2024 GAUK 40222 Ondřej Plátek Data, Dialog, Machine Learning