Logo Sciencesconf

Phraseo-MWE 2026

 1st Workshop

Multi-word expressions and phraseology – corpus-based and computer-processed 

Workshop date: 29 September 2026, 09:00–14:00

Workshop within XXII EURALEX INTERNATIONAL CONGRESS 2026

 Vienna, Austria

Call for Papers

Key Words: Phraseology, Multi Word Expressions (MWEs), Corpus, Lexicography, Computer Science

Phraseology (Cowie 1998; Granger & Meunier 2008; Mitkov 2017; Mel’čuk 2023; Polguère 2002, 2014; Mejri 2018; Chen 2021), or multiword expressions (MWEs) (Savary 2008; Constant 2012) occupy a central position in linguistic description, language use, and language learning. Idioms, collocations, lexical bundles, and fixed or semi-fixed expressions constitute a significant part of natural language, yet they remain challenging to model, describe, and represent in lexicographic resources (Pecina 2010; Mel’čuk 2011; Polguère 2014; Chen 2025). With the rapid development of corpus linguistics, Natural Language Processing (NLP), and Artificial Intelligence (AI), phraseology and lexicography are currently undergoing a profound methodological and conceptual transformation.

From a modeling perspective, large-scale corpora (Mitkov 2017), both general and specialized, enable the systematic identification of MWEs through statistical, distributional, and syntactic approaches. Methods such as n-gram extraction, association measures (PMI, t-score), syntactic patterning, and embedding-based similarity allow researchers to capture degrees of fixedness, semantic compositionality, and contextual variability. These advances open new perspectives for representing phraseological knowledge in structured models, including ontologies, lexical networks, and standards such as OntoLex-Lemon (McCrae; Bosque-Gil et al. 2017; Bosque-Gil et al. 2019).

Technological tools have also profoundly reshaped lexicographic practices (Atkins & Rundell 2008; Granger & Paquot 2012). Digital corpora, web data, and annotation platforms facilitate the semi-automatic extraction and validation of phraseological units (Mitkov 2017; Evert 2008; Gries 2008; Constant et al. 2017). New-generation lexical resources integrate rich metadata, usage examples, frequency information, and semantic relations, making them more dynamic and interoperable (Polguère 2014; Cimiano et al. 2016; McCrae et al. 2017). Computational approaches also enable multilingual and contrastive resources essential for studying phraseological variation across languages and cultures (Paquot 2015; Mel’čuk 2011).

New technologies play a crucial role in the design of pedagogical dictionaries and learning-oriented resources (Bogaards & van der Kloot 2002; Lew 2012). Phraseology is often a major obstacle for language learners, as MWEs cannot always be interpreted compositionally (Wray 2002; Howarth 1998; Granger 1998). Corpus-based examples, learner-oriented definitions, and adaptive digital interfaces can significantly enhance the accessibility of phraseological information. AI-driven tools and LLM-assisted lexicography offer promising avenues for generating contextualized examples and learner-adapted usage notes (Heift & Schulze 2007; Godwin-Jones 2023; Bender & Koller 2020).

In translation lexicography, phraseological units pose well-known challenges due to their idiomaticity and phraseocultural specificity (Chen 2022a; 2022b). Parallel corpora, alignment tools, and machine translation systems provide valuable data for identifying translation equivalents and strategies (Chen et al. 2024). Digital translation dictionaries can now incorporate cross-linguistic mappings, semantic annotations, and attested examples, bridging lexicographic description and translational practice.

Finally, computational analysis of existing dictionaries (Béjoint 2010; Lew 2013) offers new insights into lexicographic traditions. Large-scale comparison of entries, coverage, and microstructure contributes to understanding how dictionaries evolve in response to technological and societal changes (Hausmann 1989; Tarp 2012).

The intersection of phraseology, MWEs, lexicography, and new technologies constitutes a fertile research domain, fostering innovative models, richer resources, and more effective tools for analysis, learning, and translation.

Topics of Interest (Non-Exhaustive)

  • Corpus-based identification and classification of phraseological units and MWEs
  • Lexicographic description of fixed expressions, variation, and idiomaticity
  • Phraseology in digital and computational lexicography
  • Ontological and knowledge-graph modelling (OntoLex-Lemon, SKOS, RDF)
  • Phraseological units in bilingual and multilingual dictionaries
  • Phraseology and neology
  • Contrastive and cross-cultural phraseology (e.g. French–Chinese, low-resource languages)
  • NLP & AI approaches to MWEs: extraction, tagging, disambiguation, evaluation
  • Use of LLMs for phraseological description and example generation
  • Integration of phraseology into dictionary micro- and macrostructures
  • Evaluation of phraseological coverage in digital dictionaries

Bibliography

  • Atkins, B. T. S., & Rundell, M. (2008). The Oxford guide to practical lexicography. Oxford University Press.
  • Bender, E. M., & Koller, A. (2020). Climbing towards NLU: On meaning, form, and understanding in the age of data. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 5185–5198. https://doi.org/10.18653/v1/2020.acl-main.463
  • Béjoint, H. (2010). The lexicography of English: From origins to present. Oxford University Press.
  • Bogaards, P., & van der Kloot, W. (2002). The use of grammatical information in learners’ dictionaries. International Journal of Lexicography, 15(2), 97–121. https://doi.org/10.1093/ijl/15.2.97
  • Bosque-Gil, J., & Gracia, J. (2019). The OntoLex Lemon Lexicography Module. Final Community Group Report, W3C.
  • Cimiano, P., McCrae, J. P., & Buitelaar, P. (2016). Lexicon model for ontologies: The OntoLex model. In P. Cimiano et al. (Eds.), The semantic web: ESWC 2016 satellite events (pp. 107–113). Springer. https://doi.org/10.1007/978-3-319-47602-5_23
  • Chen, L. (2021). Analyse comparative des expressions idiomatiques en chinois et en français (Doctoral dissertation, CY Cergy Paris Université).
  • Chen, L. (2022a). Phraseoculture in the construction of the corpus of the DiCoP: The treatment of the phraseographic microstructure. Short papers of EUROPHRAS (pp. 17–25).
  • Chen, L. (2022b). Phraséoculturologie : Une sous-discipline moderne indispensable de la phraséologie. SHS Web of Conferences, 138, 04011. https://doi.org/10.1051/shsconf/202213804011
  • Chen, L. (2025). Modeling and structuring of a bilingual French–Chinese phraseological dictionary. In eLex 2025 (pp. 830–851).
  • Chiarcos, C., Apostol, E.-S., Kabashi, B., & Truică, C.-O. (2022). Modelling Frequency, Attestation, and Corpus-Based Information with OntoLex-FrAC. COLING 2022.
  • Constant, M. et al. (2017). Survey: Multiword expression processing—A survey. Computational Linguistics, 43(4), 837–892.
  • Cowie, A. P. (1998). Phraseology: Theory, analysis, and applications. Clarendon Press.
  • Evert, S. (2008). Corpora and collocations. In Lüdeling & Kytö (Eds.), Corpus linguistics handbook.
  • Fuertes-Olivera, P. A., & Bergenholtz, H. (2011). Lexicography, information science, and information needs. Lexikos, 21.
  • Godwin-Jones, R. (2023). Generative AI and language learning. Language Learning & Technology, 27(1).
  • Granger, S., & Meunier, F. (2008). Phraseology: An interdisciplinary perspective. John Benjamins.
  • Gries, S. T. (2008). Phraseology and linguistic theory. In Granger & Meunier (Eds.).
  • Hartmann, R. R. K. (2007). Interlingual lexicography. Niemeyer.
  • Heift, T., & Schulze, M. (2007). Errors and intelligence in CALL. Routledge.
  • Howarth, P. (1998). Phraseology and second language proficiency. Applied Linguistics, 19(1).
  • Lew, R. (2015). User-generated content in online dictionaries. Lexicography, 2(2).
  • McCrae, J. et al. (2017). The OntoLex-Lemon model. eLex 2017.
  • Mel’čuk, I. A. (2023). General phraseology: Theory and practice. John Benjamins.
  • Mejri, S. (2018). La phraséologie : Cotexte, contexte et contenus culturels.
  • Mitkov, R. (2017). Computational and corpus-based phraseology. Springer.
  • Paquot, M. (2015). Lexicography and phraseology.
  • Pecina, P. (2010). Lexical association measures. Language Resources and Evaluation.
  • Polguère, A. (2014). Principes de modélisation systémique des réseaux lexicaux.
  • Ramisch, C. et al. (2010). mwe-toolkit. LREC 2010.
  • Rundell, M. (2023). Automating the creation of dictionaries.
  • Savary, A. (2008). Computational inflection of multi-word units.
  • Smadja, F. (1993). Retrieving collocations from text. Computational Linguistics.
  • Tarp, S. (2008). Lexicography in the borderland between knowledge and non-knowledge.
  • Vincze, V. et al. (2011). Multiword expressions and named entities.
  • Wray, A. (2009). Formulaic language and the lexicon. Cambridge University Press.

Important Dates

March 8, 2026 Abstract submission
April 2, 2026 Abstract acceptance notification
April 17, 2026 Deadline for confirmation of workshop attendance
July 10, 2026 Early Bird registration deadline for EURALEX
July 24, 2026 Submission of camera-ready abstracts
September 1, 2026 End of registration for EURALEX
September 29, 2026 Phraseo-MWE-2026 at the Austrian Academy of Sciences, Vienna

 

Workshop Chairs

Dr. CHEN Lian 陈恋 Laboratoire LLL, University of Orléans; CRLAO – CNRS – INALCO, France https://lianchen.fr/

Dr. KABASHI Besim Eberhard Karls Universität Tübingen, Germany https://www.besim-kabashi.net/

Loading... Loading...