Empirical methodology

The first step: a list of potential widespread idioms
This list consisted of more than 1,000 idioms which exist in five or more European languages. It was compiled based on some preliminary knowledge and later completed by systematic and large-scale studies, checking many publications on phraseology. The starting point was not only some major languages of Central Europe (English, German, French), but also included, from the very beginning, the geographically and genetically distant languages Finnish, Greek and Russian.

The second step: preliminary tests
These potential widespread idioms were sent for further examination to experts of geographically and genetically diverse languages. They were pre-tested for Spanish, Romanian, Bulgarian, Croatian, Polish, Latvian, Hungarian and Estonian. The result was a remaining core set of about 650 actual "widespread idiom candidates" which then had to be reviewed by experts of many further languages.

The third step: a network of competent collaborators
The next step was to build up a network of competent collaborators for as many languages as possible. Ten questionnaires with a total of 500 "widespread idiom candidates" were sent via e-mail to experts of many languages, asking to answer questions based on both their own competence with regard to idioms and discussions within their circle of colleagues. The questionnaires seven (a revised version) as well as eight and nine have been placed on the website only recently; number ten will follow later. See questionnaire

The fourth step: filling in the questionnaires
The project has been reliably supported by a number of competent colleagues and institutions. The questionnaires were filled in carefully, and often provided with additional comments on individual idioms. Several collaborators have taken a lot of trouble to verify their information via investigations on the Internet or text corpus analyses. Often, shortcomings in dictionaries were mentioned when quite familiar idioms had not been included in them. For several minor and minority languages, it was not possible to refer to texts on the Internet or to phraseological dictionaries. In these cases, therefore, the informants´ answers are a unique and particularly valuable source.

Current state of the questionnaire surveys
To date, we have received data for more than 90 languages and dialects. Indo-European is represented almost exhaustively with 54 of the languages accessible to research. The same holds for Maltese, a Semitic language, and Basque, an isolated language. Unfortunately, results for the Finno-Ugric and Turkic languages spoken in Europe are not as comprehensive and still need substantial support. Apart from Hungarian, Finnish, Estonian (partly also Karelian, Vepsian, Udmurt, Mari, Komi-Zyrian, Moksha Mordvin, Erzya Mordvin and Inari Saami), on the one hand, and Turkish, Tatar, Karaim, Kazakh and Azerbaijani on the other, there are no representatives for these phyla. There is currently no access to the idioms of a large number of languages in the northeast and southeast of Europe. Of the Caucasian languages, only Georgian takes part in the project. Data for Yiddish and Esperanto are also available.

The project should include as many European languages as possible (to the extent to which they are accessible to linguistic research, see examples with maps). Therefore, it has to rely on the assistance of just as many native speakers and linguists interested in phraseology.

Every kind of collaboration is welcome, especially for the languages still missing in the project!


The second volume of the "Lexicon of Common Figurative Units" will be published in 2015. It goes without saying that in this book all participants will be mentioned with thanks.

