The example of multilingual societies in India and the Americas led to the emergence of the concept of “minor language” in sociolinguistics in the 1960s. A set of criteria, modified several times, among them the number of speakers, the admission as official language as well as the use as a medium of instruction or typifications as vernacular, standard, classical language, pidgin or creole should facilitate systematic comparisons between speakers of different languages.
From a historiographical perspective, the concept of “minor” language is noteworthy with regard to the self-perception of individual actors, especially during the 19th and 20th centuries: The perception of “their” language as marginalized and/or threatened was often an important driving force for their national or regional commitment.
However, language activists often had to struggle with one main problem: Usually, for “minor” languages there was no binding standard, i.e., no consensus had yet been reached on a dialect or dialect group which could be used as a basis for linguistic development and codification. These ambiguities at morphological and syntactic levels were in turn reflected in the heterogeneous spelling of “minor” languages, which, depending on regional or sociocultural affiliation, often borrowed from neighboring “major” languages. This could result in the parallel use of several alphabets, as we can see in the case of Belarusian and Ukrainian, for which both “Polish” and “Russian” spellings can be observed during 19th (and early 20th) century.
This brings us to the source-related, research-practical dimension that will also be part of the workshop. AI-supported text recognition in the area of print (OCR) and manuscripts (HTR) plays an important role in humanities that can hardly be overestimated. However, it is often overlooked that the existing technologies are developed on the example of “major” languages, such as English, French, Spanish or German, and consequently provide more than unsatisfactory results for “minor” languages. How can solutions be developed for problems connected with OCR/HTR for “minor” languages? Frequent change of different languages and vernaculars is characteristic for a lot of handwritten documents; sometimes, even “mixed” texts can be found, as suggested by the well-known example of Belarusian-Russian mixed form of speech (Trasianka). What makes it more complicated is the fact that historians have so far paid little attention to several “minor” languages even when it comes to “analogue” sources so that Romani or the Armeno-Kipčak, which is documented for the Polish-Lithuanian Union State have hardly come to the attention of developers. But even for the better-known “minor” languages, HTR model training is still in its infancy, as Yiddish manuscripts show.
The workshop, which targets historians, linguists, digital humanists as well as scholars with a general interest in cultural history, offers a forum for discussion on how to deal with (formerly) non-standardized “minor” languages in OCR and/or HTR. Discussions seem possible from two angles: On the one hand, from the perspective of the languages themselves: Which “minor” languages do historians work with and in what type of sources do they appear? What language-based difficulties arise when evaluating sources which go beyond an ordinary “comprehension problem”? What role does the context of “major” languages play, for example in the classification of interferences? To what extent is the coexistence of several “minor” languages in one source significant? On the other hand, from the perspective of Digital Humanities: As part of the workshop, a hands-on session is planned in which participants can contribute their own source material if they are interested.
The regional focus of the workshop is on Central and Eastern Europe, i.e., on the European region that has been particularly marked by concerns of “small” groups of speakers against the dominance of “major” ones. However, examples from other regions are highly welcome.
If you are interested, please send a short abstract of your planned presentation (max. 100-150 words) in German or English together with a short CV by June 20, 2023, to: martina.niedhammer@collegium-carolinum.de Selected presenters will receive feedback by June 27, 2023.
Working languages are English and German. Travel and accommodation expenses will be covered after consultation with the organizers.