The final workshop of the project OstData (funded by the German Research Association from 2019 to 2022) focused on aims, topics, and experiences evolving around building up research data infrastructures in a pan-European perspective with a close look on projects and initiatives in Central and Eastern Europe. Observing that the future of Area Studies also depends on the digital accessibility of data, the workshop’s goals were to discuss strategies to strengthen the inclusion of research communities in Central, Eastern and Southeastern Europe in efforts to build pan-European research data infrastructures, and to present ideas for increasing recognition of research data publication in the humanities and social sciences.
The first session discussed measures to promote and strengthen the publication culture of research data. All presenters mentioned shortcomings of the traditional way of publishing research results, but also pointed out challenges of new forms of scientific publishing and collaboration. JESSIE LABOV (Budapest) started her talk about online research collaboration with questions on how the pandemic situation effects academic networking and mobility. She called for a “fundamental rethinking of our collaboration practices” in order to establish new ways of online collaboration and to keep these new approaches even after the pandemic. Labov proposed the following three tools as possible solutions for productive online cooperation: parallel Archive as a repository and distribution platform, Ranke.2 with a focus on digital source criticism, and nodegoat as a collaborative data modeling environment. Furthermore, she argued for the importance of well-designed data preservation in research projects to better enable the re-use of data between projects, highlighting this point by a case study where data from a former project was re-purposed in the NEP4DISSENT project, thus “upcycling” legacy data by mechanisms of networking, data sharing, and collaborative data curation.
KARL GROSSNER (Pittsburgh) dealt with the World Historical Gazetteer (WHG) as a possible platform for new ways of publishing historical geodata. He pointed to the tagline of the WHG, “linking knowledge about the past via place”, and showed typical use cases like publishing smaller, specialized gazetteer datasets as especially relevant for establishing a new publication culture. As Grossner noted, there is already a strong interest in datasets covering Eastern Europe in the WHG, and to further strengthen their inclusion, the WHG is currently building a so-called focus domain towards the collection of datasets on Russia, Eastern Europe, and Eurasia.
LARS WIENEKE (Luxembourg) presented the new Journal of Digital History. He pointed out that traditional journals are interested in publishing the findings of historical research, but not in the methodological reflections regarding data preparation and analysis as well as the resulting data itself. Thus, the objective of the new digital journal is to make data visible and also accessible in order to tackle this issue. The Journal of Digital History provides the interconnection of a narrative, a hermeneutic and a data layer as a solution for combining data and code with methodological reflection and the actual text in a new publication format. Wieneke reported that the application of Jupyter notebooks to integrate computer code and data into the more traditional text was much more accepted in the community than initially expected by the project team. However, there are still challenges for the journal’s peer-review process, namely the complex publication format and the long-term preservation of the runtime environment in order to keep the code running five years after publishing and beyond.
In a roundtable, different strategies to the challenges of creating digital research infrastructures on national and European level were discussed. The participants presented different approaches to the question of centralized/de-centralized and bottom-up/top-down strategies to develop and strengthen research data infrastructures, but generally emphasized the necessity of cross-border cooperation between existing and emerging projects in order to achieve a pan-European interconnected digital research infrastructure.
RENÉ BUCH (Brussels) stressed the importance of a broad European collaboration in order to be able to compete scientifically on a global level and pointed to the Nordic countries as a best case example on how collaboration can be a way to “punch above one’s own weight”. In a similar way, regional cross-border cooperation could be also a solution for Central and Eastern European states, e.g. the Baltic or the Visegrád States to build up competitive digital research infrastructures.
PETER HASLINGER (Marburg) gave an insight into the Nationale Forschungsdateninfrastruktur (NFDI), the National Research Data Infrastructure in Germany. He described it as a result of a “painful learning process”, based on the experience that many earlier research (data) infrastructures were funded only for short time and failed to establish themselves permanently. Haslinger also observed that the collaborative application process of the NFDI-consortium NFDI4memory was already the result of growing awareness regarding the importance of research data within the community of historians, starting to close the existing gap between traditionally working historians and historians familiar with digital methods and tools.
ANA PROYKOVA (Sofia) underlined the importance of applying efforts collaboratively, due to the high costs and long timeframe needed for the installation and upkeep of digital research infrastructures, especially pan-European ones. Proykova further emphasized the necessity to create high-quality and reproducible data in order to facilitate its frequent re-use now and for later generations. For this purpose, publication cultures and incentive systems in all scientific disciplines should change by the introduction of additional credits or funding for researchers who share their data as well as a higher reputation culture for data publications.
Four research data repositories with transregional and interdisciplinary profiles presented their specific approaches to data sharing in a marketplace. Although the projects differed in regional and disciplinary scope, all speakers pointed out to the necessity of active and collaborative data re-use and data sharing, especially with a broader public domain, to ensure the endurance of research data and their respective infrastructures. Vice versa, research data infrastructures can be a mean to facilitate and amplify transregional and interdisciplinary research.
ARNOŠT ŠTANZEL (Munich) presented the OstData-project. With a modular and subject-specific network structure for long-term archiving, publishing, and discovery of research data OstData plans to open up Europe-wide by aggregating research data and metadata from Germany, providing it for interested services in and over Europe, and harvesting metadata from repositories in Central, Eastern and Southeastern Europe.
KAREL BERKHOFF (Amsterdam) presented the transnational European Holocaust Research Infrastructure (EHRI). EHRI’s goal is to advance the integration and accessibility of Holocaust archives and collections by linking these, and thereby to overcome the fragmentation of resources and to accelerate the digital transformation of Holocaust research. To fill the platform with life, EHRI is actively building new networks of transnational research and archival communities by so-called regional hubs, which already exist for the Baltic States, Eastern Europe, Russia, Central Europe, and Southern Europe. Berkhoff stressed that this “in-person interaction” is equally or even more important compared to building the digital infrastructure itself.
GJIS KESSLER (Amsterdam) presented the Electronic Repository of Russian Historical Statistics (RiStat), a bilateral Russian-Dutch collaboration. He showed how standardized data-sets on Russian demographics, production, and output for the time of 1800-2000 are made available. The project’s starting point was the fact that Russian demographical data is of high quality but difficult to access, and that sharing the data can initiate new research projects on Russian history with regional or transnational perspectives. One successful move to boost the re-use of the data was to provide data and documentation in a bilingual way (English and Russian).
Finally, Atlas Fontium of the Polish Academy of Sciences was presented by TOMASZ PANECKI and BOGUMIŁ SZADY (Warsaw). The repository is a space for storing, gathering and sharing historical spatial data for early modern Poland. The presenters stressed that data should be open to everybody, but curated through peer review processes to ensure high quality. The future prospect is a HGIS Geoportal of Poland with maps and source editions, and an extension of the regional focus to Central Europe.
The second session dealt with the development of a pan-European data culture and comprised presentations with a regional focus on projects from Slovenia, Poland, and the Czechia. All talks pointed out to the necessity of building national-based research data infrastructures to aggregate data, metadata and in that way to contribute to European initiatives.
JANEZ ŠTEBE (Ljubljana) presented the Slovenian Social Science Data Archives (ADP), which defines itself as an aggregation infrastructure for research data from the field of Social Sciences for potential reuse scenarios in national and international contexts. Based on his experience, Štebe problematized the current data sharing principles in the European scientific community, as personal reputation is still not credited for this kind of publications. He also addressed repositories, who may be too selective in the way they would like research data to be published (e.g. in respect to metadata standards or needed authority files).
LARISSA SAAR and PATRICK PIEL (Bonn) gave insights into the pan-European project “OPERAS” and the presenters’ work to improve open scholarly communication for social sciences and humanities in the European research area. Both pointed out how upgrading existing publishing services is a cornerstone for integrating national perspectives into the European Open Science Cloud (EOSC). To give practical insights into the OPERAS-structure in East European countries, MAGDALENA WNUK (Warsaw) joined the talk and emphasized the need of information on Open Access and its paradigms for the Polish scientific community, as there is still not as much of it if compared to Western Europe – an observation, that counts for most countries in Central, Eastern and Southeastern Europe.
Finally, PAVEL STRAŇÁK (Prague) talked about LINDAT/CLARIAH-CZ’s services for research data and digital tools in the field of the humanities, especially linguistics. Straňák highlighted the important role of API-interfaces for implementing Lindat-Services into other European infrastructures and to make the services easily accessible and usable to scholars by showing the example of the “UDPipe”, a software for analyzing natural language text including machine translation.
The workshop showed examples on how to provide successful research data infrastructures on and in Central and Eastern Europe. Yet, it also pointed at the huge challenges laying ahead. For example, long-term archiving of research data (including software) is still an open issue. Even more important, the participants ensured, building digital infrastructures means not only servers and code, but essentially the contribution of the involved communities as well.
Session 1: Measures to promote and strengthen the publication culture of research data
Chair: Gudrun Wirtz (Munich)
Jessie Labov (Budapest): Online Research Collaboration: from Workaround to Best Practice
Karl Grossner / Susan Grunewald (Pittsburgh): World Historical Gazetteer: Research Geodata Infrastructure
Lars Wieneke (Luxembourg) / Rabea Rittgerodt (Berlin): From Prototype to Infrastructure. Data Driven Publishing and the Journal of Digital History
Roundtable: A Digital Eastern Partnership? The Role of Transnational Research Infrastructures
Chair: Ulf Brunnbauer (Regensburg)
Discussants: René Buch (Brussels), Peter Haslinger (Marburg), Ana Proykova (Sofia)
Chair: Tilmann Tegeler (Regensburg)
Arnošt Štanzel (Munich): OstData
Karel Berkhoff (Amsterdam): European Holocaust Research Infrastructure
Gijs Kessler (Amsterdam) / Andrei Markevich (Moscow): RiStat
Tomasz Panecki / Bogumił Szady (Warsaw): Atlas Fontium
Session 2: Developing and Integrating a Pan-European Data Culture
Chair: Maren Röger (Leipzig)
Janez Štebe (Ljubljana): The Integration of Slovenian Social Science Data Archives (ADP) into European Data Space
Larissa Saar / Patrick Piel (Bonn): Open Scholarly Communication in the Social Sciences and Humanities – the OPERAS Infrastructure
Pavel Straňák (Prague): Building the Czech CLARIN Centre LINDAT/CLARIAH-CZ. Development of a Pan-European Research Data Infrastructure from the Czech Perspective
http://www.parallelarchive.org/; https://ranke2.uni.lu/; https://nodegoat.net/.