SlaVaComp-Metaglossar

Technology
Data genesis

Some of the basic information on the creation and technical background of the glossary can be found on the TextGrid:

  • TextGrid: COMPutergestützte Untersuchung von VAriabilität im KirchenSLAvischen [ppt]
  • TextGrid: Auf dem Weg zu einem kirchenslawischen Meta-Glossar [ppt]
  • SlavDok: Abschlussbericht für das BMBF-Projekt »SlaVaComp – COMPutergestützte Untersuchung von VAriabilität im KirchenSLAvischen« (2016) [pdf]
Data revival 2023

In 2023, the data was transformed from object-oriented TEI-XML to relationally-oriented SOLR-XML, which allowed dynamic indexing with SOLR.

TEI XML conversion to SOLR-XML:
  • The data corpus contains 23825 lemma entries, 6542 variants, 21500 lemma citations, 23818 hyperlemma, 4635 variant citions, informations to grammar and source [September 2023]
  • Conversion from TEI-XML to SOLR-XML - overview: [github]
  • Conversion from TEI-XML to SOLR-XML - conversion script: [github]
  • New/old data structure: [jpg], [gv]
  • Example for converted files: from TEI-XML-test to SOLR-XML-test.
  • "managed-schema" (SOLR) for dynamic indexing: [xml] and the Solr character mapping file.