Technology
Data genesis
Some of the basic information on the creation and technical background of the glossary can be found on the TextGrid:
- TextGrid: COMPutergestützte Untersuchung von VAriabilität im KirchenSLAvischen [ppt]
- TextGrid: Auf dem Weg zu einem kirchenslawischen Meta-Glossar [ppt]
- SlavDok: Abschlussbericht für das BMBF-Projekt »SlaVaComp – COMPutergestützte Untersuchung von VAriabilität im KirchenSLAvischen« (2016) [pdf]
Data revival 2023
In 2023, the data was transformed from object-oriented TEI-XML to relationally-oriented SOLR-XML, which allowed dynamic indexing with SOLR.
TEI XML conversion to SOLR-XML:
- The data corpus contains 23825 lemma entries, 6542 variants, 21500 lemma citations, 23818 hyperlemma, 4635 variant citions, informations to grammar and source [September 2023]
- Conversion from TEI-XML to SOLR-XML - overview: [github]
- Conversion from TEI-XML to SOLR-XML - conversion script: [github]
- New/old data structure: [jpg], [gv]
- Example for converted files: from TEI-XML-test to SOLR-XML-test.
- "managed-schema" (SOLR) for dynamic indexing: [xml] and the Solr character mapping file.