Cultural and scientific data cannot be understood without knowledge about the provenance (the origin, context or history). Provenance provides a critical foundation for assessing authenticity, enabling trust, and allowing reproducibility. Provenance metadata are data describing objects, people, places, times which are causally related by events. They are event centric and must be described in a historical order to ensure that there are no references to non-existent (non-recorded) events or objects.

Type: Framework
Property: Data management

D4.9 Guidelines on the use of Translation Memories in survey translation

Task 4.3 in WP4 Innovations in Data Production of the SSHOC project is dedicated to Applying ComputerAssisted Translation tools in Social Surveys. A key activity of this task is to incorporate newly created Translation Memories (TMs) from a corpus, which has been developed in Task 4.2 (Preparing tools for the use of Computer Assisted Translation), into an open-source computer-assisted translation (CAT) environment. Moreover, this report lays out a test case to demonstrate the feasibility of the usage of TMs within a CAT environment.

D4.7 Code for data exchange between TMT and open-source CAT software

This document is a report accompanying the SSHOC D4.7 Code for data exchange between TMT and open- source CAT software. The team has explored possibilities for data exchange between TMT and CAT tools, specifically MateCat and MyMemory, finding three areas where such a connection would be worthwhile to develop. As a first exploration, the team has focussed on TMT and MyMemory for single segment translation suggestions, resulting in the development of a demo tool.

The Automatic Verification Tool (AVT) enables the user to verify translations using Bilingual Word Embeddings and to report to the translators a set of translated questions to be re-checked.

The AVT imports the questions and make use of a trained Bilingual Word Embeddings model. It generates the 10 best foreign language translations of each English word.

Type: Demonstrator
Property: Processing & analysis
Accessible at: github
Demo

MS17 Open source CAT TM software selected

This report documents the selection criteria of an open source Computer Assisted Translation tool with Translation Memory functionalities that will be used in the translation research activities of Task 4.3. of the SSHOC project. The TAsk team describes the role of the milestone in the Task and the means of verification.

D4.3 Survey specific parallel corpora

This document describes the [MCSQ]: Multilingual Corpus of Survey Questionnaires (MCSQ), a database of survey questionnaires’ texts. The report summarizes technical information about Version 1.0 (Ada Lovelace) of the MCSQ, dated in June 2020. It links to the repository to access the code and files generating the database.

D4.19 Mapping of two indicative selected standards to the SSHOCro

This report documents the work undertaken within project Task 4.7 Modeling the SSHOC data life cycle and describes the process of mapping social science research metadata standards DDI Codebook and CMDI  to the SSHOC Reference Ontology (SSHOCro). The resulting mapping rules are also documented.