Vocabular Hub Demonstrator DeployEMDS

This repository contains the demonstrator of the vocabulary hub created for the deployEMDS project.

It provides the following interfaces:

Data Portal: Loads DCAT-AP feeds and contained datasets, ad enables their mapping into RDF using YARRRML.
RDF Portal: Displays the dataset distributions available as RDF, their linked profiles, and the option to export them according to a certain profile using the alignment pipelines.
Alignment Pipelines: Displays the current alignment pipelines based on SPARQL Construct queries available in the system, as well as the option to add additional pipelines by providing a SPARQL Construct query to the system.
Dataset Profile Registry: Provides an overview of the loaded dataset profiles, and their connected datasets and pipelines in the system

Preparatory steps

Prior to running the demo, we need to setup some functionality via docker

Running the YARRRML mapping docker

To run the local YARRRML mapping service, run the following docker compose script as docker-compose.yml

services:
  yarrrml-map:
    image: ghcr.io/dexagod/yarrrml-to-rml-service-docker:latest
    ports:
      - "3000:3000"
    environment:
      - PORT=3000
      - DEFAULT_SERIALIZATION=nquads

Running the Oxigraph docker

To run the local Oxigraph service, run the following docker compose script as docker-compose.yml

services:
  oxigraph:
    image: oxigraph/oxigraph:latest
    container_name: oxigraph
    ports:
      - "7878:7878"
    command: ["serve", "--bind", "0.0.0.0:7878", "--location", "/tmp/oxigraph", "--cors"]
    restart: "no"

Running the demonstrator flow

Data Portal

Adding Data Feeds

To add an (LDES) DCAT-AP feed to the system

Navigate tot he Data Portal page
Click "Add Feed" button
Add the URL of the (LDES) DCAT-AP feed
The system will automatically discover and list available datasets

For the demo, the feed at https://pod.rubendedecker.be/scholar/projects/deployEMDS/feeds/results-feed was added, for which the appropriate mappings have been pre-filled in the input fields.

Notes:

For the demo, added feeds and pipelines are stored internally in the webpage, and will need to be re-loaded when re-launching the application. When adding a feed, please select "Traffic Counting DCAT-AP Feed" as the target feed, and reload the webpage after doing the mapping, since there is a small loading issue that I will try to fix still.

1.2 Browsing Datasets

Once feeds are added, you can:

Search datasets by title, description, or publisher
Filter the datasets on keywords
Select datasets for mapping

1.3 Mapping Datasets

Now, the loaded datasets that are not yet published in an RDF format, can be mapped to RDF using a YARRRML -> RML -> RDF pipeline in the Dataset RML Mapping component.

Select the dataset(s) to map in the Dataset Browser component.
Add a YARRRRML mapping
Select a mapping service to perform the YARRRML -> RML -> RDF conversion for your chosen dataset(s)
Add a mapping target location, where the mapped RDF dataset should be POSTed.
Select the feed to which this new distribution of the dataset should be added.
Either select a profile, or create a new profile, under which the resulting RDF document is published
After updating the feed, it is refreshed to load the new dataset.

Notes:

The current implementation treats every input resource as a source "data.json". This is for demo purposes, and a solution should be found as to how loaded resources can automatically map to the defined sources in avaialble mappings.

The mapping service can be found at https://github.com/Dexagod/yarrrml-to-rml-service-docker. You can run this locally and use the default URL.

2 RDF Portal

2.1 Selecting datasets

Similarly to the Data Portal page, datasets can be filtered using:

Search on datasets by title, description, or publisher
Feed using the feed component
Filtering on the used keywords
Selection of the resulting datasets in the dataset browser component.

2.2 Exporting datasets

Exporting the chosen datasets, is done with the Export Datasets component. Here, the selected datasets are exported by

loading the selected datasets
(optional) mapping the resulting datasets into the target profile using the available pipelines
loading the resulting datasets and mappings into the target graphstore (default URL is setup for a local oxigraph service hosted in docker)
the target named graph in whcih the resulting datasets should be loaded can be changed, or left on the default graph

3 Alignment pipelines

3.1 Adding a pipeline

Select the "Add pipeline" button in the Pipeline Sources component
Enter a name for the new pipeline
Select a source and target profile between which the pipeline performs an alignment
Select the pipelines feed to which the new pipeline should be added
Add relevant keywords
Add the SPARQL Construct query that performs the profile alignment
Select "Add pipeline" to finish the process

3.2 Filtering pipelines

The Pipeline Browser component enables the browsing of the available pipelines in the vocabulary hub feeds. This component shows the source and target profiles of the pipeline, as well as the used SPARQL Construct query to perform the alignment.

Notes:

Since the alignment happens through a docker container performed at the client or dataspace service, other methods than SPARQL Construct can be employed for this alignment.

4. Dataset Profile Registry

This page keeps track of the dataset profiles used in the published datasets and alignment pipelines. The concept of a Dataset Profile is used to provide a comprehensive description of a dataset, based on the availability of both the used ontologies, and associated SHACL shape assigned to the contents of a dataset.

The Vocabulary Hub

The Vocabulary Hub operates on a point between the client and server. The server component of the Vocabulary Hub keeps track of a set of "feeds", that are persisted, maintained and updated on the server. This includes the tracked dcat-ap feeds, dataset profile alignment pipeline feeds, and any other data that should be persisted at ecosystem level.

In terms of performing alignments, based on the availability of semantic data, dataset profile metadata and alignment pipelines, this can happen both at the edge by the client, or by distributed services available in the data ecosystems.

The resulting resources of RML mapping or Semantic Alignment mapping processes can be re-published to the data space as alternative distributions of the same datasets using DCAT.

Mapping to data spaces components

The role of the Vocabulary Hub is to work in tandem with the existing data space actors to facilitate the publishing and integration of semantically rich data in the data space. This demonstrator centralized different parts of this process into a single Web interface, that can be separated into different components in the data space.

Data portal The data portal interface loading the dcat-ap feeds in the ecosystem represents the role of the data catalog in the data spaces ecosystem. Here, datasets are added, shared and published. The mapping service represents a data published (or automated service in the data catalog) doing a semantic mapping of a published dataset, and publishing this mapped semantic dataset as a DCAT distribution of the original dataset, while including information about the semantic mapping as a dataset profile, which can either be pushed directly to a vocabulary hub component, or can be pulled indirectly by the vocabulary hub from the catalog pushing this metadata.
RDF portal The rdf portal interface provides an overview of the semantically enriched datasets available in the vocabulary hub, linking their used dataset profiles, and allowing the exporting of the available datasets based on a target profile description and the available alignment pipelines. This represents the combined role of multiple components: the data catalog storing the dataset metadata from which distributions of relevant dataset in an RDF format are retrieved, the vocabulary hub where the dataset profiles and alignments (in this case SPARQL Construct queries) between these profiles are stored, and the data consumer that runs imports the datasets and executes the alignments according to their data requirements.
Alignment pipelines The alignment pipelines interface provides an overview of the alignment information that is registered in the vocabulary hub. The execution of these pipelines takes the form of the data consumer retrieving the SPARQL Construct queries used to convert from a source to a target profile, and execute them over the source inputs from the data catalog according to the pipeline source and target profiles, and insert the resulting RDF in the aligned profile in their local graph store.
Dataset profile registry The dataset profile registry gives an overview of the used profiles in the dataset metadata and pipelines available in the data space. These can be persisted in the vocabulary hub service, or discovered ad hoc by processing the dataset and alignment metadata available in the vocabulary hub and data catalog.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
index.html		index.html
mapping.example.yml		mapping.example.yml
package-lock.json		package-lock.json
package.json		package.json
run.sh		run.sh
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vocabular Hub Demonstrator DeployEMDS

Preparatory steps

Running the YARRRML mapping docker

Running the Oxigraph docker

Running the demonstrator flow

Data Portal

Adding Data Feeds

1.2 Browsing Datasets

1.3 Mapping Datasets

2 RDF Portal

2.1 Selecting datasets

2.2 Exporting datasets

3 Alignment pipelines

3.1 Adding a pipeline

3.2 Filtering pipelines

4. Dataset Profile Registry

The Vocabulary Hub

Mapping to data spaces components

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vocabular Hub Demonstrator DeployEMDS

Preparatory steps

Running the YARRRML mapping docker

Running the Oxigraph docker

Running the demonstrator flow

Data Portal

Adding Data Feeds

1.2 Browsing Datasets

1.3 Mapping Datasets

2 RDF Portal

2.1 Selecting datasets

2.2 Exporting datasets

3 Alignment pipelines

3.1 Adding a pipeline

3.2 Filtering pipelines

4. Dataset Profile Registry

The Vocabulary Hub

Mapping to data spaces components

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages