diff --git a/pyst_client/example/Simple client library guide.ipynb b/pyst_client/example/Simple client library guide.ipynb new file mode 100644 index 0000000..91ab020 --- /dev/null +++ b/pyst_client/example/Simple client library guide.ipynb @@ -0,0 +1,1063 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "6e9ec945-0fd6-4c10-a926-6ca79dda6ac1", + "metadata": {}, + "source": [ + "# PyST Simple Client\n", + "\n", + "The `pyst_client.simple` is a Python client library for simplifying the use of [PyST servers](https://docs.pyst.dev/), which use JSON-LD as their transport serialization format. This format is both verbose and bureaucratic, and `pyst_client.simple` does a lot of the paperwork for you." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "id": "4360c28f-a8b2-4129-a196-4791e45f5fc0", + "metadata": {}, + "outputs": [], + "source": [ + "from pyst_client.simple import *\n", + "import httpx, csv, io\n", + "from tqdm import tqdm" + ] + }, + { + "cell_type": "markdown", + "id": "4b63e360-61d3-4454-b3c7-e007b5dc3196", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "We first need to setup the client. This means we need to specify the following:\n", + "\n", + "* PyST server URL base path: `settings.set_server_url()`, e.g. `settings.set_server_url(\"https://vocab.bonsai.uno\")`. Trailing slash is optional.\n", + "* PyST API authentication key: `settings.set_api_key(`, e.g. `settings.set_api_key(\"supersecret\")`.\n", + "* Default creation language. This must be a [RFC 3987](https://datatracker.ietf.org/doc/html/rfc3987) language code, and should be one of the PyST server configured languages. All multilingual strings without language codes will use this language: `settings.set_language()`, e.g. `settings.set_language(\"es\")`.\n", + "* Creation base URL. The URL used as a base path for your object's IRIs when using automatic IRI generation: `settings.set_creation_base_url()`, e.g. `settings.set_creation_base_url(\"https://awesome.namespace.com\")`\n", + "* Default creator IRI. Will be used as a fallback default for all created objects when `creator` is not specific: `settings.set_creator()`, e.g. `settings.set_creator(\"https://valentin.stargazer\")`.\n", + "\n", + "## Data creation\n", + "\n", + "We follow the [PyST data model](https://docs.pyst.dev/data-model/), and won't go into detail on the structure or individual attributes here.\n", + "\n", + "The API code is pretty readable - refer to it for more information on each method. For most object classes, you can do the following:\n", + "\n", + "* `object_class.create(args)`\n", + "* `object_class.get_one(args)`\n", + "* `object_class.get_many(args)`\n", + "* `object_instance.save()`\n", + "* `object_instance.delete()`\n", + "\n", + "### Concept schemes\n", + "\n", + "The `ConceptScheme.create` method will create a `ConceptScheme` in memory, but *won't persist it to the server*." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "55515f82-afb0-441d-ae60-3fea674691e4", + "metadata": {}, + "outputs": [], + "source": [ + "cs = ConceptScheme.create(\n", + " pref_labels=[\"Central Product Classification\"],\n", + " version=\"2.1\",\n", + " notations=[\"CPCv2.1\"],\n", + " definitions=[\"CPC constitutes a comprehensive classification of all goods and services. CPC presents categories for all products that can be the object of domestic or international transactions or that can be entered into stocks. It includes products that are an output of economic activity, including transportable goods, non-transportable goods and services. CPC, as a standard central product classification, was developed to serve as an instrument for assembling and tabulating all kinds of statistics requiring product detail. Such statistics may cover production, intermediate and final consumption, capital formation, foreign trade or prices. They may refer to commodity flows, stocks or balances and may be compiled in the context of input/output tables, balance of payments and other analytical presentations. The CPC classifies products based on the physical characteristics of goods or on the nature of the services rendered. CPC was developed primarily to enhance harmonization among various fields of economic and related statistics and to strengthen the role of national accounts as an instrument for the coordination of economic statistics. It provides a basis for recompiling basic statistics from their original classifications into a standard classification for analytical use.\"],\n", + " creators=[{\"@id\": \"https://unstats.un.org/\"}]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "e88f9b1a-b16f-4ff7-a625-88718c5e6866", + "metadata": {}, + "source": [ + "Our IRI is automatically generated based on the creation base URL. My creation base URL was `https://ninja.space`:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "8faa1a66-726b-472c-be0b-d9283399a07a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'https://ninja.space/CPCv2.1'" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cs.id_" + ] + }, + { + "cell_type": "markdown", + "id": "bfc76fdb-a645-4818-8d76-a8bea30433cf", + "metadata": {}, + "source": [ + "If you want to create IRIs yourself, use `.create(..., id_=my_iri)`.\n", + "\n", + "The `.save()` method will try to create the object on the PyST server, and then switch to updating it if it already exists. It you know it exists already, you can use `.save(already_exists=True)` to save a bit of time." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "b9581b17-5d30-4670-8b4a-5e4513c5c6d7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[2m2025-05-07 08:29:57\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mServer URL http://192.168.1.137:8000 successfully loaded from secrets directory\u001b[0m\n", + "\u001b[2m2025-05-07 08:29:57\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mDefault language `en` successfully loaded from secrets directory\u001b[0m\n", + "\u001b[2m2025-05-07 08:29:57\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mServer URL `http://192.168.1.137:8000` is healthy and reachable\u001b[0m\n", + "\u001b[2m2025-05-07 08:29:57\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAPI key successfully loaded from secrets directory\u001b[0m\n", + "\u001b[2m2025-05-07 08:29:57\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCreation base URL successfully loaded from secrets directory\u001b[0m\n" + ] + }, + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cs.save()" + ] + }, + { + "cell_type": "markdown", + "id": "f9a11e8e-aca1-4bc0-8475-cc5a76e664b6", + "metadata": {}, + "source": [ + "We can change the `ConceptScheme` data and save the changed object:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "3f274b8d-a78b-4e45-9b9d-11e63ea8807e", + "metadata": {}, + "outputs": [], + "source": [ + "cs.notations = [\n", + " {\n", + " '@value': 'CPCv2.1',\n", + " '@type': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#PlainLiteral'\n", + " },\n", + " {\n", + " '@value': 'CPC-latest',\n", + " '@type': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#PlainLiteral' # Required for notations\n", + " },\n", + "]" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "e836eef4-425e-4fcd-a396-7212347072f7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cs.save(already_exists=True)" + ] + }, + { + "cell_type": "markdown", + "id": "cf332e04-b09c-40bb-939c-6ee972c99482", + "metadata": {}, + "source": [ + "### Concepts\n", + "\n", + "We can now populate our `ConceptScheme` with `Concept` objects, following the same pattern as with `ConceptScheme`:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "9ca23d36-c2a1-400b-8964-98d0878682dc", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'https://ninja.space/CPCv2.1/0'" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "concept_0 = Concept.create(\n", + " concept_scheme=cs,\n", + " pref_labels=[\"Agriculture, forestry and fishery products\"],\n", + " notations=[\"0\"],\n", + " extra={\"http://rdf-vocabulary.ddialliance.org/xkos#depth\": 1},\n", + " top_concept=True,\n", + ")\n", + "concept_0.id_" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "42da8b90-bab0-4d41-9b40-4a6b2e8bc159", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "concept_0.save()" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "ad8920a3-dc52-4fda-b111-0d32f2aa8489", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'https://ninja.space/CPCv2.1/01'" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "concept_01 = Concept.create(\n", + " concept_scheme=cs,\n", + " pref_labels=[\"Products of agriculture, horticulture and market gardening\"],\n", + " notations=[\"01\"],\n", + " extra={\"http://rdf-vocabulary.ddialliance.org/xkos#depth\": 2}\n", + ")\n", + "concept_01.id_" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "74fa6929-0317-4128-9d49-54962810c151", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "concept_01.save()" + ] + }, + { + "cell_type": "markdown", + "id": "b9158ffb-60d1-4f0e-bb85-1ed883c32f5c", + "metadata": {}, + "source": [ + "We can now ask the `ConceptScheme` about its concepts:" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "27c25b88-a0db-475c-bcba-f37a2426d114", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Concept(id_='https://ninja.space/CPCv2.1/0', types=['http://www.w3.org/2004/02/skos/core#Concept'], pref_labels=[{'@value': 'Agriculture, forestry and fishery products', '@language': 'en'}], status=[{'@id': 'http://purl.org/ontology/bibo/status/accepted'}], notations=[{'@type': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#PlainLiteral', '@value': '0'}], definitions=[], change_notes=[], history_notes=[{'http://purl.org/dc/terms/issued': [{'@type': 'http://www.w3.org/2001/XMLSchema#date', '@value': '2025-05-07T00:00:00'}], 'http://purl.org/dc/terms/creator': [{'@id': 'https://chris.mutel.org/'}], 'http://www.w3.org/1999/02/22-rdf-syntax-ns#value': [{'@value': 'Created by pyst-client version 1', '@language': 'en'}]}], editorial_notes=[], extra={'http://rdf-vocabulary.ddialliance.org/xkos#depth': 1}, schemes=[{'@id': 'https://ninja.space/CPCv2.1'}], alt_labels=[], hidden_labels=[], top_concept_of=[{'@id': 'https://ninja.space/CPCv2.1'}]),\n", + " Concept(id_='https://ninja.space/CPCv2.1/01', types=['http://www.w3.org/2004/02/skos/core#Concept'], pref_labels=[{'@value': 'Products of agriculture, horticulture and market gardening', '@language': 'en'}], status=[{'@id': 'http://purl.org/ontology/bibo/status/accepted'}], notations=[{'@type': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#PlainLiteral', '@value': '01'}], definitions=[], change_notes=[], history_notes=[{'http://purl.org/dc/terms/issued': [{'@type': 'http://www.w3.org/2001/XMLSchema#date', '@value': '2025-05-07T00:00:00'}], 'http://purl.org/dc/terms/creator': [{'@id': 'https://chris.mutel.org/'}], 'http://www.w3.org/1999/02/22-rdf-syntax-ns#value': [{'@value': 'Created by pyst-client version 1', '@language': 'en'}]}], editorial_notes=[], extra={'http://rdf-vocabulary.ddialliance.org/xkos#depth': 2}, schemes=[{'@id': 'https://ninja.space/CPCv2.1'}], alt_labels=[], hidden_labels=[], top_concept_of=[])]" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cs.concepts()" + ] + }, + { + "cell_type": "markdown", + "id": "fd82ded5-7902-43a4-b6d2-28a489d705f4", + "metadata": {}, + "source": [ + "We can also see the data on the server:" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "id": "f78a3cba-fd65-45f6-84c0-b4649c7453b9", + "metadata": {}, + "outputs": [], + "source": [ + "cs.open_new_tab()" + ] + }, + { + "cell_type": "markdown", + "id": "21a99a37-55cb-4948-9e7e-807db3551442", + "metadata": {}, + "source": [ + "### Relationships\n", + "\n", + "The two concepts we created are related - `concept_0` is broader than `concept_01`. We can create this relationship on the server.\n", + "\n", + "Note that *relationships* are directly persisted to the server. Relationships can't be modified, only created, and deleted." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "c12e25f8-8db7-4c53-9100-e2b0f6cee36b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[Relationship(source='https://ninja.space/CPCv2.1/01', target='https://ninja.space/CPCv2.1/0', predicate=)]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Relationship.create_many(\n", + " sources=[concept_01],\n", + " targets=[concept_0],\n", + " verbs=RelationshipVerbs.broader,\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "e3d2b326-0fb7-4736-b066-d902cba9a926", + "metadata": {}, + "source": [ + "We can now see the relationship on the server:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "fda7e26c-65ef-4a94-95ca-3e9daa689ebc", + "metadata": {}, + "outputs": [], + "source": [ + "concept_01.open_new_tab()" + ] + }, + { + "cell_type": "markdown", + "id": "b384995f-7013-4be8-948a-40887ec6bf1a", + "metadata": {}, + "source": [ + "## Mass import\n", + "\n", + "Let's do a mass data import. We will take data from the GTDR/BONSAI project. We can start with their data on CPC, as we already have the `ConceptScheme`." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "5c7e33f9-8293-4339-a8f9-63b605941b5c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'code': '0113', 'name': 'Rice', 'parent_code': '011', 'level': '4'}" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "url = \"https://gitlab.com/bonsamurais/bonsai/util/classifications/-/raw/main/src/classifications/data/flow/flowobject/tree_cpc_2_1.csv?inline=false\"\n", + "response = httpx.get(url)\n", + "data = list(csv.DictReader(io.StringIO(response.text)))\n", + "data[10]" + ] + }, + { + "cell_type": "markdown", + "id": "010c286a-cf01-40c7-86ea-74acd6adeb84", + "metadata": {}, + "source": [ + "We first create the `Concept` objects. The `pyst_simple_client` library is synchronous, and does one task at a time. \n", + "\n", + "The `pyst_client` library is asyncronous and can have much better performance, but requires a bit more work as it has fewer helper classes. You would also need to work with the existing event loop if running code in a Jupyter notebook." + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "7ba66a83-56ed-4771-abd7-ec734452d69d", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████████████████████████████████████████████████████| 4597/4597 [04:11<00:00, 18.26it/s]\n" + ] + } + ], + "source": [ + "for row in tqdm(data):\n", + " Concept.create(\n", + " concept_scheme=cs,\n", + " pref_labels=[row[\"name\"]],\n", + " notations=[row[\"code\"]],\n", + " extra={\"http://rdf-vocabulary.ddialliance.org/xkos#depth\": int(row[\"level\"])},\n", + " top_concept=row[\"level\"] == \"0\",\n", + " ).save()" + ] + }, + { + "cell_type": "markdown", + "id": "41021cc5-9065-4c7c-8a81-82b750904187", + "metadata": {}, + "source": [ + "We can bulk-create the relationships. We don't have all the `Concept` object instances, but don't need them - `Relationship.create_many` can take IRI strings.\n", + "\n", + "We already defined one relationship from this data, but our helper function will remove this duplicate. Note that duplicate detection happens on the server, and each request only removes one duplicate, so this isn't efficient for many duplicate entries.\n", + "\n", + "You could also chunk the input data for this function." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "92a69963-ef88-4ba6-8cd7-fe9c129eab32", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[2m2025-05-07 09:48:44\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mSkipping existing relationship between `https://ninja.space/CPCv2.1/0` and `https://ninja.space/CPCv2.1/total`\u001b[0m\n", + "\u001b[2m2025-05-07 09:48:52\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mSkipping existing relationship between `https://ninja.space/CPCv2.1/01` and `https://ninja.space/CPCv2.1/0`\u001b[0m\n", + "\u001b[2m2025-05-07 09:49:00\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mSkipping existing relationship between `https://ninja.space/CPCv2.1/011` and `https://ninja.space/CPCv2.1/01`\u001b[0m\n", + "\u001b[2m2025-05-07 09:49:07\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mSkipping existing relationship between `https://ninja.space/CPCv2.1/0111` and `https://ninja.space/CPCv2.1/011`\u001b[0m\n", + "\u001b[2m2025-05-07 09:49:17\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mSkipping existing relationship between `https://ninja.space/CPCv2.1/01111` and `https://ninja.space/CPCv2.1/0111`\u001b[0m\n", + "\u001b[2m2025-05-07 09:49:27\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mSkipping existing relationship between `https://ninja.space/CPCv2.1/01121` and `https://ninja.space/CPCv2.1/0112`\u001b[0m\n" + ] + }, + { + "data": { + "text/plain": [ + "4560" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sources = [Concept.generate_iri(concept_scheme=cs, notations=[row[\"code\"]]) for row in data if row[\"parent_code\"]]\n", + "targets = [Concept.generate_iri(concept_scheme=cs, notations=[row[\"parent_code\"]]) for row in data if row[\"parent_code\"]]\n", + " \n", + "responses = Relationship.create_many(\n", + " sources=sources,\n", + " targets=targets,\n", + " verbs=RelationshipVerbs.broader,\n", + " timeout=120\n", + ")\n", + "len(responses)" + ] + }, + { + "cell_type": "markdown", + "id": "60141c09-f8a7-478b-a64a-6b6a11da1b8e", + "metadata": {}, + "source": [ + "We can do the same for another `ConceptScheme` - this time the set of BONSAI flow objects:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "c1c07346-5e2a-4aa5-a79a-7613f0478130", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'code': 'fi_01121',\n", + " 'parent_code': 'fi_0112',\n", + " 'in_final_sut': '',\n", + " 'name': 'Maize (corn), seed',\n", + " 'level': '5',\n", + " 'default_unit': 'tonnes',\n", + " 'alias_code': '',\n", + " 'comment': ''}" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "url = \"https://gitlab.com/bonsamurais/bonsai/util/classifications/-/raw/main/src/classifications/data/flow/flowobject/tree_bonsai.csv?inline=false\"\n", + "response = httpx.get(url)\n", + "data = list(csv.DictReader(io.StringIO(response.text)))\n", + "data[10]" + ] + }, + { + "cell_type": "markdown", + "id": "bba4e130-10ea-4765-9df9-a4c722e39da1", + "metadata": {}, + "source": [ + "This has additional attributes. We can just add these to the `extra` section. They could be defined with reference to a proper RDF ontology, but don't need to be." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "id": "08bad665-5e64-4c4f-a49a-194308d211ea", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[2m2025-05-07 12:27:59\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mServer URL http://192.168.1.137:8000 successfully loaded from secrets directory\u001b[0m\n", + "\u001b[2m2025-05-07 12:27:59\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mDefault language `en` successfully loaded from secrets directory\u001b[0m\n", + "\u001b[2m2025-05-07 12:27:59\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mServer URL `http://192.168.1.137:8000` is healthy and reachable\u001b[0m\n", + "\u001b[2m2025-05-07 12:27:59\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mAPI key successfully loaded from secrets directory\u001b[0m\n", + "\u001b[2m2025-05-07 12:27:59\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mCreation base URL successfully loaded from secrets directory\u001b[0m\n" + ] + }, + { + "data": { + "text/plain": [ + "'https://ninja.space/BONSAI2025.1'" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "cs_bonsai = ConceptScheme.create(\n", + " pref_labels=[\"BONSAI Flow Object classification\"],\n", + " version=\"2025.1\",\n", + " notations=[\"BONSAI2025.1\"],\n", + " definitions=[\"BONSAI is a work output of the GTDR project. This classification extends EXIOBASE with much more detail on individual products. Original data from https://gitlab.com/bonsamurais/bonsai/util/classifications/-/blob/main/src/classifications/data/flow/flowobject/tree_bonsai.csv\"],\n", + " creators=[\n", + " {\"@id\": \"https://gitlab.com/mabudz\"},\n", + " {\"@id\": \"https://gitlab.com/matdelpierre\"},\n", + " {\"@id\": \"https://gitlab.com/Albertkwame\"},\n", + " {\"@id\": \"https://gitlab.com/SanderNielen\"},\n", + " ]\n", + ")\n", + "cs_bonsai.save()\n", + "cs_bonsai.id_" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "a04d4a99-9592-4b84-a967-480aad4b5573", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████████████████████████████████████████████████████| 7440/7440 [07:15<00:00, 17.10it/s]\n" + ] + } + ], + "source": [ + "from tqdm import tqdm\n", + "\n", + "for row in tqdm(data):\n", + " Concept.create(\n", + " concept_scheme=cs_bonsai,\n", + " pref_labels=[row[\"name\"]],\n", + " notations=[row[\"code\"]],\n", + " extra={\n", + " \"http://rdf-vocabulary.ddialliance.org/xkos#depth\": int(row[\"level\"]),\n", + " \"in_final_sut\": row[\"in_final_sut\"] == \"True\",\n", + " \"default_unit\": row[\"default_unit\"],\n", + " \"alias_code\": row[\"alias_code\"],\n", + " \"comment\": row[\"comment\"],\n", + " },\n", + " top_concept=row[\"level\"] == \"0\",\n", + " ).save()" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "c4cf3bb7-11ea-4f78-914c-51d2961367c2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "7430" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sources = [Concept.generate_iri(concept_scheme=cs_bonsai, notations=[row[\"code\"]]) for row in data if row[\"parent_code\"]]\n", + "targets = [Concept.generate_iri(concept_scheme=cs_bonsai, notations=[row[\"parent_code\"]]) for row in data if row[\"parent_code\"]]\n", + " \n", + "responses = Relationship.create_many(\n", + " sources=sources,\n", + " targets=targets,\n", + " verbs=RelationshipVerbs.broader,\n", + " timeout=240\n", + ")\n", + "len(responses)" + ] + }, + { + "cell_type": "markdown", + "id": "49e0a78b-1652-4c6f-b6f9-20bf38bfdc2b", + "metadata": {}, + "source": [ + "### Correspondence\n", + "\n", + "We can now define a `Correspondence` between the two `ConceptScheme` objects." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "2ebc0254-536d-4dc9-b3f4-855fde690800", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'https://ninja.space/CPCv2.1-BONSAI2025.1'" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "correspondence = Correspondence.create(\n", + " compares=[cs, cs_bonsai],\n", + " pref_labels=[\"BONSAI Flow Object classification\"],\n", + " version=\"2025.1\",\n", + " definitions=[\"\"],\n", + " creators=[\n", + " {\"@id\": \"https://gitlab.com/mabudz\"},\n", + " {\"@id\": \"https://gitlab.com/SanderNielen\"},\n", + " ]\n", + ")\n", + "correspondence.save()\n", + "correspondence.id_" + ] + }, + { + "cell_type": "markdown", + "id": "1d34349f-e95e-4629-a1c9-8db3105bd087", + "metadata": {}, + "source": [ + "### Associations\n", + "\n", + "`Correspondence` objects are made of concept `Association`. We can read the association data from the BONSAI Gitlab:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "75e7e12f-7aee-4bd0-85aa-4bcb5126b6b4", + "metadata": {}, + "outputs": [], + "source": [ + "import httpx, csv, io" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "c35a0bab-2623-4b18-ba0d-e136d831e752", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'flowobject_from': 'fi_01121',\n", + " 'flowobject_to': '01121',\n", + " 'classification_from': 'bonsai',\n", + " 'classification_to': 'cpc_2_1',\n", + " 'comment': 'one-to-one correspondence',\n", + " 'skos_uri': 'http://www.w3.org/2004/02/skos/core#exactMatch'}" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "url = \"https://gitlab.com/bonsamurais/bonsai/util/classifications/-/raw/main/src/classifications/data/flow/flowobject/conc_bonsai_cpc_2_1.csv?inline=false\"\n", + "response = httpx.get(url)\n", + "data = list(csv.DictReader(io.StringIO(response.text)))\n", + "row = data[10]\n", + "row" + ] + }, + { + "cell_type": "markdown", + "id": "9fd6de61-bbca-49f0-9f49-1658ead0b99b", + "metadata": {}, + "source": [ + "Let's create a single `Association` to show its pattern. Note that `Association` objects can't be updated once created - they need to be deleted and re-created instead." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "7b68eea5-a6f1-4c2e-8b8e-23480e7a0728", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'https://ninja.space/CPCv2.1-BONSAI2025.1/6fbf961723fd4195a44b3149b6d16add'" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "assoc = Association.create(\n", + " correspondence=correspondence,\n", + " source_concepts=[\n", + " # Can also be a `Concept` instance\n", + " {\"@id\": Concept.generate_iri(concept_scheme=cs_bonsai, notations=[row[\"flowobject_from\"]])}\n", + " ], \n", + " target_concepts=[\n", + " {\"@id\": Concept.generate_iri(concept_scheme=cs, notations=[row[\"flowobject_to\"]])}\n", + " ],\n", + " extra={\n", + " \"mapping_type\": row[\"skos_uri\"],\n", + " \"comment\": row[\"comment\"],\n", + " }\n", + ")\n", + "assoc.save()\n", + "assoc.id_" + ] + }, + { + "cell_type": "markdown", + "id": "d53c5a9b-8b5b-42d6-bff6-7d5f0bc7a4b4", + "metadata": {}, + "source": [ + "Creating an `Association` adds it automatically to the corresponding `Correspondence` object. We need to reload this from the server as our version has stale data:" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "id": "3802b82e-ff44-4286-9993-c6b0e4b9d99e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[{'@id': 'https://ninja.space/CPCv2.1-BONSAI2025.1/3584a98600994a23a90f13c1651f2458'},\n", + " {'@id': 'https://ninja.space/CPCv2.1-BONSAI2025.1/6fbf961723fd4195a44b3149b6d16add'}]" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "correspondence = Correspondence.get_one(correspondence.id_)\n", + "correspondence.made_ofs" + ] + }, + { + "cell_type": "markdown", + "id": "f57b389a-1400-4504-bae6-09d54aa9c5f5", + "metadata": {}, + "source": [ + "We can delete this association instance and mass import the file:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "id": "9a62dda8-8d56-4c1c-9b7a-644751cc2d81", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "assoc.delete()" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "id": "b276d1d5-e7ae-4089-9f7c-6581ebbe8180", + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "100%|██████████████████████████████████████████████████████████| 6194/6194 [04:14<00:00, 24.32it/s]\n" + ] + } + ], + "source": [ + "for row in tqdm(data):\n", + " Association.create(\n", + " correspondence=correspondence,\n", + " source_concepts=[\n", + " # Can also be a `Concept` instance\n", + " {\"@id\": Concept.generate_iri(concept_scheme=cs_bonsai, notations=[row[\"flowobject_from\"]])}\n", + " ], \n", + " target_concepts=[\n", + " {\"@id\": Concept.generate_iri(concept_scheme=cs, notations=[row[\"flowobject_to\"]])}\n", + " ],\n", + " extra={\n", + " \"mapping_type\": row[\"skos_uri\"],\n", + " \"comment\": row[\"comment\"],\n", + " }\n", + " ).save()" + ] + }, + { + "cell_type": "markdown", + "id": "0b5643be-b1e1-4c00-8672-c5ea13d4e7af", + "metadata": {}, + "source": [ + "## Using the server data\n", + "\n", + "Once data has been uploaded to the server, we can query it during out scripts, and let our users browse the classification data. You might want to load the server webpage to see the data which is now there.\n", + "\n", + "The main way we will interact with the server is via `Concept` objects. You can get a single `Concept` with:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "6005c966-12a7-49c2-aebc-281734a9afeb", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[2m2025-05-07 12:06:19\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mServer URL http://192.168.1.137:8000 successfully loaded from secrets directory\u001b[0m\n", + "\u001b[2m2025-05-07 12:06:19\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mDefault language `en` successfully loaded from secrets directory\u001b[0m\n", + "\u001b[2m2025-05-07 12:06:19\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mServer URL `http://192.168.1.137:8000` is healthy and reachable\u001b[0m\n" + ] + } + ], + "source": [ + "my_concept = Concept.get_one(concept_01.id_)" + ] + }, + { + "cell_type": "markdown", + "id": "05ba6c07-1052-41c0-9572-e00c8320cf89", + "metadata": {}, + "source": [ + "We can ask our concepts about their hierarchical relationships:" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "id": "834cfc4f-818c-4aa3-a3e0-f7684451ef99", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[[Relationship(source='https://ninja.space/CPCv2.1/01', target='https://ninja.space/CPCv2.1/0', predicate=)]]" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_concept.relationships()" + ] + }, + { + "cell_type": "markdown", + "id": "441bbe7b-85b9-4ea6-9fb1-ea30f0b58125", + "metadata": {}, + "source": [ + "By default `.relationships()` only looks for `Relationship` objects where the originating `Concept` is the source, but we can also look for targets with `.relationships(target=True)`" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "id": "f249d641-579d-449e-8cc9-f3d94548b12f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[[Relationship(source='https://ninja.space/CPCv2.1/01', target='https://ninja.space/CPCv2.1/0', predicate=)],\n", + " [Relationship(source='https://ninja.space/CPCv2.1/011', target='https://ninja.space/CPCv2.1/01', predicate=)]]" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_concept.relationships(target=True)" + ] + }, + { + "cell_type": "markdown", + "id": "bfc167e0-91dc-42b8-ae6c-6424d8b714d9", + "metadata": {}, + "source": [ + "We can also ask about associations:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "fc8c16f9-1300-4bf1-99b7-17de29c82ff9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[2m2025-05-07 12:28:16\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mServer URL http://192.168.1.137:8000 successfully loaded from secrets directory\u001b[0m\n", + "\u001b[2m2025-05-07 12:28:16\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mDefault language `en` successfully loaded from secrets directory\u001b[0m\n", + "\u001b[2m2025-05-07 12:28:16\u001b[0m [\u001b[32m\u001b[1minfo \u001b[0m] \u001b[1mServer URL `http://192.168.1.137:8000` is healthy and reachable\u001b[0m\n", + "Source: [Association(id_='https://ninja.space/CPCv2.1-BONSAI2025.1/88a1b03df77e4e769ce26c6967b72f55', types=['http://rdf-vocabulary.ddialliance.org/xkos#ConceptAssociation'], source_concepts=[{'@id': 'https://ninja.space/BONSAI2025.1/fi_12'}], target_concepts=[{'@id': 'https://ninja.space/CPCv2.1/12'}], kind=, extra={})]\n", + "Target: []\n" + ] + }, + { + "data": { + "text/plain": [ + "[Association(id_='https://ninja.space/CPCv2.1-BONSAI2025.1/88a1b03df77e4e769ce26c6967b72f55', types=['http://rdf-vocabulary.ddialliance.org/xkos#ConceptAssociation'], source_concepts=[{'@id': 'https://ninja.space/BONSAI2025.1/fi_12'}], target_concepts=[{'@id': 'https://ninja.space/CPCv2.1/12'}], kind=, extra={})]" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "Concept.get_one(\n", + " Concept.generate_iri(concept_scheme=cs_bonsai, notations=[\"fi_12\"])\n", + ").associations(target=True)" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.13.2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/pyst_client/simple/__init__.py b/pyst_client/simple/__init__.py new file mode 100644 index 0000000..39722b1 --- /dev/null +++ b/pyst_client/simple/__init__.py @@ -0,0 +1,19 @@ +__all__ = ( + "settings", + "ConceptScheme", + "Concept", + "Correspondence", + "Relationship", + "RelationshipVerbs", + "AssociationKind", + "Association", +) + +from py_semantic_taxonomy.domain.constants import AssociationKind, RelationshipVerbs + +from pyst_client.simple.classes.association import Association +from pyst_client.simple.classes.concept import Concept +from pyst_client.simple.classes.concept_scheme import ConceptScheme +from pyst_client.simple.classes.correspondence import Correspondence +from pyst_client.simple.classes.relationship import Relationship +from pyst_client.simple.settings import settings diff --git a/pyst_client/simple/classes/__init__.py b/pyst_client/simple/classes/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/pyst_client/simple/classes/api_base.py b/pyst_client/simple/classes/api_base.py new file mode 100644 index 0000000..304aca2 --- /dev/null +++ b/pyst_client/simple/classes/api_base.py @@ -0,0 +1,71 @@ +import webbrowser +from urllib.parse import urljoin + +import orjson +from py_semantic_taxonomy.domain.url_utils import get_full_api_path + +from pyst_client.simple.client import APIError, get_read_client, get_write_client +from pyst_client.simple.settings import settings + + +class APIBase: + @property + def api_url(self) -> str: + return urljoin(settings.server_url, self.api_path) + + @property + def web_url(self) -> str: + return urljoin(settings.server_url, self.web_path) + + def open_new_tab(self) -> None: + webbrowser.open_new_tab(self.web_url) + + def json(self) -> bytes: + return orjson.dumps(self.to_json_ld()) + + def save(self, already_exists: bool = False) -> None: + client = get_write_client() + if not already_exists: + try: + return client.post( + self.api_path, + data=self.json(), + ) + except APIError as err: + if err.status_code != 409: + raise err + return client.put( + self.api_path, + data=[self.json()], + ) + + def delete(self) -> None: + client = get_write_client() + return client.delete( + self.api_path, + ) + + @classmethod + def _get_many( + cls, + url_label: str, + params: dict = {}, + timeout: int = 5, + ) -> list[dict]: + return get_read_client().get( + get_full_api_path(url_label), + params=params, + timeout=timeout, + ) + + @classmethod + def _get_one( + cls, + url_label: str, + iri: str, + timeout: int = 5, + ) -> dict: + return get_read_client().get( + get_full_api_path(url_label, iri=iri), + timeout=timeout, + ) diff --git a/pyst_client/simple/classes/association.py b/pyst_client/simple/classes/association.py new file mode 100644 index 0000000..4931951 --- /dev/null +++ b/pyst_client/simple/classes/association.py @@ -0,0 +1,159 @@ +import uuid + +import orjson +import structlog +from py_semantic_taxonomy.adapters.routers import request_dto as req +from py_semantic_taxonomy.domain import entities +from py_semantic_taxonomy.domain.constants import XKOS, AssociationKind +from py_semantic_taxonomy.domain.url_utils import get_full_api_path + +from pyst_client.simple.classes.api_base import APIBase +from pyst_client.simple.classes.concept import Concept +from pyst_client.simple.classes.correspondence import Correspondence +from pyst_client.simple.client import get_write_client +from pyst_client.simple.settings import can_write, settings + +logger = structlog.get_logger("pyst-client") + + +class Association(entities.Association, APIBase): + def open_new_tab(self) -> None: + raise NotImplementedError + + @property + def api_path(self) -> str: + return get_full_api_path("association", iri=self.id_) + + @classmethod + def get_one( + cls, + iri: str, + timeout: int = 5, + ) -> "Association": + return cls.from_json_ld( + orjson.loads(cls._get_one("association", iri, timeout=timeout).text) + ) + + @classmethod + def get_many( + cls, + correspondence: Correspondence | str | None = None, + source_concept: Concept | str | None = None, + target_concept: Concept | str | None = None, + kind: AssociationKind | None = None, + timeout: int = 5, + ) -> list["Concept"]: + params = { + "correspondence_iri": ( + correspondence.id_ + if isinstance(correspondence, Correspondence) + else correspondence + ), + "source_concept_iri": ( + source_concept.id_ + if isinstance(source_concept, Concept) + else source_concept + ), + "target_concept_iri": ( + target_concept.id_ + if isinstance(target_concept, Concept) + else target_concept + ), + "kind": kind or AssociationKind.simple, + } + return [ + cls.from_json_ld(obj) + for obj in orjson.loads( + cls._get_many( + "association_all", + params={ + key: value for key, value in params.items() if value is not None + }, + timeout=timeout, + ).text + ) + ] + + @classmethod + @can_write + def create( + cls, + *, + correspondence: Correspondence, + source_concepts: list[Concept | dict], + target_concepts: list[Concept | dict], + extra: dict | None = None, + id_: str | None = None, + ) -> "Concept": + if id_ is None: + id_ = "{}{}{}/{}".format( + settings.creation_base_url, + "" if settings.creation_base_url.endswith("/") else "/", + correspondence.notations[0]["@value"], + uuid.uuid4().hex, + ) + extra = extra or {} + extra[f"{XKOS}Correspondence"] = correspondence.id_ + assoc = cls( + id_=id_, + types=[f"{XKOS}ConceptAssociation"], + source_concepts=[ + ({"@id": obj.id_} if isinstance(obj, Concept) else obj) + for obj in source_concepts + ], + target_concepts=[ + ({"@id": obj.id_} if isinstance(obj, Concept) else obj) + for obj in target_concepts + ], + kind=( + AssociationKind.simple + if len(source_concepts) == 1 + else AssociationKind.conditional + ), + extra=extra or {}, + ) + req.Association(**assoc.to_json_ld()) + return assoc + + def save(self) -> None: + client = get_write_client() + client.post( + self.api_path, + data=self.json(), + ) + # Add association to correspondence + if f"{XKOS}Correspondence" in self.extra: + client.post( + get_full_api_path("made_of"), + data=orjson.dumps( + entities.MadeOf( + id_=self.extra[f"{XKOS}Correspondence"], + made_ofs=[{"@id": self.id_}], + ).to_json_ld() + ), + ) + else: + logger.warning( + "Unable to automatically add this association to a `Correspondence` object" + ) + + def delete(self) -> None: + client = get_write_client() + return client.delete( + self.api_path, + ) + if f"{XKOS}Correspondence" in self.extra: + client.request( + url=get_full_api_path("made_of"), + method="DELETE", + content=orjson.dumps( + entities.MadeOf( + id_=self.extra[f"{XKOS}Correspondence"], + made_ofs=[{"@id": self.id_}], + ).to_json_ld() + ), + ) + else: + logger.warning( + "Unable to automatically remove this association from a `Correspondence` object" + ) diff --git a/pyst_client/simple/classes/concept.py b/pyst_client/simple/classes/concept.py new file mode 100644 index 0000000..4b00945 --- /dev/null +++ b/pyst_client/simple/classes/concept.py @@ -0,0 +1,217 @@ +import datetime +from urllib.parse import quote + +import orjson +from py_semantic_taxonomy.adapters.routers import request_dto as req +from py_semantic_taxonomy.adapters.routers.validation import ( + DateTime, + DateTimeType, + MultilingualString, + Notation, + Status, + StatusChoice, +) +from py_semantic_taxonomy.domain import entities +from py_semantic_taxonomy.domain.constants import SKOS, AssociationKind +from py_semantic_taxonomy.domain.url_utils import get_full_api_path + +from pyst_client.simple.classes.api_base import APIBase +from pyst_client.simple.classes.concept_scheme import ConceptScheme +from pyst_client.simple.settings import can_write, settings + + +class Concept(entities.Concept, APIBase): + @property + def api_path(self) -> str: + return get_full_api_path("concept", iri=self.id_) + + @property + def web_path(self) -> str: + return ( + "web/concept/" + + quote(self.id_, safe="") + + "?language=" + + settings.default_language + ) + + @classmethod + def get_one( + cls, + iri: str, + timeout: int = 5, + ) -> "Concept": + return cls.from_json_ld( + orjson.loads(cls._get_one("concept", iri, timeout=timeout).text) + ) + + @classmethod + def get_many( + cls, + concept_scheme_iri: str | None = None, + top_concepts_only: bool = False, + timeout: int = 5, + ) -> list["Concept"]: + return [ + cls.from_json_ld(obj) + for obj in orjson.loads( + cls._get_many( + "concept_all", + params={ + "concept_scheme_iri": concept_scheme_iri, + "top_concepts_only": top_concepts_only, + }, + timeout=timeout, + ).text + ) + ] + + @classmethod + @can_write + def generate_iri( + cls, *, concept_scheme: ConceptScheme, notations: list[str | Notation] = [] + ): + if not concept_scheme.notations: + raise ValueError( + "No concept scheme `notations` provided; automatic id generation requires the at least one concept scheme notation" + ) + if not notations: + raise ValueError( + "No `notations` provided; automatic id generation requires at least one concept notation" + ) + return "{}{}{}/{}".format( + settings.creation_base_url, + "" if settings.creation_base_url.endswith("/") else "/", + concept_scheme.notations[0]["@value"], + notations[0], + ) + + @classmethod + @can_write + def create( + cls, + *, + concept_scheme: ConceptScheme, + pref_labels: list[str | MultilingualString], + alt_labels: list[str | MultilingualString] = [], + hidden_labels: list[str | MultilingualString] = [], + notations: list[str | Notation] = [], + definitions: list[str | MultilingualString] = [], + status: Status = Status(**{"@id": StatusChoice.accepted}), + top_concept: bool = False, + extra: dict | None = None, + creation_message: str | None = None, + id_: str | None = None, + ) -> "Concept": + if id_ is None: + id_ = cls.generate_iri(concept_scheme=concept_scheme, notations=notations) + + concept = cls( + id_=id_, + schemes=[{"@id": concept_scheme.id_}], + types=[f"{SKOS}Concept"], + status=[status.model_dump(by_alias=True)], + notations=[ + ( + Notation(**{"@value": obj}) if isinstance(obj, str) else obj + ).model_dump(by_alias=True) + for obj in notations + ], + pref_labels=[ + ( + {"@value": obj, "@language": settings.default_language} + if isinstance(obj, str) + else obj.model_dump(by_alias=True) + ) + for obj in pref_labels + ], + alt_labels=[ + ( + {"@value": obj, "@language": settings.default_language} + if isinstance(obj, str) + else obj.model_dump(by_alias=True) + ) + for obj in alt_labels + ], + hidden_labels=[ + ( + {"@value": obj, "@language": settings.default_language} + if isinstance(obj, str) + else obj.model_dump(by_alias=True) + ) + for obj in hidden_labels + ], + definitions=[ + ( + {"@value": obj, "@language": settings.default_language} + if isinstance(obj, str) + else obj.model_dump(by_alias=True) + ) + for obj in definitions + ], + top_concept_of=[{"@id": concept_scheme.id_}] if top_concept else [], + extra={} if extra is None else extra, + history_notes=[ + { + "http://www.w3.org/1999/02/22-rdf-syntax-ns#value": [ + { + "@language": ( + "en" + if not creation_message + else settings.default_language + ), + "@value": creation_message + or "Created by pyst-client version 1", + } + ], + "http://purl.org/dc/terms/creator": [{"@id": settings.creator}], + "http://purl.org/dc/terms/issued": [ + DateTime( + **{ + "@type": DateTimeType.date, + "@value": datetime.date.today(), + } + ).model_dump(by_alias=True) + ], + } + ], + ) + req.Concept(**concept.to_json_ld()) + return concept + + def relationships( + self, source: bool = True, target: bool = False, timeout: int = 5 + ) -> list: + from pyst_client.simple.classes.relationship import Relationship + + return Relationship.get_many(concept=self, source=source, target=target) + + def associations( + self, + correspondence=None, + source: bool = True, + target: bool = False, + kind: AssociationKind | None = None, + timeout: int = 5, + ) -> list: + from pyst_client.simple.classes.association import Association + + results = [] + if source: + results.extend( + Association.get_many( + correspondence=correspondence, + source_concept=self, + kind=kind, + timeout=timeout, + ) + ) + if target: + results.extend( + Association.get_many( + correspondence=correspondence, + target_concept=self, + kind=kind, + timeout=timeout, + ) + ) + return results diff --git a/pyst_client/simple/classes/concept_scheme.py b/pyst_client/simple/classes/concept_scheme.py new file mode 100644 index 0000000..535a396 --- /dev/null +++ b/pyst_client/simple/classes/concept_scheme.py @@ -0,0 +1,156 @@ +import datetime +from urllib.parse import quote + +import orjson +from py_semantic_taxonomy.adapters.routers import request_dto as req +from py_semantic_taxonomy.adapters.routers.validation import ( + DateTime, + DateTimeType, + MultilingualString, + Notation, + Status, + StatusChoice, + VersionString, +) +from py_semantic_taxonomy.domain import entities +from py_semantic_taxonomy.domain.constants import SKOS +from py_semantic_taxonomy.domain.url_utils import get_full_api_path + +from pyst_client.simple.classes.api_base import APIBase +from pyst_client.simple.settings import can_write, settings + + +class ConceptScheme(entities.ConceptScheme, APIBase): + def concepts(self, top_concepts_only: bool = False) -> list: + from pyst_client.simple.classes.concept import Concept + + return Concept.get_many( + concept_scheme_iri=self.id_, top_concepts_only=top_concepts_only + ) + + @property + def api_path(self) -> str: + return get_full_api_path("concept_scheme", iri=self.id_) + + @property + def web_path(self) -> str: + return ( + "web/concept_scheme/" + + quote(self.id_, safe="") + + "?language=" + + settings.default_language + ) + + @classmethod + def get_one( + cls, + iri: str, + timeout: int = 5, + ) -> "ConceptScheme": + return cls.from_json_ld( + orjson.loads(cls._get_one("concept_scheme", iri, timeout=timeout).text) + ) + + @classmethod + def get_many( + cls, + timeout: int = 5, + ) -> list["ConceptScheme"]: + return [ + cls.from_json_ld(obj) + for obj in orjson.loads( + cls._get_many("concept_scheme_all", timeout=timeout).text + ) + ] + + @classmethod + @can_write + def create( + cls, + *, + pref_labels: list[str | MultilingualString], + version: str | VersionString = "1.0", + status: Status = Status(**{"@id": StatusChoice.accepted}), + notations: list[str | Notation] = [], + creation_message: str | None = None, + definitions: list[str | MultilingualString] = [], + id_: str | None = None, + extra: dict | None = None, + creators: list[dict] = [], + ) -> "ConceptScheme": + if id_ is None: + if not notations: + raise ValueError( + "No concept scheme `notations` provided; automatic id generation requires the at least one concept scheme notation" + ) + id_ = "{}{}{}".format( + settings.creation_base_url, + "" if settings.creation_base_url.endswith("/") else "/", + notations[0], + ) + if isinstance(version, str): + version = [VersionString(**{"@value": version}).model_dump(by_alias=True)] + if not creators: + creators = [{"@id": settings.creator}] + + scheme = cls( + id_=id_, + types=[f"{SKOS}ConceptScheme"], + status=[status.model_dump(by_alias=True)], + notations=[ + Notation(**{"@value": obj} if isinstance(obj, str) else obj).model_dump( + by_alias=True + ) + for obj in notations + ], + pref_labels=[ + ( + {"@value": obj, "@language": settings.default_language} + if isinstance(obj, str) + else obj.model_dump(by_alias=True) + ) + for obj in pref_labels + ], + definitions=[ + ( + {"@value": obj, "@language": settings.default_language} + if isinstance(obj, str) + else obj.model_dump(by_alias=True) + ) + for obj in definitions + ], + extra=extra or {}, + history_notes=[ + { + "http://www.w3.org/1999/02/22-rdf-syntax-ns#value": [ + { + "@language": ( + "en" + if not creation_message + else settings.default_language + ), + "@value": creation_message + or "Created by pyst-client version 1", + } + ], + "http://purl.org/dc/terms/creator": [{"@id": settings.creator}], + "http://purl.org/dc/terms/issued": [ + DateTime( + **{ + "@type": DateTimeType.date, + "@value": datetime.date.today(), + } + ).model_dump(by_alias=True) + ], + } + ], + creators=creators, + version=version, + created=[ + DateTime( + **{"@type": DateTimeType.date, "@value": datetime.date.today()} + ).model_dump(by_alias=True) + ], + ) + req.ConceptScheme(**scheme.to_json_ld()) + return scheme diff --git a/pyst_client/simple/classes/correspondence.py b/pyst_client/simple/classes/correspondence.py new file mode 100644 index 0000000..ccfe30d --- /dev/null +++ b/pyst_client/simple/classes/correspondence.py @@ -0,0 +1,143 @@ +import datetime + +import orjson +from py_semantic_taxonomy.adapters.routers import request_dto as req +from py_semantic_taxonomy.adapters.routers.validation import ( + DateTime, + DateTimeType, + MultilingualString, + Notation, + Status, + StatusChoice, + VersionString, +) +from py_semantic_taxonomy.domain import entities +from py_semantic_taxonomy.domain.constants import XKOS +from py_semantic_taxonomy.domain.url_utils import get_full_api_path + +from pyst_client.simple.classes.api_base import APIBase +from pyst_client.simple.classes.concept_scheme import ConceptScheme +from pyst_client.simple.settings import can_write, settings + + +class Correspondence(entities.Correspondence, APIBase): + @property + def api_path(self) -> str: + return get_full_api_path("correspondence", iri=self.id_) + + def open_new_tab(self) -> None: + raise NotImplementedError + + @classmethod + def get_one( + cls, + iri: str, + timeout: int = 5, + ) -> "Correspondence": + return cls.from_json_ld( + orjson.loads(cls._get_one("correspondence", iri, timeout=timeout).text) + ) + + @classmethod + def get_many( + cls, + timeout: int = 5, + ) -> list["Correspondence"]: + return [ + cls.from_json_ld(obj) + for obj in orjson.loads( + cls._get_many("correspondence_all", timeout=timeout).text + ) + ] + + @classmethod + @can_write + def create( + cls, + *, + pref_labels: list[str | MultilingualString], + compares: list[ConceptScheme], + version: str | VersionString = "1.0", + status: Status = Status(**{"@id": StatusChoice.accepted}), + notations: list[str | Notation] = [], + creation_message: str | None = None, + definitions: list[str | MultilingualString] = [], + id_: str | None = None, + extra: dict | None = None, + creators: list[dict] = [], + ) -> "Correspondence": + if not notations: + notations = ["-".join([cs.notations[0]["@value"] for cs in compares])] + if id_ is None: + id_ = "{}{}{}".format( + settings.creation_base_url, + "" if settings.creation_base_url.endswith("/") else "/", + notations[0], + ) + if isinstance(version, str): + version = [VersionString(**{"@value": version}).model_dump(by_alias=True)] + if not creators: + creators = [{"@id": settings.creator}] + + scheme = cls( + id_=id_, + compares=[{"@id": obj.id_} for obj in compares], + types=[f"{XKOS}Correspondence"], + status=[status.model_dump(by_alias=True)], + notations=[ + Notation(**{"@value": obj} if isinstance(obj, str) else obj).model_dump( + by_alias=True + ) + for obj in notations + ], + pref_labels=[ + ( + {"@value": obj, "@language": settings.default_language} + if isinstance(obj, str) + else obj.model_dump(by_alias=True) + ) + for obj in pref_labels + ], + definitions=[ + ( + {"@value": obj, "@language": settings.default_language} + if isinstance(obj, str) + else obj.model_dump(by_alias=True) + ) + for obj in definitions + ], + extra=extra or {}, + history_notes=[ + { + "http://www.w3.org/1999/02/22-rdf-syntax-ns#value": [ + { + "@language": ( + "en" + if not creation_message + else settings.default_language + ), + "@value": creation_message + or "Created by pyst-client version 1", + } + ], + "http://purl.org/dc/terms/creator": [{"@id": settings.creator}], + "http://purl.org/dc/terms/issued": [ + DateTime( + **{ + "@type": DateTimeType.date, + "@value": datetime.date.today(), + } + ).model_dump(by_alias=True) + ], + } + ], + creators=creators, + version=version, + created=[ + DateTime( + **{"@type": DateTimeType.date, "@value": datetime.date.today()} + ).model_dump(by_alias=True) + ], + ) + req.Correspondence(**scheme.to_json_ld()) + return scheme diff --git a/pyst_client/simple/classes/relationship.py b/pyst_client/simple/classes/relationship.py new file mode 100644 index 0000000..2b3b8e2 --- /dev/null +++ b/pyst_client/simple/classes/relationship.py @@ -0,0 +1,122 @@ +import itertools + +import orjson +import structlog +from py_semantic_taxonomy.adapters.routers import request_dto as req +from py_semantic_taxonomy.domain import entities +from py_semantic_taxonomy.domain.constants import RelationshipVerbs +from py_semantic_taxonomy.domain.url_utils import get_full_api_path + +from pyst_client.simple.classes.concept import Concept +from pyst_client.simple.client import APIError, get_read_client, get_write_client +from pyst_client.simple.settings import can_write + +logger = structlog.get_logger("pyst-client") + + +class Relationship(entities.Relationship): + @classmethod + def get_many( + cls, + concept: Concept | str, + source: bool = True, + target: bool = False, + timeout: int = 5, + ) -> list["Concept"]: + return [ + cls.from_json_ld(obj) + for obj in orjson.loads( + get_read_client() + .get( + get_full_api_path("relationship"), + params={ + "iri": concept.id_ if isinstance(concept, Concept) else concept, + "source": source, + "target": target, + }, + timeout=timeout, + ) + .text + ) + ] + + @classmethod + @can_write + def create_many( + cls, + *, + sources: list[Concept | str], + targets: list[Concept | str], + verbs: list[RelationshipVerbs] | RelationshipVerbs, + timeout: int = 120, + skip_duplicates: bool = True, + ) -> list: + if isinstance(verbs, RelationshipVerbs): + verbs = itertools.repeat(verbs) + + relationships = [ + cls( + source=getattr(source, "id_", source), + target=getattr(target, "id_", target), + predicate=verb, + ) + for source, target, verb in zip(sources, targets, verbs) + ] + # Validation + [req.Relationship(**obj.to_json_ld()) for obj in relationships] + + errors = True + while errors: + try: + get_write_client().post( + get_full_api_path("relationship"), + data=orjson.dumps([obj.to_json_ld() for obj in relationships]), + timeout=timeout, + ) + errors = False + except APIError as err: + if err.status_code != 409 or not skip_duplicates: + raise err + relationships = remove_duplicates(data=relationships, message=str(err)) + return relationships + + @classmethod + @can_write + def delete_many( + cls, + *, + sources: list[Concept | str], + targets: list[Concept | str], + verbs: list[RelationshipVerbs] | RelationshipVerbs, + ) -> list: + if isinstance(verbs, RelationshipVerbs): + verbs = itertools.repeat(verbs) + + relationships = [ + cls( + source=getattr(source, "id_", source), + target=getattr(target, "id_", target), + predicate=verb, + ) + for source, target, verb in zip(sources, targets, verbs) + ] + # Validation + [req.Relationship(**obj.to_json_ld()) for obj in relationships] + get_write_client().client().request( + url=get_full_api_path("relationship"), + method="delete", + content=orjson.dumps([obj.to_json_ld() for obj in relationships]), + ) + return relationships + + +def remove_duplicates(data: list[Relationship], message: str) -> list[Relationship]: + first = "Relationship between source `" + second = "` and target `" + third = "` already exists" + + source = message[message.index(first) + len(first) : message.index(second)] + target = message[message.index(second) + len(second) : message.index(third)] + + logger.info("Skipping existing relationship between `%s` and `%s`", source, target) + return [obj for obj in data if obj.source != source and obj.target != target] diff --git a/pyst_client/simple/client.py b/pyst_client/simple/client.py new file mode 100644 index 0000000..144efbb --- /dev/null +++ b/pyst_client/simple/client.py @@ -0,0 +1,126 @@ +import functools +from urllib.parse import urljoin + +import httpx +import structlog +from py_semantic_taxonomy.domain.url_utils import get_full_api_path + +from pyst_client.simple.settings import settings + +logger = structlog.get_logger("pyst-client") + + +class APIError(Exception): + def __init__(self, message, status_code): + super().__init__(message) + self.status_code = status_code + + +class ReadClient: + def __init__(self): + settings.check_read_settings() + if settings.server_url: + resp = httpx.get( + urljoin(settings.server_url, get_full_api_path("status")), timeout=2 + ) + if resp.status_code != 200: + logger.warning( + "Server URL `%s` is reachable but not healthy. Response content: %s", + settings.server_url, + resp.text, + ) + else: + logger.info( + "Server URL `%s` is healthy and reachable", settings.server_url + ) + + def server_healthy(self) -> bool: + resp = httpx.get( + urljoin(self.settings.server_url, get_full_api_path("status")), timeout=2 + ) + return resp.status_code == 200 + + def has_server_config(func): + @functools.wraps(func) + def wrap(self, *args, **kwargs): + if not settings.server_url: + raise ValueError( + "Server URL is missing but required for this function. Set with `client.set_server_url()`." + ) + return func(self, *args, **kwargs) + + return wrap + + def get(self, *args, **kwargs) -> httpx.Response: + response = self.client().get(*args, **kwargs) + self.check_error(response) + return response + + def client(self) -> httpx.Client: + return httpx.Client( + base_url=settings.server_url, + headers={ + "Content-Type": "application/json", + "x-pyst-auth-token": settings.api_key, + }, + ) + + def check_error(self, response: httpx.Response) -> None: + if response.status_code not in (200, 204): + try: + error = response.json() + except: + error = response.text + raise APIError(str(error), status_code=response.status_code) + + +class WriteClient(ReadClient): + def __init__(self): + super().__init__() + settings.check_write_settings() + + def has_server_config(func): + @functools.wraps(func) + def wrap(self, *args, **kwargs): + if not self.settings.server_url: + raise ValueError( + "Server URL is missing but required for this function." + ) + if not self.settings.creator: + raise ValueError( + "Creator setting is missing but required for this function." + ) + if not self.settings.api_key: + raise ValueError("API key is missing but required for this function.") + if not self.settings.creation_base_url: + raise ValueError( + "Creation base URL is missing but required for this function." + ) + return func(self, *args, **kwargs) + + return wrap + + def post(self, *args, **kwargs) -> httpx.Response: + response = self.client().post(*args, **kwargs) + self.check_error(response) + return response + + def put(self, *args, **kwargs) -> httpx.Response: + response = self.client().put(*args, **kwargs) + self.check_error(response) + return response + + def delete(self, *args, **kwargs) -> httpx.Response: + response = self.client().delete(*args, **kwargs) + self.check_error(response) + return response + + +@functools.lru_cache(maxsize=1) +def get_read_client(): + return ReadClient() + + +@functools.lru_cache(maxsize=1) +def get_write_client(): + return WriteClient() diff --git a/pyst_client/simple/settings.py b/pyst_client/simple/settings.py new file mode 100644 index 0000000..814e71e --- /dev/null +++ b/pyst_client/simple/settings.py @@ -0,0 +1,105 @@ +import functools +from pathlib import Path + +import platformdirs +import rfc3987 +import structlog +from py_semantic_taxonomy.adapters.routers.validation import validate_iri +from pydantic_settings import BaseSettings, SettingsConfigDict + +logger = structlog.get_logger("pyst-client") +base_dir = Path(platformdirs.user_data_dir(appname="PyST-Client", appauthor="cauldron")) +secrets_dir = base_dir / "secrets" +secrets_dir.mkdir(exist_ok=True, parents=True) + + +class Settings(BaseSettings): + model_config = SettingsConfigDict( + secrets_dir=secrets_dir, + env_prefix="PyST_client_", + ) + + api_key: str | None = None + server_url: str | None = None + creator: str | None = None + default_language: str | None = None + creation_base_url: str | None = None + + def _set(self, key: str, value: str) -> None: + with open(secrets_dir / f"PyST_client_{key}", "w", encoding="utf-8") as f: + f.write(value) + setattr(self, key, value) + + def set_api_key(self, value: str) -> None: + self._set("api_key", value) + + def set_language(self, value: str) -> None: + self._set("default_language", value) + + def set_creation_base_url(self, value: str) -> None: + validate_iri(value) + self._set("creation_base_url", value) + + def set_server_url(self, value: str) -> None: + rfc3987.parse(value, rule="absolute_URI") + self._set("server_url", value) + + def set_creator(self, value: str) -> None: + validate_iri(value) + self._set("creator", value) + + def check_read_settings(self) -> None: + if url := self.server_url: + logger.info("Server URL %s successfully loaded from secrets directory", url) + else: + logger.warning( + "Server URL not found, no API operations possible. Set with `settings.set_server_url()`." + ) + + if lang := self.default_language: + logger.info( + "Default language `%s` successfully loaded from secrets directory", lang + ) + else: + self.set_language("en") + logger.info( + "Setting default language to 'en'. Change with `settings.set_language()`" + ) + + def check_write_settings(self) -> None: + if self.api_key: + logger.info("API key successfully loaded from secrets directory") + else: + logger.warning( + "API key not found, write operations impossible. Set with `settings.set_api_key()`." + ) + + if self.creation_base_url: + logger.info("Creation base URL successfully loaded from secrets directory") + else: + logger.warning( + "Creation base URL not found, write operations impossible. Set with `settings.set_creation_base_url()`." + ) + + +settings = Settings() + + +def can_write(func): + @functools.wraps(func) + def wrap(self, *args, **kwargs): + if not settings.server_url: + raise ValueError("Server URL is missing but required for this function.") + if not settings.creator: + raise ValueError( + "Creator setting is missing but required for this function." + ) + if not settings.api_key: + raise ValueError("API key is missing but required for this function.") + if not settings.creation_base_url: + raise ValueError( + "Creation base URL is missing but required for this function." + ) + return func(self, *args, **kwargs) + + return wrap