SPARQL Extension of the Shapes Constraint Language (SHACL-SPARQL)

This document defines SHACL-SPARQL, the SPARQL extension of the SHACL Shapes Constraint Language. SHACL-SPARQL provides mechanisms to extend the SHACL vocabulary and represent SPARQL-based constraints and user-defined constraint components.

Introduction

This specification describes conformance criteria SHACL SPARQL processors.

TODO: more?

TODO: link to test cases.

Document Conventions

Throughout this document, IRIs are written in Turtle syntax, using the following mapping of prefixes to namespaces:

Prefix	Namespace
`rdf:`	`http://www.w3.org/1999/02/22-rdf-syntax-ns#`
`rdfs:`	`http://www.w3.org/2000/01/rdf-schema#`
`sh:`	`http://www.w3.org/ns/shacl#`
`shs:`	`http://www.w3.org/ns/shacl-sparql#`
`xsd:`	`http://www.w3.org/2001/XMLSchema#`
`ex:`	`http://example.com/ns#`

The remainder of this section is informative.

The Turtle serialization of the SHACL vocabulary will be uploaded to web URL of the graph that it represents.

Throughout the document, color-coded boxes containing RDF graphs in Turtle will appear. These fragments of Turtle documents use the prefix bindings given above.

	# This box represents an input shapes graph

	# Triples that can be omitted are marked as grey e.g.
	<s> <p> <o> .

	# This box represents an input data graph.
	# When highlighting is used in the examples:

	# Elements highlighted in blue are focus nodes
	ex:Bob a ex:Person .

	# Elements highlighted in red are focus nodes that fail validation
	ex:Alice a ex:Person .

	# This box represents an output results graph

SHACL or SPARQL Definitions appear in blue boxes:

DEFINITIONS

	# This box contains textual definitions.

Basic Terminology

Terminology that is linked to portions of RDF 1.1 Concepts and Abstract Syntax is used in SHACL as defined there. Terminology that is linked to portions of SPARQL 1.1 Query Language is used in SHACL as defined there. Terminology that is linked to portions of SHACL Shapes Constraints Lanugage is used in SHACL-SPARQL as defined there. TODO: perform the linking A single linkage is sufficient to provide a definition for all occurences of a particular term in this document.

Definitions are complete within this document, i.e., if there is no rule to make a some situation true in this document then the situation is false.

Relationship to SPARQL

This specification uses parts of SPARQL 1.1 in the normative definition of the semantics of the SHACL-SPARL.

SPARQL variables using the $ marker represent external bindings that are pre-bound or, in the case of $PATH, substituted in the SPARQL query before execution.

In some places, the specification assumes that the provided SPARQL engines are preserving the identity of blank nodes, so that repeated invocations of queries consistently identify and communicate the same blank nodes.

Access to the shapes graph is not a requirement for supporting the SHACL Core language. The variable shapesGraph can also be used in SPARQL-based constraints and SPARQL-based constraint components. However, such constraints may not be interoperable across different SHACL-SPARQL processors or not applicable to remote RDF datasets.

Relation to SHACL

Say it extends SHACL. SHACL-SPARQL support all SHACL constructs SHACL-SPARQL definitions go into shapes graph to define constraints. sh:SparqlSelectConstraintComponent is a constraint component similar to the SHACL Core components -> executed user query for value nodes. SPARQL-Based Constraint compontents provide a high level vocabulary to define custom constraint components. -> define parameters of constraint components and define how these parameters are prebind into SPARQL queries and executed on value nodes.

shs:SPARQLConstraintComponent

shs:SPARQLConstraintComponent is a special constraint component with mandatory parameter shs:sparql.

A shape in an RDF graph with a value for shs:sparql in G that is an ill-formed SPARQL constraint specification in the shapes graph is an ill-formed shape.

A node in a shapes graph is an ill-formed SPARQL constraint specification if

it does not have a single value in the shapes graph for shs:select
any value for shs:select is not a string that is a syntatically-valid SPARQL SELECT query after prefix handling

If there is a value for shs:select, it is turned into SPARQL algebra as usual after prefix handling except that the top-level pattern is treated as if there is an initial solution set with the following solutions: TODO: double check when Pre-Binding resolved

this bound to the focus node
value bound to a value node, for each value node, if any, otherwise there is a single binding set with value unbound
shapesGraph bound to the shapes graph
currentShape bound to the current shape

Let x be the solutions from evaluating the SELECT query on the data graph. The top-level validation results are formed from solutions in x, as defined in the section for mapping of result variables to validation results

The remainder of this section is informative.

The following example illustrates the syntax of a SPARQL-based constraint.

ex:ValidCountry a ex:Country ;
	ex:germanLabel "Spanien"@de .

ex:InvalidCountry a ex:Country ;
	ex:germanLabel "Spain"@en .

ex:LanguageExampleShape
	a sh:Shape ;
	sh:targetClass ex:Country ;
	shs:sparql [
		a shs:SPARQLConstraint ;   # This triple is optional
		shs:message "Values are literals with German language tag." ;
		shs:prefixes ex: ;
		shs:select """
			SELECT $this (ex:germanLabel AS ?path) ?value
			WHERE {
				$this ex:germanLabel ?value .
				FILTER (!isLiteral(?value) || !langMatches(lang(?value), "de"))
			}
			""" ;
	] .

The target of the shape above includes all SHACL instances of ex:Country. For those nodes (represented by the variable this), the SPARQL query walks through the values of ex:germanLabel and verifies that they are literals with a German language code. The validation results for the aforementioned data graph is shown below:

[] a sh:ValidationReport ;
	sh:conforms "false"^^xsd:boolean ;
	sh:result [
		a sh:ValidationResult ;
		sh:resultSeverity sh:Violation ;
		sh:focusNode ex:InvalidCountry ;
		sh:resultPath ex:germanLabel ;
		sh:value "Spain"@en ;
		sh:sourceConstraintComponent shs:SPARQLConstraintComponent ;
		sh:sourceShape ex:LanguageExampleShape ;
		# ...
	] .

Pre-bound Variables in SPARQL Constraints ($this, $shapesGraph, $currentShape)

TODO: align / merge with above

The following table enumerates the variables that have special meaning in SPARQL constraints. When SPARQL constraints are processed, the SHACL-SPARQL processor pre-binds values for these variables.

Variable	Interpretation
`$this`	The focus node.
`$shapesGraph`	Can be used to query the shapes graph as in `GRAPH $shapesGraph { ... }`. If the shapes graph is a named graph in the same dataset as the data graph then it is the IRI of the shapes graph in the dataset. Not all SHACL-SPARQL processors need to support this variable. Processors that do not support `$shapesGraph` MUST report a failure if they encounter a query that references this variable. Use of `GRAPH $shapesGraph { ... }` should be handled with extreme caution. It may result in constraints that are not interoperable across different SHACL-SPARQL processors and that may not run on remote RDF datasets.
`$currentShape`	The current shape. Typically used in conjunction with `$shapesGraph`. The same support policies as for `$shapesGraph` apply for this variable.

Mapping of Result Variables to Validation Results

If one of the solutions of the result set produced by a SELECT query during a validation process contains the binding true for the variable failure, then the SHACL-SPARQL processor MUST signal a failure.

Otherwise, each row of the result set produced by a SELECT query MUST be converted into one validation result node. The property values of those nodes are derived by the following rules, through a combination of result solutions and the values of the constraint itself. The rules are meant to be executed from top to bottom, so that the first bound value will be used.

Property	Production Rules
`sh:focusNode`	The binding for the variable `focusNode` The binding for the variable `this`
`sh:resultPath`	The binding for the variable `path`, if that is a IRI
`sh:value`	The binding for the variable `value`, if any
`sh:resultMessage`	The binding for the variable `message` The values of `shs:message` of the subject of the `shs:select` triple. These string literals may include the names of any SELECT result variables via `{?varName}` or `{$varName}`. If the constraint is based on a SPARQL-based constraint component, then the component's parameter names can also be used. These `{?varName}` and `{$varName}` blocks SHOULD be substituted with suitable string representations of the values of said variables. TODO: align with Shape validation messages definition
`sh:sourceConstraint`	The SPARQL-based constraint, i.e. the value of `shs:sparql`

Injecting Annotation Properties into Validation Results

TODO: align / merge with above

It is possible to inject additional annotation properties into the validation result nodes created for each solution of the SELECT result sets. Any such property needs to be declared via a value of shs:resultAnnotation at the subject of the shs:select triple. The values of shs:resultAnnotation are either IRIs or blank nodes with the following properties. In this table, the Value type column states the required SHACL class or datatype of the property values, and the Count column indicates the minimum and maximum number of values that the properties may have. If these value types and counts are violated then the shapes graph is invalid.

Property	Value type	Count	Description
`shs:annotationProperty`	`rdf:Property`	`1 (mandatory)`	The annotation property that shall be set
`shs:annotationVarName`	`xsd:string`	`0..1`	The name of the SPARQL variable to take the values from
`shs:annotationValue`		`0..unlimited`	Constant RDF terms that shall be used as default values

For each solution of a SELECT result set, a SHACL-SPARQL processor MUST walk through the declared result annotations. The mapping from result annotations to SPARQL variables uses the following rules:

If a shs:resultAnnotation has a value for the property shs:annotationVarName then the SHACL-SPARQL processor MUST look for the variable with the same name as the value of shs:annotationVarName
Otherwise, the SHACL-SPARQL processor MUST use the local name of the value of shs:annotationProperty as the variable name

If a variable name could be determined, then the SHACL-SPARQL processor MUST copy the binding for the given variable as a value for the property specified using shs:annotationProperty into the validation result that is being produced for the current solution. If the variable has no binding in the result set solution, then the value of shs:annotationValue MUST be used, if present.

The remainder of this section is informative.

Here is a slightly complex example, illustrating the use of result annotations.

ex:ShapeWithPathViolationExample
	a sh:Shape ;
	sh:targetNode ex:ExampleRootResource ;
	shs:sparql [
		shs:resultAnnotation [
			shs:annotationProperty ex:time ;
			shs:annotationVarName "time"
		] ;
		shs:select """
			SELECT $this (ex:property1 AS ?path) (?first AS ?value) ?message ?time
			WHERE {
				$this ex:property1 ?first .
				?subject ex:property2 ?first .
				FILTER isBlank(?value) .
				BIND (CONCAT("The ", "message.") AS ?message) .
				BIND (NOW() AS ?time) .
			}
			""" ;
	] .

ex:ExampleRootResource
	ex:property1 ex:ExampleIntermediateResource .

ex:ExampleValueResource
	ex:property2 ex:ExampleIntermediateResource .

Validation produces the following validation result nodes:

[] a sh:ValidationReport ;
	sh:conforms "false"^^xsd:boolean ;
	sh:result [
		a sh:ValidationResult ;
		sh:resultSeverity sh:Violation ;
		sh:focusNode ex:ExampleRootResource ;
		sh:resultPath ex:property1 ;
		sh:value ex:ExampleIntermediateResource ;
		sh:resultMessage "The message." ;
		sh:sourceConstraintComponent shs:SPARQLConstraintComponent ;
		sh:sourceShape ex:ShapeWithPathViolationExample ;
		ex:time "2015-03-27T10:58:00"^^xsd:dateTime ;  # Example
	] .

SPARQL-based Constraint Components

Each SHACL instance of shs:ConstraintComponent is a SPARQL-based constraint component.

If any node in a graph that is a SHACL instance of sh:ConstraintComponent is an ill-formed SPARQL constraint in the graph then the graph is an ill-formed shapes graph.

Each SPARQL-Based constraint component specifies:

one or more parameters (e.g. sh:class)
at least one validator

An Example Constraint Component

The following example demonstrates how SPARQL can be used to specify new constraint components using the SHACL-SPARQL language. The example implements sh:pattern and sh:flags using a SPARQL ASK query to validate that each value node matches a given regular expression. Note that this is only an example implementation and should not be considered normative.

sh:PatternConstraintComponent
	a shs:ConstraintComponent ;
	shs:parameter [
		shs:predicate sh:pattern ;
	] ;
	shs:parameter [
		shs:predicate sh:flags ;
		shs:optional true ;
	] ;
	shs:validator shimpl:hasPattern .

shimpl:hasPattern
	a shs:SPARQLAskValidator ;
	shs:message "Value does not match pattern {$pattern}" ;
	shs:ask "ASK { FILTER (!isBlank($value) && IF(bound($flags), regex(str($value), $pattern, $flags), regex(str($value), $pattern))) }" .

The following sections introduce the properties that constraint components may have. Some of these properties are independent of SPARQL-based execution and apply to constraint components based on other potential extension languages such as JavaScript too.

Parameter Declarations (shs:parameter)

The parameters of a constraint component are declared via the property shs:parameter. The objects of triples with shs:parameter as predicate have shs:Parameter as expected type.

Each parameter has exactly one value p for the property shs:predicate and the value is an IRI, otherwise, the constraint component that has this parameter is an ill-formed constraint component. The local name of an IRI is defined as the longest NCNAME at the end of the IRI, not immediately preceded by the first colon in the IRI. The parameter name is defined as the local name of the value of shs:predicate. To ensure that a correct mapping from parameters into SPARQL variables is possible, every parameter name

is a valid SPARQL VARNAME
is not equal to this, shapesGraph, currentShape, focusNode, predicate, path or value
is not equal to another parameter name in the same constraint component

An shs:Parameter may have its property shs:optional set to true to indicate that the parameter is not mandatory. Every constraint component has at least one non-optional parameter.

The class shs:Parameter is defined as a SHACL subclass of shs:PropertyConstraint, and all properties that are applicable to property constraints may also be used for parameters. This includes descriptive properties such as sh:name and sh:description but also constraint parameters such as sh:class. Some implementations MAY use these constraint parameters to prevent the execution of constraint components with invalid parameter values.

Parameter pre-binding

Every parameter name is used as the name of a pre-bound variable for the constraint component the parameter belongs to. This variable can be used in the SPARQL queries of the constraint component and a SHACL-SPARQL processor MUST pre-bind it to the parameter value.

Label Templates (shs:labelTemplate)

The property shs:labelTemplate can be used at any constraint component to suggest how they could be rendered to humans. The values of shs:labelTemplate are strings (possibly with language tag) that can include the names of the declared parameters using the syntax {?varName} or {$varName}, where varName is the name of the SPARQL variable that corresponds to the parameter. At display time, these {?varName} and {$varName} blocks SHOULD be substituted with the actual parameter values. There may be multiple label templates for the same subject, assuming they do not have the same language tags.

Validators

For every supported context (i.e., property constraints or shape) the constraint component declares a suitable validator. For a given constraint, a validator is selected from the constraint component using the following rules, in order:

For shapes, use one of the values of shs:shapeSelfValidator, if present.
For property constraints, use one of the values of shs:shapePathValidator, if present.
Otherwise, use one of the values of shs:validator.

If no suitable validator can be found, a SHACL-SPARQL processor ignores the constraint. The SHACL WG is seeking practical feedback on what the default behavior should be, and whether we should report violations in those cases.

SHACL-SPARQL includes two types of validators, based on SPARQL SELECT (for shs:shapeSelfValidator and shs:shapePathValidator) or SPARQL ASK queries (for shs:validator).

Validators based on SPARQL SELECT Queries

TODO: rephrase the context based on shapes (not property constraints anymore)

Validators that are SHACL instances of shs:SPARQLSelectValidator have exactly one string representation of a SPARQL SELECT query via the property shs:select. The value of shs:select is a valid SPARQL query using the aforementioned prefix handling rules. The query returns the result variable this in its SELECT clause. This type of validator can be used as values of shs:shapeSelfValidator or shs:shapePathValidator.

The following example illustrates the declaration of a constraint component based on a SPARQL SELECT query. It is a generalized variation of the SPARQL-based example constraint from the section on SPARQL-based constraints. That SPARQL query included two constants: the specific property ex:germanLabel and the language tag de. Constraint components make it possible to generalize such scenarios, so that constants get pre-bound with parameters. This allows the query logic to be reused in multiple places, without having to write any new SPARQL.

ex:LanguageConstraintComponentUsingSELECT
	a shs:ConstraintComponent ;
	rdfs:label "Language constraint component" ;
	shs:parameter [
		shs:predicate ex:lang ;
		sh:datatype xsd:string ;
		sh:minLength 2 ;
		sh:name "language" ;
		sh:description "The language tag, e.g. \"de\"." ;
	] ;
	shs:labelTemplate "Values are literals with language \"{$lang}\"" ;
	shs:shapePathValidator [
		a shs:SPARQLSelectValidator ;
		shs:message "Values are literals with language \"{?lang}\"" ;
		shs:select """
			SELECT DISTINCT $this ?value
			WHERE {
				$this $PATH ?value .
				FILTER (!isLiteral(?value) || !langMatches(lang(?value), $lang))
			}
			"""
	] .

Once a constraint component has been declared (in a shapes graph), its parameters can be used as illustrated in the following example.

ex:LanguageExampleShape
	a sh:Shape ;
	sh:targetClass ex:Country ;
	sh:shape [
		sh:path ex:germanLabel ;
		ex:lang "de" ;
	] ;
	sh:shape [
		sh:path ex:englishLabel ;
		ex:lang "en" ;
	] .

The example shape above specifies the condition that all values of ex:germanLabel carry the language tag de while all values of ex:englishLabel have en as their language. These details are specified via two property constraints that have values for the ex:lang parameter required by the constraint component.

SELECT queries used in the context of property constraints use a special variable named PATH as a placeholder for the predicate or path used by the constraint. The only legal use of this variable is in the predicate position of a triple pattern. A query that uses the variable PATH in any other position is invalid.

A SHACL-SPARQL processor executes the provided SPARQL query on the data graph to produce validation results. In the context of property constraints, the SHACL-SPARQL processor will first substitute all occurrences of the variable PATH with the provided property path derived from the value of sh:path in the constraint. The resulting SPARQL query is then evaluated with the same pre-bound variables as outlined in the section for SPARQL-based Constraints ($this etc). Additionally, the value of each declared parameter of the constraint component needs to be pre-bound for the variable derived by the local name of the parameter's shs:predicate. For example, if a non-optional parameter declares shs:predicate ex:lang then the variable lang needs to be pre-bound. The result set of the SELECT query is turned into validation results using the same rules as outlined in the section for SPARQL-based Constraints. In addition to the result properties listed in that section, the property sh:sourceConstraintComponent MUST point at the IRI of the constraint component that has been evaluated. Furthermore, a SHACL-SPARQL processor MUST use any additional annotation properties that are associated with a SPARQL select validator via shs:resultAnnotation.

Validators based on SPARQL ASK Queries

Many constraint components are of the form in which all value nodes are tested individually against some boolean condition. Writing SELECT queries for these becomes burdensome, especially if a constraint component can be used for both property constraints and shapes. SHACL-SPARQL provides an alternative, more compact syntax for validators based on ASK queries. This type of validators can be used as values of the property shs:validator.

Validators that are SHACL instances of shs:SPARQLAskValidator point at exactly one string representation of a SPARQL ASK query via the property shs:ask. The value of shs:ask is a valid SPARQL query using the aforementioned prefix handling rules. The ASK queries return true if and only if a given value node (represented by the pre-bound variable value) conforms to the constraint.

Prior to evaluation, a SHACL-SPARQL processor transforms the provided ASK query into a SELECT query using the following templates. The resulting SELECT query can then be evaluated using the same algorithm as for SELECT-based validators. The processor drops the ASK keyword, any top-level dataset clauses and solution modifiers, leaving only the GroupGraphPattern including the outermost {...} pair. This block then substitutes ... in the template.

Template for sh:Shape context:

	SELECT $this ?value
	WHERE {
		BIND ($this AS ?value) .
		FILTER NOT EXISTS ...
	}

Template for sh:PropertyConstraint context:

	SELECT DISTINCT $this ?value
	WHERE {
		$this $PATH ?value .
		FILTER NOT EXISTS ...
	}

The WG is awaiting input from the SPARQL Maintenance (EXISTS) Community Group on potential changes to the semantics of EXISTS.

Note that the template above includes a DISTINCT keyword because a SPARQL path expression may return the same ?value multiple times, yet each value node is only validated once.

Once the corresponding template has been applied, the resulting SELECT query will be evaluated using the same approach as outlined above. Actual SHACL implementations may of course use a different approach internally, as long as the results are equivalent to the described approach.

The following example declares a constraint component using an ASK query.

ex:LanguageConstraintComponentUsingASK
	a shs:ConstraintComponent ;
	rdfs:label "Language constraint component" ;
	shs:parameter [
		shs:predicate ex:lang ;
		sh:datatype xsd:string ;
		sh:minLength 2 ;
		sh:name "language" ;
		sh:description "The language tag, e.g. \"de\"." ;
	] ;
	shs:labelTemplate "Values are literals with language \"{$lang}\"" ;
	shs:validator ex:hasLang .

ex:hasLang
	a shs:SPARQLAskValidator ;
	shs:message "Values are literals with language \"{$lang}\"" ;
	shs:ask """
		ASK {
			FILTER (isLiteral($value) && langMatches(lang($value), $lang))
		}
		""" .

Note that the validation condition implemented by an ASK query is "in the inverse direction" from its SELECT counterpart: ASK queries return true for value nodes that conform to the constraint, while SELECT queries return those value nodes that do not conform.

Prefix Declarations for SPARQL Queries

A shapes graph may include declarations of namespace prefixes so that these prefixes can be used to abbreviate the SPARQL queries derived from the same shapes graph. The syntax of such prefix declarations is illustrated by the following example.

ex:
	a owl:Ontology ;
	owl:imports sh: ;
	shs:declare [
		shs:prefix "ex" ;
		shs:namespace "http://example.com/ns#"^^xsd:anyURI ;
	] ;
	shs:declare [
		shs:prefix "schema" ;
		shs:namespace "http://schema.org/"^^xsd:anyURI ;
	] .

The property shs:declare is used to make prefix declarations. The SHACL vocabulary includes the class shs:PrefixDeclaration as type for the values of shs:declare although no rdf:type triple is required for them. The values of shs:declare have exactly one value for the property shs:prefix (literals of datatype xsd:string) and exactly one value for the property shs:namespace (literals of datatype xsd:anyURI). Such a pair of values specifies a single mapping of a prefix to a namespace.

The recommended subject for values of shs:declare is the IRI of the graph containing the shapes that use the prefixes. These IRIs are often declared as an instance of owl:Ontology, but this is not required.

Prefix declarations can be used by SPARQL-based constraints and similar SPARQL-based features such as the validators of constraint components, derived values constraints, target types and functions. These nodes can use the property shs:prefixes to specify a set of prefix mappings. (An example use of the shs:prefixes property can be found in the example above.) The values of shs:prefixes are either IRIs or blank nodes. A SHACL processor collects a set of prefix mappings as the union of all individual prefix mappings that can be reached by the property path shs:prefixes/owl:imports*/shs:declare starting at the SPARQL-based constraint. If such a collection of prefix declarations contains multiple namespaces for the same value of shs:prefix, then the shapes graph is invalid. (Note that SHACL processors MAY ignore prefix declarations that are never reached). A SHACL processor transforms the values of shs:select (and similar properties such as shs:ask) into SPARQL by prepending PREFIX declarations for all prefix mappings. Each value of shs:prefix is turned into the PNAME_NS, while each value of shs:namespace is turned into the IRIREF in the PREFIX declaration. For the example shapes graph above, a SHACL-SPARQL processor would produce lines such as PREFIX ex: <http://example.com/ns#>. The SHACL-SPARQL processor MUST produce a failure if the resulting SPARQL query string cannot be parsed into a valid SPARQL 1.1 query. In the rest of this document, the shs:prefixes statements may have been omitted for brevity.

Document Outline

Introduction

Document Conventions

Basic Terminology

Relationship to SPARQL

Relation to SHACL

shs:SPARQLConstraintComponent

Pre-bound Variables in SPARQL Constraints ($this, $shapesGraph, $currentShape)

Mapping of Result Variables to Validation Results

Injecting Annotation Properties into Validation Results

SPARQL-based Constraint Components

An Example Constraint Component

Parameter Declarations (shs:parameter)

Parameter pre-binding

Label Templates (shs:labelTemplate)

Validators

Validators based on SPARQL SELECT Queries

Validators based on SPARQL ASK Queries

Prefix Declarations for SPARQL Queries

Pre-binding of Variables in SPARQL Queries

Pre-bound variables

SHACL-SPARQL Processors

Acknowledgements