This document defines SHACL-SPARQL, the SPARQL extension of the SHACL Shapes Constraint Language. SHACL-SPARQL provides mechanisms to extend the SHACL vocabulary and represent SPARQL-based constraints and user-defined constraint components.
This document covers SHACL-SPARQL, the SPARQL extension of the SHACL Shapes Constraint Language that can be used to define SPARQL-based constraints and user defined constraint components.
The examples in this document use Turtle [[!turtle]]. The reader should be familiar with SHACL [[!shacl]] and with SPARQL [[!sparql11-overview]].
This specification describes conformance criteria SHACL SPARQL processors.
TODO: more?
TODO: link to test cases.Throughout this document, IRIs are written in Turtle syntax, using the following mapping of prefixes to namespaces:
Prefix | Namespace |
---|---|
rdf: |
http://www.w3.org/1999/02/22-rdf-syntax-ns# |
rdfs: |
http://www.w3.org/2000/01/rdf-schema# |
sh: |
http://www.w3.org/ns/shacl# |
shs: |
http://www.w3.org/ns/shacl-sparql# |
xsd: |
http://www.w3.org/2001/XMLSchema# |
ex: |
http://example.com/ns# |
The remainder of this section is informative.
The Turtle serialization of the SHACL vocabulary will be uploaded to web URL of the graph that it represents.
Throughout the document, color-coded boxes containing RDF graphs in Turtle will appear. These fragments of Turtle documents use the prefix bindings given above.
# This box represents an input shapes graph
# Triples that can be omitted are marked as grey e.g.
<s> <p> <o> .
# This box represents an input data graph. # When highlighting is used in the examples: # Elements highlighted in blue are focus nodes ex:Bob a ex:Person . # Elements highlighted in red are focus nodes that fail validation ex:Alice a ex:Person .
# This box represents an output results graph
SHACL or SPARQL Definitions appear in blue boxes:
# This box contains textual definitions.
Terminology that is linked to portions of RDF 1.1 Concepts and Abstract Syntax is used in SHACL as defined there. Terminology that is linked to portions of SPARQL 1.1 Query Language is used in SHACL as defined there. Terminology that is linked to portions of SHACL Shapes Constraints Lanugage is used in SHACL-SPARQL as defined there. TODO: perform the linking A single linkage is sufficient to provide a definition for all occurences of a particular term in this document.
Definitions are complete within this document, i.e., if there is no rule to make a some situation true in this document then the situation is false.
This specification uses parts of SPARQL 1.1 in the normative definition of the semantics of the SHACL-SPARL.
SPARQL variables using the $
marker represent external bindings that are
pre-bound or, in the case of $PATH
, substituted in the SPARQL query before execution.
In some places, the specification assumes that the provided SPARQL engines are preserving the identity of blank nodes, so that repeated invocations of queries consistently identify and communicate the same blank nodes.
Access to the shapes graph is not a requirement for supporting the SHACL Core language.
The variable shapesGraph
can also be used in SPARQL-based
constraints and SPARQL-based constraint components.
However, such constraints may not be interoperable across different SHACL-SPARQL processors or not
applicable to remote RDF datasets.
Say it extends SHACL. SHACL-SPARQL support all SHACL constructs SHACL-SPARQL definitions go into shapes graph to define constraints. sh:SparqlSelectConstraintComponent is a constraint component similar to the SHACL Core components -> executed user query for value nodes. SPARQL-Based Constraint compontents provide a high level vocabulary to define custom constraint components. -> define parameters of constraint components and define how these parameters are prebind into SPARQL queries and executed on value nodes.
shs:SPARQLConstraintComponent is a special constraint component with mandatory parameter shs:sparql.
A shape in an RDF graph with a value for shs:sparql
in G
that is an ill-formed
SPARQL constraint specification in the shapes graph is an ill-formed shape.
A node in a shapes graph is an ill-formed SPARQL constraint specification if
If there is a value for shs:select, it is turned into SPARQL algebra as usual after prefix handling except that the top-level pattern is treated as if there is an initial solution set with the following solutions: TODO: double check when Pre-Binding resolved
this
bound to the focus node
value
bound to a value node, for each value node, if any,
otherwise there is a single binding set with value unbound
shapesGraph
bound to the shapes graph
currentShape
bound to the current shape
Let x be the solutions from evaluating the SELECT query on the data graph. The top-level validation results are formed from solutions in x, as defined in the section for mapping of result variables to validation results
The remainder of this section is informative.
The following example illustrates the syntax of a SPARQL-based constraint.
ex:ValidCountry a ex:Country ;
ex:germanLabel "Spanien"@de .
ex:InvalidCountry a ex:Country ;
ex:germanLabel "Spain"@en .
ex:LanguageExampleShape
a sh:Shape ;
sh:targetClass ex:Country ;
shs:sparql [
a shs:SPARQLConstraint ; # This triple is optional
shs:message "Values are literals with German language tag." ;
shs:prefixes ex: ;
shs:select """
SELECT $this (ex:germanLabel AS ?path) ?value
WHERE {
$this ex:germanLabel ?value .
FILTER (!isLiteral(?value) || !langMatches(lang(?value), "de"))
}
""" ;
] .
The target of the shape above includes all SHACL instances of ex:Country
.
For those nodes (represented by the variable this
), the SPARQL query walks through the
values of ex:germanLabel
and verifies that they are literals with a German language code.
The validation results for the aforementioned data graph is shown below:
[] a sh:ValidationReport ; sh:conforms "false"^^xsd:boolean ; sh:result [ a sh:ValidationResult ; sh:resultSeverity sh:Violation ; sh:focusNode ex:InvalidCountry ; sh:resultPath ex:germanLabel ; sh:value "Spain"@en ; sh:sourceConstraintComponent shs:SPARQLConstraintComponent ; sh:sourceShape ex:LanguageExampleShape ; # ... ] .
The following table enumerates the variables that have special meaning in SPARQL constraints. When SPARQL constraints are processed, the SHACL-SPARQL processor pre-binds values for these variables.
Variable | Interpretation |
---|---|
$this |
The focus node. |
$shapesGraph |
Can be used to query the shapes graph as in GRAPH $shapesGraph { ... } .
If the shapes graph is a named graph in the same dataset as the data graph then it is the IRI of
the shapes graph in the dataset.
Not all SHACL-SPARQL processors need to support this variable.
Processors that do not support $shapesGraph MUST report a failure if they
encounter a query that references this variable.
Use of GRAPH $shapesGraph { ... } should be handled with extreme caution.
It may result in constraints that are not interoperable across different SHACL-SPARQL processors
and that may not run on remote RDF datasets.
|
$currentShape |
The current shape. Typically used in conjunction with $shapesGraph .
The same support policies as for $shapesGraph apply for this variable.
|
If one of the solutions of the result set produced by a SELECT query during a validation process
contains the binding true
for the variable failure
, then the SHACL-SPARQL processor MUST signal a failure.
Otherwise, each row of the result set produced by a SELECT query MUST be converted into one validation result node. The property values of those nodes are derived by the following rules, through a combination of result solutions and the values of the constraint itself. The rules are meant to be executed from top to bottom, so that the first bound value will be used.
Property | Production Rules |
---|---|
sh:focusNode |
|
sh:resultPath |
|
sh:value |
|
sh:resultMessage |
|
sh:sourceConstraint |
|
It is possible to inject additional annotation properties into the validation
result nodes created for each solution of the SELECT result sets.
Any such property needs to be declared via a value of shs:resultAnnotation
at the subject
of
the shs:select
triple.
The values of shs:resultAnnotation
are either IRIs or blank nodes with the following
properties.
In this table, the Value type column states the required SHACL class or datatype of the
property values,
and the Count column indicates the minimum and maximum number of values that the properties may
have.
If these value types and counts are violated then the shapes graph is invalid.
Property | Value type | Count | Description |
---|---|---|---|
shs:annotationProperty |
rdf:Property |
1 (mandatory) |
The annotation property that shall be set |
shs:annotationVarName |
xsd:string |
0..1 |
The name of the SPARQL variable to take the values from |
shs:annotationValue |
0..unlimited |
Constant RDF terms that shall be used as default values |
For each solution of a SELECT result set, a SHACL-SPARQL processor MUST walk through the declared result annotations. The mapping from result annotations to SPARQL variables uses the following rules:
shs:resultAnnotation
has a value for the property
shs:annotationVarName
then the SHACL-SPARQL processor MUST look for the variable with
the same name as the value of shs:annotationVarName
shs:annotationProperty
as the variable name
If a variable name could be determined, then the SHACL-SPARQL processor MUST copy the binding for the
given variable
as a value for the property specified using shs:annotationProperty
into the validation result that is being produced for the current solution.
If the variable has no binding in the result set solution, then the value of
shs:annotationValue
MUST be used, if present.
The remainder of this section is informative.
Here is a slightly complex example, illustrating the use of result annotations.
ex:ShapeWithPathViolationExample
a sh:Shape ;
sh:targetNode ex:ExampleRootResource ;
shs:sparql [
shs:resultAnnotation [
shs:annotationProperty ex:time ;
shs:annotationVarName "time"
] ;
shs:select """
SELECT $this (ex:property1 AS ?path) (?first AS ?value) ?message ?time
WHERE {
$this ex:property1 ?first .
?subject ex:property2 ?first .
FILTER isBlank(?value) .
BIND (CONCAT("The ", "message.") AS ?message) .
BIND (NOW() AS ?time) .
}
""" ;
] .
ex:ExampleRootResource ex:property1 ex:ExampleIntermediateResource . ex:ExampleValueResource ex:property2 ex:ExampleIntermediateResource .
Validation produces the following validation result nodes:
[] a sh:ValidationReport ; sh:conforms "false"^^xsd:boolean ; sh:result [ a sh:ValidationResult ; sh:resultSeverity sh:Violation ; sh:focusNode ex:ExampleRootResource ; sh:resultPath ex:property1 ; sh:value ex:ExampleIntermediateResource ; sh:resultMessage "The message." ; sh:sourceConstraintComponent shs:SPARQLConstraintComponent ; sh:sourceShape ex:ShapeWithPathViolationExample ; ex:time "2015-03-27T10:58:00"^^xsd:dateTime ; # Example ] .
Each SHACL instance of shs:ConstraintComponent is a SPARQL-based constraint component.
If any node in a graph that is a SHACL instance of sh:ConstraintComponent is an ill-formed SPARQL constraint in the graph then the graph is an ill-formed shapes graph.
Each SPARQL-Based constraint component specifies:
sh:class
)
The following example demonstrates how SPARQL can be used to specify new constraint components using the
SHACL-SPARQL language.
The example implements sh:pattern
and
sh:flags
using a SPARQL ASK query to validate that
each value node matches a given regular expression.
Note that this is only an example implementation and should not be considered normative.
sh:PatternConstraintComponent a shs:ConstraintComponent ; shs:parameter [ shs:predicate sh:pattern ; ] ; shs:parameter [ shs:predicate sh:flags ; shs:optional true ; ] ; shs:validator shimpl:hasPattern . shimpl:hasPattern a shs:SPARQLAskValidator ; shs:message "Value does not match pattern {$pattern}" ; shs:ask "ASK { FILTER (!isBlank($value) && IF(bound($flags), regex(str($value), $pattern, $flags), regex(str($value), $pattern))) }" .
The following sections introduce the properties that constraint components may have. Some of these properties are independent of SPARQL-based execution and apply to constraint components based on other potential extension languages such as JavaScript too.
The parameters of a constraint component are declared via the property shs:parameter
.
The objects of triples with shs:parameter
as predicate have shs:Parameter
as expected
type.
Each parameter has exactly one value p
for the property shs:predicate
and the
value is an IRI, otherwise, the constraint component that has this parameter is an ill-formed constraint component.
The local name of an IRI is defined as the longest NCNAME
at the end of the IRI, not immediately preceded by the first colon in the IRI.
The parameter name is defined as the local name of the
value
of shs:predicate
.
To ensure that a correct mapping from parameters into SPARQL variables is possible, every parameter
name
this
, shapesGraph
, currentShape
,
focusNode
,
predicate
, path
or value
An shs:Parameter
may have its property shs:optional
set to true
to indicate that the parameter is not mandatory.
Every constraint component has at least one non-optional parameter.
The class shs:Parameter
is defined as a SHACL subclass of
shs:PropertyConstraint
,
and all properties that are applicable to property constraints may also be used
for parameters.
This includes descriptive properties such as sh:name
and sh:description
but also constraint parameters such as sh:class
.
Some implementations MAY use these constraint parameters to prevent the execution of constraint
components with invalid parameter values.
Every parameter name is used as the name of a pre-bound variable for the constraint component the parameter belongs to. This variable can be used in the SPARQL queries of the constraint component and a SHACL-SPARQL processor MUST pre-bind it to the parameter value.
The property shs:labelTemplate
can be used at any constraint component to suggest how
they could be rendered to humans.
The values of shs:labelTemplate
are strings (possibly with language tag) that can include
the names of the declared parameters using the syntax {?varName}
or {$varName}
,
where varName
is the name of the SPARQL variable that corresponds to the parameter.
At display time, these {?varName}
and {$varName}
blocks SHOULD be substituted
with the actual parameter values.
There may be multiple label templates for the same subject, assuming they do not have the same language
tags.
For every supported context (i.e., property constraints or shape) the constraint component declares a suitable validator. For a given constraint, a validator is selected from the constraint component using the following rules, in order:
shs:shapeSelfValidator
, if present.shs:shapePathValidator
,
if present.
shs:validator
.
If no suitable validator can be found, a SHACL-SPARQL processor ignores the constraint. The SHACL WG is seeking practical feedback on what the default behavior should be, and whether we should report violations in those cases.
SHACL-SPARQL includes two types of validators, based on SPARQL
SELECT (for shs:shapeSelfValidator
and shs:shapePathValidator
)
or SPARQL ASK queries (for shs:validator
).
Validators that are SHACL instances of shs:SPARQLSelectValidator
have exactly one
string representation of a SPARQL SELECT query via the property shs:select
.
The value of shs:select
is a valid SPARQL query using the aforementioned prefix handling rules.
The query returns the result variable this
in its SELECT clause.
This type of validator can be used as values of shs:shapeSelfValidator
or shs:shapePathValidator
.
The following example illustrates the declaration of a constraint component based on a SPARQL SELECT
query.
It is a generalized variation of the SPARQL-based example constraint from the section on SPARQL-based constraints.
That SPARQL query included two constants: the specific property ex:germanLabel
and the
language tag de
.
Constraint components make it possible to generalize such scenarios, so that constants get
pre-bound
with parameters.
This allows the query logic to be reused in multiple places, without having to write any new SPARQL.
ex:LanguageConstraintComponentUsingSELECT a shs:ConstraintComponent ; rdfs:label "Language constraint component" ; shs:parameter [ shs:predicate ex:lang ; sh:datatype xsd:string ; sh:minLength 2 ; sh:name "language" ; sh:description "The language tag, e.g. \"de\"." ; ] ; shs:labelTemplate "Values are literals with language \"{$lang}\"" ; shs:shapePathValidator [ a shs:SPARQLSelectValidator ; shs:message "Values are literals with language \"{?lang}\"" ; shs:select """ SELECT DISTINCT $this ?value WHERE { $this $PATH ?value . FILTER (!isLiteral(?value) || !langMatches(lang(?value), $lang)) } """ ] .
Once a constraint component has been declared (in a shapes graph), its parameters can be used as illustrated in the following example.
ex:LanguageExampleShape
a sh:Shape ;
sh:targetClass ex:Country ;
sh:shape [
sh:path ex:germanLabel ;
ex:lang "de" ;
] ;
sh:shape [
sh:path ex:englishLabel ;
ex:lang "en" ;
] .
The example shape above specifies the condition that all values of ex:germanLabel
carry
the language tag de
while all values of ex:englishLabel
have en
as their language.
These details are specified via two property constraints that have values for
the
ex:lang
parameter required by the constraint component.
SELECT queries used in the context of property constraints use a special
variable named
PATH
as a placeholder for the predicate or path
used by the constraint.
The only legal use of this variable is in the predicate position of a triple pattern.
A query that uses the variable PATH
in any other position is invalid.
A SHACL-SPARQL processor executes the provided SPARQL query on the data graph to produce validation
results.
In the context of property constraints, the SHACL-SPARQL processor will first
substitute all
occurrences of the variable PATH
with the provided property path derived from the value of sh:path
in the constraint.
The resulting SPARQL query is then evaluated with the same pre-bound variables
as outlined in the section for SPARQL-based Constraints
($this
etc).
Additionally, the value of each declared parameter of the constraint component needs to be
pre-bound for
the variable derived by the local name of the parameter's shs:predicate
.
For example, if a non-optional parameter declares shs:predicate ex:lang
then the
variable lang
needs to be pre-bound.
The result set of the SELECT query is turned into validation results using the same rules as
outlined in the section for SPARQL-based Constraints.
In addition to the result properties listed in that section, the property sh:sourceConstraintComponent
MUST point at the
IRI of the constraint component that has been evaluated.
Furthermore, a SHACL-SPARQL processor MUST use any additional annotation properties that are associated with a
SPARQL select validator via shs:resultAnnotation
.
Many constraint components are of the form in which all value nodes are tested individually
against some boolean condition.
Writing SELECT queries for these becomes burdensome, especially if a constraint component can be
used for both property constraints and shapes.
SHACL-SPARQL provides an alternative, more compact syntax for validators based on ASK queries.
This type of validators can be used as values of the property shs:validator
.
Validators that are SHACL instances of shs:SPARQLAskValidator
point at exactly
one string representation of a SPARQL ASK query via the property shs:ask
.
The value of shs:ask
is a valid SPARQL query using the aforementioned prefix handling rules.
The ASK queries return true
if and only if a given value node
(represented by the pre-bound variable value
) conforms to the constraint.
Prior to evaluation, a SHACL-SPARQL processor transforms the provided ASK query into a SELECT query
using the following templates.
The resulting SELECT query can then be evaluated using the same algorithm as for SELECT-based validators.
The processor drops the ASK keyword, any top-level dataset clauses and solution modifiers, leaving
only the GroupGraphPattern
including the outermost {...}
pair.
This block then substitutes ...
in the template.
Template for sh:Shape
context:
SELECT $this ?value WHERE { BIND ($this AS ?value) . FILTER NOT EXISTS ... }
Template for sh:PropertyConstraint
context:
SELECT DISTINCT $this ?value WHERE { $this $PATH ?value . FILTER NOT EXISTS ... }
Note that the template above includes a DISTINCT
keyword because a SPARQL path
expression may
return the same ?value
multiple times, yet each value node is only validated
once.
Once the corresponding template has been applied, the resulting SELECT query will be evaluated using the same approach as outlined above. Actual SHACL implementations may of course use a different approach internally, as long as the results are equivalent to the described approach.
The following example declares a constraint component using an ASK query.
ex:LanguageConstraintComponentUsingASK a shs:ConstraintComponent ; rdfs:label "Language constraint component" ; shs:parameter [ shs:predicate ex:lang ; sh:datatype xsd:string ; sh:minLength 2 ; sh:name "language" ; sh:description "The language tag, e.g. \"de\"." ; ] ; shs:labelTemplate "Values are literals with language \"{$lang}\"" ; shs:validator ex:hasLang . ex:hasLang a shs:SPARQLAskValidator ; shs:message "Values are literals with language \"{$lang}\"" ; shs:ask """ ASK { FILTER (isLiteral($value) && langMatches(lang($value), $lang)) } """ .
Note that the validation condition implemented by an ASK query is "in the inverse direction" from
its SELECT counterpart:
ASK queries return true
for value nodes that conform to the constraint, while SELECT
queries return those value nodes that do not conform.
A shapes graph may include declarations of namespace prefixes so that these prefixes can be used to abbreviate the SPARQL queries derived from the same shapes graph. The syntax of such prefix declarations is illustrated by the following example.
ex: a owl:Ontology ; owl:imports sh: ; shs:declare [ shs:prefix "ex" ; shs:namespace "http://example.com/ns#"^^xsd:anyURI ; ] ; shs:declare [ shs:prefix "schema" ; shs:namespace "http://schema.org/"^^xsd:anyURI ; ] .
The property shs:declare
is used to make prefix declarations.
The SHACL vocabulary includes the class shs:PrefixDeclaration
as type for the values of
shs:declare
although no rdf:type
triple is required for them.
The values of shs:declare
have exactly one value for the property shs:prefix
(literals of datatype xsd:string
)
and exactly one value for the property shs:namespace
(literals of datatype
xsd:anyURI
).
Such a pair of values specifies a single mapping of a prefix to a namespace.
The recommended subject for values of shs:declare
is the IRI of the graph containing
the shapes that use the prefixes.
These IRIs are often declared as an instance of owl:Ontology
, but this is not required.
Prefix declarations can be used by SPARQL-based constraints and similar SPARQL-based features such as
the validators of constraint components,
derived values constraints,
target types and functions.
These nodes can use the property shs:prefixes
to specify a set of prefix mappings.
(An example use of the shs:prefixes
property can be found in the
example above.)
The values of shs:prefixes
are either IRIs or blank nodes.
A SHACL processor collects a set of prefix mappings as the union of all
individual prefix mappings that can be reached by the property path
shs:prefixes/owl:imports*/shs:declare
starting at the SPARQL-based constraint.
If such a collection of prefix declarations contains multiple namespaces for the same value of
shs:prefix
,
then the shapes graph is invalid.
(Note that SHACL processors MAY ignore prefix declarations that are never reached).
A SHACL processor transforms the values of shs:select
(and similar properties such as
shs:ask
)
into SPARQL by prepending PREFIX
declarations
for all prefix mappings.
Each value of shs:prefix
is turned into the PNAME_NS
, while each value of
shs:namespace
is turned
into the IRIREF
in the PREFIX
declaration.
For the example shapes graph above, a SHACL-SPARQL processor would produce lines such as PREFIX
ex: <http://example.com/ns#>
.
The SHACL-SPARQL processor MUST produce a failure if the resulting SPARQL query string cannot be
parsed into a valid SPARQL 1.1 query.
In the rest of this document, the shs:prefixes
statements may have been omitted for brevity.
The following definition of what pre-binding means has not been approved by the WG yet, and is work in progress. The WG is also awaiting input from the SPARQL Maintenance (EXISTS) Community Group.
Some features of the SPARQL-based extension mechanism of SHACL-SPARQL rely on the concept of pre-binding of variables. Although variations of this concept are supported by several existing SPARQL implementations, there is no formal definition of pre-binding in the SPARQL 1.1 specifications. The goal of this section is to illustrate the effect of pre-binding to users and implementers. Note however that the following definition is not meant to serve as recommendation for an actual implementation strategy.
Pre-binding a variable with a value means that the SPARQL processor needs to evaluate all occurrences of variables with that same name (including occurrences in inner targets and nested SELECT queries) so that they have the provided value. In other words, whenever a SPARQL processor evaluates a pre-bound variable, it must use the given value.
SHACL-SPARQL defines two forms of variable pre-binding:
The variable PATH
has a special treatment in SHACL property constraint components and MUST
be processed before any other pre-bound variable.
SHACL-SPARQL processors MUST perform string substitution of every occurrence of the variable
PATH
to the generated SPARQL property path before performing any
pre-binding.
A SHACL-SPARQL processor MUST be a conformant SHACL Processor. Say we do not need to support all SHACL Core constraint components except sh:shape?
Many people contributed to this specification, including members of the RDF Data Shapes Working Group. We especially thank the following:
Arnaud Le Hors (chair), Jim Amsden, Iovka Boneva, Karen Coyle, Richard Cyganiak, Michel Dumontier, Holger Knublauch, Dimitris Kontokostas, Jose Labra, Peter Patel-Schneider, Eric Prud'hommeaux, Arthur Ryman (who also served as a co-editor until Feb 2016), Harold Solbrig, Simon Steyskal, Ted Thibodeau