ShExStatements: Documentation

ShExStatements allows the users to generate shape expressions from simple CSV statements and files. shexstatements can be also be used from the command line.

Objectives

  • Easily generate shape expressions (ShEx) from CSV files
  • Simple syntax, with 5 columns
  • Node name
  • Property
  • Allowed values
  • Cardinality (optional)
  • Comments (optional)

Quick start

Clone the ShExStatements repository.

$ git clone https://github.com/johnsamuelwrites/ShExStatements.git

Go to ShExStatements directory.

$ cd ShExStatements

Install modules required by ShExStatements (here: installing into a virtual environment).

$ python3 -m venv .venv
$ source ./.venv/bin/activate
$ pip3 install .

Run the following command with an example CSV file. The file contains an example description of a language on Wikidata. This file uses comma as a delimiter to separate the values.

$ ./shexstatements.sh examples/language.csv

There are five columns in the CSV file. Column 1 is used for specifying the node name, 2 for specifying the property value, 3 for possible values, 4 for cardinality (+,*) and column 5 for comments. Comments start with #. Columns 1, 2, 3 are mandatory. Column 3 can be a special value like . (period to say ‘any’ value). Columns 3,4 and 5 are empty for prefixes.

Cardinality can be any one of the following values
  • * : zero or more values
  • + : one or more values
  • m : m number of values
  • m,n : any number of values between m and n (including m and n).

CSV file can use delimiters like ;. Take for example, the following command works with a file using semi-colon as a delimiter.

$ ./shexstatements.sh examples/languagedelimsemicolon.csv --delim ";"

But sometimes, users may like to specify the header. In that case, they can make use of -s or --skipheader to tell the generator to skip the header (firsrt line of CSV).

$ ./shexstatements.sh --skipheader examples/header/languageheader.csv

In all the above cases, the shape expression generated by ShExStatements will look like

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
start = @<language>
<language> {
  wdt:P31 [ wd:Q34770  ] ;# instance of a language
  wdt:P1705 LITERAL ;# native name
  wdt:P17 .+ ;# spoken in country
  wdt:P2989 .+ ;# grammatical cases
  wdt:P282 .+ ;# writing system
  wdt:P1098 .+ ;# speakers
  wdt:P1999 .* ;# UNESCO language status
  wdt:P2341 .+ ;# indigenous to
}

Use -j or --shexj to generate ShEx JSON Syntax (ShExJ) instead of default ShEx Compact syntax (ShExC).

$ ./shexstatements.sh --shexj examples/language.csv

The outpul will be similiar to:

{
  "type": "Schema",
  "start": "language",
  "shapes": [
    {
      "type": "Shape",
      "id": "language",
      "expression": {
        "type": "EachOf",
        "expressions": [
          {
            "type": "TripleConstraint",
            "predicate": "http://www.wikidata.org/prop/direct/P31",
            "valueExpr": {
              "type": "NodeConstraint",
              "values": [
                "http://www.wikidata.org/entity/Q34770"
              ]
            }
          },
          {
            "type": "TripleConstraint",
            "predicate": "http://www.wikidata.org/prop/direct/P1705",
            "valueExpr": {
              "type": "NodeConstraint",
              "nodeKind": "literal"
            }
          },
          {
            "type": "TripleConstraint",
            "predicate": "http://www.wikidata.org/prop/direct/P17",
            "min": 1,
            "max": -1
          },
          {
            "type": "TripleConstraint",
            "predicate": "http://www.wikidata.org/prop/direct/P2989",
            "min": 1,
            "max": -1
          },
          {
            "type": "TripleConstraint",
            "predicate": "http://www.wikidata.org/prop/direct/P282",
            "min": 1,
            "max": -1
          },
          {
            "type": "TripleConstraint",
            "predicate": "http://www.wikidata.org/prop/direct/P1098",
            "min": 1,
            "max": -1
          },
          {
            "type": "TripleConstraint",
            "predicate": "http://www.wikidata.org/prop/direct/P1999",
            "min": 0,
            "max": -1
          },
          {
            "type": "TripleConstraint",
            "predicate": "http://www.wikidata.org/prop/direct/P2341",
            "min": 1,
            "max": -1
          }
        ]
      }
    }
  ]
}

It’s also possible to use application profiles of the following form

Entity_name,Property,Property_label,Mand,Repeat,Value,Value_type,Annotation

and Shape expressions can be generated using the following form

$ ./shexstatements.sh -ap --skipheader examples/languageap.csv

There are example CSV files in the examples folder.