Skip to content

Import Directives


Import directives are used to import external data resources as predicates.

Syntax

The general syntax of an import statement is as follows:

@import <predicate> :- <format>{<parameter>=<value>, ...} .

In the example below table denotes the predicate that data is imported to, csv is the data format (comma-separated values), and path/to/file.csv is the relative path to the file. Instead of a local path, it is also possible to define a URL from which data should be downloaded, e.g., resource = "https://example.org/file.csv".

@import table :- csv { resource="path/to/file.csv", format=(int,string, any)}.

Formats

Currently the following formats are supported:

Format Parameters
csv Comma-separated values.
dsv Delimited separated values. (Like csv, but allows specifying different delimiters e.g. delimiter=";")
tsv Tab-separated values.
rdf Generic RDF format. The actual format will be guessed using the file name.
nquads RDF NQuads format.
trig RDF TriG format.
ntriples RDF Ntriples format
rdfxml RDF/XML format.
turtle RDF Turtle format.
sparql SPARQL query format.
json JSON triples.

Format specific notes

  • nquads, trig: data is always imported in order “graphname, subject, predicate, object”
  • ntriples, rdfxml, turtle: data is always imported in order “subject, predicate, object”
  • sparql: can be used to make a query following the SPARQL syntax to a specific endpoint. For example, the import statement below will retrieve the number of humans that are known to Wikidata:
    @import nr_humans :- sparql{endpoint = <https://query.wikidata.org/sparql>, query="""
    PREFIX wd: <http://www.wikidata.org/entity/>
    PREFIX wdt: <http://www.wikidata.org/prop/direct/>
    SELECT (COUNT(*) AS ?count)
    WHERE {
    ?item wdt:P31 wd:Q5 .
    }"""
    } .
  • json: data is deserialized into a table representation consisting of triples of the form “object-id, key, object-id” (see table below, how each json type is represented).
    The root object of the document always has object-id 0. Furthermore, for each object-id a triple of the form “object-id, <type>, type-id” is stored, where type-id is one of "object", "array", "string", "number", "bool" or "null". Literal values will be assigned an object-id as well, with a triple of the form “object-id, <value>, literal-value”
JSON | nemo
----------------------------------------------------------------------------------------------
{"key1": value1, "key2": value2} | (0, type, "object"),
| (0, "key1", 1),
| (0, "key2", 2),
| (1, type, ...),
| (2, type, ...)
|
[value1, value2] | (0, type, "array"),
| (0, 0, 1),
| (0, 1, 2),
| (1, type, ...),
| (2, type, ...)
|
"foobar" | (0, type, "string"),
| (0, value, "foobar")
|
42.1337 | (0, type, "number"),
| (0, value, 42.1337)
|
true | (0, type, "bool"),
| (0, value, "true"^^<http://www.w3.org/2001/XMLSchema#boolean>)
|
null | (0, type, "null")

Available parameters

resource
The file name to write to. If it contains an extension, this is used to automatically set the compression parameter. If set to the empty string "", the tuples are read from stdin. This is restricted to one stdin resource per program. If omitted, this is set based on the predicate name, file format and compression type <predicate-name>.<format>.<compression>.
Accepted type(s): IRI, String
format
The input-format of the imported data. Might be int, double, string, rdf or skip.
Accepted type(s): Tuple of Nemo type names, Nemo type name
compression
The compression to use. Currently only gzip or none is supported. ' This will normally be guessed correctly from the file extension, but can be useful for non-standard file names or URLs.
Accepted type(s): String
limit
The maximum number of tuples to import. (great for testing when working with large files)
Accepted type(s): Unsigned integer
ignore_headers
if true, the first record (containing the column headers) is ignored
Accepted type(s): Boolean
http_headers
Each pair is added as HTTP headers when making an HTTP request
Accepted type(s): Map with key-value pairs that can be of type: String, Constant or Number
Example: http_headers=("Accept-Language"="en-US","Accept-Charset="utf-8")
http_get_parameters
The map will be flattened into pairs that are appended to the IRI before making an HTTP request
Accepted type(s): Map where each key is of type String, Constant or Number and each value is a (possibly unary) tuple containing Strings, Numbers and Constants.
Example: http_get_parameters={name="John Doe", age=42, parent=("Johanna Doe", "Josh Doe")}
http_post_parameters
The map will be flattened into pairs that are sent as the body of an HTTP POST request
Accepted type(s): Map where each key is of type String, Constant or Number and each value is a (possibly unary) tuple containing Strings, Numbers and Constants.
Example: http_post_parameters={name="John Doe", age=42, parent=("Johanna Doe", "Josh Doe")}
iri_fragment
A fragment that is appended to a resource or endpoint IRI
Accepted type(s): String

The format parameter

Some import formats, such as csv, do not specify how values in the file should be interpreted. By default, Nemo will heuristically interpret data. For example, in CSV files, the string 123 will be interpreted as an integer, but the string abc as a constant name (relative IRI). To control this in more detail, the parameter format can be used. It takes a list of format strings, like in the following example:

@import data :- csv{resource="file.csv", format=(string,string,int,any)} .

This example imports from a CSV file that should have four columns: the first two will be read as literal strings, the third as an integer, and the final as “anything” that seems right (the default format, also good for IRIs). Lines that cannot be interpreted in this format will be skipped silently, e.g., if a line has something in the third field that does not look like an integer. Therefore, all imported data is guaranteed to have the chosen format. Currently, Nemo supports the following formats:

  • any: the default best-effort parsing
  • string: try to interpret all values as plain strings, fastest import option for CSV
  • int: interpret values as integers, if possible
  • double: interpret values as double precision floats (f64), if possible
  • skip: special format to ignore a column entirely (if used, the imported data has smaller arity than the table in the file)

The same formats can be used in RDF imports as well. RDF has a more precise type system, where every value has an unambiguous type. The formats then mainly act as filters that will remove triples that do not fit (e.g., to load only triples with integer values). For converting values to other datatypes, e.g., from integers to strings, rules with built-in functions (in this case STR) can be used after import.

Fstrings

Fstrings are supported in the import statement as can be seen in the example below:

@import table :- csv{resource = f"file-{?x}-{?y}.csv"}, ?x = "name", ?y = 42 .