KoralQuery

, IDS, Mannheim

, IDS, Mannheim

Published:

Last update:

Abstract

KoralQuery is a general corpus query protocol (i.e. it tries to be independent of research tasks and corpus formats), serialized in JSON-LD [1]. KoralQuery focuses on simplicity of implementation rather than human readibility and writability. Support for a growing number of query languages is granted by the Koral serialization processor.

{
  "@context" : "http://korap.ids-mannheim.de/ns/koral/0.5/context.jsonld",
  "corpus" : { ... },
  "query" : { ... },
  "meta" : { ... }
}

Introduction

JSON-LD is a JSON based serialization format for Linked Data. Nested objects in JSON-LD can be categorized by providing unique identifiers based on URIs. These identifiers can be defined in context files and associated to JSON objects by defining a @context attribute.

In KoralQuery corpus search is divided in multiple separated protocol concepts, that are categorized in the following types:

Collection Type Objects
Define a document collection by certain constraints. The expected result of a collection type object is a subset of the corpus collection that meet the conditions. The empty set is valid.
Span Type Objects
Define an occurrence collection by certain conditions. The expected result of a span type object is a collection of substrings in documents of the document collection, that meet the conditions. The empty set is valid.
Parametric Type Objects
Specify further constraints for embedding collection type objects or span type objects as parameters. The expected result of a parametric type object is a refinement of the parental collection type object or span type object.
Report Type Objects
Report modifications of a query object (rewrites) as part of the query; or report errors, warnings and further messages regarding the processing of the query. Report types do not alter the expected result of a query.
Response Type Objects
Define the result of the processing of the query.

Further undefined objects are allowed, but are unspecified. The definition of meta objects, that define further adjustments to the query execution or the processing of search results, is dependent on the implementation of KoralQuery and not part of this specification. For common meta information with recommended definitions, refer to the appendix.

Status of this draft

KoralQuery is not meant to be complete, but to be extensible and forward compatible. Extensibility is granted by support of embedded @context objects. Forward compatibility is tried to be ensured by describing implementation advices for incompatibility fallbacks.

Implementations do not need to implement all described features to be called KoralQuery compliant, but implementations should fail in a predictable manner by using the described report methods.

Definitions

Table Description

In this document, attributes as part of KoralQuery objects are represent as tables, listing all defined attributes as key value pairs. Optional keys have a trailing question mark, mandatory keys are unmarked.

The type documenting tables have four columns:

Key
The key string in the JSON object for this attribute. In running text these keys are marked as keys.
Type
The type of values allowed for this attribute. Defined types include:
xsd:string
Arbitrary character sequence, represented as a JSON string.
xsd:boolean
Either true or false.
xsd:integer
A signed integer, represented as a JSON number. The valid range of the integer is described in the Values column.
@id
An identifier that represents a valid JSON-LD type, represented as a JSON string. The supported types are listed under values.
In addition to these types, KoralQuery type objects may be listed. These may be represented by their identifier (e.g. koral:termGroup) or by their category (e.g. "span type"). Multiple supported types are listed comma separated. If the value is expected to be a list of values, the valid types are enclosed in brackets (e.g. "[@id]"). If the list is expected to have a certain number of members, this is described in the Values column.
Default
The default value of this attribute in case it is not given.
Values
Gives a list of valid values, constraints on this attribute, consequences for other attributes, and examples with descriptions.

Implementation Guide

If an object contains key attributes other than defined, they are ignored.

If an object contains types other than defined for a certain key, the query has to be rejected and an error has to be raised.

If an object has values other then defined for a certain key, the behaviour is defined in the implementation guide of the object type. If no special behaviour is defined, the query has to be rejected and an error has to be raised. The term defined in implementation guides may cover further definitions not part of this specification. The term undefined in implementation guides is not restricted to definitions of this specification.

Empty objects in lists have to be ignored. Exception are "[xsd:integer]" and "[xsd:string]".

Error, Warning, and Message objects

KoralQuery is meant to be future proof by being upwards compatible. That means, new features officially introduced or supported by third party software (using external context files) should be either treated as intended, be intendendly ignored or be rejected.

Incompatibilities with query objects and collection objects should be treated as documented in the implementation guide section of each object.

To inform the user on certain incompatibilities, KoralQuery has three different mechanisms for raising awareness. These mechanisms may also be used by query rewrite processors, to inject errors, warnings, and messages.

Errors
Errors will inform the user of a reason a query was rejected. This may originate from the KoralQuery processing, but may also be injected for other reasons, like access restrictions.
Warnings
Warnings will inform the user of probably unexpected behaviour of the KoralQuery process. This may originate from the KoralQuery processing, but may also be injected for other reasons, like limitations of the query result set by time out.
Messages
Messages will inform the user of useful information that don't effect the results of the query. This may originate from the KoralQuery processing, for example to inform about future incompatibilities, but may also be injected for other reasons, like deprecation of certain endpoints in the query service.

Implementation Guide

KoralQuery processors will always pass errors, warnings, and messages injected by prior processing systems. A final processing filter may decide, which errors, warnings and messages may be of interest to present to the user and which errors, warnings and messages may only be of interest for intermediate processing.

Collection Type Objects

A KoralQuery can be limited to a subset of documents of a corpus. The collection can be defined by criteria, documents have to satisfy. The collection has to be defined on the top level object with the attribute corpus or collection. A single criterion is defined by a koral:doc object. Multiple criteria can be further constrained using koral:docGroup objects.

The result of a collection type object is a collection of documents that meet the conditions formulated by the collection criteria and group constraints.

collection will be renamed to corpus in future versions of this specification. Implementations should support both attributes with corpus being the prefered variant.

{
  "@context" : "http://korap.ids-mannheim.de/ns/koral/0.5/context.jsonld",
  "corpus" : {
    "@type":"koral:docGroup",
    "operation":"operation:and",
    "operands":[{
      "@type":"koral:doc",
      "key":"title",
      "match":"match:eq",
      "value":"Der Birnbaum",
      "type":"type:string"
    },{
      "@type":"koral:doc",
      "key":"pubPlace",
      "match":"match:eq",
      "value":"Mannheim",
      "type":"type:string"
    },{
      "@type":"koral:docGroup",
      "operation":"operation:or",
      "operands":[{
        "@type":"koral:doc",
        "key":"pubDate",
        "match":"match:geq",
        "value":"2015-03-03",
        "type":"type:date"
      },{
        "@type":"koral:doc",
        "key":"lastModified",
        "match":"match:geq",
        "value":"2015-04-04",
        "type":"type:date"
      }]
    }]
  },
  "query" : { ... },
  "meta" : { ... }
}

Basic collection types

A document in KoralQuery is represented by the primary data, annotation data and metadata. The different data fields are defined by field names, for example author for the metadata field for the author, or pubDate for the metadata field for the publication date. The name of the fields is not part of the specification. Basic collection types do also not differ between fields for metadata, primary data and annotation data, although the field type may be constrained by the nature of the field.

koral:doc

{
  "@type" : "koral:doc",
  "key" : "textClass",
  "value" : "novel",
  "match" : "match:eq"
}
Key Type Default Values
@type @id koral:doc
key xsd:string The field name.
value xsd:string, [xsd:string] The field value.
type @id type:string
type:string
The value is treated as a character sequence.
type:regex
The value is treated as a regular expression.
type:date
The value is treated as a date format.
match @id match:eq Specifies agreement between key attribute and annotation.
match:eq
The key attribute has to match the value attribute exactly.
match:ne
The key attribute has to match anything but the value.
match:geq
The key attribute has to match anything that is equal or greater than the value attribute.
match:leq
The key attribute has to match anything that is equal or littler than the value attribute.
match:contains
The key attribute has to match the value attribute as a substring.
match:excludes
The key attribute has to match anything but field values containing the value attribute as a substring.

A koral:doc object defines one criterion a document in the collection has to satisfy. If a document satisfies the criterion, it is part of the collection.

The key attribute represents a field name, like author, for maybe a metadata field containing the name of the document's author, text, for probably a field containing the primary data, or even a field like numberOfTokens, representing a field containing the number of tokens annotated in the document.

The value attribute represents the value a document is expected to match according to the match attribute in the field defined by the key attribute. This may, for example, be the name of the author Theodor Fontane, in case of a key field for author. If the value attribute is defined as an array, the document is expected to match for any of the given values.

The type attribute defines the nature of the value attribute. This may represent a string, a date or a regular expression. Dates have to follow the W3C Date and Time Formats.

The match attribute defines the kind of agreement the value defined by the value attribute has needs to make with the value specified in the document in the respective key field. Ths behaviour is further constrained by the type attribute. match:eq expects an exact agreement for type:string, a full match for type:regex with implicit anchors ^ and $, and for type:date a date value in the range of the given date (the range is based on the granularity, so a 2015-04 date matches 2015-04-01 as well as 2015-04-12). match:ne expects an exact disagreement for type:string, a mismatch for type:regex with implicit boundary anchors, and a date outside the defined range. match:geq and match:leq are currently only defined for type:date, expecting a date in the exact range or later for match:geq or a date in the exact range or earlier for match:leq. match:contains expects a value in which value is a valid substring for type:string or in which value matches for type:regex. match:excludes expects a value in which value is not a valid substring for type:string or in which value does not match for type:regex. match:contains and match:excludes are undefined for type:date.

koral:doc may be renamed to koral:field in future versions of this specification.

Implementation Guide

As collection type objects are used for the restriction of access to certain parts of the corpus, the implementation needs to be strict to prevent violation of access control mechanisms.

The key attribute in koral:doc is mandatory. If the attribute is missing, the query has to be rejected and an error has to be raised.

If the key field of the criterion is not part of the document and the match attribute is match:ne or excludes, the criterion is satisfied. In case of any other match values, the criterion is not satisfied.

The value attribute in koral:doc is mandatory. If the attribute is missing, the query has to be rejected and an error has to be raised. However, this rule is experimental and may change in future versions of this specification.

If the type attribute contains an undefined identifier, the query has to be rejeced and an error has to be raised.

If the value attribute is not valid refering to the given type, e.g. an invalid date string or a regular expression with unbalanced parenthesis, the query has to be rejeced and an error has to be raised.

If the match attribute contains an undefined identifier, or an identifier that is undefined to the given type, the query has to be rejected and an error has to be raised.

match:contains and match:excludes are only defined for fields supporting full text search. If a field with no fulltext search capabilities is requested with match:contains, the meaning is identical to match:eq. If a field with no fulltext search capailities is requested with match:excludes, the query may be rejected with a raised error, or an empty collection is returned.

Field names are not specified by KoralQuery and their string representation is not constrained.

Complex collection types

koral:docGroup

{
  "@type" : "koral:docGroup"
}
Key Type Default Values
@type @id koral:docGroup
operation @id
operation:and
operation:or
operands [collection type] Arguments of the operation. Number depends on the respective operation

A koral:docGroup object defines boolean operations on criteria a document in the collection has to satisfy.

The operands list represents collection type objects the group refers to.

The operation attribute represents the kind of boolean operation between the operands. The operation:or operation acts like a unification of all collections defined in the operands list, meaning a document is part of the collection if it is part of at least one collection in the operands list. The operation:and operation acts like an intersection of all collections defined in the operands list, meaning a document is part of the collection if it is part of all collections in the operands list.

koral:docGroup may be renamed to koral:fieldGroup in future versions of this specification.

Implementation Guide

The operation attribute is mandatory. If the attribute is missing, the query has to be rejected and an error has to be raised. If the operation attribute contains an undefined identifier, the query has to be rejected and an error has to be raised.

If the operands list is empty, the resulting collection is empty. If the operands list has only one entry, the resulting collection is identical to the resulting collection of the only entry, independent of the operation.

koral:docGroupRef

{
  "@type" : "koral:docGroupRef",
  "ref" : "https://korap.ids-mannheim.de/@ndiewald/MyCorpus"
}
Key Type Default Values
@type @id koral:docGroupRef
ref xsd:string A unique reference to a virtual corpus.

A koral:docGroupRef references a collection type object with criteria a document in the collection has to satisfy.

The ref attribute is a unique identifier by which a KoralQuery consumer references a collection type object to be embedded in place of the koral:docGroupRef object, e.g. stored as a JSON-LD file.

Span Type Objects

For KoralQuery the primary data of a document is represented as a series of tokens. A series of tokens is called a substring of the document.

Query objects define conditions regarding the constellation of tokens and token-bound features that have to be in place to make a substring a valid occurrence of the query object in a document. These conditions may have syntagmatic or paradigmatic character.

In addition to the substring, results of a query object may contain so-called classes as markers for substrings of the result substring.

The result of a span type object, i.e. its span and its classes, may be an operand of another object, that may filter, enrich, combine or alter the results of nested objects.

Basic span types

koral:token

{
  "@type" : "koral:token",
  "wrap" : {
    "@type" : "koral:term",
    "foundry" : "tt",
    "layer" : "pos",
    "key" : "ADJD"
  }
}
Key Type Default Values
@type @id koral:token
wrap koral:term, koral:termGroup Holds information on search key, foundry, layer, value

A koral:token object defines the occurrence of one token, a match has to satisfy.

Implementation Guide

The wrap attribute is optional. In case no wrap attribute is defined, the object matches any token of the text. If the processor does not support any tokens, the query has to be rejected and an error has to be raised.

koral:span

{
  "@type" : "koral:span",
  "wrap" : {
    "@type" : "koral:term",
    "foundry" : "cnx",
    "layer" : "c",
    "key" : "np"
  }
}
Key Type Default Values
@type @id koral:span
wrap koral:term, koral:termGroup Holds information on search key, foundry, layer, value
attr koral:term, koral:termGroup Span attributes.

Complex span types

koral:group

{
  "@type" : "koral:group",
  "operation" : "operation:sequence",
  "operands" : []
}
Key Type Default Values
@type @id koral:group
operands [span type] Arguments of the operation. Number depends on respective operation.
operation @id
operation:sequence
Operands take part in a sequence.
Returns the joined span of all operands.
Expects ≥ 2 operands.
Parameters: inOrder, distances
operation:position
The operands take part in a positional relation defined by frames.
Returns the joined span of both operands.
Expects 2 operands.
Parameters: frames
operation:exclusion
There is no second operand in a positional relation defined by frames to a first operand.
Returns the span of the first operand.
Expects 2 operands.
Parameters: frames
operation:relation
The operands take part in an arbitrary relation.
Returns the joined span of both operands.
Expects 2 operands.
Parameters: relType
operation:disjunction
The operands are treated as alternatives.
Returns the span of a single operand.
Expects ≥ 2 operands.
Parameters: none
operation:repetition
The operand is sequentially repeated a defined time.
Returns the joined span of the repetition.
Expects 1 operand.
Parameters: boundary
operation:length
Define the minimum and maximum length of a span by means of tokens.
Returns the span of the operand.
Expects 1 operand.
Parameters: boundary
operation:class
Define a class span based on the operand.
Returns the span of the operand.
Expects 1 operand.
If no parameter is defined, classOut: 1 is assumed.
Parameters: classOut, classIn, classRefCheck, classRefOp
operation:merge
Condense the result set.
Returns the joined span of the merged spans.
Expects ≥ 1 operand.
Parameters: none
boundary koral:boundary Specifies the mininmum and maximum values of the operation.
classIn [xsd:integer] The numeric identifiers of classes on which classRefCheck or classRefOp operate.
classOut xsd:integer The numeric identifier of the defined class.
classRefCheck [@id] Set-theoretic condition on input classes. Results that do not fulfil this condition are excluded from the result set.
classRefCheck:disjoints
The intersection between the classes in classIn is empty.
classRefCheck:intersects
The intersection between the classes in classIn is not empty.
classRefCheck:includes
The intersection between the first class and the second class in classIn equals the second class.
classRefCheck:equals
The intersection between the classes in classIn equals their union.
classRefCheck:differs
The intersection between the classes in classIn does not equal their union.
classRefOp @id Set-theoretic operation on input classes. Creates new output class in classOut.
classRefOp:union
The class contains all spans defined in at least one classIn class.
classRefOp:intersection
The class contains all spans defined in all of the classIn classes.
classRefOp:inversion
Defines the class over all spans that are not part of the classes listed in classIn.
classRefOp:deletion
Deletes previously defined classes inside the operands (as indicated by classIn).
distances [koral:distance] [] Distance constraints between operands (pertaining to different keys).
frames [@id] [frames:isAround, frames:endsWith, frames:startsWith, frames:matches] The allowed positional relations between operands A and B.
frames:succeeds
Matches [B..B]..[A..A]
frames:succeedsDirectly
Matches [B..B][A..A]
frames:alignsRight
Matches [B..[A..A]B]
frames:isWithin
Matches [B..[A..A]..B]
frames:overlapsRight
Matches [B..[A..B]..A]
frames:preceedsDirectly
Matches [A..A][B..B]
frames:preceeds
Matches [A..A]..[B..B]
frames:endsWith
Matches [A..[B..B]A]
frames:isAround
Matches [A..[B..B]..A]
frames:overlapsLeft
Matches [A..[B..A]..B]
frames:startsWith
Matches [A[B..B]..A]
frames:alignsLeft
Matches [B[A..A]..B]
frames:matches
Matches [A[B..B]A]
inOrder xsd:boolean true If true, the order is relevant.
relType koral:relation Specifies the relation between operands.

koral:group may be renamed to koral:spanGroup in future versions of this specification.

operation:disjunction may be deprecated in favor of operation:or in future versions of this specification.

koral:reference

{
  "@type" : "koral:reference",
  "classRef" : [1],
  "operation" : "operation:focus",
  "operands" : [ ... ]
}
Key Type Default Values
@type @id koral:reference
operation @id operation:focus Defines the operation performed based on the references.
operation:focus
Reduce the match to the given classes.
Expects 0 or 1 operands.
classRef [xsd:integer] [0] Defined classes to refer to.
The class 0 refers to the operand's span.
spanRef [xsd:integer] Defined subspans to refer to.
Expects one or to integers. The first integer defines the start index of the subspan, the second defines the length of the subspan.
operands [span type] Arguments of the operation. Number depends on respective operation.

A koral:reference object defines a span by refering to another span type.

The operation attribute defines the kind of result expected by referencing to another span. Currently the only value defined is operation:focus, making the matching span refering to the defined start and end positions of the refering span.

spanRef refers to subspans (i.e. tokens) of the operand. It accepts a list of numerical parameters. The first parameter defines the start index (starting at position 0). The second parameter defines the length of the match counting from the start index to the right.

The reference (either a span or a class) has to be part of the operands. If no operand is given, but a classRef is defined, the class refers to classes defined at any point in the query tree.

In case multiple classes are defined in classRef for a operation:focus, the focus starts with the first classed token in sequential order and ends with the final classed token in sequential order.

Implementation Guide

If the operation attribute contains an undefined identifier, a warning has to be raised and the default operation has to be assumed.

If the classRef list refers to a class not defined, the class is ignored.

If the classRef is an empty list (for example "[]" or because of ignored classes), and the operation is operation:focus, the resulting span is empty and matches nowhere.

If the operands list is empty and no classRef is defined, the resulting spoan is empty and matches nowhere.

If both spanRef and classRef is defined, a warning has to be raised and classRef has to be assumed.

A negative start index for spanRef counts from the end of the operand's span. If the positive start index starts beyond the end of the operand's span, the result of the operation is empty and matches nowhere. If the negative start index starts beyond the beginning of the operand's span, the startindex will be treated as being 0. If the length is omitted or exceeds the length of the operand's span, the rest of the operand's span is part of the match.

Parametric Type Objects

Basic parametric types

koral:term

{
  "@type" : "koral:term",
  "foundry" : "tt",
  "layer" : "pos",
  "key" : "ADJD",
}
Key Type Default Values
@type @id koral:term
key [xsd:string] The term key
value [xsd:string] The term value
foundry xsd:string The annotation foundry
layer xsd:string surface layer The annotation layer
type @id type:string
type:string
type:regex
type:punct
match @id match:eq
match:eq
match:ne
flags [@id]
flags:caseInsensitive
flags:diacriticInsensitive

To specify a term, KoralQuery provides four attributes: foundry, layer, key, and value. The concrete definition of these attributes relies on the annotation model of the corpus and the implementation of the search system. As an abstract definition, the attributes have a hierarchical structure for annotations, meaning a foundry may bundle multiple layers. A layer may bundle multiple keys and a key may bundle multiple values. An annotation or a system may not need all of these attributes to define a term, only the key attribute is mandatory.

The key attribute represents a annotations like the part-of-speech tag noun or verb, or the surface token Tree. It can be given as a single string or as an array of alternative strings. Sometimes annotations have to be represented as key and value pairs, for example in morphological annotations a key of the term may be number and the value of the key may be plural. In that case, the key attribute will hold the term number and the value attribute will hold the value plural. Values can be given as single strings or as an array of alternative strings.

The layer attribute may define the annotation level of the term, for example tokenization, part-of-speech or lemma. In case the layer information is ommitted, the layer defaults to the tokenization layer, irrespective of the implementation specific word for that layer.

The foundry attribute may define the origin of the annotation, for example the name of the human annotator or the automated tool. Or it may serve as an umbrella for layers with common characteristics (for example bundling several models for named entities). [2]

In most implementations the foundry term may not be relevant, but it is important to deal with conflicting annotations, for example, in case the corpus provides multiple part-of-speech annotations.

The attribute type defines the treatment of key and value.

Currently supported types are string, indicating that key and value should be treated as a sequence of characters. The type regex indicates that key and value should be treated as regular expressions. The punct type defines that the key attribute will be treated as a character class of punctuation symbols. In case the punct type is defined, the treatment of the value attribute is undefined. The default value for the type attribute is string. Support for types different than strings for foundry and layer is not supported yet.

The term defined by foundry, layer, key and value represents the condition of the term object. The match attribute can be used to invert the condition, saying a substring of a text holds true for the condition, in case it fails. Therefore the match attribute can hold the value eq, meaning the term has to match exactly as defined, or the value may be ne, meaning the term has to be not equal to the defined condition. The default value for match is eq.

In the current version of KoralQuery the match attribute of terms is limited to the same functionality as exclude in operations. As match may support further operators, it is used in favor of exclude in this context.

The matching may further be modified by certain flags, using the flag attribute. Multiple flags are supported. In case, order is of relevance, the flag operations are processed from left to right. Currently there are two flags supported by KoralQuery: caseInsensitive means, the matching will ignore a difference between small and capital letters in the key and value attributes, as well as in the term index. diacriticInsensitive means, the match will ignore diacritic symbols in the key and value attributes, as well as in the term index.

{
  "@type" : "koral:term",
  "key" : "Octopus",
  "flags" : ["flags:caseInsensitive"]
}

Implementation Guide

The key attribute in terms is mandatory. If the attribute is missing, the query has to be rejected and an error has to be raised.

If the type attribute contains an undefined identifier, a warning has to be raised and the default type has to be assumed.

If the match attribute contains an undefined identifier, a warning has to be raised and the default match has to be assumed.

If the flag attribute contains an undefined identifier, a warning has to be raised. The flag will be ignored.

All other attributes may silently be ignored.

koral:distance

{
  "@type" : "koral:distance",
  "key" : "w",
  "boundary" : {...}
}
Key Type Default Values
@type @id koral:distance
key xsd:string w Measure of distance
foundry xsd:string Foundry in which distance measure (key) is annotated
layer xsd:string Layer in which distance measure (key) is annotated
boundary koral:boundary Specified degree of distance

koral:boundary

{
  "@type" : "koral:boundary",
  "min" : 0,
  "max" : "3"
}
Key Type Default Values
@type @id koral:boundary
min xsd:integer Minimal value.
max xsd:integer Maximal value.

koral:relation

{
  "@type" : "koral:relation",
  "wrap" : {...}
}
Key Type Default Values
@type @id koral:relation
wrap koral:term, koral:termGroup Holds information on key, foundry, layer, value

Complex parametric types

koral:termGroup

{
  "@type" : "koral:termGroup",
  "operation" : "operation:and",
  "operands" : [...]
}       
Key Type Default Values
@type @id koral:termGroup
operation @id operation:and
operation:or
operands [koral:term, koral:termGroup] Arguments of the paradigmatic relation.

A koral:termGroup object defines paradigmatic relations between koral:term objects to describe that term annotations may or may not occur at the same position (e.g. a word is annotated as a specific lemma with a specific part-of-speech tag).

A koral:termGroup object may specify an arbitrary number of operands that refer to the same defined operation. To specify a different operation in the same koral:token position, it is possible to nest a koral:termGroup.

Implementation Guide

The operation attribute is mandatory. If the attribute is missing, the query has to be rejected and an error has to be raised. If the operation attribute contains an undefined identifier, the query has to be rejected and an error has to be raised.

If the operands list is empty, the resulting span is undefined, therefore the wrapping object is empty.

operation was previously named relation. For improved compatibility, a KoralQuery consumption service may accept both variants and a KoralQuery generation service may generate both variants.

Report Type Objects

koral:rewrite

{
  "@type" : "koral:rewrite",
  "operation" : "operation:injection",
  "origin" : "Kustvakt"
}
Key Type Default Values
@type @id koral:rewrite
operation @id Specifies the performed rewrite action.
origin xsd:string Specifies the component responsible for the rewrite
scope xsd:string The current object Specifies which object/attribute has been rewritten

origin was previously named src. For improved compatibility, a KoralQuery consumption service may accept both variants and a KoralQuery generation service may generate both variants.

Response Type Objects

The response format is still in preparation.

The response to a KoralQuery match request (in contrast to, for example, a request for statistic information, currently out of scope of this document) is a collection of documents, satisfying the defined document query in the corpus or collection, the defined span query in query, and all supported result modifying constraints in meta. In case no query is defined, each document of the collection is represented by the requested metadata.

collection will be renamed to corpus in future versions of this specification. Implementations should support both attributes with corpus being the prefered variant.

{
  "@context" : "http://korap.ids-mannheim.de/ns/koral/0.5/context.jsonld",
  "corpus" : { ... },
  "query" : { ... },
  "result" : {
    "@type" : "koral:result",
    "results" : [
      {
        "@type" : "koral:match",
        "annotation" : ["xip","xip/p", "cnx", "cnx/c"],
        "annotationType" : ["xip/p=token", "cnx/c=spans"],
        "fields" : [{
          "@type" : "koral:doc",
          "key" : "docID",
          "value" : "doc-3",
          "type" : "type:string"
        }],
        "snippet" : "..."
      },
      {
        "@type" : "koral:match",
        ...
      }
    ]
  }
}
          

koral:result

{
  "@type" : "koral:result",
  "results" : [ ... ],
  "totalResults" : 4
}
Key Type Default Values
@type @id koral:result
results [response type] Contains a list of results.
totalResults xsd:integer 0 The number of total results in the result set.

koral:match

This specification defines a format for representing matches as HTML snippets in the appendix with "keywords in context", that may be used in the response format.

{
  "@type" : "koral:match",
  "fields" : [{
    "@type" : "koral:doc",
    "key" : "docID",
    "value" : "doc-3",
    "type" : "type:string"
  }],
  "snippet" : "..."
}
Key Type Default Values
@type @id koral:match
fields [koral:doc] Contains a set of koral:doc objects defining the metadata fields of the document the match occurs in.

Import Type Objects

The import format is still in preparation and currently not supported by the reference implementation Krill.

{
  "@context" : "http://korap.ids-mannheim.de/ns/koral/0.5/context.jsonld",
  "record" : {
    "@type" : "koral:record",
    "primaryData" : "Der Bau-Leiter trug einen lustigen Bau-Helm.",
    "id" : 3,
    "fields" : [
      {
        "@type" : "koral:doc",
        "key" : "docID",
        "value" : "doc-3",
        "type" : "type:string"
      },
      {
        "@type":"koral:doc",
        "key":"license",
        "value":"closed",
        "type":"type:string"
      }
    ],
    "subtokens" : [
      {
        "@type" : "koral:subtoken",
        "offsets" : [0,3]
      },
      {
        "@type" : "koral:subtoken",
        "offsets" : [4,7]
      },
      {
        "@type" : "koral:subtoken",
        "offsets" : [8,14]
      },
      {
        "@type" : "koral:subtoken",
        "offsets" : [15,19]
      },
      {
        "@type" : "koral:subtoken",
        "offsets" : [20,25]
      },
      {
        "@type" : "koral:subtoken",
        "offsets" : [26,34]
      },
      {
        "@type" : "koral:subtoken",
        "offsets" : [35,38]
      },
      {
        "@type" : "koral:subtoken",
        "offsets" : [39,43]
      }
    ],
    "annotations" : [
      {
        "@type": "koral:token",
        "subtokens" : [0],
        "wrap" : {
          "@type" : "koral:term",
          "foundry" : "akron",
          "key" : "Der"
        }
      },
      {
        "@type" : "koral:span",
        "subtokens" : [0,2],
        "wrap" : {
          "@type" : "koral:term",
          "foundry" : "akron",
          "layer" : "c",
          "key" : "NP"
        }
      },
      {
        "@type": "koral:token",
        "subtokens" : [1,2],
        "wrap" : {
          "@type" : "koral:term",
          "foundry" : "akron",
          "key" : "Bau-Leiter"
        }
      },
      {
        "@type": "koral:token",
        "subtokens" : [3],
        "wrap" : {
          "@type" : "koral:termGroup",
          "operands" : [
            {
              "@type" : "koral:term",
              "foundry" : "akron",
              "key" : "trug"
            },
            {
              "@type" : "koral:term",
              "foundry" : "opennlp",
              "layer" : "p",
              "key" : "V"
            }
          ]
        }
      },
      {
        "@type": "koral:token",
        "subtokens" : [4],
        "wrap" : {
          "@type" : "koral:term",
          "foundry" : "akron",
          "key" : "einen"
        }
      },
      {
        "@type" : "koral:span",
        "subtokens" : [4,7],
        "wrap" : {
          "@type" : "koral:term",
          "foundry" : "akron",
          "layer" : "c",
          "key" : "NP"
        }
      },
      {
        "@type": "koral:token",
        "subtokens" : [5],
        "wrap" : {
          "@type" : "koral:term",
          "foundry" : "akron",
          "key" : "lustigen"
        }
      },
      {
        "@type": "koral:token",
        "subtokens" : [6,7],
        "wrap" : {
          "@type" : "koral:term",
          "foundry" : "akron",
          "key" : "Bau-Helm"
        }
      }
    ]
  }
}

koral:record

{
  "@type" : "koral:record",
  "fields" : [],
  "subtokens" : [],
  "annotations" : []
}
          
Key Type Default Values
@type @id koral:record
fields [koral:doc] Contains a set of koral:doc objects defining the metadata fields of the imported record.
primaryData xsd:string The primary data of the record. Currently only supports text.
subtokens [koral:subtoken] The list of subtoken offsets defined on the primaryData.
annotations [koral:token,koral:span,koral:relation] The list of annotations refering to the primary data.

Appendix

Recommended Attributes for Meta Objects

Following the specification of OpenSearch and PortableContacts, the following attributes for the meta section are recommended.

Key Type Default Values
count xsd:integer The number of results shown per page.
startIndex xsd:integer 0 The offset for paging through result sets.
startPage xsd:integer 1 The page for paging through the result sets. Overwritten by startIndex.
fields [xsd:string] The data fields requested.

count should be used for requests as well as for responses of query processors. Similar implementations have a different key for requests, using itemsPerPage. We recommend using the rewrite mechanisms of KoralQuery to report on difference between request and response vounts.

totalResults should reflect occurrences of the query structure in all documents of corpus or collection. This is not necessarily the base for paging, as the base of paging may be documents, corpora etc. instead. The value 0 indicates that there was no match. A negative value may indicate that the total number of results is not known, not reportable etc. Further parameters may alter the interpretation of totalResults, e.g. to say the value is only approximated or there are at least these numbers of matches.

collection will be renamed to corpus in future versions of this specification. Implementations should support both attributes with corpus being the prefered variant.

Regular Expressions

The definition of the supported regular expressions is out of scope of this specification and depends on the implementation.

KWIC representation as HTML snippets

KoralQuery span type objects return a textual span. There are several ways to return this information, with a "KWIC" snippet being the most popular. In a "KWIC" snippet, the primary data of the document is merged with the positional information of the match, with a context to the left and the right of the actual match.

KoralQuery supports the definition of classes using operation:class, that may add additional positional information to the "KWIC", that may be merged into the primary data. As KoralQuery supports annotations of different types, the "KWIC" may be enriched with further annotations as well.

The snippet may be added to a match as an xsd:string using a snippet attribute.

<span class="context-left"></span>
<mark>
  <span title="corenlp/c:CS">
    <span title="corenlp/c:ROOT">
      <span title="corenlp/c:S">
        <span title="corenlp/c:NP">die Sonne</span>
        war
        <span title="corenlp/c:CAP">hoch und heiß</span>
      </span>,
      <span title="corenlp/c:S">
        ich mu\sste
        <span title="corenlp/c:S">
          <span title="corenlp/c:NP">meine Kleidung</span>
          erleichtern,
          <span title="corenlp/c:S">
            die ich
            <span title="corenlp/c:PP">
              bei der veränderlichen Atmosphäre
              <span title="corenlp/c:NP">des Tages</span>
            </span>
            oft wechsele
          </span>
        </span>
      </span>
    </span>
  </span>
</mark>
<span class="context-right"></span>

Implementations

KoralQuery is the base communication protocol of KorAP. The Koral query serializer can translate queries formulated in Poliqarp, Cosmas-II, Annis, and CQL to KoralQuery. Kustvakt is a Policy service, using Koral to translate queries and to rewrite the query based on access restrictions and user settings. Krill is a corpus search service, that consumes KoralQuery and creates KoralQuery compatible responses.

Footnotes

[1] JSON-LD was chosen to be compatible with LAPPS recommendations from ISO TC37 SC4 WG1-EP, suggested by Piotr Bański.

[2] Thanks to Piotr Bański for the definition of foundry and layer.

References

To cite work on KoralQuery, please refer to: Bingel, Joachim and Nils Diewald (2015): KoralQuery - a General Corpus Query Protocol, Proceedings of the Workshop on Innovative Corpus Query and Visualization Tools at NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania.

To cite this specification, please refer to: Diewald, Nils and Joachim Bingel (2015-2017): KoralQuery 0.5, Technical report, IDS, Mannheim, Germany. Working draft.

ECMA (2003): The JSON Data Interchange Format, ECMA-404, ECMA Standard.

Sporny, Manu, Dave Longley, Gregg Kellogg, Markus Lanthaler, and Niklas Lindström (2014): JSON-LD 1.0 - A JSON-based Serialization for Linked Data, W3C Recommendation.

Wolf, Misha and Charles Wicksteed (1997): Date and Time Formats, W3C Standard.

Smarr, Joseph (2008): Portable Contacts 1.0 Draft C

Copyright

Copyright (c) 2015-2022, IDS Mannheim, Germany, and the authors.

The authors want to thank Eliza Margaretha for her help on implementing the reference implementation of KoralQuery, and Piotr Bański, Elena Frick, and Michael Hanl for their valuable input.

KoralQuery is developed as part of the Koral query processing software, that is one component of the KorAP Corpus Analysis Platform at the Institute for German Language (IDS), member of the Leibniz-Gemeinschaft, and supported by the KobRA project, funded by the Federal Ministry of Education and Research (BMBF).

CHANGES:
0.5.8 2024-11-19
- Prefer 'origin' over 'src' in rewrites.

0.5.7 2024-09-27
- Prefer 'corpus' over 'collection'.

0.5.6 2022-02-21
- Introduced print stylesheet

0.5.5 2019-11-27
- Introduced key and value vectors to 'koral:term'

0.5.4 2018-08-13
- Introduced value vectors to 'koral:doc'
- Introduced 'koral:docGroupRef'

0.5.3 2017-12-09
- Renamed 'relation' to 'operation' in 'koral:termGroup'

0.5.2 2017-09-11
- Deprecated 'type:wildcard' in favor of 'type:regex'

0.5.1 2017-07-05
- Introduced 'operation:exclusion' in favour of the 'exclude' attribute
- Introduced values for 'classRefCheck' and 'classRefOp'

0.5.0 2017-04-04
- Introduced import format

0.4.0 2016-10-05
- Introduced response format

0.3.1 2016-06-06
- Spans now wrap terms

0.3.0 2015-03-22
- Initial publication on GitHub
  Versions prior to 0.3 were used internally only