Introduction
JSON-LD is a JSON based serialization format for Linked Data.
Nested objects in JSON-LD can be categorized by providing unique identifiers based on URIs. These identifiers can be defined in context files and associated to JSON objects by defining a @context
attribute.
In KoralQuery corpus search is divided in multiple separated protocol concepts, that are categorized in the following types:
- Collection Type Objects
- Define a document collection by certain constraints. The expected result of a collection type object is a subset of the corpus collection that meet the conditions. The empty set is valid.
- Span Type Objects
- Define an occurrence collection by certain conditions. The expected result of a span type object is a collection of substrings in documents of the document collection, that meet the conditions. The empty set is valid.
- Parametric Type Objects
- Specify further constraints for embedding collection type objects or span type objects as parameters. The expected result of a parametric type object is a refinement of the parental collection type object or span type object.
- Report Type Objects
- Report modifications of a query object (rewrites) as part of the query; or report errors, warnings and further messages regarding the processing of the query. Report types do not alter the expected result of a query.
- Response Type Objects
- Define the result of the processing of the query.
Further undefined objects are allowed, but are unspecified. The definition of meta
objects, that define further adjustments to the query execution or the processing of search results, is dependent on the implementation of KoralQuery and not part of this specification. For common meta information with recommended definitions, refer to the appendix.
Status of this draft
KoralQuery is not meant to be complete, but to be extensible and forward compatible. Extensibility is granted by support of embedded @context
objects.
Forward compatibility is tried to be ensured by describing implementation advices for incompatibility fallbacks.
Implementations do not need to implement all described features to be called KoralQuery compliant, but implementations should fail in a predictable manner by using the described report methods.
Definitions
Table Description
In this document, attributes as part of KoralQuery objects are represent as tables, listing all defined attributes as key value pairs. Optional keys have a trailing question mark, mandatory keys are unmarked.
The type documenting tables have four columns:
- Key
- The key string in the JSON object for this attribute.
In running text these keys are marked as
keys
. - Type
- The type of values allowed for this attribute.
Defined types include:
- xsd:string
- Arbitrary character sequence, represented as a JSON string.
- xsd:boolean
- Either
true
orfalse
. - xsd:integer
- A signed integer, represented as a JSON number. The valid range of the integer is described in the Values column.
- @id
- An identifier that represents a valid JSON-LD type, represented as a JSON string. The supported types are listed under values.
koral:termGroup
) or by their category (e.g. "span type"). Multiple supported types are listed comma separated. If the value is expected to be a list of values, the valid types are enclosed in brackets (e.g. "[@id]"). If the list is expected to have a certain number of members, this is described in the Values column. - Default
- The default value of this attribute in case it is not given.
- Values
- Gives a list of valid values, constraints on this attribute, consequences for other attributes, and examples with descriptions.
Implementation Guide
If an object contains key attributes other than defined, they are ignored.
If an object contains types other than defined for a certain key, the query has to be rejected and an error has to be raised.
If an object has values other then defined for a certain key, the behaviour is defined in the implementation guide of the object type. If no special behaviour is defined, the query has to be rejected and an error has to be raised. The term defined in implementation guides may cover further definitions not part of this specification. The term undefined in implementation guides is not restricted to definitions of this specification.
Empty objects in lists have to be ignored. Exception are "[xsd:integer]" and "[xsd:string]".
Error, Warning, and Message objects
KoralQuery is meant to be future proof by being upwards compatible. That means, new features officially introduced or supported by third party software (using external context files) should be either treated as intended, be intendendly ignored or be rejected.
Incompatibilities with query objects and collection objects should be treated as documented in the implementation guide section of each object.
To inform the user on certain incompatibilities, KoralQuery has three different mechanisms for raising awareness. These mechanisms may also be used by query rewrite processors, to inject errors, warnings, and messages.
- Errors
- Errors will inform the user of a reason a query was rejected. This may originate from the KoralQuery processing, but may also be injected for other reasons, like access restrictions.
- Warnings
- Warnings will inform the user of probably unexpected behaviour of the KoralQuery process. This may originate from the KoralQuery processing, but may also be injected for other reasons, like limitations of the query result set by time out.
- Messages
- Messages will inform the user of useful information that don't effect the results of the query. This may originate from the KoralQuery processing, for example to inform about future incompatibilities, but may also be injected for other reasons, like deprecation of certain endpoints in the query service.
Implementation Guide
KoralQuery processors will always pass errors, warnings, and messages injected by prior processing systems. A final processing filter may decide, which errors, warnings and messages may be of interest to present to the user and which errors, warnings and messages may only be of interest for intermediate processing.
Collection Type Objects
A KoralQuery can be limited to a subset of documents of a corpus.
The collection can be defined by criteria, documents have to satisfy.
The collection has to be defined on the top level object with the attribute
corpus
or collection
.
A single criterion is defined by a koral:doc
object. Multiple criteria can be further constrained using
koral:docGroup objects.
The result of a collection type object is a collection of documents that meet the conditions formulated by the collection criteria and group constraints.
collection
will be renamed to corpus
in future versions of this specification. Implementations should support both attributes with corpus
being the prefered variant.
{ "@context" : "http://korap.ids-mannheim.de/ns/koral/0.5/context.jsonld", "corpus" : { "@type":"koral:docGroup", "operation":"operation:and", "operands":[{ "@type":"koral:doc", "key":"title", "match":"match:eq", "value":"Der Birnbaum", "type":"type:string" },{ "@type":"koral:doc", "key":"pubPlace", "match":"match:eq", "value":"Mannheim", "type":"type:string" },{ "@type":"koral:docGroup", "operation":"operation:or", "operands":[{ "@type":"koral:doc", "key":"pubDate", "match":"match:geq", "value":"2015-03-03", "type":"type:date" },{ "@type":"koral:doc", "key":"lastModified", "match":"match:geq", "value":"2015-04-04", "type":"type:date" }] }] }, "query" : { ... }, "meta" : { ... } }
Basic collection types
A document in KoralQuery is represented by the primary data,
annotation data and metadata. The different data fields are defined
by field names, for example author
for the metadata field
for the author, or pubDate
for the metadata field
for the publication date. The name of the fields
is not part of the specification. Basic collection types do
also not differ between fields for metadata, primary data and
annotation data, although the field type may be constrained by the nature
of the field.
koral:doc
{ "@type" : "koral:doc", "key" : "textClass", "value" : "novel", "match" : "match:eq" }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:doc |
|
key | xsd:string | The field name. | |
value | xsd:string, [xsd:string] | The field value. | |
type | @id | type:string |
|
match | @id | match:eq |
Specifies agreement between key attribute and annotation.
|
A koral:doc
object defines one criterion a document in the collection has to satisfy.
If a document satisfies the criterion, it is part of the collection.
The key
attribute represents a field name,
like author
, for maybe a metadata field containing
the name of the document's author, text
, for probably
a field containing the primary data, or even a field like
numberOfTokens
, representing a field containing the
number of tokens annotated in the document.
The value
attribute represents the value
a document is expected to match according to the
match
attribute in the
field defined by the key
attribute.
This may, for example, be the name of the author Theodor Fontane
,
in case of a key
field for author
.
If the value
attribute is defined as an array,
the document is expected to match for any of the given values.
The type
attribute defines the nature of the
value
attribute. This may represent a string,
a date or a regular expression. Dates have to follow the
W3C Date and Time Formats.
The match
attribute defines the kind of
agreement the value defined by the value
attribute has needs to make with the value specified in the document
in the respective key
field.
Ths behaviour is further constrained by the type
attribute. match:eq
expects an exact agreement for
type:string
, a full match for
type:regex
with implicit anchors
^
and $
, and for type:date
a date value in the range of the given date
(the range is based on the granularity, so a 2015-04
date matches
2015-04-01
as well as 2015-04-12
).
match:ne
expects an exact disagreement for
type:string
, a mismatch
for type:regex
with implicit boundary anchors,
and a date outside the defined range.
match:geq
and match:leq
are currently only defined for type:date
,
expecting a date in the exact range or later for
match:geq
or a date in the exact range or earlier
for match:leq
.
match:contains
expects a value in which
value
is a valid substring
for type:string
or in which
value
matches
for type:regex
.
match:excludes
expects a value in which
value
is not a valid substring
for type:string
or in which
value
does not match
for type:regex
.
match:contains
and
match:excludes
are undefined for
type:date
.
koral:doc
may be renamed to koral:field
in future versions of this specification.
Implementation Guide
As collection type objects are used for the restriction of access to certain parts of the corpus, the implementation needs to be strict to prevent violation of access control mechanisms.
The key
attribute in
koral:doc
is mandatory. If the attribute
is missing, the query has to be rejected and an error has to be raised.
If the key
field of the criterion
is not part of the document and the match
attribute is match:ne
or
excludes
, the criterion is satisfied.
In case of any other match
values,
the criterion is not satisfied.
The value
attribute in
koral:doc
is mandatory. If the attribute
is missing, the query has to be rejected and an error has to be raised.
However, this rule is experimental and may change in
future versions of this specification.
If the type
attribute contains an
undefined identifier, the query has to be rejeced and an
error has to be raised.
If the value
attribute is not valid
refering to the given type
, e.g.
an invalid date string or a regular expression with unbalanced
parenthesis, the query has to be rejeced and an
error has to be raised.
If the match
attribute contains an
undefined identifier, or an identifier that is undefined
to the given type
,
the query has to be rejected and an
error has to be raised.
match:contains
and
match:excludes
are only defined for fields
supporting full text search. If a field with no fulltext search
capabilities is requested with match:contains
,
the meaning is identical to match:eq
. If a field with
no fulltext search capailities is requested with
match:excludes
, the query may be
rejected with a raised error, or an empty collection is returned.
Field names are not specified by KoralQuery and their string representation is not constrained.
Complex collection types
koral:docGroup
{ "@type" : "koral:docGroup" }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:docGroup |
|
operation | @id |
|
|
operands | [collection type] | Arguments of the operation. Number depends on the respective operation |
A koral:docGroup
object defines
boolean operations on criteria a document in the collection has to satisfy.
The operands
list represents collection type objects
the group refers to.
The operation
attribute represents the kind of
boolean operation between the operands
.
The operation:or
operation
acts like a unification of all collections defined in the
operands
list,
meaning a document is part of the collection
if it is part of at least one collection in the operands
list.
The operation:and
operation
acts like an intersection of all collections defined in the
operands
list,
meaning a document is part of the collection
if it is part of all collections in the operands
list.
koral:docGroup
may be renamed to koral:fieldGroup
in future versions of this specification.
Implementation Guide
The operation
attribute is mandatory.
If the attribute is missing, the query has to be rejected and an error
has to be raised.
If the operation
attribute contains an undefined
identifier,
the query has to be rejected and an error has to be raised.
If the operands
list is empty,
the resulting collection is empty.
If the operands
list has only one entry,
the resulting collection is identical to the resulting collection
of the only entry, independent of the operation
.
koral:docGroupRef
{ "@type" : "koral:docGroupRef", "ref" : "https://korap.ids-mannheim.de/@ndiewald/MyCorpus" }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:docGroupRef |
|
ref | xsd:string | A unique reference to a virtual corpus. |
A koral:docGroupRef
references a
collection type object with criteria a document in the collection has to satisfy.
The ref
attribute is a unique identifier by which
a KoralQuery consumer references a collection type object to be embedded in place of
the koral:docGroupRef
object, e.g. stored as a JSON-LD file.
Span Type Objects
For KoralQuery the primary data of a document is represented as a series of tokens. A series of tokens is called a substring of the document.
Query objects define conditions regarding the constellation of tokens and token-bound features that have to be in place to make a substring a valid occurrence of the query object in a document. These conditions may have syntagmatic or paradigmatic character.
In addition to the substring, results of a query object may contain so-called classes as markers for substrings of the result substring.
The result of a span type object, i.e. its span and its classes, may be an operand of another object, that may filter, enrich, combine or alter the results of nested objects.
Basic span types
koral:token
{ "@type" : "koral:token", "wrap" : { "@type" : "koral:term", "foundry" : "tt", "layer" : "pos", "key" : "ADJD" } }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:token |
|
wrap | koral:term , koral:termGroup |
Holds information on search key, foundry, layer, value |
A koral:token
object defines the occurrence
of one token, a match has to satisfy.
Implementation Guide
The wrap
attribute is optional.
In case no wrap
attribute is defined, the object matches any token of the text.
If the processor does not support any tokens,
the query has to be rejected and an error has to be raised.
koral:span
{ "@type" : "koral:span", "wrap" : { "@type" : "koral:term", "foundry" : "cnx", "layer" : "c", "key" : "np" } }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:span |
|
wrap | koral:term , koral:termGroup |
Holds information on search key, foundry, layer, value | |
attr | koral:term , koral:termGroup |
Span attributes. |
Complex span types
koral:group
{ "@type" : "koral:group", "operation" : "operation:sequence", "operands" : [] }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:group |
|
operands | [span type] | Arguments of the operation. Number depends on respective operation. | |
operation | @id |
|
|
boundary | koral:boundary |
Specifies the mininmum and maximum values of the operation. | |
classIn | [xsd:integer] | The numeric identifiers of classes on which classRefCheck or classRefOp operate. |
|
classOut | xsd:integer | The numeric identifier of the defined class. | |
classRefCheck | [@id] | Set-theoretic condition on input classes. Results that do not fulfil this condition are excluded from the result set.
|
|
classRefOp | @id | Set-theoretic operation on input classes. Creates new output class in classOut .
|
|
distances | [koral:distance ] |
[] | Distance constraints between operands (pertaining to different keys). |
frames | [@id] | [frames:isAround ,
frames:endsWith ,
frames:startsWith ,
frames:matches ] |
The allowed positional relations between operands A and B.
|
inOrder | xsd:boolean | true |
If true , the order is relevant. |
relType | koral:relation |
Specifies the relation between operands. |
koral:group
may be renamed to koral:spanGroup
in future versions of this specification.
operation:disjunction
may be deprecated in favor of operation:or
in future versions of this specification.
koral:reference
{ "@type" : "koral:reference", "classRef" : [1], "operation" : "operation:focus", "operands" : [ ... ] }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:reference |
|
operation | @id | operation:focus |
Defines the operation performed based on the references.
|
classRef | [xsd:integer] | [0] | Defined classes to refer to. The class 0 refers to the operand's span. |
spanRef | [xsd:integer] | Defined subspans to refer to. Expects one or to integers. The first integer defines the start index of the subspan, the second defines the length of the subspan. |
|
operands | [span type] | Arguments of the operation. Number depends on respective operation. |
A koral:reference
object defines a
span by refering to another span type.
The operation
attribute defines the kind of result
expected by referencing to another span.
Currently the only value defined is operation:focus
,
making the matching span refering to the defined start and end positions
of the refering span.
spanRef
refers to subspans (i.e. tokens) of the operand.
It accepts a list of numerical parameters.
The first parameter defines the start index (starting at position 0).
The second parameter defines the length of the match counting from the
start index to the right.
The reference (either a span or a class) has to be part of the operands.
If no operand is given, but a classRef
is defined,
the class refers to classes defined at any point in the query tree.
In case multiple classes are defined in classRef
for
a operation:focus
,
the focus starts with the first classed token in sequential order
and ends with the final classed token in sequential order.
Implementation Guide
If the operation
attribute contains an undefined identifier,
a warning has to be raised and the default operation has to be assumed.
If the classRef
list refers to a class not defined,
the class is ignored.
If the classRef
is an empty list
(for example "[]" or because of ignored classes),
and the operation is operation:focus
,
the resulting span is empty and matches nowhere.
If the operands
list is empty
and no classRef
is defined,
the resulting spoan is empty and matches nowhere.
If both spanRef
and
classRef
is defined,
a warning has to be raised and classRef
has to be assumed.
A negative start index for spanRef
counts from the end of the operand's span.
If the positive start index starts beyond the end of the operand's span,
the result of the operation is empty and matches nowhere.
If the negative start index starts beyond the beginning of the operand's span,
the startindex will be treated as being 0
.
If the length is omitted or exceeds the length of the operand's span,
the rest of the operand's span is part of the match.
Parametric Type Objects
Basic parametric types
koral:term
{ "@type" : "koral:term", "foundry" : "tt", "layer" : "pos", "key" : "ADJD", }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:term |
|
key | [xsd:string] | The term key | |
value | [xsd:string] | The term value | |
foundry | xsd:string | The annotation foundry | |
layer | xsd:string | surface layer | The annotation layer |
type | @id | type:string |
|
match | @id | match:eq |
|
flags | [@id] |
|
To specify a term, KoralQuery provides four attributes:
foundry
, layer
, key
,
and value
. The concrete definition of these attributes
relies on the annotation model of the corpus and the implementation of the search
system. As an abstract definition, the attributes have a hierarchical structure for
annotations, meaning a foundry may bundle multiple layers. A layer may bundle
multiple keys and a key may bundle multiple values. An annotation or a system may not
need all of these attributes to define a term, only the key
attribute is mandatory.
The key
attribute represents a annotations like the
part-of-speech tag noun
or verb
, or the surface token
Tree
. It can be given as a single string or as an array of alternative strings.
Sometimes annotations have to be represented as key and value
pairs, for example in morphological annotations a key of the term may be
number
and the value of the key may be plural
. In that
case, the key
attribute will hold the term
number
and the value
attribute will hold the
value plural
. Values can be given as single strings or as an array of
alternative strings.
The layer
attribute may define the annotation level of
the term, for example tokenization
, part-of-speech
or
lemma
. In case the layer
information is
ommitted, the layer defaults to the tokenization layer, irrespective of the
implementation specific word for that layer.
The foundry
attribute may define the origin of the
annotation, for example the name of the human annotator or the automated tool. Or it
may serve as an umbrella for layers with common characteristics (for example bundling
several models for named entities).
[2]
In most implementations the foundry term may not be relevant, but it is important to deal with conflicting annotations, for example, in case the corpus provides multiple part-of-speech annotations.
The attribute type
defines the treatment of
key
and value
.
Currently supported types are string
, indicating that
key
and value
should be treated as a sequence
of characters. The type regex
indicates that
key
and value
should be treated as regular
expressions.
The
punct
type defines that the key
attribute will
be treated as a character class of punctuation symbols. In case the
punct
type is defined, the treatment of the
value
attribute is undefined.
The default value for the type attribute
is string
.
Support for types different than strings for foundry
and layer
is not supported yet.
The term defined by foundry
,
layer
, key
and
value
represents the condition of the term object. The match
attribute can be used to invert the condition, saying a substring of a text holds
true for the condition, in case it fails. Therefore the
match
attribute can hold the value eq
,
meaning the term has
to match exactly as defined, or the value may be ne
, meaning the term
has to be not equal to the defined condition. The default value for match is
eq
.
match
attribute of terms is limited to the same functionality as
exclude
in operations. As match may support
further operators, it is used in favor of
exclude
in this context.
The matching may further be modified by certain flags, using the
flag
attribute. Multiple flags are supported.
In case, order is of
relevance, the flag operations are processed from left to right. Currently there are
two flags supported by KoralQuery: caseInsensitive
means, the matching
will ignore a difference between small and capital letters in the
key
and value
attributes, as well as in the
term index. diacriticInsensitive
means, the match will ignore diacritic
symbols in the key
and value
attributes, as well as in the term index.
{ "@type" : "koral:term", "key" : "Octopus", "flags" : ["flags:caseInsensitive"] }
Implementation Guide
The key
attribute in terms is mandatory. If the attribute
is missing, the query has to be rejected and an error has to be raised.
If the type
attribute contains an undefined identifier, a
warning has to be raised and the default type has to be assumed.
If the match
attribute contains an undefined identifier,
a warning has to be raised and the default match has to be assumed.
If the flag
attribute contains an undefined identifier, a
warning has to be raised. The flag will be ignored.
All other attributes may silently be ignored.
koral:distance
{ "@type" : "koral:distance", "key" : "w", "boundary" : {...} }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:distance |
|
key | xsd:string | w | Measure of distance |
foundry | xsd:string | Foundry in which distance measure (key ) is annotated |
|
layer | xsd:string | Layer in which distance measure (key ) is annotated |
|
boundary | koral:boundary |
Specified degree of distance |
koral:boundary
{ "@type" : "koral:boundary", "min" : 0, "max" : "3" }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:boundary |
|
min | xsd:integer | Minimal value. | |
max | xsd:integer | Maximal value. |
koral:relation
{ "@type" : "koral:relation", "wrap" : {...} }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:relation |
|
wrap | koral:term , koral:termGroup |
Holds information on key, foundry, layer, value |
Complex parametric types
koral:termGroup
{ "@type" : "koral:termGroup", "operation" : "operation:and", "operands" : [...] }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:termGroup |
|
operation | @id |
operation:and operation:or
|
|
operands | [koral:term , koral:termGroup ] |
Arguments of the paradigmatic relation. |
A koral:termGroup
object defines paradigmatic relations
between koral:term
objects to describe that
term annotations may or may not occur at the same position (e.g. a word is
annotated as a specific lemma with a specific part-of-speech tag).
A koral:termGroup
object may specify an arbitrary
number of operands that refer to the same defined operation.
To specify a different operation in the same koral:token
position, it is possible
to nest a koral:termGroup
.
Implementation Guide
The operation
attribute is mandatory.
If the attribute is missing, the query has to be rejected and an error
has to be raised.
If the operation
attribute contains an undefined
identifier,
the query has to be rejected and an error has to be raised.
If the operands
list is empty,
the resulting span is undefined, therefore the wrapping object
is empty.
operation
was previously named
relation
. For improved compatibility, a
KoralQuery consumption service may accept both variants and
a KoralQuery generation service may generate both variants.
Report Type Objects
koral:rewrite
{ "@type" : "koral:rewrite", "operation" : "operation:injection", "origin" : "Kustvakt" }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:rewrite |
|
operation | @id | Specifies the performed rewrite action. | |
origin | xsd:string | Specifies the component responsible for the rewrite | |
scope | xsd:string | The current object | Specifies which object/attribute has been rewritten |
origin
was previously named
src
. For improved compatibility, a
KoralQuery consumption service may accept both variants and
a KoralQuery generation service may generate both variants.
Response Type Objects
The response format is still in preparation.
The response to a KoralQuery match request
(in contrast to, for example, a request for statistic information, currently out of scope of this document)
is a collection of documents,
satisfying the defined document query in the corpus
or collection
,
the defined span query in query
,
and all supported result modifying constraints in meta
.
In case no query
is defined,
each document of the collection is
represented by the requested metadata.
collection
will be renamed to corpus
in future versions of this specification. Implementations should support both attributes with corpus
being the prefered variant.
{ "@context" : "http://korap.ids-mannheim.de/ns/koral/0.5/context.jsonld", "corpus" : { ... }, "query" : { ... }, "result" : { "@type" : "koral:result", "results" : [ { "@type" : "koral:match", "annotation" : ["xip","xip/p", "cnx", "cnx/c"], "annotationType" : ["xip/p=token", "cnx/c=spans"], "fields" : [{ "@type" : "koral:doc", "key" : "docID", "value" : "doc-3", "type" : "type:string" }], "snippet" : "..." }, { "@type" : "koral:match", ... } ] } }
koral:result
{ "@type" : "koral:result", "results" : [ ... ], "totalResults" : 4 }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:result |
|
results | [response type] | Contains a list of results. | |
totalResults | xsd:integer | 0 | The number of total results in the result set. |
koral:match
This specification defines a format for representing matches as HTML snippets in the appendix with "keywords in context", that may be used in the response format.
{ "@type" : "koral:match", "fields" : [{ "@type" : "koral:doc", "key" : "docID", "value" : "doc-3", "type" : "type:string" }], "snippet" : "..." }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:match |
|
fields | [koral:doc ] |
Contains a set of koral:doc objects defining the metadata fields of the document the match occurs in. |
Import Type Objects
The import format is still in preparation and currently not supported by the reference implementation Krill.
{ "@context" : "http://korap.ids-mannheim.de/ns/koral/0.5/context.jsonld", "record" : { "@type" : "koral:record", "primaryData" : "Der Bau-Leiter trug einen lustigen Bau-Helm.", "id" : 3, "fields" : [ { "@type" : "koral:doc", "key" : "docID", "value" : "doc-3", "type" : "type:string" }, { "@type":"koral:doc", "key":"license", "value":"closed", "type":"type:string" } ], "subtokens" : [ { "@type" : "koral:subtoken", "offsets" : [0,3] }, { "@type" : "koral:subtoken", "offsets" : [4,7] }, { "@type" : "koral:subtoken", "offsets" : [8,14] }, { "@type" : "koral:subtoken", "offsets" : [15,19] }, { "@type" : "koral:subtoken", "offsets" : [20,25] }, { "@type" : "koral:subtoken", "offsets" : [26,34] }, { "@type" : "koral:subtoken", "offsets" : [35,38] }, { "@type" : "koral:subtoken", "offsets" : [39,43] } ], "annotations" : [ { "@type": "koral:token", "subtokens" : [0], "wrap" : { "@type" : "koral:term", "foundry" : "akron", "key" : "Der" } }, { "@type" : "koral:span", "subtokens" : [0,2], "wrap" : { "@type" : "koral:term", "foundry" : "akron", "layer" : "c", "key" : "NP" } }, { "@type": "koral:token", "subtokens" : [1,2], "wrap" : { "@type" : "koral:term", "foundry" : "akron", "key" : "Bau-Leiter" } }, { "@type": "koral:token", "subtokens" : [3], "wrap" : { "@type" : "koral:termGroup", "operands" : [ { "@type" : "koral:term", "foundry" : "akron", "key" : "trug" }, { "@type" : "koral:term", "foundry" : "opennlp", "layer" : "p", "key" : "V" } ] } }, { "@type": "koral:token", "subtokens" : [4], "wrap" : { "@type" : "koral:term", "foundry" : "akron", "key" : "einen" } }, { "@type" : "koral:span", "subtokens" : [4,7], "wrap" : { "@type" : "koral:term", "foundry" : "akron", "layer" : "c", "key" : "NP" } }, { "@type": "koral:token", "subtokens" : [5], "wrap" : { "@type" : "koral:term", "foundry" : "akron", "key" : "lustigen" } }, { "@type": "koral:token", "subtokens" : [6,7], "wrap" : { "@type" : "koral:term", "foundry" : "akron", "key" : "Bau-Helm" } } ] } }
koral:record
{ "@type" : "koral:record", "fields" : [], "subtokens" : [], "annotations" : [] }
Key | Type | Default | Values |
---|---|---|---|
@type | @id | koral:record |
|
fields | [koral:doc ] |
Contains a set of koral:doc objects defining the metadata fields of the imported record. |
|
primaryData | xsd:string | The primary data of the record. Currently only supports text. | |
subtokens | [koral:subtoken ] |
The list of subtoken offsets defined on the primaryData. | |
annotations | [koral:token ,koral:span ,koral:relation ] |
The list of annotations refering to the primary data. |
Appendix
Recommended Attributes for Meta Objects
Following the specification of OpenSearch and PortableContacts, the following attributes for the meta
section are recommended.
Key | Type | Default | Values |
---|---|---|---|
count | xsd:integer | The number of results shown per page. | |
startIndex | xsd:integer | 0 | The offset for paging through result sets. |
startPage | xsd:integer | 1 | The page for paging through the result sets.
Overwritten by startIndex . |
fields | [xsd:string] | The data fields requested. |
count
should be used for requests as well as for responses of query processors. Similar implementations have a different key for requests, using itemsPerPage
. We recommend using the rewrite mechanisms of KoralQuery to report on difference between request and response vounts.
totalResults
should reflect occurrences of the query
structure in all documents of corpus
or collection
. This is not necessarily the base for paging, as the base of paging may be documents, corpora etc. instead. The value 0
indicates that there was no match. A negative value may indicate that the total number of results is not known, not reportable etc. Further parameters may alter the interpretation of totalResults, e.g. to say the value is only approximated or there are at least these numbers of matches.
collection
will be renamed to corpus
in future versions of this specification. Implementations should support both attributes with corpus
being the prefered variant.
Regular Expressions
The definition of the supported regular expressions is out of scope of this specification and depends on the implementation.
KWIC representation as HTML snippets
KoralQuery span type objects return a textual span. There are several ways to return this information, with a "KWIC" snippet being the most popular. In a "KWIC" snippet, the primary data of the document is merged with the positional information of the match, with a context to the left and the right of the actual match.
KoralQuery supports the definition of classes using operation:class
,
that may add additional positional information to the "KWIC", that may be merged into the primary data.
As KoralQuery supports annotations of different types, the "KWIC" may be enriched with further annotations as well.
The snippet may be added to a match as an xsd:string
using a snippet
attribute.
<span class="context-left"></span> <mark> <span title="corenlp/c:CS"> <span title="corenlp/c:ROOT"> <span title="corenlp/c:S"> <span title="corenlp/c:NP">die Sonne</span> war <span title="corenlp/c:CAP">hoch und heiß</span> </span>, <span title="corenlp/c:S"> ich mu\sste <span title="corenlp/c:S"> <span title="corenlp/c:NP">meine Kleidung</span> erleichtern, <span title="corenlp/c:S"> die ich <span title="corenlp/c:PP"> bei der veränderlichen Atmosphäre <span title="corenlp/c:NP">des Tages</span> </span> oft wechsele </span> </span> </span> </span> </span> </mark> <span class="context-right"></span>
Implementations
KoralQuery is the base communication protocol of KorAP. The Koral query serializer can translate queries formulated in Poliqarp, Cosmas-II, Annis, and CQL to KoralQuery. Kustvakt is a Policy service, using Koral to translate queries and to rewrite the query based on access restrictions and user settings. Krill is a corpus search service, that consumes KoralQuery and creates KoralQuery compatible responses.
Footnotes
[1] JSON-LD was chosen to be compatible with LAPPS recommendations from ISO TC37 SC4 WG1-EP, suggested by Piotr Bański.
[2] Thanks to Piotr Bański for the definition of foundry and layer.
References
To cite work on KoralQuery, please refer to: Bingel, Joachim and Nils Diewald (2015): KoralQuery - a General Corpus Query Protocol, Proceedings of the Workshop on Innovative Corpus Query and Visualization Tools at NODALIDA 2015, May 11-13, 2015, Vilnius, Lithuania.
To cite this specification, please refer to: Diewald, Nils and Joachim Bingel (2015-2017): KoralQuery 0.5, Technical report, IDS, Mannheim, Germany. Working draft.
ECMA (2003): The JSON Data Interchange Format, ECMA-404, ECMA Standard.
Sporny, Manu, Dave Longley, Gregg Kellogg, Markus Lanthaler, and Niklas Lindström (2014): JSON-LD 1.0 - A JSON-based Serialization for Linked Data, W3C Recommendation.
Wolf, Misha and Charles Wicksteed (1997): Date and Time Formats, W3C Standard.
Copyright
Copyright (c) 2015-2022, IDS Mannheim, Germany, and the authors.
The authors want to thank Eliza Margaretha for her help on implementing the reference implementation of KoralQuery, and Piotr Bański, Elena Frick, and Michael Hanl for their valuable input.
KoralQuery is developed as part of the Koral query processing software, that is one component of the KorAP Corpus Analysis Platform at the Institute for German Language (IDS), member of the Leibniz-Gemeinschaft, and supported by the KobRA project, funded by the Federal Ministry of Education and Research (BMBF).