22 #ifndef AWALI_COMMON_DOCSTRING_JSONFORMAT_HH
23 #define AWALI_COMMON_DOCSTRING_JSONFORMAT_HH
27 namespace awali {
namespace docstring {
31 "The json format used in awali for automata and expressions",
32 "Json format for automata and expressions",
35 In Awali, automata and rational expressions are described (or linearised)
38 Their syntax is described here by means of a (CF) grammar.
39 Non-terminals are written between double square brakets, e.g. `[[kind]]`.
40 Lists of objects in a json *array* are described by the term `list`.
41 E.g. `[ [[state]] list ]` stands for ` [ ([[state]],)*[[state]] ] | [] `,
42 that is, a list may be empty.
47 A json file in Awali is an object with 4 fields, two of which are optional.
49 `[[Awali json file]] ::= { [[kind]], [[metadata]], [[context]], [[data]] }`
51 ### 1.1 The 4 first level fields
53 * `[[kind]]` , *mandatory* , is an object with two possible values only:
55 `[[kind]] ::= {"kind":"Automaton"} | {"kind":"Rational Expression"}`
57 * `[[metadata]]` , *optional* , is an object with six fields, all *optional*:
59 * `[[context]]` , *mandatory* , describes the _type_ of the awali object,
60 automaton or expression, with the same syntax in both cases.
62 * `[[data]]` , *optional* , describes the _content_ of the automaton or
63 expression. Does not appear for empty automaton or expression.
65 `[[data]] ::= [[automaton content]] | [[ratexp content]]`
70 The metadata fields contains descriptive informations on the object, automaton
71 or expression. All fields are optional.
73 `[[metadata]] ::= { [[name]], [[caption]], [[description]], `
74 ` [[timestamp]], [[author]], [[history]] }`
77 * `[[name]] ::= "name":"<string>"`
78 `<string>` is the name of the automaton or expression. It is recommended that
79 this name coincides with the name of the json file.
81 * `[[caption]] ::= "caption":"<string>"`
82 `<string>` is a one line description of the automaton (or expression).
84 * `[[description]] ::= "description":[("<string>",)*"<string>"]`
85 The value of this field is an *array* of *strings* which give a long
86 description of the automaton (or expression).
88 * `[[timestamp]] ::= "timestamp":{"day":"<string>", "UTCtime":"<string>"}`
90 * `[[author]] ::= "author":"<string>"`
93 ## 2. Context of an automaton or an expression
95 [[context]] is an *object* that contains two mandatory *fields*:
97 `[[context]] ::= "context":{ [[labels]], [[weights]] }`
99 * `[[labels]]` describes the type of the labels of the transitions
100 of the automaton or of the atoms of expression.
102 * `[[weights]]` tells in which semiring the weights of the automaton or
103 of the expression are taken.
105 ### 2.1 Label set description
107 [[labels]] is an *object* that contains various *fields* whose appearance
108 or values may depend on each other:
110 `[[labels]] ::= "labels": { [[labelKind]], [[letterType]], [[alphabet]], `
111 `[[tapeLabels]], [[allowEpsilon]] }`
113 * The `labelKind` field takes one of 4 possible values :
114 `[[labelKind]] ::= "labelKind":[[LBK-str]]`
115 `[[LBK-str]] ::= "Letters" | "Words" | "No Label" | "Multi-tape" `
117 * `"Letters" ` if the labels are __letters__
118 * `"Words" ` if the labels are __words__
119 * `"No Label"` if every transition is labelled by the __empty word__
120 (for automata only: the automaton is then a weighted
122 * `"Multi-tape"` for __transducers__
124 * The `letterType` field exists if the value of `labelKind` is either
125 `"Letters"` or `"Words"`; the value of this field is either `"Char"` or
128 `[[letterType]] ::= "letterType": "Char" | "letterType": "Integer" `
130 * The `alphabet` field exists if the value of `"labelKind"` is either
131 `"Letters"` or `"Words"`. Its value is an array of elements which must be
132 of the type given in `letterType`. Example:
134 `"labels": {"labelKind":"Letters", "letterType":"Char", `
135 `"alphabet":["a","b"]}`
137 * The `tapeLabels` field exists if the value of `labelKind` is `Multi-tape`,
138 that is, if the automaton is a __transducer__. Its value is an array of
139 `labels`, each one describing the labels on one tape.
141 `[[tapeLabels]] ::= [ [[labels]] list ]`
143 __NB1__ As indicated by the grammar, a transducer may have 0 or 1 tape only.
144 A 0-tape transducer is rather a weird object. A 1-tape transducer is
145 isomorphic to, but of different (C++) type than, a plain automaton.
147 __NB2__ With the `cora` interface, only 2-tape transducers can be dealt with.
149 * The `allowEpsilon` field exists if the value of `labelKind` is either
150 `Letters` or `Multi-tape`.
152 `[[allowEpsilon]] ::= "allowEpsilon":false | "allowEpsilon":true `
154 The default value is `false`; it can be set to `true` to indicate that the
155 automaton may have transitions labelled with the empty word.
157 In the data part of the Awali object, the representation of the label of a
158 transition depends on the nature of labels. See below 3.2.
161 ### 2.2 Weight set description
163 The weight set of an automaton or an expression is either a *basic* semiring,
164 or a *series* semiring, or a direct product of weight sets.
165 Hence the production:
167 `[[weights]] ::= "weights":{ ([[basicSR]]|[[paramSR]]|[[expSR]]|[[tupleSR]]) }`
169 The four non-terminals develop in the following way.
171 * `[[basicSR]] ::= "semiring": ("B"|"N"|"N-oo"|"Z"|"Q"|"R"|"C"|"F2"`
172 `"Z-min-plus"|"Z-max-plus"|"R-max-prod"|"Fuzzy")`
174 The description of these semirings is to be found in the documentation on
177 * `[[paramSR]] ::= "semiring": ("Cyclic"|"Bounded"), "characteristic": <int>`
179 * A `"weights": { "semiring":"Cyclic", "characteristic":<int> }` refers to
180 the semiring `Z/<int>Z` and `<int>` must be equal to, or greater than, 2.
182 * A `"weights": { "semiring":"Bounded", "characteristic":<int> }` refers to
183 the qotient of `N` by the congruence generated by `<int> = <int>+1`.
184 The integer `<int>` must be (strictly) positive.
186 * `[[expSR]] ::= "expressionSR": { [[context]] }`
188 Series are represented by *rational expressions* whose __type__ is given
189 by `[[context]]`. These expressions are considered modulo *natural
190 identities* (see documentation on rational expressions), and hence
193 * `[[tupleSR]] ::= { "tupleSR": [ ([[weights]],)* [[weights]] ] }`
195 In this case, the weight semiring is the direct product of the semirings
196 given in the list (which must be of length at least 1), that is, the
197 operation are realised component wise.
199 In the data part of the Awali object, the representation of a weight depends on
200 the nature of weights.
203 ## 3. Content of an automaton
205 An `automaton content` is an *object* that contains two mandatory *fields*:
207 `[[automaton content]] ::= "data" : { [[states]], [[transitions]]}`
211 `[[states]]` is a object whose value is the __list__ of states of the automaton.
212 `[[state]]` gives information on every state.
214 `[[states]] ::= {"states": [ [[state]] list ] }`
216 `[[state]] ::= { "id":<int>, "name":<string> (opt.), "history":<string> (opt.),`
217 `"initial":<weight> (opt.), "final":<weight> (opt.) }`
220 * `"id":<int>` : `<int>` is the __identifier__ of the state.
221 It must be unique w.r.t. the other states. *(mandatory field)*
223 * `"name":<string>` : `<string>` labels the state.
225 * `"history":<string>` : `<string>` gives a description on the way the state
226 has been built (from product, determinization, ...).
228 * `"initial":<weight>` : `<weight>` is the *initial weight* of the state.
229 The syntax for writing `<weight>` depends on the
230 weightset described in the `context`. See Sec. 3.4.
232 * `"final":<weight>` : `<weight>` is the *final weight* of the state.
238 `[[transitions]]` is a object whose value is the __list__ of transitions of the
239 automaton. `[[transition]]` gives information on every transition.
241 `[[transitions]] ::= {"transitions": [ [[transition]] list ] }`
243 `[[transition]] ::= { "id":<int> (opt.), "source":<int>, "destination":<int>,`
244 `"label":<label>, "weight":<weight> (opt.) }`
246 * `"id":<int>` : `<int>` is the __identifier__ of the transition.
247 It must be unique w.r.t. the other transitions. *(optional field)*
249 * `"source":<int>` : `<int>` is the __identifier__ of the source state of
252 * `"destination":<int>` : `<int>` is the __identifier__ of the destination state
255 * `"label":<label>` : `<label>` is the *label* of the transition.
256 The syntax for writing `<label>` depends on the
257 *labelset* described in the `context`. See Sec. 3.3.
259 This field is __mandatory__ except if `labelKind` is set to `"No Labels"`.
260 For *epsilon transitions*, `<label>` is set to `null`.
263 * `"weight":<weight>` : `<weight>` is the *weight* of the transition.
264 The syntax for writing `<weight>` depends on the
265 weightset described in the `context`. See Sec. 3.4.
267 This field is __optional__ except if `labelKind` is set to `"No Labels"`, in
268 which case it is forbidden; if it is not present, `<weight>` is set to
269 the identity of the weightset.
271 __NB__ The value *zero* of the weightset is forbidden for `<weight>`: it is
272 equivalent to *no transition* and the transition should thus not appear in
273 the list. Accordingly, the writing of the weights in the 2-element
274 semirings B or F2=Z/<2>Z is not relevant since the default value is the
278 ### 3.3 Writing of labels
280 The writing of `<label>` depends on `labelKind` and `letterType`.
281 If `labelKind` is `Letters` or `Words`, the representation is given by the
284 | | letterType | Char | Integer |
285 |-------------|------------|------|-----------------|
286 |__labelKind__| | | |
287 | Letters | |string|string or integer|
288 | Words | |string|array of Letters |
291 If `labelKind` is `Multi-tape`, the representation is an array of labels.
293 In any case, `<label>` must be written with the *letters* defined in `alphabet`.
296 ### 3.4 Writing of weights
298 A value in json may be a __string__, a __Boolean__, an __integer__,
299 a __float__, or an array of the former.
301 | | string | integer | float | bool | array of |
302 |----------|-----------------|---------|-------|------|----------------------|
303 | B |"0" or "1" | 0 or 1 | | Yes | |
304 | N |integer rep. | Yes | | | |
305 | N-oo |integer rep. | Yes | | | |
306 | Z |integer rep. | Yes | | | |
307 | R |int or float rep.| Yes | Yes | | |
308 | Q |integer rep. | Yes | | | 2 integers or strings|
309 | C |int or float rep.| Yes | Yes | | 2 floats or strings |
310 | F2 |"0" or "1" | 0 or 1 | | Yes | |
311 |Z-min-plus|int rep. or "oo" | Yes | | | |
312 |Z-max-plus|int rep. or "-oo"| Yes | | | |
313 | Fuzzy |int rep. | Yes | | | |
314 |R-max-prod|int or float rep.| Yes | Yes | | |
315 | Cyclic |integer rep. | Yes | | | |
316 | Bounded |integer rep. | Yes | | | |
318 If the *weightset* is a set of series or expressions, `<weight>` is written as
319 an expression, see Sec. 4 below. If the *weightset* is a tuple of weightsets,
320 `<weight>` is an *array* whose elements must be congruent with the description
321 given in the `context`.
324 ## 4. Content of a rational expression
326 A `ratexp content` is an *object* whose value is a rational expression.
327 The grammar is almost self-explanatory.
329 `[[ratexp content]] ::= "data":{ [[ratexp]] }`
331 `[[ratexp]] ::= { [[ratexpNode]], [[lweight]] (opt.), [[rweight]] (opt.) }`
333 `[[ratexpNode]] ::= [[zero]]|[[one]]|[[label]]|[[sum]]|[[prod]]|[[star]] `
335 `[[lweight]] ::= "lweight":<weight>`
337 `[[rweight]] ::= "rweight":<weight>`
339 The value `<weight>` of the field `lweight` (resp. `rweight`) is the
340 coefficient that multiplies on the left (resp. on the right) the expression
341 represented by `[[ratexpNode]]` and allows to build *weighted expressions*.
342 The rest of the grammar is classical rational (regular) expressions.
344 `[[zero]] ::= "zero":null`
346 `[[one]] ::= "one":null`
348 `[[label]] ::= "label":<label>`
350 `[[sum]] ::= "sum":[ [[ratexp]] list ]` __list__ of lenght at least 2
352 `[[prod]] ::= "prod":[ [[ratexp]] list ]` __list__ of lenght at least 2
354 `[[star]] ::= "star":{ [[ratexp]] }`
357 * `"zero":null` is the representation of the expression 0.
358 It never occurs but for the case of that expression (see doc on rational
361 * `"one":null` is the representation of the expression 1 (hence denoting
364 * `"label":<label>` : `<label>` denotes an atom of the expression. Its writing
365 must follow the syntax described in the `context`.
367 * `[[sum]]` : the definition in the grammar makes `sum` an *n-ary* operator,
370 * `[[prod]]` : the definition in the grammar makes `prod` an *n-ary* operator,
static entry_t json_format
Definition: json_format.hh:29
Main namespace of Awali.
Definition: ato.hh:22