Awali
Another Weighted Automata library
json_format.hh
Go to the documentation of this file.
1 // This file is part of Awali.
2 // Copyright 2016-2022 Sylvain Lombardy, Victor Marsault, Jacques Sakarovitch
3 //
4 // Awali is a free software: you can redistribute it and/or modify
5 // it under the terms of the GNU General Public License as published by
6 // the Free Software Foundation, either version 3 of the License, or
7 // (at your option) any later version.
8 //
9 // This program is distributed in the hope that it will be useful,
10 // but WITHOUT ANY WARRANTY; without even the implied warranty of
11 // MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 // GNU General Public License for more details.
13 //
14 // You should have received a copy of the GNU General Public License
15 // along with this program. If not, see <http://www.gnu.org/licenses/>.
16 
17 
18 /*****************************************************************************
19  See template-docfile.hh for documentation and how characters are preprocessed
20 ******************************************************************************/
21 
22 #ifndef AWALI_COMMON_DOCSTRING_JSONFORMAT_HH
23 #define AWALI_COMMON_DOCSTRING_JSONFORMAT_HH
24 
26 
27 namespace awali { namespace docstring {
28 
29 static entry_t json_format = {
30 /* Name: */ "json-format",
31 /* Description: */ "The json format used in awali for automata and expressions",
32 /* Title: */ "Json format for automata and expressions",
33 /* Content: */
34 R"---doxytoken---(
35 In Awali, automata and rational expressions are described (or linearised)
36 with __json files__.
37 
38 Their syntax is described here by means of a (CF) grammar.
39 Non-terminals are written between double square brakets, e.g. `[[kind]]`.
40 Lists of objects in a json *array* are described by the term `list`.
41 E.g. `[ [[state]] list ]` stands for ` [ ([[state]],)*[[state]] ] | [] `,
42 that is, a list may be empty.
43 
44 
45 ## 1. Awali objects
46 
47 A json file in Awali is an object with 4 fields, two of which are optional.
48 
49 `[[Awali json file]] ::= { [[kind]], [[metadata]], [[context]], [[data]] }`
50 
51 ### 1.1 The 4 first level fields
52 
53 * `[[kind]]` , *mandatory* , is an object with two possible values only:
54 
55 `[[kind]] ::= {"kind":"Automaton"} | {"kind":"Rational Expression"}`
56 
57 * `[[metadata]]` , *optional* , is an object with six fields, all *optional*:
58 
59 * `[[context]]` , *mandatory* , describes the _type_ of the awali object,
60  automaton or expression, with the same syntax in both cases.
61 
62 * `[[data]]` , *optional* , describes the _content_ of the automaton or
63  expression. Does not appear for empty automaton or expression.
64 
65 `[[data]] ::= [[automaton content]] | [[ratexp content]]`
66 
67 
68 ### 1.2 Metadata
69 
70 The metadata fields contains descriptive informations on the object, automaton
71 or expression. All fields are optional.
72 
73 `[[metadata]] ::= { [[name]], [[caption]], [[description]], `
74  ` [[timestamp]], [[author]], [[history]] }`
75 
76 
77 * `[[name]] ::= "name":"<string>"`
78 `<string>` is the name of the automaton or expression. It is recommended that
79 this name coincides with the name of the json file.
80 
81 * `[[caption]] ::= "caption":"<string>"`
82 `<string>` is a one line description of the automaton (or expression).
83 
84 * `[[description]] ::= "description":[("<string>",)*"<string>"]`
85 The value of this field is an *array* of *strings* which give a long
86 description of the automaton (or expression).
87 
88 * `[[timestamp]] ::= "timestamp":{"day":"<string>", "UTCtime":"<string>"}`
89 
90 * `[[author]] ::= "author":"<string>"`
91 
92 
93 ## 2. Context of an automaton or an expression
94 
95 [[context]] is an *object* that contains two mandatory *fields*:
96 
97 `[[context]] ::= "context":{ [[labels]], [[weights]] }`
98 
99  * `[[labels]]` describes the type of the labels of the transitions
100  of the automaton or of the atoms of expression.
101 
102  * `[[weights]]` tells in which semiring the weights of the automaton or
103  of the expression are taken.
104 
105 ### 2.1 Label set description
106 
107 [[labels]] is an *object* that contains various *fields* whose appearance
108 or values may depend on each other:
109 
110 `[[labels]] ::= "labels": { [[labelKind]], [[letterType]], [[alphabet]], `
111  `[[tapeLabels]], [[allowEpsilon]] }`
112 
113  * The `labelKind` field takes one of 4 possible values :
114  `[[labelKind]] ::= "labelKind":[[LBK-str]]`
115  `[[LBK-str]] ::= "Letters" | "Words" | "No Label" | "Multi-tape" `
116 
117  * `"Letters" ` if the labels are __letters__
118  * `"Words" ` if the labels are __words__
119  * `"No Label"` if every transition is labelled by the __empty word__
120  (for automata only: the automaton is then a weighted
121  directed graph)
122  * `"Multi-tape"` for __transducers__
123 
124 * The `letterType` field exists if the value of `labelKind` is either
125  `"Letters"` or `"Words"`; the value of this field is either `"Char"` or
126  `"Integer"`
127 
128  `[[letterType]] ::= "letterType": "Char" | "letterType": "Integer" `
129 
130 * The `alphabet` field exists if the value of `"labelKind"` is either
131  `"Letters"` or `"Words"`. Its value is an array of elements which must be
132  of the type given in `letterType`. Example:
133 
134  `"labels": {"labelKind":"Letters", "letterType":"Char", `
135  `"alphabet":["a","b"]}`
136 
137 * The `tapeLabels` field exists if the value of `labelKind` is `Multi-tape`,
138  that is, if the automaton is a __transducer__. Its value is an array of
139  `labels`, each one describing the labels on one tape.
140 
141  `[[tapeLabels]] ::= [ [[labels]] list ]`
142 
143  __NB1__ As indicated by the grammar, a transducer may have 0 or 1 tape only.
144  A 0-tape transducer is rather a weird object. A 1-tape transducer is
145  isomorphic to, but of different (C++) type than, a plain automaton.
146 
147  __NB2__ With the `cora` interface, only 2-tape transducers can be dealt with.
148 
149 * The `allowEpsilon` field exists if the value of `labelKind` is either
150  `Letters` or `Multi-tape`.
151 
152  `[[allowEpsilon]] ::= "allowEpsilon":false | "allowEpsilon":true `
153 
154  The default value is `false`; it can be set to `true` to indicate that the
155  automaton may have transitions labelled with the empty word.
156 
157 In the data part of the Awali object, the representation of the label of a
158 transition depends on the nature of labels. See below 3.2.
159 
160 
161 ### 2.2 Weight set description
162 
163 The weight set of an automaton or an expression is either a *basic* semiring,
164 or a *series* semiring, or a direct product of weight sets.
165 Hence the production:
166 
167 `[[weights]] ::= "weights":{ ([[basicSR]]|[[paramSR]]|[[expSR]]|[[tupleSR]]) }`
168 
169 The four non-terminals develop in the following way.
170 
171 * `[[basicSR]] ::= "semiring": ("B"|"N"|"N-oo"|"Z"|"Q"|"R"|"C"|"F2"`
172  `"Z-min-plus"|"Z-max-plus"|"R-max-prod"|"Fuzzy")`
173 
174  The description of these semirings is to be found in the documentation on
175  weight sets.
176 
177 * `[[paramSR]] ::= "semiring": ("Cyclic"|"Bounded"), "characteristic": <int>`
178 
179  * A `"weights": { "semiring":"Cyclic", "characteristic":<int> }` refers to
180  the semiring `Z/<int>Z` and `<int>` must be equal to, or greater than, 2.
181 
182  * A `"weights": { "semiring":"Bounded", "characteristic":<int> }` refers to
183  the qotient of `N` by the congruence generated by `<int> = <int>+1`.
184  The integer `<int>` must be (strictly) positive.
185 
186 * `[[expSR]] ::= "expressionSR": { [[context]] }`
187 
188  Series are represented by *rational expressions* whose __type__ is given
189  by `[[context]]`. These expressions are considered modulo *natural
190  identities* (see documentation on rational expressions), and hence
191  form a semiring.
192 
193 * `[[tupleSR]] ::= { "tupleSR": [ ([[weights]],)* [[weights]] ] }`
194 
195  In this case, the weight semiring is the direct product of the semirings
196  given in the list (which must be of length at least 1), that is, the
197  operation are realised component wise.
198 
199 In the data part of the Awali object, the representation of a weight depends on
200 the nature of weights.
201 
202 
203 ## 3. Content of an automaton
204 
205 An `automaton content` is an *object* that contains two mandatory *fields*:
206 
207 `[[automaton content]] ::= "data" : { [[states]], [[transitions]]}`
208 
209 ### 3.1 States
210 
211 `[[states]]` is a object whose value is the __list__ of states of the automaton.
212 `[[state]]` gives information on every state.
213 
214 `[[states]] ::= {"states": [ [[state]] list ] }`
215 
216 `[[state]] ::= { "id":<int>, "name":<string> (opt.), "history":<string> (opt.),`
217  `"initial":<weight> (opt.), "final":<weight> (opt.) }`
218 
219 
220 * `"id":<int>` : `<int>` is the __identifier__ of the state.
221  It must be unique w.r.t. the other states. *(mandatory field)*
222 
223 * `"name":<string>` : `<string>` labels the state.
224 
225 * `"history":<string>` : `<string>` gives a description on the way the state
226  has been built (from product, determinization, ...).
227 
228 * `"initial":<weight>` : `<weight>` is the *initial weight* of the state.
229  The syntax for writing `<weight>` depends on the
230  weightset described in the `context`. See Sec. 3.4.
231 
232 * `"final":<weight>` : `<weight>` is the *final weight* of the state.
233  Idem *initial*.
234 
235 
236 ### 3.2 Transitions
237 
238 `[[transitions]]` is a object whose value is the __list__ of transitions of the
239 automaton. `[[transition]]` gives information on every transition.
240 
241 `[[transitions]] ::= {"transitions": [ [[transition]] list ] }`
242 
243 `[[transition]] ::= { "id":<int> (opt.), "source":<int>, "destination":<int>,`
244  `"label":<label>, "weight":<weight> (opt.) }`
245 
246 * `"id":<int>` : `<int>` is the __identifier__ of the transition.
247  It must be unique w.r.t. the other transitions. *(optional field)*
248 
249 * `"source":<int>` : `<int>` is the __identifier__ of the source state of
250  the transition.
251 
252 * `"destination":<int>` : `<int>` is the __identifier__ of the destination state
253  of the transition.
254 
255 * `"label":<label>` : `<label>` is the *label* of the transition.
256  The syntax for writing `<label>` depends on the
257  *labelset* described in the `context`. See Sec. 3.3.
258 
259  This field is __mandatory__ except if `labelKind` is set to `"No Labels"`.
260  For *epsilon transitions*, `<label>` is set to `null`.
261 
262 
263 * `"weight":<weight>` : `<weight>` is the *weight* of the transition.
264  The syntax for writing `<weight>` depends on the
265  weightset described in the `context`. See Sec. 3.4.
266 
267  This field is __optional__ except if `labelKind` is set to `"No Labels"`, in
268  which case it is forbidden; if it is not present, `<weight>` is set to
269  the identity of the weightset.
270 
271  __NB__ The value *zero* of the weightset is forbidden for `<weight>`: it is
272  equivalent to *no transition* and the transition should thus not appear in
273  the list. Accordingly, the writing of the weights in the 2-element
274  semirings B or F2=Z/<2>Z is not relevant since the default value is the
275  only possible one.
276 
277 
278 ### 3.3 Writing of labels
279 
280 The writing of `<label>` depends on `labelKind` and `letterType`.
281 If `labelKind` is `Letters` or `Words`, the representation is given by the
282 following table:
283 
284 | | letterType | Char | Integer |
285 |-------------|------------|------|-----------------|
286 |__labelKind__| | | |
287 | Letters | |string|string or integer|
288 | Words | |string|array of Letters |
289 
290 
291 If `labelKind` is `Multi-tape`, the representation is an array of labels.
292 
293 In any case, `<label>` must be written with the *letters* defined in `alphabet`.
294 
295 
296 ### 3.4 Writing of weights
297 
298 A value in json may be a __string__, a __Boolean__, an __integer__,
299 a __float__, or an array of the former.
300 
301 | | string | integer | float | bool | array of |
302 |----------|-----------------|---------|-------|------|----------------------|
303 | B |"0" or "1" | 0 or 1 | | Yes | |
304 | N |integer rep. | Yes | | | |
305 | N-oo |integer rep. | Yes | | | |
306 | Z |integer rep. | Yes | | | |
307 | R |int or float rep.| Yes | Yes | | |
308 | Q |integer rep. | Yes | | | 2 integers or strings|
309 | C |int or float rep.| Yes | Yes | | 2 floats or strings |
310 | F2 |"0" or "1" | 0 or 1 | | Yes | |
311 |Z-min-plus|int rep. or "oo" | Yes | | | |
312 |Z-max-plus|int rep. or "-oo"| Yes | | | |
313 | Fuzzy |int rep. | Yes | | | |
314 |R-max-prod|int or float rep.| Yes | Yes | | |
315 | Cyclic |integer rep. | Yes | | | |
316 | Bounded |integer rep. | Yes | | | |
317 
318 If the *weightset* is a set of series or expressions, `<weight>` is written as
319 an expression, see Sec. 4 below. If the *weightset* is a tuple of weightsets,
320 `<weight>` is an *array* whose elements must be congruent with the description
321 given in the `context`.
322 
323 
324 ## 4. Content of a rational expression
325 
326 A `ratexp content` is an *object* whose value is a rational expression.
327 The grammar is almost self-explanatory.
328 
329 `[[ratexp content]] ::= "data":{ [[ratexp]] }`
330 
331 `[[ratexp]] ::= { [[ratexpNode]], [[lweight]] (opt.), [[rweight]] (opt.) }`
332 
333 `[[ratexpNode]] ::= [[zero]]|[[one]]|[[label]]|[[sum]]|[[prod]]|[[star]] `
334 
335 `[[lweight]] ::= "lweight":<weight>`
336 
337 `[[rweight]] ::= "rweight":<weight>`
338 
339 The value `<weight>` of the field `lweight` (resp. `rweight`) is the
340 coefficient that multiplies on the left (resp. on the right) the expression
341 represented by `[[ratexpNode]]` and allows to build *weighted expressions*.
342 The rest of the grammar is classical rational (regular) expressions.
343 
344 `[[zero]] ::= "zero":null`
345 
346 `[[one]] ::= "one":null`
347 
348 `[[label]] ::= "label":<label>`
349 
350 `[[sum]] ::= "sum":[ [[ratexp]] list ]` __list__ of lenght at least 2
351 
352 `[[prod]] ::= "prod":[ [[ratexp]] list ]` __list__ of lenght at least 2
353 
354 `[[star]] ::= "star":{ [[ratexp]] }`
355 
356 
357 * `"zero":null` is the representation of the expression 0.
358  It never occurs but for the case of that expression (see doc on rational
359  expressions).
360 
361 * `"one":null` is the representation of the expression 1 (hence denoting
362  the empty word).
363 
364 * `"label":<label>` : `<label>` denotes an atom of the expression. Its writing
365  must follow the syntax described in the `context`.
366 
367 * `[[sum]]` : the definition in the grammar makes `sum` an *n-ary* operator,
368  with `n>=2`.
369 
370 * `[[prod]]` : the definition in the grammar makes `prod` an *n-ary* operator,
371  with `n>=2`.
372 
373 )---doxytoken---"
374 };
375 
376 }} //End of namespaces awali::docstring and awali
377 
378 
379 #endif
380 
static entry_t json_format
Definition: json_format.hh:29
Definition: entry.hh:24
Main namespace of Awali.
Definition: ato.hh:22