Some tests have been made one a computer using a single CPU@2600MHz without hyperthreading. Test have been made for a state machine composed of 20.002 states split in two stacks of 10.001 nested states. The two deepest states of the state machine were interlinked with 65.533 different transitions (allocation of 2 * 65.533 transitions).
As 65.533 transitions are added to a single state, keep in mind that for each insertion, the system test if another transition exists. Without any optimisation, the complexity of the algorithm is O(n) with n the number of existing transition for a state. This will surely never happen in a real state machine but this is the worst case so it is used here for tests :) .
The provided information only applies with the following source code and the library built with gcc using -O2 optimisation.
The system reaction time is highly dependent of the structure of the state machine you are using. Depending on this structure it might be or not be a good idea to activate some options.
StateMachine* createDeepStateMachine(unsigned int verboseFlag) { int i; short int cmpt = -32767; StateMachine* stateMachine; State* tempCreate; State* tempSave; State* save; stateMachine = allocStateMachine(NULL, NULL); setDbgParam(stateMachine, verboseFlag); allocateSpaceForStates(stateMachine, 2); tempSave = allocateState(stateMachine, NULL, NULL); for(i = 0; i < 10000; i++) { allocateSpaceForStates(tempSave, 1); tempCreate = allocateState(tempSave, NULL, NULL); tempSave = tempCreate; } allocateSpaceForStates(tempSave, 1); save = allocateState(tempSave, NULL, NULL); tempSave = allocateState(stateMachine, NULL, NULL); for(i = 0; i < 10000; i++) { allocateSpaceForStates(tempSave, 1); tempCreate = allocateState(tempSave, NULL, NULL); tempSave = tempCreate; } allocateSpaceForStates(tempSave, 1); tempCreate = allocateState(tempSave, NULL, NULL); allocateSpaceForTransitions(save, 65533); allocateSpaceForTransitions(tempCreate, 65533); for(i = 0; i < 65533; i++) { addTransition(save, tempCreate, cmpt, 0); addTransition(tempCreate, save, cmpt, 0); cmpt++; } return stateMachine; }
Calculation are made using double float precision.
Calculation of the memory usage is made using the top command and the following fields:
Event throw (32766) | Event throw (-32767) | Event throw (32765) | Allocation time | |
---|---|---|---|---|
Mean | 291.046000 µs | 201.146000 µs | 524.774000 µs | 10 s 689 ms 268.660000 µs |
Variance | 5.0262040 | 25.132684 | 33.616924 | 508938325.3444 |
Standard deviation | 2.2 µs | 5.0 µs | 5.8 µs | 22 ms 559.6 µs |
VIRT | SHR | CODE | DATA |
---|---|---|---|
14 112 kB | 416 kB | 16 kB | 8472 kB |
We can notice here that without any optimisation, the initialisation time may appear a bit scary. In reality the only operation that is time-consuming in the previous code is the creation of 100.000 transitions on a single state without optimisation. The complexity of the transition insertion algorithm without optimisation is O(n) with n the number of already existing transition associated to the 'from' state. So the loop at line 52 is O(n²) with n = 65533 that is huge.
On the other hand we can notice that the system find and execute the transition with event 1 in about 200 µs. The transition implies 20 002 states, 10 001 are exited and 10 001 are entered. (You can use the debug system to check). So the determination of the path plus the execution of the 20 002 transition is made in 200 µs. In this case, as -32767 is the first transition of the list, the system need only some µs to find the transition so 200 can be considered as the transition execution time only. This is the strength of the library.
As usually states have very few transitions, even without any optimisation, the system can react with an event throw in a very short time. This time cannot be compressed as the algorithm actually only depends on the real number of states involved in a transition. It is a O(n) complex algorithm with n the exact number of states that are involved in the transition (entered and/or exited). O(n) seems to be the best complexity as we have to test (and execute) any entry/exit function of the involved states. So i don't think that a O(sqrt(n)) or any O(ln(n)) algorithm is possible for this function. Even the path determination (which states have to be exited or entered) is included in the one shot loop.
A few words on the memory occupation of the system, yes it is a lot :) but keep in mind that we are on a 64 bits system and that you have 130k transitions and 20k states ...
Event throw (32766) | Event throw (-32767) | Event throw (32765) | Allocation time | |
---|---|---|---|---|
Mean | 272.054000 µs | 199.844000 µs | 484.197000 µs | 10 s 689 ms 268.660000 µs |
Variance | 10.403084 | 16.595664 | 27.688191 | 508938325.3444 |
Standard deviation | 3.2 µs | 4.1 µs | 5.3 µs | 22 ms 559.6 µs |
VIRT | SHR | CODE | DATA |
---|---|---|---|
14 280 kB | 416 kB | 16 kB | 8640 kB |
So this optimisation even by avoiding memory reallocation during runtime don't make the library be faster. On the other hand avoiding dynamic allocation during runtime might be a good idea on some systems (embedded systems).
Event throw (32766) | Event throw (-32767) | Event throw (32765) | Allocation time | |
---|---|---|---|---|
Mean | 275.823000 µs | 159.550000 µs | 400.029000 µs | 10 s 719 ms 046.100000 µs |
Variance | 3.8916710 | 11.069500 | 19.914159 | 2415964816.689999 |
Standard deviation | 2.0 µs | 3.3 µs | 4.4 µs | 49 ms 152.5 µs |
VIRT | SHR | CODE | DATA |
---|---|---|---|
13 336 kB | 416 kB | 16 kB | 7704 kB |
Event throw (32766) | Event throw (-32767) | Event throw (32765) | Allocation time | |
---|---|---|---|---|
Mean | 83.445000 µs | 159.878000 µs | 159.756000 µs | 16 ms 209.160000 µs |
Variance | 0.744975 | 6.641116 | 6.6314464 | 77535.094400 |
Standard deviation | 0.9 µs | 2.6 µs | 2.6 µs | 278.5 µs |
VIRT | SHR | CODE | DATA |
---|---|---|---|
13 336 kB | 416 kB | 16 kB | 7704 kB |
With this level of optimisation, you can notice the great effect it has on allocation time. However, be careful, if you don't follow the documentation advises on how to allocate the state machine, IT WILL BE WORSE. This optimisation implies sorted lists in the state machine, so if you give unsorted information to the state machine during allocation, it will have to sort the data itself.
The transition researches have been as well optimised by this option. We can notice that now, the reaction time of the system with event -32767 and event 32765 is now the same.
This optimisation is only needed if you have many transitions associated to a single state.
Event throw (32766) | Event throw (-32767) | Event throw (32765) | Allocation time | |
---|---|---|---|---|
Mean | 80.085000 µs | 159.142000 µs | 159.428000 µs | 11 ms 387.024400 µs |
Variance | 0.741775 | 7.671836 | 8.174816 | 64682.0244000 |
Standard deviation | 0.9 µs | 2.8 µs | 2.9 µs | 254.3 µs |
VIRT | SHR | CODE | DATA |
---|---|---|---|
13 336 kB | 416 kB | 16 kB | 7704 kB |
This optimisation reduces the time consumption of the system but as said in the documentation, working with this require good knowledge of the library as it can otherwise lead you to a segfault ...
Event throw (32766) | Event throw (-32767) | Event throw (32765) | Allocation time | |
---|---|---|---|---|
Mean | 0.000000 µs | 171.975000 µs | 171.985000 µs | 16 ms 485.690000 µs |
Variance | 0.000000 | 24.756375 | 20.612775 | 75664.293900 |
Standard deviation | 0.0 µs | 5.0 µs | 4.5 µs | 275.1 µs |
VIRT | SHR | CODE | DATA |
---|---|---|---|
14 672 kB | 416 kB | 16 kB | 9040 kB |
This optimisation target to suppress any dependence of a state depth when looking for transitions. It so allow to reduce in this case the time used to find a transition or to find the absence of a transition. The timer accuracy is not good enough to measure the time. But, it implies a consequent extra memory use, and (something I cannot explain) increase the time consumption of a transition execution. The effect on the transition research is great, but is visible here because the state machine is very deep. In common state machine, the effect will surely not be as obvious.
Think twice before using this optimisation.