numerical information

Without any optimisation.
With USE_PRE_ALLOCATED_BUF.
With USE_PRE_ALLOCATED_BUF and OPT_SPACE_AND_SPEED.
With USE_PRE_ALLOCATED_BUF, OPT_SPACE_AND_SPEED and OPTIMIZE_RUN.
With USE_PRE_ALLOCATED_BUF, OPT_SPACE_AND_SPEED, OPTIMIZE_RUN and DEACTIVATE_CHECKING.
With USE_PRE_ALLOCATED_BUF, OPT_SPACE_AND_SPEED, OPTIMIZE_RUN, DEACTIVATE_CHECKING and MAX_RUNTIME_SPEED.

Some tests have been made one a computer using a single CPU@2600MHz without hyperthreading. Test have been made for a state machine composed of 20.002 states split in two stacks of 10.001 nested states. The two deepest states of the state machine were interlinked with 65.533 different transitions (allocation of 2 * 65.533 transitions).

Warning:

Tests have been made on a 64 bits computer. sizeof(int) = 4 sizeof(void*) = 8

As 65.533 transitions are added to a single state, keep in mind that for each insertion, the system test if another transition exists. Without any optimisation, the complexity of the algorithm is O(n) with n the number of existing transition for a state. This will surely never happen in a real state machine but this is the worst case so it is used here for tests :) .

The provided information only applies with the following source code and the library built with gcc using -O2 optimisation.

The system reaction time is highly dependent of the structure of the state machine you are using. Depending on this structure it might be or not be a good idea to activate some options.

See the following code:

StateMachine* createDeepStateMachine(unsigned int verboseFlag)
{
    int i;
    short int cmpt = -32767;
    StateMachine* stateMachine;
    State* tempCreate;
    State* tempSave;
    State* save;
    
    stateMachine = allocStateMachine(NULL, NULL);
    setDbgParam(stateMachine, verboseFlag);
    allocateSpaceForStates(stateMachine, 2);
    tempSave = allocateState(stateMachine, NULL, NULL);

    for(i = 0; i < 10000; i++)
    {
        allocateSpaceForStates(tempSave, 1);
        tempCreate = allocateState(tempSave, NULL, NULL);
        tempSave = tempCreate;
    }
    allocateSpaceForStates(tempSave, 1);
    save = allocateState(tempSave, NULL, NULL);
    tempSave = allocateState(stateMachine, NULL, NULL);
    for(i = 0; i < 10000; i++)
    {
        allocateSpaceForStates(tempSave, 1);
        tempCreate = allocateState(tempSave, NULL, NULL);
        tempSave = tempCreate;
    }
    allocateSpaceForStates(tempSave, 1);
    tempCreate = allocateState(tempSave, NULL, NULL);

    allocateSpaceForTransitions(save, 65533);
    allocateSpaceForTransitions(tempCreate, 65533);
    
    for(i = 0; i < 65533; i++)
    {
        addTransition(save, tempCreate, cmpt, 0);
        addTransition(tempCreate, save, cmpt, 0);
        cmpt++;
    }
    return stateMachine;
}

We have tested the library using different optimisation options. The test consisted in a series of event sent to the state machine. We have tested the reaction time for an existing transition but also for a non existing transition.
The timer accuracy was 1µs using gettimeofday() system call under a gentoo linux operating system without any X server running.

Calculation are made using double float precision.
Calculation of the memory usage is made using the top command and the following fields:

VIRT The total amount of memory used it includes all code, data and shared libraries plus pages that have been swapped out.
CODE Test program code size.
DATA The amount of memory occupied by the stack and the heap.
SHR The amount of shared memory used (linux-vdso.so.1 libStateMachine.so libc.so.6 ld-linux-x86-64.so.2)

Keep in mind that tests are made on a 64 bits system (see the top of this page for more information) For each categorie, five different things have been tested.

1°) The time the system use to detect the absence of a transition after event 32766 have been thrown (no associated transition).
2°) The time the system use to detect the presence of a transition and execute it after event -32767 have been thrown. (first event in the table)
3°) The time the system use to detect the presence of a transition and execute it after event 32765 have been thrown. (last event in the table)
4°) The time used by the system to fully allocate the state mahcine.
5°) The memory occupation of the system.

Note:: Event rection statistics have been made with 1000 samples and alloation time statistics have been made with 100 samples.

Without any optimisation.

Tests results
	Event throw (32766)	Event throw (-32767)	Event throw (32765)	Allocation time
Mean	291.046000 µs	201.146000 µs	524.774000 µs	10 s 689 ms 268.660000 µs
Variance	5.0262040	25.132684	33.616924	508938325.3444
Standard deviation	2.2 µs	5.0 µs	5.8 µs	22 ms 559.6 µs

Memory occupation
VIRT SHR CODE DATA

14 112 kB 416 kB 16 kB 8472 kB

Memory occupation
VIRT	SHR	CODE	DATA
14 112 kB	416 kB	16 kB	8472 kB

We can notice here that without any optimisation, the initialisation time may appear a bit scary. In reality the only operation that is time-consuming in the previous code is the creation of 100.000 transitions on a single state without optimisation. The complexity of the transition insertion algorithm without optimisation is O(n) with n the number of already existing transition associated to the 'from' state. So the loop at line 52 is O(n²) with n = 65533 that is huge.

On the other hand we can notice that the system find and execute the transition with event 1 in about 200 µs. The transition implies 20 002 states, 10 001 are exited and 10 001 are entered. (You can use the debug system to check). So the determination of the path plus the execution of the 20 002 transition is made in 200 µs. In this case, as -32767 is the first transition of the list, the system need only some µs to find the transition so 200 can be considered as the transition execution time only. This is the strength of the library.

As usually states have very few transitions, even without any optimisation, the system can react with an event throw in a very short time. This time cannot be compressed as the algorithm actually only depends on the real number of states involved in a transition. It is a O(n) complex algorithm with n the exact number of states that are involved in the transition (entered and/or exited). O(n) seems to be the best complexity as we have to test (and execute) any entry/exit function of the involved states. So i don't think that a O(sqrt(n)) or any O(ln(n)) algorithm is possible for this function. Even the path determination (which states have to be exited or entered) is included in the one shot loop.

A few words on the memory occupation of the system, yes it is a lot :) but keep in mind that we are on a 64 bits system and that you have 130k transitions and 20k states ...

With USE_PRE_ALLOCATED_BUF.

Tests results
	Event throw (32766)	Event throw (-32767)	Event throw (32765)	Allocation time
Mean	272.054000 µs	199.844000 µs	484.197000 µs	10 s 689 ms 268.660000 µs
Variance	10.403084	16.595664	27.688191	508938325.3444
Standard deviation	3.2 µs	4.1 µs	5.3 µs	22 ms 559.6 µs

Memory occupation
VIRT SHR CODE DATA

14 280 kB 416 kB 16 kB 8640 kB

Memory occupation
VIRT	SHR	CODE	DATA
14 280 kB	416 kB	16 kB	8640 kB

So this optimisation even by avoiding memory reallocation during runtime don't make the library be faster. On the other hand avoiding dynamic allocation during runtime might be a good idea on some systems (embedded systems).

With USE_PRE_ALLOCATED_BUF and OPT_SPACE_AND_SPEED.

Tests results
	Event throw (32766)	Event throw (-32767)	Event throw (32765)	Allocation time
Mean	275.823000 µs	159.550000 µs	400.029000 µs	10 s 719 ms 046.100000 µs
Variance	3.8916710	11.069500	19.914159	2415964816.689999
Standard deviation	2.0 µs	3.3 µs	4.4 µs	49 ms 152.5 µs

Memory occupation
VIRT SHR CODE DATA

13 336 kB 416 kB 16 kB 7704 kB

This optimisation deactivate any debug capabilities and so save some space in code, but also in every state. (States doesn't have any name with this optimisation). We can notice that it allows the system to save some execution time. Indeed, the system doesn't perform anymore tests to determine if it should make a display or not. It is a good idea to use this optimisation when your system has been tested, as it implies no code modification. You can still give names to the states while allocating them, it will be ignored.

Memory occupation
VIRT	SHR	CODE	DATA
13 336 kB	416 kB	16 kB	7704 kB

With USE_PRE_ALLOCATED_BUF, OPT_SPACE_AND_SPEED and OPTIMIZE_RUN.

Tests results
	Event throw (32766)	Event throw (-32767)	Event throw (32765)	Allocation time
Mean	83.445000 µs	159.878000 µs	159.756000 µs	16 ms 209.160000 µs
Variance	0.744975	6.641116	6.6314464	77535.094400
Standard deviation	0.9 µs	2.6 µs	2.6 µs	278.5 µs

Memory occupation
VIRT SHR CODE DATA

13 336 kB 416 kB 16 kB 7704 kB

Memory occupation
VIRT	SHR	CODE	DATA
13 336 kB	416 kB	16 kB	7704 kB

With this level of optimisation, you can notice the great effect it has on allocation time. However, be careful, if you don't follow the documentation advises on how to allocate the state machine, IT WILL BE WORSE. This optimisation implies sorted lists in the state machine, so if you give unsorted information to the state machine during allocation, it will have to sort the data itself.

The transition researches have been as well optimised by this option. We can notice that now, the reaction time of the system with event -32767 and event 32765 is now the same.

This optimisation is only needed if you have many transitions associated to a single state.

With USE_PRE_ALLOCATED_BUF, OPT_SPACE_AND_SPEED, OPTIMIZE_RUN and DEACTIVATE_CHECKING.

Tests results
	Event throw (32766)	Event throw (-32767)	Event throw (32765)	Allocation time
Mean	80.085000 µs	159.142000 µs	159.428000 µs	11 ms 387.024400 µs
Variance	0.741775	7.671836	8.174816	64682.0244000
Standard deviation	0.9 µs	2.8 µs	2.9 µs	254.3 µs

Memory occupation
VIRT SHR CODE DATA

13 336 kB 416 kB 16 kB 7704 kB

Memory occupation
VIRT	SHR	CODE	DATA
13 336 kB	416 kB	16 kB	7704 kB

This optimisation reduces the time consumption of the system but as said in the documentation, working with this require good knowledge of the library as it can otherwise lead you to a segfault ...

With USE_PRE_ALLOCATED_BUF, OPT_SPACE_AND_SPEED, OPTIMIZE_RUN, DEACTIVATE_CHECKING and MAX_RUNTIME_SPEED.

Tests results
	Event throw (32766)	Event throw (-32767)	Event throw (32765)	Allocation time
Mean	0.000000 µs	171.975000 µs	171.985000 µs	16 ms 485.690000 µs
Variance	0.000000	24.756375	20.612775	75664.293900
Standard deviation	0.0 µs	5.0 µs	4.5 µs	275.1 µs

Memory occupation
VIRT SHR CODE DATA

14 672 kB 416 kB 16 kB 9040 kB

Memory occupation
VIRT	SHR	CODE	DATA
14 672 kB	416 kB	16 kB	9040 kB

This optimisation target to suppress any dependence of a state depth when looking for transitions. It so allow to reduce in this case the time used to find a transition or to find the absence of a transition. The timer accuracy is not good enough to measure the time. But, it implies a consequent extra memory use, and (something I cannot explain) increase the time consumption of a transition execution. The effect on the transition research is great, but is visible here because the state machine is very deep. In common state machine, the effect will surely not be as obvious.

Think twice before using this optimisation.