numerical information

Without any optimisation.
With USE_PRE_ALLOCATED_BUF.
With USE_PRE_ALLOCATED_BUF and OPT_SPACE_AND_SPEED.
With USE_PRE_ALLOCATED_BUF, OPT_SPACE_AND_SPEED and OPTIMIZE_RUN.
With USE_PRE_ALLOCATED_BUF, OPT_SPACE_AND_SPEED, OPTIMIZE_RUN and DEACTIVATE_CHECKING.
With USE_PRE_ALLOCATED_BUF, OPT_SPACE_AND_SPEED, OPTIMIZE_RUN, DEACTIVATE_CHECKING and MAX_RUNTIME_SPEED.

Some tests have been made one a computer using a single CPU@2600MHz without hyperthreading. Test have been made for a state machine composed of 20.002 states split in two stacks of 10.001 nested states. The two deepest states of the state machine were interlinked with 65.533 different transitions (allocation of 2 * 65.533 transitions).

Warning:
Tests have been made on a 64 bits computer. sizeof(int) = 4 sizeof(void*) = 8

As 65.533 transitions are added to a single state, keep in mind that for each insertion, the system test if another transition exists. Without any optimisation, the complexity of the algorithm is O(n) with n the number of existing transition for a state. This will surely never happen in a real state machine but this is the worst case so it is used here for tests :) .

The provided information only applies with the following source code and the library built with gcc using -O2 optimisation.

The system reaction time is highly dependent of the structure of the state machine you are using. Depending on this structure it might be or not be a good idea to activate some options.

See the following code:
StateMachine* createDeepStateMachine(unsigned int verboseFlag)
{
    int i;
    short int cmpt = -32767;
    StateMachine* stateMachine;
    State* tempCreate;
    State* tempSave;
    State* save;
    
    stateMachine = allocStateMachine(NULL, NULL);
    setDbgParam(stateMachine, verboseFlag);
    allocateSpaceForStates(stateMachine, 2);
    tempSave = allocateState(stateMachine, NULL, NULL);

    for(i = 0; i < 10000; i++)
    {
        allocateSpaceForStates(tempSave, 1);
        tempCreate = allocateState(tempSave, NULL, NULL);
        tempSave = tempCreate;
    }
    allocateSpaceForStates(tempSave, 1);
    save = allocateState(tempSave, NULL, NULL);
    tempSave = allocateState(stateMachine, NULL, NULL);
    for(i = 0; i < 10000; i++)
    {
        allocateSpaceForStates(tempSave, 1);
        tempCreate = allocateState(tempSave, NULL, NULL);
        tempSave = tempCreate;
    }
    allocateSpaceForStates(tempSave, 1);
    tempCreate = allocateState(tempSave, NULL, NULL);

    allocateSpaceForTransitions(save, 65533);
    allocateSpaceForTransitions(tempCreate, 65533);
    
    for(i = 0; i < 65533; i++)
    {
        addTransition(save, tempCreate, cmpt, 0);
        addTransition(tempCreate, save, cmpt, 0);
        cmpt++;
    }
    return stateMachine;
}
We have tested the library using different optimisation options. The test consisted in a series of event sent to the state machine. We have tested the reaction time for an existing transition but also for a non existing transition.
The timer accuracy was 1µs using gettimeofday() system call under a gentoo linux operating system without any X server running.

Calculation are made using double float precision.
Calculation of the memory usage is made using the top command and the following fields:

Keep in mind that tests are made on a 64 bits system (see the top of this page for more information) For each categorie, five different things have been tested.
Note:
Event rection statistics have been made with 1000 samples and alloation time statistics have been made with 100 samples.

Without any optimisation.

Tests results
Event throw (32766) Event throw (-32767) Event throw (32765) Allocation time
Mean 291.046000 µs 201.146000 µs 524.774000 µs 10 s 689 ms 268.660000 µs
Variance 5.0262040 25.132684 33.616924 508938325.3444
Standard deviation 2.2 µs 5.0 µs 5.8 µs 22 ms 559.6 µs

Memory occupation
VIRT SHR CODE DATA
14 112 kB 416 kB 16 kB 8472 kB

We can notice here that without any optimisation, the initialisation time may appear a bit scary. In reality the only operation that is time-consuming in the previous code is the creation of 100.000 transitions on a single state without optimisation. The complexity of the transition insertion algorithm without optimisation is O(n) with n the number of already existing transition associated to the 'from' state. So the loop at line 52 is O(n²) with n = 65533 that is huge.

On the other hand we can notice that the system find and execute the transition with event 1 in about 200 µs. The transition implies 20 002 states, 10 001 are exited and 10 001 are entered. (You can use the debug system to check). So the determination of the path plus the execution of the 20 002 transition is made in 200 µs. In this case, as -32767 is the first transition of the list, the system need only some µs to find the transition so 200 can be considered as the transition execution time only. This is the strength of the library.

As usually states have very few transitions, even without any optimisation, the system can react with an event throw in a very short time. This time cannot be compressed as the algorithm actually only depends on the real number of states involved in a transition. It is a O(n) complex algorithm with n the exact number of states that are involved in the transition (entered and/or exited). O(n) seems to be the best complexity as we have to test (and execute) any entry/exit function of the involved states. So i don't think that a O(sqrt(n)) or any O(ln(n)) algorithm is possible for this function. Even the path determination (which states have to be exited or entered) is included in the one shot loop.

A few words on the memory occupation of the system, yes it is a lot :) but keep in mind that we are on a 64 bits system and that you have 130k transitions and 20k states ...

With USE_PRE_ALLOCATED_BUF.

Tests results
Event throw (32766) Event throw (-32767) Event throw (32765) Allocation time
Mean 272.054000 µs 199.844000 µs 484.197000 µs 10 s 689 ms 268.660000 µs
Variance 10.403084 16.595664 27.688191 508938325.3444
Standard deviation 3.2 µs 4.1 µs 5.3 µs 22 ms 559.6 µs

Memory occupation
VIRT SHR CODE DATA
14 280 kB 416 kB 16 kB 8640 kB

So this optimisation even by avoiding memory reallocation during runtime don't make the library be faster. On the other hand avoiding dynamic allocation during runtime might be a good idea on some systems (embedded systems).

With USE_PRE_ALLOCATED_BUF and OPT_SPACE_AND_SPEED.

Tests results
Event throw (32766) Event throw (-32767) Event throw (32765) Allocation time
Mean 275.823000 µs 159.550000 µs 400.029000 µs 10 s 719 ms 046.100000 µs
Variance 3.8916710 11.069500 19.914159 2415964816.689999
Standard deviation 2.0 µs 3.3 µs 4.4 µs 49 ms 152.5 µs

Memory occupation
VIRT SHR CODE DATA
13 336 kB 416 kB 16 kB 7704 kB
This optimisation deactivate any debug capabilities and so save some space in code, but also in every state. (States doesn't have any name with this optimisation). We can notice that it allows the system to save some execution time. Indeed, the system doesn't perform anymore tests to determine if it should make a display or not. It is a good idea to use this optimisation when your system has been tested, as it implies no code modification. You can still give names to the states while allocating them, it will be ignored.

With USE_PRE_ALLOCATED_BUF, OPT_SPACE_AND_SPEED and OPTIMIZE_RUN.

Tests results
Event throw (32766) Event throw (-32767) Event throw (32765) Allocation time
Mean 83.445000 µs 159.878000 µs 159.756000 µs 16 ms 209.160000 µs
Variance 0.744975 6.641116 6.6314464 77535.094400
Standard deviation 0.9 µs 2.6 µs 2.6 µs 278.5 µs

Memory occupation
VIRT SHR CODE DATA
13 336 kB 416 kB 16 kB 7704 kB

With this level of optimisation, you can notice the great effect it has on allocation time. However, be careful, if you don't follow the documentation advises on how to allocate the state machine, IT WILL BE WORSE. This optimisation implies sorted lists in the state machine, so if you give unsorted information to the state machine during allocation, it will have to sort the data itself.

The transition researches have been as well optimised by this option. We can notice that now, the reaction time of the system with event -32767 and event 32765 is now the same.

This optimisation is only needed if you have many transitions associated to a single state.

With USE_PRE_ALLOCATED_BUF, OPT_SPACE_AND_SPEED, OPTIMIZE_RUN and DEACTIVATE_CHECKING.

Tests results
Event throw (32766) Event throw (-32767) Event throw (32765) Allocation time
Mean 80.085000 µs 159.142000 µs 159.428000 µs 11 ms 387.024400 µs
Variance 0.741775 7.671836 8.174816 64682.0244000
Standard deviation 0.9 µs 2.8 µs 2.9 µs 254.3 µs

Memory occupation
VIRT SHR CODE DATA
13 336 kB 416 kB 16 kB 7704 kB

This optimisation reduces the time consumption of the system but as said in the documentation, working with this require good knowledge of the library as it can otherwise lead you to a segfault ...

With USE_PRE_ALLOCATED_BUF, OPT_SPACE_AND_SPEED, OPTIMIZE_RUN, DEACTIVATE_CHECKING and MAX_RUNTIME_SPEED.

Tests results
Event throw (32766) Event throw (-32767) Event throw (32765) Allocation time
Mean 0.000000 µs 171.975000 µs 171.985000 µs 16 ms 485.690000 µs
Variance 0.000000 24.756375 20.612775 75664.293900
Standard deviation 0.0 µs 5.0 µs 4.5 µs 275.1 µs

Memory occupation
VIRT SHR CODE DATA
14 672 kB 416 kB 16 kB 9040 kB

This optimisation target to suppress any dependence of a state depth when looking for transitions. It so allow to reduce in this case the time used to find a transition or to find the absence of a transition. The timer accuracy is not good enough to measure the time. But, it implies a consequent extra memory use, and (something I cannot explain) increase the time consumption of a transition execution. The effect on the transition research is great, but is visible here because the state machine is very deep. In common state machine, the effect will surely not be as obvious.

Think twice before using this optimisation.


Generated on Sat Sep 18 15:39:57 2010 for Easy C State Machine by  doxygen 1.5.8