π
<-

## [Eng][help needed] Calculator benchmarks

### [Eng][help needed] Calculator benchmarks

I recently moved the data about "Calculator add loop" benchmark on a wiki page, here: http://www.wiki4hp.com/doku.php?id=benchmarks:addloop .

Since the benchmark, originally, was not updated after 2011, i searched for new results and i found that:
- no one has done it with a Nspire.
- Previous result with TI calculators (like ti89) are limited since the for loop was not used. For example ti89 has a score of 9400 while hp50g has a score of 31000 (using a for loop).

So, is anyone willing to do this benchmark and report the results?

The format is:
Code: Select all
- Calculator used and firmware/software- The count after 60 seconds of execution- The program code used.

For further comparisons there is another benchmark (just designed): http://www.wiki4hp.com/doku.php?id=benc ... ddlesquare . Even for this any result will be appreciated.

Thanks a lot and sorry if the section is not the "right one", i don't know this forum but it appears the only one "alive" about Ti calculators.

Code: Select all

edit: the community has just demanded a simpler benchmark (the middle square one seems not so clear). Do you mind to run also this: http://www.wiki4hp.com/doku.php?id=benchmarks:ultranaiveprimes ?

The code is:
Code: Select all
input: n--for k:=3 to n do {  for j:=2 to k-1 do {        if ( k mod j == 0 ) then {          j:= k-1 //so we exit from the inner for        }  }}

The result format is:
Code: Select all
A result is composed by the following list- the device used plus the language used, eventual overclock, eventual custom firmware and so on.- time elapsed for a given n in seconds (see below)- the code used.if the calculator is too slow, or limited, to compute a given n, then report "for n the computationtakes too much time". Conversely, if the calculator is too fast to compute a given n, then report"for n the computation takes too little time, i skipped it"

The options are
Code: Select all
n:= 100n:= 1000For very fast implementations:n:= 10000n:= 100000
Last edited by pier4r on 12 Sep 2013, 15:33, edited 1 time in total.

pier4r

Niveau 5: MO (Membre Overclocké)
Level up: 6.3%

Posts: 15
Joined: 11 Sep 2013, 16:12
Gender:
Calculator(s):

### Re: [Eng][help needed] Calculator benchmarks

Hi Pier

We'll see what we can do, on TI-68k/AMS, in BASIC and C with GCC4TI, and on Nspire in BASIC, Lua and Ndless.
Membre de la TI-Chess Team.
Co-mainteneur de GCC4TI (documentation en ligne de GCC4TI), TIEmu et TILP.

Lionel DebrouxSuper Modo

Niveau 14: CI (Calculateur de l'Infini)
Level up: 6.5%

Posts: 6488
Joined: 23 Dec 2009, 00:00
Location: France
Gender:
Calculator(s):
Class: -
GitHub: debrouxl

### Re: [Eng][help needed] Calculator benchmarks

thanks!

Anyway for further comparisons, all the benchmark gathered on the wiki are here:
http://www.wiki4hp.com/doku.php?id=benc ... p&do=index

pier4r

Niveau 5: MO (Membre Overclocké)
Level up: 6.3%

Posts: 15
Joined: 11 Sep 2013, 16:12
Gender:
Calculator(s):

### Re: [Eng][help needed] Calculator benchmarks

A little "up", i have added the code for a simpler benchmark (along with the "addloop" one). Casio guys on the casio forum are on rampage, the PRIMZ is really fast.

pier4r

Niveau 5: MO (Membre Overclocké)
Level up: 6.3%

Posts: 15
Joined: 11 Sep 2013, 16:12
Gender:
Calculator(s):

### Re: [Eng][help needed] Calculator benchmarks

For addloop, I think that I'm going to make something along the lines of the HP-50g C benchmark, currently ranked #3 on your page, because that one seems reasonably fair.

I have suggestions for the benchmark as it currently stands. To sum up, it's not your fault, but I find it generally lacking a number basic rules and information. In more detail, I mean:
1) there's already some information about OS versions, toolchains and clock speeds, but for native code programs, no information about the compiler flags, which matter a lot, as you're aware;
2) no telling whether the native code programs are forced to store and re-read the iteration variable from memory (or increment directly in memory, for the processors which can) upon each iteration, instead of keeping it in a register, which will usually make them at least 3x slower. The HP-50g benchmark doesn't store to / re-read memory (no "volatile" qualifier), but the vast, vast majority of implementations of the benchmark do, due to being written in interpreted languages which store to actual language-level variables. In fact, for native code programs, having both benchmark types would make sense;
3) no explicit telling whether all tricks are allowed to paint one's favorite calculator under a more favorable light. For instance, native code programs could skew results by disabling interrupts, which interpreted programs cannot do. Usage of interrupts, which belong to this category, is untold as well: for instance, on the TI-68k series, through interrupts, the incrementation loop could do entirely without checking any stop condition, whether a time-based condition (OS software timers, though nobody would do that because it would skew results by more than an order of magnitude, or changing the rate of the programmable timer + using one's own interrupt handler) or pressing ON (which has an interrupt on the TI-68k series), and it would therefore be faster. Here again, for native code programs, it would probably be desirable to have both standard versions, and "all tricks" versions.

I'm fully aware that centralizing information and discussing on message boards is time-consuming work, and I don't want to sound discouraging, but I felt I'd submit some of my thoughts for improvement

Also, the fx-9860g benchmark, ranked #1, is suspicious. There's no conceivable reason for the fx-9860g to be faster than the HP-50g is. IMO, chances are good that it's made artificially (though probably involuntarily) fast due to compiler optimization. Indeed, when optimization is enabled, any well-behaved recent compiler will not only compute at compile time the loop which increments the "counter" variable, but also, simply erase it from the generated code because its result is used nowhere...
Membre de la TI-Chess Team.
Co-mainteneur de GCC4TI (documentation en ligne de GCC4TI), TIEmu et TILP.

Lionel DebrouxSuper Modo

Niveau 14: CI (Calculateur de l'Infini)
Level up: 6.5%

Posts: 6488
Joined: 23 Dec 2009, 00:00
Location: France
Gender:
Calculator(s):
Class: -
GitHub: debrouxl

### Re: [Eng][help needed] Calculator benchmarks

Lionel Debroux wrote:For addloop, I think that I'm going to make something along the lines of the HP-50g C benchmark, currently ranked #3 on your page, because that one seems reasonably fair.

Thanks!

I have suggestions for the benchmark as it currently stands. To sum up, it's not your fault, but I find it generally lacking a number basic rules and information.

I agree! But nevertheless, at least for "non so tricky" submissions they give a general idea of the rough power of the device using a specific language.
For example, pick the addloop bench and the ultranaiveprimes one. Both are simple but the addloop is extremely simple, while the ultranaive use some "complex" operations.
Now, hp50g with saturn ASM score as much as the HP prime in the addloop test, and, surprisingly, they score similarly even in the ultranaive (both order of magnitude and so on). So a "general" idea can be extracted, IMO, from these simple tests.

In more detail, I mean:
1) there's already some information about OS versions, toolchains and clock speeds, but for native code programs, no information about the compiler flags, which matter a lot, as you're aware;
2) no telling whether the native code programs are forced to store and re-read the iteration variable from memory (or increment directly in memory, for the processors which can) upon each iteration, instead of keeping it in a register, which will usually make them at least 3x slower. The HP-50g benchmark doesn't store to / re-read memory (no "volatile" qualifier), but the vast, vast majority of implementations of the benchmark do, due to being written in interpreted languages which store to actual language-level variables. In fact, for native code programs, having both benchmark types would make sense;
3) no explicit telling whether all tricks are allowed to paint one's favorite calculator under a more favorable light. For instance, native code programs could skew results by disabling interrupts, which interpreted programs cannot do. Usage of interrupts, which belong to this category, is untold as well: for instance, on the TI-68k series, through interrupts, the incrementation loop could do entirely without checking any stop condition, whether a time-based condition (OS software timers, though nobody would do that because it would skew results by more than an order of magnitude, or changing the rate of the programmable timer + using one's own interrupt handler) or pressing ON (which has an interrupt on the TI-68k series), and it would therefore be faster. Here again, for native code programs, it would probably be desirable to have both standard versions, and "all tricks" versions.

1. agree
2. agree
3. agree

But users has limited time so the motto here is "it's better than nothing" (because we assume that, in general, these tests are consistent as i said above)

I'm fully aware that centralizing information and discussing on message boards is time-consuming work, and I don't want to sound discouraging, but I felt I'd submit some of my thoughts for improvement

Don't worry, on the contrary it is really important to point out these information.

Also, the fx-9860g benchmark, ranked #1, is suspicious. There's no conceivable reason for the fx-9860g to be faster than the HP-50g is. IMO, chances are good that it's made artificially (though probably involuntarily) fast due to compiler optimization. Indeed, when optimization is enabled, any well-behaved recent compiler will not only compute at compile time the loop which increments the "counter" variable, but also, simply erase it from the generated code because its result is used nowhere...

I know that but... how on the earth the compiler will know the value after 60 seconds? Anyway yes, it looks suspicious, but there is a simple solution: who looks suspicious, for the reader, isn't counted by the reader himself.

Now....unleash your Texas instruments! (i ask it to all the forum) I'm still stunned by the performance of the casio prizm. It looks so "simple" and instead is a beast (it is way faster than a 600 mhz phone, even if the latter used a scripting language) with a really simple C code!

pier4r

Niveau 5: MO (Membre Overclocké)
Level up: 6.3%

Posts: 15
Joined: 11 Sep 2013, 16:12
Gender:
Calculator(s):

### Re: [Eng][help needed] Calculator benchmarks

A small update: one kind user on cemetech forum has done the summation test with the Ti89 (only ti-basic).

I expected to see values comparable with hp50g with normal userRPL, instead it is comparable to the old 48gx.

pier4r

Niveau 5: MO (Membre Overclocké)
Level up: 6.3%

Posts: 15
Joined: 11 Sep 2013, 16:12
Gender:
Calculator(s):

### Re: [Eng][help needed] Calculator benchmarks

I know that but... how on the earth the compiler will know the value after 60 seconds?

It doesn't know the value after 60 seconds, but it knows the value at the end of the loop, which is written in the code. For years, optimizing compilers have been able to recognize a number of loop idioms, especially such simple ones as
Code: Select all
do {    counter++;  } while (counter < 349700000);

Such code is turned into
Code: Select all
counter = 349700000;

by optimizing compilers; then, Dead Store Elimination will erase this assignment and the counter variable, since it's not used later.
Unless the compiler used for the fx-9860g absolutely stinks, or the benchmark is compiled without optimization, the program should print "end" immediately.

Anyway yes, it looks suspicious, but there is a simple solution: who looks suspicious, for the reader, isn't counted by the reader himself.

If the #1 spot in the benchmark is a fluke (which remains to be confirmed), it would reduce the benchmark's credibility.

Now....unleash your Texas instruments!

I wrote I would, so here are a couple TI-68k/ASM C programs, made yesterday evening and this morning
NOTE: building them requires GCC4TI, they won't compile with the older, unmaintained and
much
harder to install TIGCC:

:
Code: Select all
// addloop_register_polling.c: optimize counting to the maximum, through keeping the value in a register and writing the main loop in ASM, so as to avoid compiler pessimizations.#define MIN_AMS 101#define USE_TI89#define USE_TI92P#define USE_V200#define USE_TI89T#define NO_CALC_DETECT#define OPTIMIZE_ROM_CALLS#define RETURN_VALUE#include <stdint.h>#include <system.h>#include <args.h>#include <estack.h>#include <peekpoke.h>#include <intr.h>#define TIMER_START_VAL (100000UL)void _main(void) {    uint32_t i = 0; // We don't want to     short orig_rate = PRG_getRate();    unsigned short orig_start = PRG_getStart();    unsigned char * ON_key_status = (unsigned char *)0x60001A;    unsigned long val = 0;    // Make the system timer an order of magnitude more precise;    // NOTE: this code assumes a HW2+ TI-68k, i.e. anything since 1999.    PRG_setRate(1); // Increment counter at a rate of 2^19/2^9 Hz    PRG_setStart(0xCE); // Trigger the interrupt every 257 - 0xCE = 51 increments ~ 20.07 Hz.    // The PRG_getStart() above effectively waited for the interrupt to trigger, so we don't need another wait.    /*OSRegisterTimer(USER_TIMER, 1);    while (!OSTimerExpired(USER_TIMER));    OSFreeTimer(USER_TIMER);*/    OSRegisterTimer(USER_TIMER, TIMER_START_VAL);    // Main loop :)    // The assembly snippet is the equivalent of     /*    do {        i++;    } while (*(volatile unsigned char *)ON_key_status & 2);    */    // but it lets no compiler pessimization, such as constant-propagating the ON_key_status variable away (sigh), occur.    asm volatile("lloop:\n"    "    addq.l #1, %0\n"    "    btst.b #1, (%1)\n"    "    bne.s lloop\n"        : "=d"(i) : "a"(ON_key_status));    // Retrieve timer value.    val = TIMER_START_VAL - OSTimerCurVal(USER_TIMER);    OSFreeTimer(USER_TIMER);    // Give some time for the ON key to come back up.    OSRegisterTimer(USER_TIMER, 4);    while (!OSTimerExpired(USER_TIMER));    OSFreeTimer(USER_TIMER);    OSClearBreak();    // Push arguments onto the RPN stack: clean arguments up, then create a list.    while (GetArgType (top_estack) != END_TAG) {        top_estack = next_expression_index (top_estack);    }    top_estack--;    push_END_TAG();    push_longint(i);    push_longint(val);    push_LIST_TAG();    // Restore old system state.    PRG_setRate(orig_rate);    PRG_setStart(orig_start);}

:
Code: Select all
// addloop_memory_polling.c: don't optimize counting that much, through "volatile" which triggers three instructions instead of just one for dealing with memory and an address which gets constant-propagated instead of being kept in a register.#define MIN_AMS 101#define USE_TI89#define USE_TI92P#define USE_V200#define USE_TI89T#define NO_CALC_DETECT#define OPTIMIZE_ROM_CALLS#define RETURN_VALUE#include <stdint.h>#include <system.h>#include <args.h>#include <estack.h>#include <peekpoke.h>#include <intr.h>#define TIMER_START_VAL (100000UL)void _main(void) {    volatile uint32_t i = 0;    short orig_rate = PRG_getRate();    unsigned short orig_start = PRG_getStart();    volatile unsigned char * ON_key_status = (volatile unsigned char *)0x60001A;    unsigned long val = 0;    // Make the system timer an order of magnitude more precise;    // NOTE: this code assumes a HW2+ TI-68k, i.e. anything since 1999.    PRG_setRate(1); // Increment counter at a rate of 2^19/2^9 Hz    PRG_setStart(0xCE); // Trigger the interrupt every 257 - 0xCE = 51 increments ~ 20.07 Hz.    // The PRG_getStart() above effectively waited for the interrupt to trigger, so we don't need another wait.    /*OSRegisterTimer(USER_TIMER, 1);    while (!OSTimerExpired(USER_TIMER));    OSFreeTimer(USER_TIMER);*/    OSRegisterTimer(USER_TIMER, TIMER_START_VAL);    // Main loop :)    // Let compiler pessimizations inherent to "volatile", such as:    // * reading and writing i in memory instead of incrementing it directly;    // * constant-propagating the ON_key_status variable away.    // occur.    do {        i++;    } while (*ON_key_status & 2);    // Retrieve timer value.    val = TIMER_START_VAL - OSTimerCurVal(USER_TIMER);    OSFreeTimer(USER_TIMER);    // Give some time for the ON key to come back up.    OSRegisterTimer(USER_TIMER, 4);    while (!OSTimerExpired(USER_TIMER));    OSFreeTimer(USER_TIMER);    OSClearBreak();    // Push arguments onto the RPN stack: clean arguments up, then create a list.    while (GetArgType (top_estack) != END_TAG) {        top_estack = next_expression_index (top_estack);    }    top_estack--;    push_END_TAG();    push_longint(i);    push_longint(val);    push_LIST_TAG();    // Restore old system state.    PRG_setRate(orig_rate);    PRG_setStart(orig_start);}

3) Build script
- all flags but -O3 reduce size but have no effect on code generation for the main loop:
Code: Select all
tigcc -v -O3 -Wall -W -mpcrel --optimize-code --cut-ranges --reorder-sections --remove-unused --merge-constants -fmerge-all-constants -Wa,--all-relocs -Wa,-l -fverbose-asm -save-temps -o addloop1 addloop_register_polling.ctigcc -v -O3 -Wall -W -mpcrel --optimize-code --cut-ranges --reorder-sections --remove-unused --merge-constants -fmerge-all-constants -Wa,--all-relocs -Wa,-l -fverbose-asm -save-temps -o addloop2 addloop_memory_polling.c

4) Results
on 89T HW4 running AMS 3.10 patched with my tiosmod+amspatch, the first element of each list being the number of timer ticks at (2^19/2^9)/53 ~ 20.07 Hz and the second element being the value of the counter when ON is pressed:
* addloop1 (addloop_register_polling): {1203, 24700949} {1237, 25423732} {1211, 24846885} (very coherent with each other)
* addloop2 (addloop_memory_polling): {1206, 9769092} {1214, 9827570} (again, coherent with each other)

* the main loop is a tiny code snippet buried into the rest of accuracy-increasing measures and dealing with the consequences of pressing the ON key;
* the main loop in addloop1 is a 1:1 copy of that of the HP-50g benchmark, and shows the 89T is between 6x and 7x slower than the HP-50g, which is easily explained, as I posted on Cemetech;
* the main loop in addloop2 is closer to interpreted languages, since at least, the variable is read from + written to memory, and it shows ~2.5x slowdown.
Membre de la TI-Chess Team.
Co-mainteneur de GCC4TI (documentation en ligne de GCC4TI), TIEmu et TILP.

Lionel DebrouxSuper Modo

Niveau 14: CI (Calculateur de l'Infini)
Level up: 6.5%

Posts: 6488
Joined: 23 Dec 2009, 00:00
Location: France
Gender:
Calculator(s):
Class: -
GitHub: debrouxl

### Re: [Eng][help needed] Calculator benchmarks

Lionel Debroux wrote:
I know that but... how on the earth the compiler will know the value after 60 seconds?

It doesn't know the value after 60 seconds, but it knows the value at the end of the loop, which is written in the code. For years, optimizing compilers have been able to recognize a number of loop idioms, especially such simple ones as
Code: Select all
do {    counter++;  } while (counter < 349700000);

Such code is turned into
Code: Select all
counter = 349700000;

by optimizing compilers; then, Dead Store Elimination will erase this assignment and the counter variable, since it's not used later.
Unless the compiler used for the fx-9860g absolutely stinks, or the benchmark is compiled without optimization, the program should print "end" immediately.

That's right! I didn't see the while (counter < 349700000) ! I simply skip it thinking at one "While until getkey something".
Now i'll report your observations as well

pier4r

Niveau 5: MO (Membre Overclocké)
Level up: 6.3%

Posts: 15
Joined: 11 Sep 2013, 16:12
Gender:
Calculator(s):

### Re: [Eng][help needed] Calculator benchmarks

Thanks.

Another odd benchmark result is "4. Casio fx-CG 10 PRIZM, OS version 01.04.3200, C PrizmSDK". The speed of the Prizm C benchmark should be close enough to the speed of the HP-50g C benchmark, significantly faster than the TI-68k C benchmarks. Looking at the code, it's due to the keyboard checking code. Declaring keyupdate(), keydownlast() (keydownhold() is unused) "static inline" should provide a performance boost.

EDIT: Savage benchmark, for TI-68k/AMS/GCC4TI.

1) File savage.c

Code: Select all
// savage.c: Savage benchmark#define MIN_AMS 101#define USE_TI89#define USE_TI92P#define USE_V200#define USE_TI89T#define NO_CALC_DETECT#define OPTIMIZE_ROM_CALLS#define RETURN_VALUE#include <stdint.h>#include <system.h>#include <args.h>#include <estack.h>#include <intr.h>#include <timath.h>#define TIMER_START_VAL (100000UL)/*5 RADIANS10 A=120 FOR I=1 TO 249930 A=TAN(ATN(EXP(LOG(SQR(A*A)))))+140 NEXT I50 PRINT A*/void _main(void) {    uint16_t i;    short orig_rate = PRG_getRate();    unsigned short orig_start = PRG_getStart();    unsigned long val = 0;    double a = 1;    // Make the system timer an order of magnitude more precise;    // NOTE: this code assumes a HW2+ TI-68k, i.e. anything since 1999.    PRG_setRate(1); // Increment counter at a rate of 2^19/2^9 Hz    PRG_setStart(0xCE); // Trigger the interrupt every 257 - 0xCE = 51 increments ~ 20.07 Hz.    // The PRG_getStart() above effectively waited for the interrupt to trigger, so we don't need another wait.    /*OSRegisterTimer(USER_TIMER, 1);    while (!OSTimerExpired(USER_TIMER));    OSFreeTimer(USER_TIMER);*/    OSRegisterTimer(USER_TIMER, TIMER_START_VAL);    // Main loop :)    for (i = 1; i < 2500; i++) {        a = tan(atan(exp(log(sqrt(a * a))))) + 1;    }    // Retrieve timer value.    val = TIMER_START_VAL - OSTimerCurVal(USER_TIMER);    OSFreeTimer(USER_TIMER);    // Push arguments onto the RPN stack: clean arguments up, then create a list.    while (GetArgType (top_estack) != END_TAG) {        top_estack = next_expression_index (top_estack);    }    top_estack--;    push_END_TAG();    push_Float(a); // Note: rounds to 14 digits.    push_longint(val);    push_LIST_TAG();    // Restore old system state.    PRG_setRate(orig_rate);    PRG_setStart(orig_start);}

2) Compiler invocation

Code: Select all
tigcc -v -O3 -Wall -W -mpcrel --optimize-code --cut-ranges --reorder-sections --remove-unused --merge-constants -fmerge-all-constants -Wa,--all-relocs -Wa,-l -fverbose-asm -save-temps -o savage savage.c

3) Results
on 89T HW4 AMS 3.10 patched with tiosmod+amspatch: {1952, 2500.0000025271}, {1951, 2500.0000025271}, i.e. ~1'37".
Examining the full 16-digit precision of the BCD floats in the debugger shows 2500.000002527092.
Membre de la TI-Chess Team.
Co-mainteneur de GCC4TI (documentation en ligne de GCC4TI), TIEmu et TILP.

Lionel DebrouxSuper Modo

Niveau 14: CI (Calculateur de l'Infini)
Level up: 6.5%

Posts: 6488
Joined: 23 Dec 2009, 00:00
Location: France
Gender:
Calculator(s):
Class: -
GitHub: debrouxl

Next

Return to Problèmes divers / Aide débutants

### Who is online

Users browsing this forum: No registered users and 7 guests

-
Search
-
Featured topics
1234
-
For more contests, prizes, reviews, helping us pay the server and domains...

Discover the the advantages of a donor account !
JoinRejoignez the donors and/or premium!les donateurs et/ou premium !

-
Featured files

DBZcx (856)
(18/09/2014)

nDoom (2705)
(02/01/2011)

Analyse de suites (22244)
(12/04/2013)

1234

-
Stats.
370 utilisateurs:
>363 invités
>2 membres
>5 robots
Record simultané (sur 6 mois):
6892 utilisateurs (le 07/06/2017)
-
Other interesting websites
Texas Instruments Education
Global | France
(English / Français)
Banque de programmes TI
ticalc.org
(English)
La communauté TI-82
tout82.free.fr
(Français)