Discussion:
Speeding up code - am I missing something?
(too old to reply)
Robert Prins
2018-08-15 21:37:41 UTC
Permalink
Hi all,

Not really specific x86, Virtual Pascal, or PL/I, but as I've got the same code
written in all three languages, I'll give it a try here:

I have an array (actually three, but that's not very relevant for the issue) of
pointers to items in a linked list. The array is sorted into a specific order,
and after the sort, I use it to print Top-N tables. For the main Top-50 table (x
3) that's easy enough, the first 50 items are selected.

However, at this time the linked list also contains 5 sub-lists, each containing
additional sub-lists for a total of 390 (yes, three-hunderd-and-ninety)
sub-lists for which I want to print Top-10 tables (or if a sub-list contains
just 4 items, a Top-4 table). Because I don't really want to sort the main array
six (x 3) times, I just scan the array from the top, and pick out the items for
each sub-list, until I have 10 (or the 4 mentioned before) items. To speed up
the process a little, I make an initial pass over the array to find the first
item for each sub-list, but that doesn't really improve things a great deal.

So given that "this" is the list, sorted on its main key, with the second and
third columns identifying additional sub-lists,

100, 'a', '1'
099, 'b', '9'
098, 'c', '1'
097, 'b', '8'
096, 'd', '3'
095, 'a', '2'
094, 'c', '8'
093, 'c', '1'
092, 'a', '3'
091, 'f', '4'
090, 'a', '2'

the "Top-50" would contain the above.

For sub-list 'a', I would scan the array, and the resulting "Top-10" would be

100, 'a', '1'
095, 'a', '2'
092, 'a', '3'
090, 'a', '2'

Likewise, for sub-list 'b', I would scan the array, and the resulting "Top-10"
would be, and the initial fill-the-cache scan would have set the starting index
for the 'b' sub-list to the cell containing 099, saving me one superfluous compare.

099, 'b', '9'
097, 'b', '8'

etc, and the same would happen for the sub-lists defined by column 3, the first
being

100, 'a', '1'
098, 'c', '1'
093, 'c', '1'

As I wrote, I make an initial pass over the data to cache the first item for
each sub-list, some of them consist of a single item and are nearly at the end
of the sorted full array, and this saves some time, the count of executed
statements is reduced by around 10%, but if anyone can think of a way to speed
up things a bit more, without having to sort the array(s) five more times (in an
additional 10,000,000 executed statements, see below), please share it with me.

Some numbers:

- the caching code executes around 120,000 PL/I statements
- sorting the three arrays executes around in 2,000,000 PL/I statements (using a
Heapsort)
- printing the 390 top-n tables executes around 7,600,000 PL/I statements, and
the "IF" statement that selects the data to be printed has a hit-percentage of
just 0.5%:

IF STY_TCN = 'Tr' & STY = LIST.TR | /* 208 type 1 */ 1,841,754 executed IF's
STY_TCN = 'Na' & TCN = LIST.NA | /* 93 type 2 */
STY_TCN = 'Yr' & STY = LIST.YR | /* 37 type 3 */
STY_TCN = 'Co' & TCN = LIST.CO | /* 33 type 4 */
STY_TCN = 'Ty' & TCN = LIST.TY | /* 19 type 5 */
STY_TCN = '**' THEN /* 1 main */
DO;
#T = #T + 1; 9,594 - the actual
number of Top-N lines printed

Robert
--
Robert AH Prins
robert(a)prino(d)org
Peter Flass
2018-08-15 21:52:36 UTC
Permalink
Post by Robert Prins
Hi all,
Not really specific x86, Virtual Pascal, or PL/I, but as I've got the same code
I have an array (actually three, but that's not very relevant for the issue) of
pointers to items in a linked list. The array is sorted into a specific order,
and after the sort, I use it to print Top-N tables. For the main Top-50 table (x
3) that's easy enough, the first 50 items are selected.
However, at this time the linked list also contains 5 sub-lists, each containing
additional sub-lists for a total of 390 (yes, three-hunderd-and-ninety)
sub-lists for which I want to print Top-10 tables (or if a sub-list contains
just 4 items, a Top-4 table). Because I don't really want to sort the main array
six (x 3) times, I just scan the array from the top, and pick out the items for
each sub-list, until I have 10 (or the 4 mentioned before) items. To speed up
the process a little, I make an initial pass over the array to find the first
item for each sub-list, but that doesn't really improve things a great deal.
So given that "this" is the list, sorted on its main key, with the second and
third columns identifying additional sub-lists,
100, 'a', '1'
099, 'b', '9'
098, 'c', '1'
097, 'b', '8'
096, 'd', '3'
095, 'a', '2'
094, 'c', '8'
093, 'c', '1'
092, 'a', '3'
091, 'f', '4'
090, 'a', '2'
the "Top-50" would contain the above.
For sub-list 'a', I would scan the array, and the resulting "Top-10" would be
100, 'a', '1'
095, 'a', '2'
092, 'a', '3'
090, 'a', '2'
Likewise, for sub-list 'b', I would scan the array, and the resulting "Top-10"
would be, and the initial fill-the-cache scan would have set the starting index
for the 'b' sub-list to the cell containing 099, saving me one superfluous compare.
099, 'b', '9'
097, 'b', '8'
etc, and the same would happen for the sub-lists defined by column 3, the first
being
100, 'a', '1'
098, 'c', '1'
093, 'c', '1'
As I wrote, I make an initial pass over the data to cache the first item for
each sub-list, some of them consist of a single item and are nearly at the end
of the sorted full array, and this saves some time, the count of executed
statements is reduced by around 10%, but if anyone can think of a way to speed
up things a bit more, without having to sort the array(s) five more times (in an
additional 10,000,000 executed statements, see below), please share it with me.
- the caching code executes around 120,000 PL/I statements
- sorting the three arrays executes around in 2,000,000 PL/I statements (using a
Heapsort)
- printing the 390 top-n tables executes around 7,600,000 PL/I statements, and
the "IF" statement that selects the data to be printed has a hit-percentage of
IF STY_TCN = 'Tr' & STY = LIST.TR | /* 208 type 1 */ 1,841,754 executed IF's
STY_TCN = 'Na' & TCN = LIST.NA | /* 93 type 2 */
STY_TCN = 'Yr' & STY = LIST.YR | /* 37 type 3 */
STY_TCN = 'Co' & TCN = LIST.CO | /* 33 type 4 */
STY_TCN = 'Ty' & TCN = LIST.TY | /* 19 type 5 */
STY_TCN = '**' THEN /* 1 main */
DO;
#T = #T + 1; 9,594 - the actual
number of Top-N lines printed
Robert
Assuming mainframe, the rule-of-thumb I learned was "let the sort do the
work" since DFSORT is always more efficient than any user code. I can't
visualize what you're doing from your description, but if possible I'd make
one pass of the data and build records that can be sorted into the desired
sequence by DFSORT. Then read the sorted records and print the listings. I
would use PLISRTD to build and retrieve the records, but some would
probably prefer two separate programs with the sort in between invoked by
JCL, or build the records and ATTACH the sort, or...
--
Pete
Robert Prins
2018-08-16 19:51:00 UTC
Permalink
Post by Peter Flass
Post by Robert Prins
Hi all,
Not really specific x86, Virtual Pascal, or PL/I, but as I've got the same code
I have an array (actually three, but that's not very relevant for the issue) of
pointers to items in a linked list. The array is sorted into a specific order,
and after the sort, I use it to print Top-N tables. For the main Top-50 table (x
3) that's easy enough, the first 50 items are selected.
However, at this time the linked list also contains 5 sub-lists, each containing
additional sub-lists for a total of 390 (yes, three-hunderd-and-ninety)
sub-lists for which I want to print Top-10 tables (or if a sub-list contains
just 4 items, a Top-4 table). Because I don't really want to sort the main array
six (x 3) times, I just scan the array from the top, and pick out the items for
each sub-list, until I have 10 (or the 4 mentioned before) items. To speed up
the process a little, I make an initial pass over the array to find the first
item for each sub-list, but that doesn't really improve things a great deal.
So given that "this" is the list, sorted on its main key, with the second and
third columns identifying additional sub-lists,
100, 'a', '1'
099, 'b', '9'
098, 'c', '1'
097, 'b', '8'
096, 'd', '3'
095, 'a', '2'
094, 'c', '8'
093, 'c', '1'
092, 'a', '3'
091, 'f', '4'
090, 'a', '2'
the "Top-50" would contain the above.
For sub-list 'a', I would scan the array, and the resulting "Top-10" would be
100, 'a', '1'
095, 'a', '2'
092, 'a', '3'
090, 'a', '2'
Likewise, for sub-list 'b', I would scan the array, and the resulting "Top-10"
would be, and the initial fill-the-cache scan would have set the starting index
for the 'b' sub-list to the cell containing 099, saving me one superfluous compare.
099, 'b', '9'
097, 'b', '8'
etc, and the same would happen for the sub-lists defined by column 3, the first
being
100, 'a', '1'
098, 'c', '1'
093, 'c', '1'
As I wrote, I make an initial pass over the data to cache the first item for
each sub-list, some of them consist of a single item and are nearly at the end
of the sorted full array, and this saves some time, the count of executed
statements is reduced by around 10%, but if anyone can think of a way to speed
up things a bit more, without having to sort the array(s) five more times (in an
additional 10,000,000 executed statements, see below), please share it with me.
- the caching code executes around 120,000 PL/I statements
- sorting the three arrays executes around in 2,000,000 PL/I statements (using a
Heapsort)
- printing the 390 top-n tables executes around 7,600,000 PL/I statements, and
the "IF" statement that selects the data to be printed has a hit-percentage of
IF STY_TCN = 'Tr' & STY = LIST.TR | /* 208 type 1 */ 1,841,754 executed IF's
STY_TCN = 'Na' & TCN = LIST.NA | /* 93 type 2 */
STY_TCN = 'Yr' & STY = LIST.YR | /* 37 type 3 */
STY_TCN = 'Co' & TCN = LIST.CO | /* 33 type 4 */
STY_TCN = 'Ty' & TCN = LIST.TY | /* 19 type 5 */
STY_TCN = '**' THEN /* 1 main */
DO;
#T = #T + 1; 9,594 - the actual
number of Top-N lines printed
Robert
Assuming mainframe, the rule-of-thumb I learned was "let the sort do the
work" since DFSORT is always more efficient than any user code. I can't
visualize what you're doing from your description, but if possible I'd make
one pass of the data and build records that can be sorted into the desired
sequence by DFSORT. Then read the sorted records and print the listings. I
would use PLISRTD to build and retrieve the records, but some would
probably prefer two separate programs with the sort in between invoked by
JCL, or build the records and ATTACH the sort, or...
The program processes my hitchhike data (yes, really), and I have three
versions, one written in PL/I, which runs on z/OS, one written in Virtual
Pascal, running on Windows, and one where most of the Pascal has been converted
to in-line assembler. All three should produce the same output, and when there
are differences (and there were some recently), I know that I'm doing something
wrong.

The full set of hitchhike data is kept in a single linked list, but every item
in the list contains a load of extra pointers to form additional lists,

1) per trip,
2) per type of driver,
3) per country,
4) per nationality of driver, and
5) per year.

The very first (Turbo) Pascal (V3.01a) version of the program produced just a
few tables (see
<https://prino.neocities.org/miscellaneous/keeping-statistics.html> for a longer
description), but over the years, most as a result of in-ride conversations with
my drivers "Did you ever consider determining ...?", I've added a "few" more tables.

This post refers to a relatively new set of tables, the Top-n ones. The program
produces a Top-50 table of the longest rides (actually three Top-50 tables, for
longest rides (km), longest rides (time), and fastest rides (km/h)) for the
whole set, and those three tables, but then limited to Top-10 (or Top-<10) are
then repeated for every trip (208 at the moment), type of driver (19), country
(1+33), nationality of the driver (93) and year (37) for a total of 390 tables.

Sorting the array of pointers to the items of the list in distance order gives
me all data in that order, and to obtain the top-10 fort a specific sub-list, I
just skip the unwanted data, and which saves me, for example for case 1) above,
three additional sorts (distance/time/velocity) where the trip-number is the
primary key (and ditto for all other sub-lists), at the, as is obvious from the
data presented, expense of rather a hell of a lot of extra compares.

And yes, on z/OS I could use the PLISRTx interface to DFSORT, and
using Virtual Pascal I could, with rather a lot more hassle, spawn M$' sort.exe,
but that would mean writing out the set of data five times in pure text format,
as SORT.EXE still cannot sort on anything other than a single starting column.
Hell, even using PLISRTD requires me to write out the data over and over again...

So, unless someone comes up with some kind of brilliant suggestion (like Paul
Green way back in 1998, when I asked how to speed-up another procedure(*)), I'm
probably stuck with what I have now, whether I like it or not. (I don't, but
c'est la vie...)

And lest you say, "Why the flipping 'ell are we discussing a program that deals
with data relating to hitchhiking?"

My answer to that question?

The very first time (October 2006) I compiled an older it with Enterprise PL/I
(V3.5.0) the compiler abended, not being able to handle the INIT of a based
structure with a power ("INIT ((dont-remember-what**2))"), and it's directly
responsible for

- https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=31632
- https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=84512
- https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=100073
- https://www.ibm.com/developerworks/rfe/execute?use_case=viewRfe&CR_ID=115540

'nuff said?

Robert
--
PS:

(*) Executed PL/I statement counts:

- pre Paul Green: NAT_SCAN 5,407,617,546
- post Paul Green: NAT_SCAN 1,237,239

Old 2.3.0 compiler actually requires compilation with (NOFOFL): prefix to avoid
a decimal overflow exception abend in the PL/I library code.
--
Robert AH Prins
robert(a)prino(d)org
Robert Prins
2018-08-16 20:38:20 UTC
Permalink
(newsgroups limited to just clax86, my server does not allow indiscriminate
cross-posting.)
Robert, you do realize that anything you are going to print will take billions
of cycles for every second the printer needs to actually print the pages?
Really? ROFL...

Of course I do realise that once I/O is added, all bets are off. But as I've
mentioned in another reply you may already have seen, sometimes others approach
a problem from a completely different angle, with stunning results.

Somewhere in the back of my head I think I might be able to build the top-10 or
top<10 lists in my caching code, but that would mean adding another 5 pointers
to each item, and more arrays to keep track of start- and end-of-list addresses.
It may be worth a try, but I remember that I once changed the PL/I version to
use arrays of pointers and indices in the actual allocated items, and code
generated with, I think Enterprise PL/I V3.9, was pretty AD 1985 Borland TP
3.01a-ish, even at the highest level of optimisation. Virtual Pascal would not
be any better, even two consecutive

a:= array[i].a;
b:= array[i].b

statements result in the calculation of the address of the variable in the array
twice.

Robert
--
Robert AH Prins
robert(a)prino(d)org
Terje
Post by Robert Prins
Hi all,
Not really specific x86, Virtual Pascal, or PL/I, but as I've got the
I have an array (actually three, but that's not very relevant for the
issue) of pointers to items in a linked list. The array is sorted into a
specific order, and after the sort, I use it to print Top-N tables. For
the main Top-50 table (x 3) that's easy enough, the first 50 items are
selected.
However, at this time the linked list also contains 5 sub-lists, each
containing additional sub-lists for a total of 390 (yes,
three-hunderd-and-ninety) sub-lists for which I want to print Top-10
tables (or if a sub-list contains just 4 items, a Top-4 table). Because
I don't really want to sort the main array six (x 3) times, I just scan
the array from the top, and pick out the items for each sub-list, until
I have 10 (or the 4 mentioned before) items. To speed up the process a
little, I make an initial pass over the array to find the first item for
each sub-list, but that doesn't really improve things a great deal.
So given that "this" is the list, sorted on its main key, with the
second and third columns identifying additional sub-lists,
100, 'a', '1'
099, 'b', '9'
098, 'c', '1'
097, 'b', '8'
096, 'd', '3'
095, 'a', '2'
094, 'c', '8'
093, 'c', '1'
092, 'a', '3'
091, 'f', '4'
090, 'a', '2'
the "Top-50" would contain the above.
For sub-list 'a', I would scan the array, and the resulting "Top-10" would be
100, 'a', '1'
095, 'a', '2'
092, 'a', '3'
090, 'a', '2'
Likewise, for sub-list 'b', I would scan the array, and the resulting
"Top-10" would be, and the initial fill-the-cache scan would have set
the starting index for the 'b' sub-list to the cell containing 099,
saving me one superfluous compare.
099, 'b', '9'
097, 'b', '8'
etc, and the same would happen for the sub-lists defined by column 3,
the first being
100, 'a', '1'
098, 'c', '1'
093, 'c', '1'
As I wrote, I make an initial pass over the data to cache the first item
for each sub-list, some of them consist of a single item and are nearly
at the end of the sorted full array, and this saves some time, the count
of executed statements is reduced by around 10%, but if anyone can think
of a way to speed up things a bit more, without having to sort the
array(s) five more times (in an additional 10,000,000 executed
statements, see below), please share it with me.
- the caching code executes around 120,000 PL/I statements
- sorting the three arrays executes around in 2,000,000 PL/I statements
(using a Heapsort)
- printing the 390 top-n tables executes around 7,600,000 PL/I
statements, and the "IF" statement that selects the data to be printed
IF STY_TCN = 'Tr' & STY = LIST.TR | /* 208 type 1 */ 1,841,754 executed IF's
   STY_TCN = 'Na' & TCN = LIST.NA | /*  93 type 2 */
   STY_TCN = 'Yr' & STY = LIST.YR | /*  37 type 3 */
   STY_TCN = 'Co' & TCN = LIST.CO | /*  33 type 4 */
   STY_TCN = 'Ty' & TCN = LIST.TY | /*  19 type 5 */
   STY_TCN = '**' THEN              /*   1 main   */
  DO;
    #T = #T + 1;                                         9,594 - the
actual number of Top-N lines printed
Robert Prins
2018-08-17 19:20:54 UTC
Permalink
Post by Robert Prins
(newsgroups limited to just clax86, my server does not allow
indiscriminate cross-posting.)
Robert, you do realize that anything you are going to print will take
billions of cycles for every second the printer needs to actually print
the pages?
Really? ROFL...
Of course I do realise that once I/O is added, all bets are off. But as
I've mentioned in another reply you may already have seen, sometimes
others approach a problem from a completely different angle, with stunning
results.
Create a function for each of these lists, put them all on an array and call
Each time a list has been completed, the corresponding function pointer
removes itself from the array of functions to be called.
Great minds (OK, one very great mind and my rather smaller one) seem to think
the same, I'm actually using a similar approach in the follow-up program that
converts the output of "LIFT" to RTF format:

@02:
mov eax, _s
lea eax, [eax * 4 + offset @03]
jmp dword ptr [eax]

align 4

@03:
dd offset @04 // 0 - Trip -> 0 s -> 1
dd offset @05 // 1 - Trip fall through -> 2
dd offset @06 // 2 - Type -> 2 s -> 3
dd offset @07 // 3 - Type fall through -> 4
dd offset @08 // 4 - Cnty -> 4 s -> 5
dd offset @09 // 5 - Cnty fall through -> 6
dd offset @10 // 6 - Nat -> 6 s -> 7
dd offset @11 // 7 - Nat fall through -> 8
dd offset @14 // 8 - Naco / Cnty #I/#E -> 8 s -> 9
dd offset @18 // 9 - Cnty #I/#E fall through -> 10
dd offset @20 // 10 - Year -> 10 s -> 11
dd offset @21 // 11 - Year fall through -> 12
dd offset @22 // 12 - V! tables -> 12 s -> 13
nop
nop
nop
nop

@04:

Sadly, this approach is not (entirely) possible here, because the Country top-10
tables are "incomplete", I've hitched through Bulgaria, Slovakia, and Andorra,
but I've never hitched in those countries.

Obviously I could cater for this, but while we've got this discussion going,
I've decided to try filling the entire top-N in the "caching" code, where I end
up going to nearly the bottom, entry 4,300 of (currently) 4,310 rides, anyway. A
quick test using the old PL/I version, where I can use a compiler option that
tells me how many times each statement is executed, tells me that the executed
statement count is reduced by more than 90% when I use this approach for the
Top-10s of trips. ;)

Only unpleasant side issue is the fact that I'm using 3-D arrays,

dcl 1 t10_trip(trip_total.trip) ctl,
2 tr_ix(3) fixed bin (31) init ((3)0),
2 tr_ktv(3, 10) ptr;

with

#j = t10_trip(lift_list.trip).tr_ix(1) + 1;
if #j <= hbound(tr_ktv, 3) then
t10_trip(lift_list.trip).tr_ktv(1, #j) = lift_ptr;

t10_trip(lift_list.trip).tr_ix(1) = #j;

to fill the top-10 slots, and worse for the Type/Country/Nat tables where I have
to do a lookup of the index. (Which leads to the question, is there a way using
all the new whiter than white x86 instructions to fast match a 4-byte value in
an array of up to (potentially) 234 values? I'm now using the "vpcmpistri" (see
another tread) to find CR's in the buffer (and don't understand why I require a
VPXOR or set them to 0x00), and similar code would probably faster than a
discretely coded version of CMPSD...)

which won't result in pretty code using Virtual Pascal. Need to check what
Enterprise PL/I generates, but even here, based on past experience, I'm somewhat
pessimistic.

More later,

Robert
--
Robert AH Prins
robert(a)prino(d)org
Robert Prins
2018-08-23 17:21:01 UTC
Permalink
Post by Robert Prins
(newsgroups limited to just clax86, my server does not allow indiscriminate
cross-posting.)
<snip>
Post by Robert Prins
Obviously I could cater for this, but while we've got this discussion going,
I've decided to try filling the entire top-N in the "caching" code, where I end
up going to nearly the bottom, entry 4,300 of (currently) 4,310 rides, anyway. A
quick test using the old PL/I version, where I can use a compiler option that
tells me how many times each statement is executed, tells me that the executed
statement count is reduced by more than 90% when I use this approach for the
Top-10s of trips. ;)
Only unpleasant side issue is the fact that I'm using 3-D arrays,
dcl 1 t10_trip(trip_total.trip) ctl,
      2 tr_ix(3)        fixed bin (31) init ((3)0),
      2 tr_ktv(3, 10)   ptr;
with
#j = t10_trip(lift_list.trip).tr_ix(1) + 1;
if #j <= hbound(tr_ktv, 3) then
  t10_trip(lift_list.trip).tr_ktv(1, #j) = lift_ptr;
t10_trip(lift_list.trip).tr_ix(1) = #j;
to fill the top-10 slots, and worse for the Type/Country/Nat tables where I
have to do a lookup of the index. (Which leads to the question, is there a
way using all the new whiter than white x86 instructions to fast match a
4-byte value in an array of up to (potentially) 234 values? I'm now using the
"vpcmpistri" (see another tread) to find CR's in the buffer (and don't
understand why I require a VPXOR or set them to 0x00), and similar code would
probably faster than a discretely coded version of CMPSD...)
which won't result in pretty code using Virtual Pascal.
Needlessly to say, the VP generated code is ugly as (insert your favourite
expletive)!
Post by Robert Prins
Need to check what Enterprise PL/I generates, but even here, based on past
experience, I'm somewhat pessimistic.
And this code seems to be pretty good using Enterprise PLI V4.5.0. Would need
use another system to see what Enterprise PL/I V5.2.2 generates.

Running time with VP generated code is around 5% less than old (find only
top-of-sublist) code. ;) Now just have to hack it into something more palatable.

One feature sadly missing from Pascal is the fact that PL/I allows you to use
'*' as array extents in a called proc, and the (hidden) descriptors that are
also passed to it allow you to call the same proc with

dcl 1 t10_summ(1) ctl,
2 su_ix(3) fixed bin (31),
2 su_ktv(3,50) ptr;

dcl 1 t10_trip(trip_total.trip) ctl,
2 tr_ix(3) fixed bin (31),
2 tr_ktv(3,10) ptr;

dcl 1 t10_cnty(0:lift_work.#cnty) ctl,
2 co_ix(3) fixed bin (31),
2 co_ktv(3,10) ptr;

dcl 1 t10_year(year_top -> year_list.year:
year_end -> year_list.year) ctl,
2 yr_ix(3) fixed bin (31) init ((3)0),
2 yr_ktv(3,10) ptr;

and a set of builtin functions (lbound and hbound) can be used to retrieve the
low and high bounds of the arrays. ;) It's a bit more convoluted to do this with
Pascal...

Robert
--
Robert AH Prins
robert(a)prino(d)org
Peter Flass
2018-08-23 15:47:32 UTC
Permalink
Post by Robert Prins
Post by Robert Prins
(newsgroups limited to just clax86, my server does not allow indiscriminate
cross-posting.)
<snip>
Post by Robert Prins
Obviously I could cater for this, but while we've got this discussion going,
I've decided to try filling the entire top-N in the "caching" code, where I end
up going to nearly the bottom, entry 4,300 of (currently) 4,310 rides, anyway. A
quick test using the old PL/I version, where I can use a compiler option that
tells me how many times each statement is executed, tells me that the executed
statement count is reduced by more than 90% when I use this approach for the
Top-10s of trips. ;)
Only unpleasant side issue is the fact that I'm using 3-D arrays,
dcl 1 t10_trip(trip_total.trip) ctl,
      2 tr_ix(3)        fixed bin (31) init ((3)0),
      2 tr_ktv(3, 10)   ptr;
with
#j = t10_trip(lift_list.trip).tr_ix(1) + 1;
if #j <= hbound(tr_ktv, 3) then
  t10_trip(lift_list.trip).tr_ktv(1, #j) = lift_ptr;
t10_trip(lift_list.trip).tr_ix(1) = #j;
to fill the top-10 slots, and worse for the Type/Country/Nat tables where I
have to do a lookup of the index. (Which leads to the question, is there a
way using all the new whiter than white x86 instructions to fast match a
4-byte value in an array of up to (potentially) 234 values? I'm now using the
"vpcmpistri" (see another tread) to find CR's in the buffer (and don't
understand why I require a VPXOR or set them to 0x00), and similar code would
probably faster than a discretely coded version of CMPSD...)
which won't result in pretty code using Virtual Pascal.
Needlessly to say, the VP generated code is ugly as (insert your favourite
expletive)!
Post by Robert Prins
Need to check what Enterprise PL/I generates, but even here, based on past
experience, I'm somewhat pessimistic.
And this code seems to be pretty good using Enterprise PLI V4.5.0. Would need
use another system to see what Enterprise PL/I V5.2.2 generates.
Running time with VP generated code is around 5% less than old (find only
top-of-sublist) code. ;) Now just have to hack it into something more palatable.
One feature sadly missing from Pascal is the fact that PL/I allows you to use
'*' as array extents in a called proc, and the (hidden) descriptors that are
also passed to it allow you to call the same proc with
dcl 1 t10_summ(1) ctl,
2 su_ix(3) fixed bin (31),
2 su_ktv(3,50) ptr;
dcl 1 t10_trip(trip_total.trip) ctl,
2 tr_ix(3) fixed bin (31),
2 tr_ktv(3,10) ptr;
dcl 1 t10_cnty(0:lift_work.#cnty) ctl,
2 co_ix(3) fixed bin (31),
2 co_ktv(3,10) ptr;
year_end -> year_list.year) ctl,
2 yr_ix(3) fixed bin (31) init ((3)0),
2 yr_ktv(3,10) ptr;
and a set of builtin functions (lbound and hbound) can be used to retrieve the
low and high bounds of the arrays. ;) It's a bit more convoluted to do this with
Pascal...
Robert
It's like the old FORTRAN or C, where you have to pass the upper bound as
an argument (lower bound always 1). PL/I and (I think Ada) don't require
this, din't know about ALGOL.
--
Pete
r***@gmail.com
2018-08-31 16:47:17 UTC
Permalink
Hi,
Post by Robert Prins
One feature sadly missing from Pascal is the fact that PL/I allows you to use
'*' as array extents in a called proc, and the (hidden) descriptors that are
also passed to it allow you to call the same proc with
...
I don't quite understand what you mean here.
Post by Robert Prins
and a set of builtin functions (lbound and hbound) can be used to retrieve
the low and high bounds of the arrays. ;) It's a bit more convoluted to do
this with Pascal...
Are you talking about conformant arrays? Schemata? Open arrays?
Even Turbo Pascal (only later versions?) had LOW() and HIGH() built-ins.
Is that what you meant?

* https://www.freepascal.org/docs-html/ref/refsu14.html
Robert Prins
2018-09-01 10:08:35 UTC
Permalink
Post by r***@gmail.com
Hi,
Post by Robert Prins
One feature sadly missing from Pascal is the fact that PL/I allows you to use
'*' as array extents in a called proc, and the (hidden) descriptors that are
also passed to it allow you to call the same proc with
...
I don't quite understand what you mean here.
Post by Robert Prins
and a set of builtin functions (lbound and hbound) can be used to retrieve
the low and high bounds of the arrays. ;) It's a bit more convoluted to do
this with Pascal...
Are you talking about conformant arrays? Schemata? Open arrays?
Even Turbo Pascal (only later versions?) had LOW() and HIGH() built-ins.
Is that what you meant?
Not really. or maybe.

Can Pascal (choose your flavour) handle passing these two different arrays of
structures to a single procedure:

dcl 1 s1,
2 s2(2,3),
3 v1 char (12),
3 v2 char (15),
2 s3(4,5),
3 v3 char (19),
3 v4 char (1);

dcl 1 t1,
2 t2(7,8),
3 w1 char (14),
3 w2 char (77),
2 t3(3,3),
3 w3 char (12),
3 w4 char (1);

Well, PL/I can:

myproc: procedure(parm);
dcl 1 parm,
2 p2(*,*),
3 w1 char (*),
3 w2 char (*),
2 s3(*,*),
3 w3 char (*),
3 w4 char (*);

and trying to assign a char(14) to w(1,1) will happily cut off the last two
characters when the procedure is passed s1. And when the SUBSCRIPTRANGE is
enabled, you'll get an error when trying to access w3(4,5) when t1 is passed to
the procedure.

I don't think there's an easy way to do the same with Pascal, but I'm ready to
be corrected.

Robert
--
Robert AH Prins
robert(a)prino(d)org
Marco van de Voort
2018-09-01 11:20:48 UTC
Permalink
Post by Robert Prins
I don't think there's an easy way to do the same with Pascal, but I'm
ready to be corrected.
Not if you force static allocation. Since Delphi2 (1?) strings are mostly
dynamically allocated, and the static allocation stuff didn't develop
further.
r***@gmail.com
2018-09-04 00:32:33 UTC
Permalink
Hi,
Post by Robert Prins
Post by r***@gmail.com
Are you talking about conformant arrays? Schemata? Open arrays?
Even Turbo Pascal (only later versions?) had LOW() and HIGH() built-ins.
Is that what you meant?
Not really. or maybe.
Can Pascal (choose your flavour) handle passing these two different arrays of
But I don't grok PL/I, so you'll have to show me in a more Pascal-friendly pseudo-code.
Post by Robert Prins
I don't think there's an easy way to do the same with Pascal, but I'm ready to
be corrected.
"Pascal" can mean any number of dialects and offshoots. Yes, there are
several (potential) ways to solve the problem.

Loading...