Re: [Bug-apl] Implementing Dyalog Key function

Hi Blake,

The state of GNU APL is, I believe, this:

I am not aware of any unnecessary copying of arrays in GNU APL. There were some suspicions
claimed earlier that some copying could be avoided. But it then turned out that removing these
copies would corrupt other values under certain circumstances (because different functions would
modify the same value). We therefore had to revert the "unnecessary copies" to a state that seems
to be safe now.

Regarding parallel processing, the situation seems to be so that a multi-core CPU cannot be
significantly faster than a single-core CPU with the same memory (-interface). A significant
speedup requires that the bandwidth between the cores and the memory grows with the number
of cores. I had built a machine like that with some students back in 1990, but the current PCs with
multi-core CPUs just do not provide that.

If you make parallel1 and then run ScalarBenchmark.apl (which comes with GNU APL)
then you get, for example:

-------------- Mix_IRC + Mix1_IRC --------------
average sequential startup cost:     530 cycles
average parallel startup cost:      1300 cycles
per item cost sequential:            122 cycles
per item cost parallel:              195 cycles
parallel break-even length:          not reached

This means that:

the additional start-up cost for a parallel computation is 1300-530=770 cycles or 240 nano-seconds
on a 3.2 GHz machine. This is actually a pretty good value. Before writing my own core synchronization functions I used a standard paralle;ization library that took aboutc 20,000 cycles.
I believe it was libmpi but that I was many years ago so I don't quite remember.

This startup cost also includes what Elias refers to as bookkeeping).

What remains (and is the show-stopper) is the per item cost A single core needs 122 cycles for
adding two numbers while 4 cores need 195 cycles per core = 780 cycles in total.
The code in both cases is exactly the same. Putting it differenly, if I run alone then some function
takes 122 cycles and when my collegues work in parallel on something that has nothing
to do with my work then I need 780 cycles.

Once the parallel startup has finished the cores work independently and without any locks or
the like between them. This cannot be explained at software level but rather suggests that
some common resource (memory ?!) is slowing each core down when some other core(s) are
working at the same time. The 122 cycles (~40 ns) in the single core case is roughly the time
for one DRAM access in page mode. In other words, the main memory bandwidth (at least of
my machine) is just enough to feed one core, but far to low for 4 cores.

GNU APL can do little to fix this bottleneck. I believe I have done everything possible (although
new ideas are welcome) that can be done in software, but if we hit hardware bottlenecks then
then thats it. I believe GNU APL would run perfectly on a 4-core CPU with a 4-port main memory,
but that will probably remain a dream in my lifetime.

Best Regards,
Jürgen Sauermann

On 07/04/2017 06:15 PM, Blake McBride wrote:

Hi Ellas,

I remember your earlier work and comments. In fact, that is the source of my knowledge of this.

Putting more features on a base with known issues is problematic for the following reasons:

1. The time and effort it would take to secure the base is being spent elsewhere.

2. Adding new features is, in effect, ignoring the problem and moving on.

3. APL is good and useful as it is. If it's not being used as-is, then there is some other issue (like efficiency). I remember another post about unnecessary bloat because of the unnecessary copying. This was causing his machine to page unnecessarily. The unnecessary coping is major in terms of performance and memory efficiency. This is big.

4. APL was developed with the promise of parallel processing. This is something that wasn't practical back then, but is now. I am under the impression that this was one of the big reasons Juergen created GNU APL - to take advantage of parallel processing.

Just one opinion.

Blake

On Tue, Jul 4, 2017 at 11:01 AM, Elias Mårtenson <address@hidden> wrote:

Hello Blake,

I agree those are important points too. Some time ago, I spent time trying to improve those things. Unfortunately, my approach turned out to be a dead end. I have some other ideas but those require a significantly different underlying architecture. Essentially, it requires a garbage collector, which changes things quite a bit.

I'm sure Jürgen can go into more detail, but my impression is that avoiding copying is much harder than it seems.

Regards,

Elias

On 4 Jul 2017 22:57, "Blake McBride" <address@hidden> wrote:

My list would be different:

1. don't clone arrays unnecessarily

2. improve support for parallel processing

With these, GNU APL would be much more efficient. I think moving the focus to CPU and memory efficiency is much more important than adding extensions of any sort.

--blake

On Tue, Jul 4, 2017 at 1:59 AM, Elias Mårtenson <address@hidden> wrote:

Thank you Jürgen,

I think we understand each others positions, and agree that they are not entirely the same.

That said, your points are very well taken, and for the most part I actually agree with you.

I have a wishlist of features that I personally believe are important. For the most part, these have been discussed previously but here they are for completeness sake:

Bignums

Lexical binding.

First-class functions.

Rational numbers

Some kind of easy-to-use imperative structure (i.e. something better than the horrific :If :Then :Else structure in Dyalog)

Some kind of complex datastructure (again, something better than the Dyalog classes)

Note that out of these, the only feature that would change the semantics compared to the ISO spec is the first, bignums. At least if it's implemented in the "natural" way.

I've considered working on some of these myself, but I have no intention of doing so if you're against these ideas in principle. I certainly have no desire to maintain my own version.

Regards,

Elias

On 4 July 2017 at 03:21, Juergen Sauermann <address@hidden> wrote:

Hi Elias,

thanks for explaining your position.

My concern about free software is not so much political but more practical.

If I look at programming languages, then my impression is that those languages that make the
exchange of software simple are more successful than those that do not.

Historically it has always been possible to exchange APL software from one interpreter to another,
but it was never easy. Most of the code can be exchanged via .ATF files, but the problems were
often tiny incompatibilities. These incompatibilities are spread all over the code, so getting some
APL workspace to work on a different machine is still an adventure.

That is why I prefer to stick to the ISO standard, no matter how bad it is. As long as you use only
standardised APL functions you have very few compatibility problems. There are some, but they
are well known. But every new function that is not standardised moves you away from portability.
If I object to implementing some new function than this is not for political reasons, but because I
am afraid that the additional incompatibility makes the exchange of APL software more difficult.

In some cases the function is so important that the incompatibility has to be accepted. Examples
for that are certainly ⎕SQL, ⎕FIO, and maybe dyadic ⎕CR and ⎕DLX. These functions are almost
impossible to implement efficiently by APL's own means.

On the other end (from my point of view) we have things like the KEY function. I still believe that it
rather fits into the FinAPL Idiom Library than into GNU APL. It is shorter than one APL line and if
you make it an idiom then it remains portable between all APL interpreters while otherwise it is only
portable between GNU APL and Dyalog APL.

I am open to implementing a feature if it is really useful, but only then. Becoming a leader in
implementing new feature is not one of my priorities. There are enough other APLs that are
keen on that (e.g. Dyalog and NARS, see http://www.nars2000.org). The ambition of GNU APL
has always been to become a stable standard interpreter some day. That is difficult enough, and
we have learned from PL/I how too many features can kill a language. And I have seen too many
software projects that failed due to being overly ambitious. I simply do not want to share their fate.

Regarding emacs, I can't help to note that I am not using it, because it is, for my taste, too complex.
I rather prefer something simpler like vi. Sometimes less is just more.

Best Regards,
/// Jürgen

On 07/03/2017 04:00 PM, Elias Mårtenson wrote:

Hello Jürgen, and thanks for your thorough reply.

In terms of the usefulness of Key, I don't disagree with you. I'd certainly like to see even more flexible solutions.

Where we do disagree is what the goal of free software is. Arguably there are probably as many goals as there are people.

What follows below is an explanation as to why I disagree with your assessment as to what is the best for Free Software. Please don't take it as personal criticism. You know that I have the deepest respect for you as the maintainer and author of GNU APL.

After spending quite some time on the Emacs Development mailing list, I have learned quite a bit about what the FSF's goals are with regards to what they call "Free Software". Time and time again, RMS has stated that the goal of GNU is to make people use commercial software less. In order words, if a project can implement a feature that draws people away from commercial software towards Free Software, then that is what the project should do.

At this point, I'd like to clarify that I am not completely in agreement with RMS on this. In the Emacs project, this position has prevented Emacs from gaining certain important features, simply because they would have made it easier to use "non-free" software together with Emacs. This is a position I don't agree with.

I'd really like to see GNU APL become a leader in implementing new features. That way perhaps we get more people to switch. The point I'm making here is that by implementing useful features that would make people choose GNU APL before any alternative, then the project would better serve the GNU goals.

Regards,

Elias

On 3 July 2017 at 21:36, Juergen Sauermann <address@hidden> wrote:

Hi Elias,

thaks. The explanation is a bit clearer but the problems remain.

Key is a non-standard APL function and we should be careful with the implementation
of non-standard functions.

Every function in GNU APL is an invitation to use it. If the function is obviously useful then it improves
the language. If it merely solves a particular programming case, then it may improve GNU APL a little,
but at the price of incompatibility. Programs using it become less portable and that undermines the
goal of free software.

So the question in such cases is how useful is a function and is that usefulness worth the incompatibility?

In the case of the key function I would say no.

First of all the key function can only be used if the data it operates on is organized in a specific way: that
the first column is the key. That may be the case but the fact that this is needed is somewhat contrary to
how other APL function work. You could also call that arbitrary.

That goal can easily achieved by other means. If I have a single KEY then something along the lines of

((DATA[1;]≡KEY)⌿KEY)[1;]

will give me the first row (or all rows if I remove the right [1;]) in an array that has that KEY. I suppose that is
more or less what the key function does (plus applying some function on that _expression_). The _expression_ is
even superior to a function because it can be used at the left side of an assignment.

If that is so then the key function is only one of several APL idioms (see http://aplwiki.com/FinnAplIdiomLibrary
for a rather famous list of more than 700 such idioms). Each of the 700+ idioms is useful and would deserver
its own symbol, but if we would do so (which is technically possible due to Unicode) then we would have turned
GNU APL into an unreadable mess.

Best Regards,
Jürgen Sauermann

On 07/03/2017 05:50 AM, Elias Mårtenson wrote:

The key function is better described in the Dyalog reference manual, on page 153 here: http://docs.dyalog.com/16.0/Dyalog%20APL%20Language%20Reference%20Guide.pdf

Essentially, it's a grouping function. It's used to create groups of similar things, and apply a function on the individual instances. The examples in the section I referenced above should be pretty clear, I think.

Regards,

Elias

On 3 July 2017 at 00:51, Juergen Sauermann <address@hidden> wrote:

Hi Elias,

I am not quite in favour of it and it has problems.

It is not on my keyboard (even though I am using a Dyalog keyboard).
Not to talk about other keyboards.

It does not really look like need-to-have function and I suppose it can be
efficiently performed by a short combination of other APL primitives.

In my opinion adding primitives for every imaginable use case (and
there are certainly use cases for the key function) leads to an overloading
of the APL language in the long run and does not improve the language.

Another problem is that after reading the description several times, I still
can't explain in simple terms what the function is actually doing. That makes it
a good candidate for a never used function if it should ever be implemented.

Best Regards,
Jürgen Sauermann

On 07/02/2017 06:24 PM, Elias Mårtenson wrote:

How about implementing the key function, ⌸?

It's described in this article on the Dyalog site: https://www.dyalog.com/blog/2015/04/exploring-key/

Jürgen, are you in favour of this function?

Regards,

Elias

From:	Juergen Sauermann
Subject:	Re: [Bug-apl] Implementing Dyalog Key function
Date:	Wed, 5 Jul 2017 14:49:25 +0200
User-agent:	Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Thunderbird/45.2.0