On Sun, 17 Mar 2024 at 04:37, Alice Osako wrote:
__
Well, ISO is as much a legacy standard as PIM.
Is there a more recent promulgated standard, then?
There is Wirth's successor language to Modula-2, called Oberon and there
is a revised Modula-2 by Rick Sutcliffe and myself.
Oberon is very much a stripped down Modula-2 with extensible records
replacing variant records, which supports both the static and dynamic
dispatch paradigms of OOP. Unfortunately, it is overly simplistic,
which led to balkanisation with a large number of dialects. But there
are quite a few compilers available supporting one or another of these
dialects. However, to my knowledge there is no GCC support.
Our Modula-2 revision is a modern language derived from PIM4 with
Oberon-style extensible records replacing variant records, unnecessary
and outdated features removed and modern features added to increase
expressive power without increasing the footprint of the core language.
To avoid balkanisation it provides means to bind library functions to
built-in syntax and thereby elevate user defined types to first class
status. It also defines interfacing to C, the JVM and CLR.
Unfortunately, our compiler is still work in progress as this is an
unfunded pet project on which we can only work sporadically. Gaius has
pledged support in GM2, which is likely to appear in a "one feature at a
time" manner.
Perhaps it is noteworthy that we have an agreement with Springer for a
fifth edition of Wirth's "Programming in Modula-2" based on our core
language definition and thus this is poised to become PIM5. Springer
wanted to keep the same structure as in the previous editions with two
parts, (1) a tutorial part and (2) a language report part, where Rick
was to write/edit the former, and I the latter. The language report is
finished, but the first is not as Rick had a tragedy in his family that
required his full attention and I didn't feel like bothering him about
it since, after all, I haven't finished the compiler yet either. But
we'll get there eventually.
I'd like to point out that it is possible to write portable code in
Modula-2 in the manner I advocated on this list before and again
mentioned in my comments on your recent posts, and in doing so the code
will be fairly straightforward to migrate to M2R10/PIM5.
One supported by the GCC implementation?
As mentioned, Oberon isn't and M2R10/PIM5 isn't yet.
Pretty much all other modifications to the core language made
things worse.
I am not so well versed in the language at this time to judge how
the ISO changes made things worse. Can you give me some examples of
this?
Due to having spent an entire decade revising the language, I have
looked at this in-depth and I could write an entire book on what is
wrong with ISO M2. Rick was on the ISO standards committee from its
early days until the very end and he was the designer and editor of the
ISO I/O library and the generics extension. I had briefly participated
in the committee myself. We both share the view that ISO M2 turned out
to be that very same thing all the participants had hoped to avoid,
pretty much repeating the folly of the Algol committee that led to
Algol-68. In its defense, ISO M2 is not quite as bad as Algol-68, yet it
had the same effect: it pretty much killed the language.
I had already mentioned the FOR loop semantics, which is a less evil
example how ISO went wrong. The intentions were good in that the working
group didn't want to leave the semantics undefined. But the way in which
this was then done is not much better than leaving it undefined in the
first place. The loop variant is still syntactically accessible outside
the loop body even though its value is semantically undefined. In our
revision we solved this by making the loop body the scope of the loop
variant. It does not exist outside the loop body. For this, the loop
variant is not declared in a VAR section but right inside the loop header.
(* foo is not defined yet *)
FOR foo IN bar DO (* <= foo is defined in the loop header *)
(* foo is in scope here *)
END; (* FOR *)
(* foo is no longer in scope here *)
Far more evil is the BITSET type and bitwise operations.
Suppose you need to implement a hash function that operates on strings.
For this you iterate over all characters in the string and apply various
operations where the operands are each character and the eventual result
is accumulated in a temporary value. The operations used for hashing are
typically addition and subtraction ignoring over- or underflow, shifts,
rotations, logical NOT, AND and OR. None of these operations are
permitted on type CHAR, and addition and subtraction aren't permitted on
BITSET either. So we need to import from SYSTEM to use CAST and this
will clutter our hash function as we have to cast forwards and backwards
between CARDINAL, BITSET and CHAR, depending on what operation is used
in the composite expression that calculates the cumulative hash value.
The code quickly becomes unreadable and difficult to maintain,
significantly increasing the opportunity for error.
Why are shifts and rotations only permitted on BITSET when they are in
module SYSTEM anyway? We have already crossed the line into potentially
unsafe territory. Nothing is gained by restricting these operations to
BITSET. It does not add any safety. We already left safety behind. So
why not permit shifts and rotations and other bit manipulations at least
on machine types LOC and WORD? At least we could cut down on the number
of cast operations then. But if these operations are imported from
SYSTEM anyway, they might as well be permitted on any type. There is no
safety to be gained by restricting them to any particular type. The only
outcome is an increase in the number of cast operations and thus clutter.
Then there are a number of issues with features that are academic only,
but totally useless in practice.
For example, in PIM which was designed in 1978, there are lexical
synonyms ~ for NOT and & for AND. This was already inconsistent because
there is no such synonym for OR because | is already used as a separator
in case label lists. The use of synonyms is bad design to begin with.
Everything should only have a single syntax form. If the designer
believes that it is of such importance to save the programmer two extra
key strokes, then they should only have single character symbols and
then consistently for all logical operations, for example ~ for NOT, &
for AND and | for OR, in which case the designer should look for an
alternative case label separator, such as a double semicolon ;; or
whatever else. If the issue isn't considered important enough to change
the case label separator, then the single character synonyms should be
dropped entirely. It should be either all single character symbols and
then single character symbols only, or all reserved word symbols and
then reserved word symbols only. Consistency is far more important than
saving some lazy arse programmer one or two key strokes.
And yet, as late as 1988, the ISO working group felt it was necessary to
inflate the practice by introducing ! as a synonym for | and @ as a
synonym for ^ as if more than 20 years after the introduction of the
ASCII character set there was a need to accommodate dinosaur hardware
with 5 or 6 bit character sets that might not include | and ^. Entirely
academic. Totally useless in practice. Not only is it totally useless
but it also makes it far more difficult to later assign these symbols
for other far more practical uses. For example, Modula-2 like most
Pascal family languages does not have single-line comments. Modern
Fortran uses ! as a single line comment prefix which is a very good
choice as it allows the insertion of function header specifications that
stand out as documentation blocks when they all start with ! at the very
left. This would have been a much better use for the ! character, but
ISO reserves that for 1950s hardware with five or six bit character
sets. This might have been understandable if ISO M2 had been defined in
the early 1960s, but certainly not in 1988.
Similarly, ISO leaves the bitwidth of the smallest addressable unit type
LOC implementation defined. This too would have been understandable in
the 1960s, but not in 1988. In 1988 it was 100% foreseeable that all
silicon will forever be based on units whose size is a multiple of
eight. Yet again, a ridiculous decision due to being stuck in a 1950s/
1960s mindset. And if that mindset had not existed, if it had been
accepted that the future belongs to multiple's of eight, then the name
of the smallest addressable unit type would have been OCTET, not LOC,
thus making it self-explanatory, leading to better readability of the
code, not even talking about implementation and portability issues with
an implementation defined unit.
Plenty of chances were missed to remove outdated features and add
features for more modern requirements in their place. In Oberon, Wirth
followed the approach "How can I reduce the feature set to the absolute
minimum that I can get away with". In our revision, we followed the
approach "How can we keep the size of the language about the same but
increase its expressive power and utility to the absolute maximum doable
with that given footprint". ISO M2 followed the opposite approach
allowing feature creep.
Built-in types COMPLEX and BCD were correctly rejected early on by the
working group. As a mathematician, Rick had advocated COMPLEX, while p1
Modula-2 implementor and maintainer Albert Wiedemann and I had tabled a
proposal for BCD. The working group explained to us that if we got BCD,
then Rick would have to get COMPLEX and eventually somebody else would
want to get even more built-in types, that a line had to be drawn
somewhere. For the sake of keeping the language lean, we then withdrew
our proposal much to Rick's disappointment. However, Rick stuck around
for long enough to sneak COMPLEX back in later when most members had
lost interest and resistance had faded. Albert stuck around to the end
as well, but I didn't, so I don't know why he didn't push for the
inclusion of BCD at that point. It is part of his ISO M2 compiler as an
extension though.
In hindsight, Rick and I realised that this was a bad thing. In our
revision we provide a feature called syntax binding which allows user
defined types to be used like built-in types except for the need to
import the library that implements them. With this general feature it is
possible to keep the language lean but have library defined COMPLEX and
BCD types that look just as if they were built-in.
Then there is the ISO way of doing COROUTINEs. It is incompatible with
PIM, but without any real gain. Neither is it more user friendly, nor is
it more powerful. The way COROUTINEs are done in both PIM and ISO is
crude, almost assembler like. The Lua language is an example of how to
do COROUTINEs in a user friendly and powerful way. It is subject of a
seminal paper by Roberto Jerusalemschy (the primary designer,
implementor and maintainer of Lua) and his co-author whose name escapes
me right now. Again, a chance missed by ISO to improve things, but
instead making it worse. Thanks to Roberto we didn't have to come up
with an entirely new approach for coroutines in our revision, we adapted
his approach to Modula-2.
There are many other issues with ISO M2 which are only apparent when
getting into fair detail and I will therefore refrain from any
discussion of those here.
Again, I am not familiar enough with the library to judge.
The by far biggest problem is however the ISO library, and in particular
the I/O library. There, the committee dynamics show even more as
everybody had some pet issue with library APIs proposed and those were
then tweaked until everybody would agree. Not to improve the API or
functionality, but simply to get approval. The worst that committee
design has to offer.
Rick was the designer and editor of the ISO I/O library and he has
taught ISO M2 at his university for 20+ years. He says, the ISO I/O
library is unteachable to undergraduate students. It is overly complex
and has circular dependencies. You cannot introduce a basic concept and
then build upon it. Students need to understand the whole thing before
you can teach any of its components. And not surprisingly, this makes it
cumbersome to use.
Why shouldn't I/O be as simple as this:
IMPORT BCD;
IMPORT PervasiveIO;
VAR a, b, c : BCD;
READ "Enter a: ", a, "\nEnter b: ", b;
c := a + b;
WRITE "\nSum: ", #("5;2", c), "\n";
or with specified input and output streams
READ @infile: "Enter a: ", a, "\nEnter b: ", b;
c := a + b;
WRITE @outfile: #("5;2", a), " + ", #("5;2", b), " = ", #("5;2", c), "\n";
instead of importing from different layers of the API, wondering which
layer to use, having to write tons os boilerplate code, then having to
call several WriteThis(), WriteThat(), WriteSomethingElse() functions,
each type requiring memorisation of another set of IO functions. What a
holy mess.
Even so, it doesn't cover much of the functionality needed in our
day and age.
That is inevitable in a language which hasn't been updated in 30
years, to be sure.
But that is only an explanation, not justification.
In any event, writing portably usually leads to cleaner code and
fewer bugs, regardless of the language used. This is so because
writing portable code typically requires the use of abstraction
layers and design the API from a functional point of view.
I agree, but only to a point. While it is certainly true that a
better designed API results in better client code, writing a library
for portability generally leads to significantly less clear internal
code, as it invariably means the use of special cases, and often
means applying conditional compilations and/or separate versions of
the library for separate circumstances.
You are thinking of a C style macro based style of portable coding where
all the different scenarios are bundled into a single implementation.
That is diametrically opposite to the philosophy of a modular language
like Modula-2.
To write portable code in a modular way, you first design a platform
agnostic API. Where it is possible to implement that API without using
dialect or implementation or target specific features and syntax, you do
that. And where this is not possible, you write separate platform
specific implementations. Each of these will be lean and clean, readable
and maintainable. Plus, if you need to migrate the library to another
dialect, compiler or target, it is possible to do that with minimal
effort since all the platform agnostic code remains in place, and you
only need to write those platform specific implementations which will
seamlessly fit into the whole architecture since they conform to the
same API.
While better use of abstraction layers internally can mitigate this,
it doesn't avoid it - at best it can isolate the non-portable
sections more carefully. While this is worthwhile
More importantly, you can only write a program to be cleanly
portable to a system/compiler/dialect you know exists.
Not necessarily.
Between PIM and ISO, it is mostly possible to write dialect agnostic
code for most if not all use cases except for casting. This does require
a collection of libraries to replace certain built-in functions, like
DIV and MOD when used with signed types, and low-level functions for bit
manipulations that are entirely based on basic math instead of using
already provided but manipulation provided by the dialect or compiler
extension. However, once you have these libraries in place it is all
smooth sailing from there. You can also reuse the code in the github
repo I posted links to before.
The alternatives are clear - either support one specific
environment, and make it clear that you are doing so, or else
support as many as you can, again making it clear that it might not
portable to non-supported environments. The latter is certainly
preferable, but is it really practical?
Ideally, you write one set of platform agnostic implementations which
may sacrifice efficiency for portability, and only write platform
specific implementations for cases where either platform agnosticity is
not possible, or where the efficiency gain of a platform specific
implementation justifies the extra effort.
There is also the aspect of reputation. You put your code into a
public repository with some kind of open source license and
eventually somebody may use it with a different compiler and/or
different dialect.
True, but as I said earlier, it is impossible to predict all of the
possible environments someone might try to use a program in. One can
try to support as many environments as possible, but not everything.
If you design for portability in the manner I have outlined, people who
will use your code and port it to a different platform (be it a
different dialect, compiler or target) will take advantage of this
design. They will follow your platform agnostic API. That way, they will
complete your library by adding the missing platform specific
implementations that you didn't have time nor motivation to provide. And
you'll get the kudos for having designed the API in a forward looking way.
However, given that this was intended as a support library for
particular use case of my own, it leads me to ask if it should be
public at all, or if it would be better to make the repo private. I
don't want to do that, if my code could be useful for others, but at
the same time I have no expectation of maintaining this indefinitely.
If you write the code in the manner I have outlined, it will be much
easier for somebody else to take over from you when the time comes and
continue to maintain it. The more specific it is, and the less
intuitive, the lower the chances of that.
I have no particular commitment to Modula-2 as a language - I am
primarily a Lisper, and was undertaking this mainly as an
interesting stretch of my skills in a language I had been curious
about since the 1980s - and don't see this as something which very
many people even within the Modula-2 community would have much
interest in. Am I wrong about that?
I think a Unicode library would find great interest. Also a JSON parser
should be of interest.
Given the lack of UNICODE support in the language itself (especially
the lack of string literal support), a UNICODE library is of only
limited use. I am writing this for a specific purpose, and I am not
certain if it would be of general applicability.
It wouldn't be of general applicability in using Unicode within PIM/ISO
Modula-2 source text itself, for that the dialects would need to support
Unicode string literals. But it would certainly be of general
applicability in using the library for developing Unicode supporting
applications in PIM/ISO Modula-2.
I was posting on this mailing list mainly to get support in using
gm2, not for general language support, though as you know I
certainly have needed that as well. I wasn't even aware of the
discrepancies between the different standards when I first posted
here. My knowledge of Modula-2 is still fairly limited; the only
reason I made the repos public was because that is the default on
GitHub, and I kept them public to facilitate getting outside help.
I didn't mean to discourage you. I meant to encourage you ;-)
As far as I was aware when I started this project, gm2 was the only
Modula-2 compiler in active support, and I was frankly surprised
that even that was the case when I heard about it - I hadn't done
anything with the language before in part because every other
Modula-2 compiler I'd ever heard of was an expensive commercial
product that hadn't been updated since the 1990s. I now know that
this isn't the case, but I hadn't known that when I began this trek.
However, far from simplifying matters, this just makes it more
complicated.
p1 Modula-2 (supports ISO) is also in active development and support.
And then there is the ACK Modula-2 compiler (supports PIM) which is part
of a compiler kit (like GCC) formerly by the University of Amsterdam. It
was abandoned for a number of years but has found new maintainers and is
now again actively maintained and supported again.
The sum of all this is leading me to question whether to proceed at all.
I would encourage you to proceed, but also consider my advice on
portability.
As far as the Unicode library is concerned, I am quite happy to help out
a bit although I do not have a lot of time, so this might be more in the
form of API review and advice, the odd library contribution of stuff
that I have already written, lying around some place else etc.
regards
benjamin