Re: Portability Considerations

gm2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Portability Considerations

From:	Eric Streit
Subject:	Re: Portability Considerations
Date:	Sun, 17 Mar 2024 09:05:04 +0100
User-agent:	Thunderbird Daily

hi,

just to add another compiler usable under Linux and Windows : XDS; worksgreat.


https://github.com/excelsior-oss/xds-2.60

Best regards

Eric

On 17/03/2024 08:47, Benjamin Kowarsch wrote:

On Sun, 17 Mar 2024 at 04:37, Alice Osako wrote:

    __
        Well, ISO is as much a legacy standard as PIM.
    Is there a more recent promulgated standard, then?
There is Wirth's successor language to Modula-2, called Oberon and thereis a revised Modula-2 by Rick Sutcliffe and myself.
Oberon is very much a stripped down Modula-2 with extensible recordsreplacing variant records, which supports both the static and dynamicdispatch paradigms of OOP. Unfortunately, it is overly simplistic,which led to balkanisation with a large number of dialects. But thereare quite a few compilers available supporting one or another of thesedialects. However, to my knowledge there is no GCC support.
Our Modula-2 revision is a modern language derived from PIM4 withOberon-style extensible records replacing variant records, unnecessaryand outdated features removed and modern features added to increaseexpressive power without increasing the footprint of the core language.To avoid balkanisation it provides means to bind library functions tobuilt-in syntax and thereby elevate user defined types to first classstatus. It also defines interfacing to C, the JVM and CLR.Unfortunately, our compiler is still work in progress as this is anunfunded pet project on which we can only work sporadically. Gaius haspledged support in GM2, which is likely to appear in a "one feature at atime" manner.
Perhaps it is noteworthy that we have an agreement with Springer for afifth edition of Wirth's "Programming in Modula-2" based on our corelanguage definition and thus this is poised to become PIM5. Springerwanted to keep the same structure as in the previous editions with twoparts, (1) a tutorial part and (2) a language report part, where Rickwas to write/edit the former, and I the latter. The language report isfinished, but the first is not as Rick had a tragedy in his family thatrequired his full attention and I didn't feel like bothering him aboutit since, after all, I haven't finished the compiler yet either. Butwe'll get there eventually.
I'd like to point out that it is possible to write portable code inModula-2 in the manner I advocated on this list before and againmentioned in my comments on your recent posts, and in doing so the codewill be fairly straightforward to migrate to M2R10/PIM5.
    One supported by the GCC implementation?


As mentioned, Oberon isn't and M2R10/PIM5 isn't yet.
    Pretty much all other modifications to the core language made
    things worse.
    I am not so well versed in the language at this time to judge how
    the ISO changes made things worse. Can you give me some examples of
    this?
Due to having spent an entire decade revising the language, I havelooked at this in-depth and I could write an entire book on what iswrong with ISO M2. Rick was on the ISO standards committee from itsearly days until the very end and he was the designer and editor of theISO I/O library and the generics extension. I had briefly participatedin the committee myself. We both share the view that ISO M2 turned outto be that very same thing all the participants had hoped to avoid,pretty much repeating the folly of the Algol committee that led toAlgol-68. In its defense, ISO M2 is not quite as bad as Algol-68, yet ithad the same effect: it pretty much killed the language.
I had already mentioned the FOR loop semantics, which is a less evilexample how ISO went wrong. The intentions were good in that the workinggroup didn't want to leave the semantics undefined. But the way in whichthis was then done is not much better than leaving it undefined in thefirst place. The loop variant is still syntactically accessible outsidethe loop body even though its value is semantically undefined. In ourrevision we solved this by making the loop body the scope of the loopvariant. It does not exist outside the loop body. For this, the loopvariant is not declared in a VAR section but right inside the loop header.
(* foo is not defined yet *)
FOR foo IN bar DO (* <= foo is defined in the loop header *)
   (* foo is in scope here *)
END; (* FOR *)
(* foo is no longer in scope here *)

Far more evil is the BITSET type and bitwise operations.
Suppose you need to implement a hash function that operates on strings.For this you iterate over all characters in the string and apply variousoperations where the operands are each character and the eventual resultis accumulated in a temporary value. The operations used for hashing aretypically addition and subtraction ignoring over- or underflow, shifts,rotations, logical NOT, AND and OR. None of these operations arepermitted on type CHAR, and addition and subtraction aren't permitted onBITSET either. So we need to import from SYSTEM to use CAST and thiswill clutter our hash function as we have to cast forwards and backwardsbetween CARDINAL, BITSET and CHAR, depending on what operation is usedin the composite expression that calculates the cumulative hash value.The code quickly becomes unreadable and difficult to maintain,significantly increasing the opportunity for error.
Why are shifts and rotations only permitted on BITSET when they are inmodule SYSTEM anyway? We have already crossed the line into potentiallyunsafe territory. Nothing is gained by restricting these operations toBITSET. It does not add any safety. We already left safety behind. Sowhy not permit shifts and rotations and other bit manipulations at leaston machine types LOC and WORD? At least we could cut down on the numberof cast operations then. But if these operations are imported fromSYSTEM anyway, they might as well be permitted on any type. There is nosafety to be gained by restricting them to any particular type. The onlyoutcome is an increase in the number of cast operations and thus clutter.
Then there are a number of issues with features that are academic only,but totally useless in practice.
For example, in PIM which was designed in 1978, there are lexicalsynonyms ~ for NOT and & for AND. This was already inconsistent becausethere is no such synonym for OR because | is already used as a separatorin case label lists. The use of synonyms is bad design to begin with.Everything should only have a single syntax form. If the designerbelieves that it is of such importance to save the programmer two extrakey strokes, then they should only have single character symbols andthen consistently for all logical operations, for example ~ for NOT, &for AND and | for OR, in which case the designer should look for analternative case label separator, such as a double semicolon ;; orwhatever else. If the issue isn't considered important enough to changethe case label separator, then the single character synonyms should bedropped entirely. It should be either all single character symbols andthen single character symbols only, or all reserved word symbols andthen reserved word symbols only. Consistency is far more important thansaving some lazy arse programmer one or two key strokes.
And yet, as late as 1988, the ISO working group felt it was necessary toinflate the practice by introducing ! as a synonym for | and @ as asynonym for ^ as if more than 20 years after the introduction of theASCII character set there was a need to accommodate dinosaur hardwarewith 5 or 6 bit character sets that might not include | and ^. Entirelyacademic. Totally useless in practice. Not only is it totally uselessbut it also makes it far more difficult to later assign these symbolsfor other far more practical uses. For example, Modula-2 like mostPascal family languages does not have single-line comments. ModernFortran uses ! as a single line comment prefix which is a very goodchoice as it allows the insertion of function header specifications thatstand out as documentation blocks when they all start with ! at the veryleft. This would have been a much better use for the ! character, butISO reserves that for 1950s hardware with five or six bit charactersets. This might have been understandable if ISO M2 had been defined inthe early 1960s, but certainly not in 1988.
Similarly, ISO leaves the bitwidth of the smallest addressable unit typeLOC implementation defined. This too would have been understandable inthe 1960s, but not in 1988. In 1988 it was 100% foreseeable that allsilicon will forever be based on units whose size is a multiple ofeight. Yet again, a ridiculous decision due to being stuck in a 1950s/1960s mindset. And if that mindset had not existed, if it had beenaccepted that the future belongs to multiple's of eight, then the nameof the smallest addressable unit type would have been OCTET, not LOC,thus making it self-explanatory, leading to better readability of thecode, not even talking about implementation and portability issues withan implementation defined unit.
Plenty of chances were missed to remove outdated features and addfeatures for more modern requirements in their place. In Oberon, Wirthfollowed the approach "How can I reduce the feature set to the absoluteminimum that I can get away with". In our revision, we followed theapproach "How can we keep the size of the language about the same butincrease its expressive power and utility to the absolute maximum doablewith that given footprint". ISO M2 followed the opposite approachallowing feature creep.
Built-in types COMPLEX and BCD were correctly rejected early on by theworking group. As a mathematician, Rick had advocated COMPLEX, while p1Modula-2 implementor and maintainer Albert Wiedemann and I had tabled aproposal for BCD. The working group explained to us that if we got BCD,then Rick would have to get COMPLEX and eventually somebody else wouldwant to get even more built-in types, that a line had to be drawnsomewhere. For the sake of keeping the language lean, we then withdrewour proposal much to Rick's disappointment. However, Rick stuck aroundfor long enough to sneak COMPLEX back in later when most members hadlost interest and resistance had faded. Albert stuck around to the endas well, but I didn't, so I don't know why he didn't push for theinclusion of BCD at that point. It is part of his ISO M2 compiler as anextension though.
In hindsight, Rick and I realised that this was a bad thing. In ourrevision we provide a feature called syntax binding which allows userdefined types to be used like built-in types except for the need toimport the library that implements them. With this general feature it ispossible to keep the language lean but have library defined COMPLEX andBCD types that look just as if they were built-in.
Then there is the ISO way of doing COROUTINEs. It is incompatible withPIM, but without any real gain. Neither is it more user friendly, nor isit more powerful. The way COROUTINEs are done in both PIM and ISO iscrude, almost assembler like. The Lua language is an example of how todo COROUTINEs in a user friendly and powerful way. It is subject of aseminal paper by Roberto Jerusalemschy (the primary designer,implementor and maintainer of Lua) and his co-author whose name escapesme right now. Again, a chance missed by ISO to improve things, butinstead making it worse. Thanks to Roberto we didn't have to come upwith an entirely new approach for coroutines in our revision, we adaptedhis approach to Modula-2.
There are many other issues with ISO M2 which are only apparent whengetting into fair detail and I will therefore refrain from anydiscussion of those here.
    Again, I am not familiar enough with the library to judge.
The by far biggest problem is however the ISO library, and in particularthe I/O library. There, the committee dynamics show even more aseverybody had some pet issue with library APIs proposed and those werethen tweaked until everybody would agree. Not to improve the API orfunctionality, but simply to get approval. The worst that committeedesign has to offer.
Rick was the designer and editor of the ISO I/O library and he hastaught ISO M2 at his university for 20+ years. He says, the ISO I/Olibrary is unteachable to undergraduate students. It is overly complexand has circular dependencies. You cannot introduce a basic concept andthen build upon it. Students need to understand the whole thing beforeyou can teach any of its components. And not surprisingly, this makes itcumbersome to use.
Why shouldn't I/O be as simple as this:

IMPORT BCD;
IMPORT PervasiveIO;

VAR a, b, c : BCD;

READ "Enter a: ", a, "\nEnter b: ", b;
c := a + b;
WRITE "\nSum: ", #("5;2", c), "\n";

or with specified input and output streams

READ @infile: "Enter a: ", a, "\nEnter b: ", b;
c := a + b;
WRITE @outfile: #("5;2", a), " + ", #("5;2", b), " = ", #("5;2", c), "\n";
instead of importing from different layers of the API, wondering whichlayer to use, having to write tons os boilerplate code, then having tocall several WriteThis(), WriteThat(), WriteSomethingElse() functions,each type requiring memorisation of another set of IO functions. What aholy mess.
    Even so, it doesn't cover much of the functionality needed in our
    day and age.
    That is inevitable in a language which hasn't been updated in 30
    years, to be sure.


But that is only an explanation, not justification.

        In any event, writing portably usually leads to cleaner code and
        fewer bugs, regardless of the language used. This is so because
        writing portable code typically requires the use of abstraction
        layers and design the API from a functional point of view.


    I agree, but only to a point. While it is certainly true that a
    better designed API results in better client code, writing a library
    for portability generally leads to significantly less clear internal
    code, as it invariably means the use of special cases, and often
    means applying conditional compilations and/or separate versions of
    the library for separate circumstances.
You are thinking of a C style macro based style of portable coding whereall the different scenarios are bundled into a single implementation.That is diametrically opposite to the philosophy of a modular languagelike Modula-2.
To write portable code in a modular way, you first design a platformagnostic API. Where it is possible to implement that API without usingdialect or implementation or target specific features and syntax, you dothat. And where this is not possible, you write separate platformspecific implementations. Each of these will be lean and clean, readableand maintainable. Plus, if you need to migrate the library to anotherdialect, compiler or target, it is possible to do that with minimaleffort since all the platform agnostic code remains in place, and youonly need to write those platform specific implementations which willseamlessly fit into the whole architecture since they conform to thesame API.
    While better use of abstraction layers internally can mitigate this,
    it doesn't avoid it - at best it can isolate the non-portable
    sections more carefully. While this is worthwhile

    More importantly, you can only write a program to be cleanly
    portable to a system/compiler/dialect you know exists.


Not necessarily.
Between PIM and ISO, it is mostly possible to write dialect agnosticcode for most if not all use cases except for casting. This does requirea collection of libraries to replace certain built-in functions, likeDIV and MOD when used with signed types, and low-level functions for bitmanipulations that are entirely based on basic math instead of usingalready provided but manipulation provided by the dialect or compilerextension. However, once you have these libraries in place it is allsmooth sailing from there. You can also reuse the code in the githubrepo I posted links to before.
    The alternatives are clear - either support one specific
    environment, and make it clear that you are doing so, or else
    support as many as you can, again making it clear that it might not
    portable to non-supported environments. The latter is certainly
    preferable, but is it really practical?
Ideally, you write one set of platform agnostic implementations whichmay sacrifice efficiency for portability, and only write platformspecific implementations for cases where either platform agnosticity isnot possible, or where the efficiency gain of a platform specificimplementation justifies the extra effort.
    There is also the aspect of reputation. You put your code into a
    public repository with some kind of open source license and
    eventually somebody may use it with a different compiler and/or
    different dialect.
    True, but as I said earlier, it is impossible to predict all of the
    possible environments someone might try to use a program in. One can
    try to support as many environments as possible, but not everything.
If you design for portability in the manner I have outlined, people whowill use your code and port it to a different platform (be it adifferent dialect, compiler or target) will take advantage of thisdesign. They will follow your platform agnostic API. That way, they willcomplete your library by adding the missing platform specificimplementations that you didn't have time nor motivation to provide. Andyou'll get the kudos for having designed the API in a forward looking way.
    However, given that this was intended as a support library for
    particular use case of my own, it leads me to ask if it should be
    public at all, or if it would be better to make the repo private. I
    don't want to do that, if my code could be useful for others, but at
    the same time I have no expectation of maintaining this indefinitely.
If you write the code in the manner I have outlined, it will be mucheasier for somebody else to take over from you when the time comes andcontinue to maintain it. The more specific it is, and the lessintuitive, the lower the chances of that.
    I have no particular commitment to Modula-2 as a language - I am
    primarily a Lisper, and was undertaking this mainly as an
    interesting stretch of my skills in a language I had been curious
    about since the 1980s - and don't see this as something which very
    many people even within the Modula-2 community would have much
    interest in. Am I wrong about that?
I think a Unicode library would find great interest. Also a JSON parsershould be of interest.
    Given the lack of UNICODE support in the language itself (especially
    the lack of string literal support), a UNICODE library is of only
    limited use. I am writing this for a specific purpose, and I am not
    certain if it would be of general applicability.
It wouldn't be of general applicability in using Unicode within PIM/ISOModula-2 source text itself, for that the dialects would need to supportUnicode string literals. But it would certainly be of generalapplicability in using the library for developing Unicode supportingapplications in PIM/ISO Modula-2.
    I was posting on this mailing list mainly to get support in using
    gm2, not for general language support, though as you know I
    certainly have needed that as well. I wasn't even aware of the
    discrepancies between the different standards when I first posted
    here. My knowledge of Modula-2 is still fairly limited; the only
    reason I made the repos public was because that is the default on
    GitHub, and I kept them public to facilitate getting outside help.


I didn't mean to discourage you. I meant to encourage you ;-)

    As far as I was aware when I started this project, gm2 was the only
    Modula-2 compiler in active support, and I was frankly surprised
    that even that was the case when I heard about it - I hadn't done
    anything with the language before in part because every other
    Modula-2 compiler I'd ever heard of was an expensive commercial
    product that hadn't been updated since the 1990s. I now know that
    this isn't the case, but I hadn't known that when I began this trek.
    However, far from simplifying matters, this just makes it more
    complicated.
p1 Modula-2 (supports ISO) is also in active development and support.And then there is the ACK Modula-2 compiler (supports PIM) which is partof a compiler kit (like GCC) formerly by the University of Amsterdam. Itwas abandoned for a number of years but has found new maintainers and isnow again actively maintained and supported again.
    The sum of all this is leading me to question whether to proceed at all.
I would encourage you to proceed, but also consider my advice onportability.
As far as the Unicode library is concerned, I am quite happy to help outa bit although I do not have a lot of time, so this might be more in theform of API review and advice, the odd library contribution of stuffthat I have already written, lying around some place else etc.
regards
benjamin

[Prev in Thread]

Current Thread

[Next in Thread]

Re: gm2 internal compiler error report, (continued)

Prev by Date: Re: Portability Considerations
Next by Date: Re: Portability Considerations
Previous by thread: Re: Portability Considerations
Next by thread: Re: Portability Considerations
Index(es):
- Date
- Thread