Re: porting efforts

bug-mes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: porting efforts

From:	Jan Nieuwenhuizen
Subject:	Re: porting efforts
Date:	Thu, 09 Dec 2021 13:57:08 +0100
User-agent:	Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux)

Gabriel Wicki writes:

Hello Gabriel!

> I did not actually intend to write such a huge email, but here we are.
> Not all questions raised here have to be answered, i'm sure i can find
> my way with the right pointers.  But i seem to lack a good bunch of
> experience with x86 assembly and need some insight into MEScc design
> decisions.

Okay, no problem.  I am cc'ing Ekaitz, who has some RISC-V experience
and is planning to work on the RISC-V port (of Mes/Guix) too.

> As you may or may not know i'm in the process of porting MEScc for
> RISC-V as part of my bachelor thesis (yay).

I have seen you active on IRC, but haven't had much time for that
lately.  Sorry to have missed this, that's realy great to hear!

I hope that you have seen the "wip-m2" branch that will be rebased and
should be released some time soon as Mes v0.24.  This will probably be
mostly orthogonal to what you are doing, but there may be some changes
that you would need to be aware of: it ports Mes to M2-PLanet.

Also, have you seen wlaan's work in for linux system calls in wip-riscv?
That still needs some work before we can merge it: We want it to be
generalized for all architectures (x86, x86_64, arm), i.e., use the
newer system calls everywhere.

> My point of reference is module/mescc/i386/as.scm

Ok.

> Where/how exactly is the mapping between NYACC's output and the
> procedures in as.scm happening?  I'm unable to find references to
> xor-zf et al in the codebase (except in arch specific as.scm).

This happens in module/mescc/compile.scm.  "xor-zf" is used here:

        ((ne ,a ,b) (let* ((info ((binop->r info) a b 'r0-r1))
                           (info (append-text info (wrap-as (as info 'test-r))))
                           (info (append-text info (wrap-as (as info 'xor-zf))))
                           (info (append-text info (wrap-as (as info 'zf->r)))))
                      info))

we do an equality test, xor the zero flag, then move the zero flag to a
register.

As an easier example, expr-register has:

      (pmatch o
        ...
        ((bitwise-xor ,a ,b) ((binop->r info) a b 'r0-xor-r1))
        ...)

When preprocessing this file

--8<---------------cut here---------------start------------->8---
int
main ()
{
  return 12 ^ 25; // exits with 21
}
--8<---------------cut here---------------end--------------->8---

with MesCC:

    MES=guile ./pre-inst-env mescc -E  -o/dev/stdout xor.c

(using Guile is faster, gives better errors and gives us pretty printing):

MesCC writes xor.E:

--8<---------------cut here---------------start------------->8---
(trans-unit
  (fctn-defn
    (decl-spec-list (type-spec (fixed-type "int")))
    (ftn-declr (ident "main") (param-list))
    (compd-stmt
      (block-item-list
        (return
          (bitwise-xor
            (p-expr (fixed "12"))
            (p-expr (fixed "25"))))))))
--8<---------------cut here---------------end--------------->8---

> The `info' argument (to almost every procedure) is a piece of output from
> NYACC -- right?

No, but close.  INFO is the accumulated result and holds architecture
flags.  The NYACC input in mostly called "O" (you can see that as
"this").

The toplevel function for the compiler itself is ast->info:

(define (ast->info o info)
  (let ((functions (.functions info))
        (globals (.globals info))
        ...
         )
    (pmatch o
      (((trans-unit . _) . _) (ast-list->info o info))
      ((trans-unit . ,_) (ast-list->info _ info))
      ((fctn-defn . ,_) (fctn-defn->info _ info))
      ...)))

and you can see how this takes matches any toplevel NYACC "trans-unit"
in O from the xor.E example above and produces (enriches) INFO with
it.

> What are (e->x) and (e->l) used for (i do understand /what/ they do)?

Ah yes, a comment here would really help.  IIRC, these have to do with
details of instruction/register naming on x86.

   %rax - 64bit accumulator (x86_64)
   %eax - 32bit accu (x86, and on x86_64: lower 32 bits)
   %ax -  16bit accu
   %ah -  upper 8 bits
   %al -  lower 8 bits

some instructions working on byte (8bits) or words (16bits) must not use
%eax but %ax, or even %al.  e->x means (go from Eax to ax):

    (string-drop "eax" 1)
    $1 = "ax"

> Is (* 4 n) - which appears a bunch of times in as.scm  - used for word-
> alignment?

In general, I don't think so.  Because in x86, registers are 4 bytes
(32bits), the byte count to store of 4 locals is four times the number
of locals.  So if you want to get local number N, you must use a N * 4
byte offset

    (define (i386:local->r info n)
      (let ((r (get-r info))
            (n (- 0 (* 4 n))))
            ...
            `(,(string-append "mov____0x8(%ebp),%" r) (#:immediate1 ,n))
            ...)
the 0x8 (and 0x32) take offset in bytes.

> Why is this very same s-expr negated (- 0 (* 4 n)) i.e. in local-add,
> local->r, et al.?

That's for negative offsets, which are used for local variables.

> Are i386:r-negate and i386:zf->r identical?  Is one of them obsolete?

Hmm, indeed, they are.  They are currently both still used.  I guess
that's because I think about them being different operations.  Not sure
what to do here.  I guess it's OK to keep them for now...

> What's the comparison with the magic number #x80 used for?  Is this some
> kind of type-check?

For a two-complement's byte value, 0-79 is positive and ff-80 are
negative.  So, when an offset is "near" (less than 80 bytes away), a
byte-offset instruction can be used.  If the offset is 81 bytes away,
the instruction needs to use a 4-byte offset, as 81 would mean a
negative offset of -126 instead.

> What is 0x32 (in local-add and others)? Is it offset decimal 50?

No, it is part of the intel instruction.  It tells to use 32bit offsets.

> What do the following mean/do (i wrote my guesses so you may just reply
> with yes if they are correct):
>
>  - r-mem-add, r-byte-mem-add, r-word-mem-add :: directly add to a value
>    in memory?

yes

>  - local-ptr->r :: load pointer (from stack) to register?

yes, get a pointer to a local variable (that's in memory from the
stack), so that we can use it to assign to that local, e.g.

>  - label-r :: load label-address (from stack) to register?

yes, but not from stack.  the LABEL is a name that will later be
resolved.

>  - movzbl :: zero-extend a byte to long?

yes

>  - byte-mem->r :: load a byte from memory to a register

yes

>  - byte-r, byte-signed-r, word-r, word-signed-r :: load byte/word into
>    register?

yes

>  - byte-r0->r1-mem :: save the content in r0 to the address in r1?

yes, but only the lower byte.  Do not overwrite any other bits in
memory.

>  - local-add :: add immediate directly in memory? what are n and v?

yes.  N is the number of the local, V is the immediate value.

>  - local-mem-add :: add immediate to memory location at label?

Yes.

>  - swap-r0-r1 :: swap the contents of two registers? what is this used
>    for?

Yes.  On x86, a lot of instructions can only be done on the accumulator
(R0).  Imagine a calculation is done using R1 and R0 and the result is
in R0, but we to do something with R1 to calculate the end result.

>  - r0*r1 :: simple multiplication?
>  - r0/r1 :: simple division?

Yes.

> The RISC-V port might need to go a bit more in-depth than previous ones,
> mostly due to the non-"standard" architecture (more registers, no
> flags).

Ouch, no flags that could become interesting.  Have you looked at how
Stage 0 (M2-Planet) does this?

> The question remains if i should just dedicate a register or
> two for zero- and carry-flags or if the operations should just work
> entirely differently -- instead of setting a zero flag somewhere in code
> and then eventually do a jump-z the RISC-V way is to simply BEQZ (branch
> if equal to zero).

That sounds like a feasible plan, but I do not know anything about
RISC-V yet.  Ekaitz?

> Quarters, Eurocents or really tiny fractions of digital currencies for
> your thoughts!

Phew, kudos for even formulating these questions.  It would be nice to
add some documentation/comments to make the code more clear.  Also, you
could have a look at arm.scm, it already has some comments.

I have used some resources on the internet a lot for looking up x86
instructions.  Also, I have played a lot with gcc and gdb.  Compile a
trivial program, look at gcc's assembly, use gdb to step through
instruction by instruction and look at the registers.

Greetings,
Janneke

-- 
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.com

[Prev in Thread]

Current Thread

[Next in Thread]

porting efforts, Gabriel Wicki, 2021/12/08
- Re: porting efforts, Jan Nieuwenhuizen <=
  - Re: porting efforts, Jan Nieuwenhuizen, 2021/12/09
    - Re: porting efforts, Ekaitz Zarraga, 2021/12/09
    - Re: porting efforts, Ekaitz Zarraga, 2021/12/09
    - Re: porting efforts, Gabriel Wicki, 2021/12/13
    - Re: porting efforts, Ekaitz Zarraga, 2021/12/13
    - Re: porting efforts, Gabriel Wicki, 2021/12/21
    - Re: porting efforts, Ekaitz Zarraga, 2021/12/21
    - Re: porting efforts, Gabriel Wicki, 2021/12/21
    - Re: porting efforts, Ekaitz Zarraga, 2021/12/21

Prev by Date: porting efforts
Next by Date: Re: porting efforts
Previous by thread: porting efforts
Next by thread: Re: porting efforts
Index(es):
- Date
- Thread