poke-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[POKOLOGY] Endianness in Poke - And a little nice hack


From: Jose E. Marchesi
Subject: [POKOLOGY] Endianness in Poke - And a little nice hack
Date: Thu, 24 Oct 2019 22:53:47 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)

[This is also published as an article in the Applied Pokology blog
 http://www.jemarch.net/pokology]

Byte endianness is an important aspect of encoding data.  As a good
binary editor poke provides support for both little and big endian, and
will soon acquire the ability to encode exotic endianness like PDP
endian.  Endianness control is integrated in the Poke language, and is
designed to be used in type descriptions.  Let's see how.

GNU poke maintains a global variable that holds the current endianness.
This is the endianness that will be used when mapping integers whose
types do not specify an explicit endianness.

Like other poke global state, this global variable can be
modified using the .set dot-command:

  .set endian little
  .set endian big
  .set endian host

We can easily see how changing The current endianness indeed impacts the
way integers are mapped:

  (poke) dump :from 0#B :size 4#B :ruler 0 :ascii 0
  00000000: 8845 4c46
  (poke) .set endian little
  (poke) int @ 0#B
  0x464c4588
  (poke) .set endian big
  (poke) int @ 0#B
  0x88454c46

However, as handy as this dot-command may be, it is also important to be
able to change the current endianness programmatically from a Poke
program.  For that purpose, the PKL compiler provides a couple of
built-in functions: get_endian and set_endian.

Their definitions, along with the specific supported values, look like:

  defvar ENDIAN_LITTLE = 0;
  defvar ENDIAN_BIG = 1;
      
  defun get_endian = int: { ... }
  defun set_endian = (int endian) int: { ... }      


Accessing the current endianness programmatically is especially useful
in situations where the data being poked features a different structure,
depending on the endianness.

A good (or bad) example of this is the way registers are encoded in eBPF
instructions.  eBPF is the in-kernel virtual machine of Linux, and
features an ISA with ten general-purpose registers.  eBPF instructions
generally use two registers, namely the source register and the
destination register.  Each register is encoded using 4 bits, and the
fields encoding registers are consecutive in the instructions.

Typical.  However, for reasons I won't be discussing right now (because
I'm having a nice night and don't want to ruin it) the order of the
source and destination register fields is switched depending on the
endianness.

In big-endian systems the order is:

  dst:4 src:4

whereas in little-endian systems the order is:

  src:4 dst:4

In Poke, the obvious way of representing data whose structure depends on
some condition is using an union.  In this case, it could read like
this:

  deftype BPF_Insn_Regs =
    union
    {
      struct
      {
        BPF_Reg src;
        BPF_Reg dst;
      } le : get_endian == ENDIAN_LITTLE;
  
      struct
      {
        BPF_Reg dst;
        BPF_Reg src;
      } be;
    };

Note the call to the get_endian function (which takes no arguments and
thus can be called Algol68-style, without specifying an empty argument
list) in the constraint of the union alternative.  This way, the
register fields will have the right order corresponding to the current
endianness.

Nifty.  However, there is an ever better way to denote the
structure of these fields.  This is it:

  deftype BPF_Insn_Regs =
    struct
    {
      defvar little_p = (get_endian == ENDIAN_LITTLE);
      
      BPF_Reg src @ little_p * 4#b;
      BPF_Reg dst @ !little_p * 4#b;
    };

This version, where the ordering of the fields is implemented using
field labels, is not only more compact, but also has the virtue of not
requiring additional "intermediate" fields like `le' and `be' above.  It
also shows how convenient is to be able to define variables inside
structs.

Changing the current endianness in constraint expressions is useful when
dealing with binary formats that specify the endianness of the data that
follows using some sort of tag.  This is the case of ELF, for example.

The first few bytes in an ELF header conform what is known as the
e_ident.  One of these bytes is called ei_data and its value specifies
the endianness of the data stored in the ELF file.

This is how we handle this in Poke:

  defun elf_endian = (int endian) byte:
   {
     if (endian == ENDIAN_LITTLE)
       return ELFDATA2LSB;
     else
       return ELFDAT2MSB;
   }
  
  [...]
  
  struct
  {
    byte[4] ei_mag : ei_mag[0] == 0x7fUB
                     && ei_mag[1] == 'E'
                     && ei_mag[2] == 'L'
                     && ei_mag[3] == 'F';
    byte ei_class;
    byte ei_data : (ei_data != ELFDATANONE
                    && set_endian (elf_endian (ei_data)));
    byte ei_version;
    byte ei_osabi;
    byte ei_abiversion;
    byte[6] ei_pad;
    offset<byte,B> ei_nident;
  } e_ident;

Note how set_endian returns an integer value...  it is always 1. This is
to facilitate its usage in fields constraint expressions, executing its
side effect.

Happy poking! :)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]