[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Libjit] Register allocation optimizations.
From: |
Aleksey Demakov |
Subject: |
Re: [Libjit] Register allocation optimizations. |
Date: |
Thu, 9 Mar 2017 20:08:34 +0700 |
Hi Rajan,
Libjit uses very simple register allocation algorithm that spills
registers before any branch. Implementing a better algorithm is
certainly possible.
Regards,
Aleksey
On Thu, Mar 9, 2017 at 4:24 AM, Rajan Walia <address@hidden> wrote:
> Hi,
> I am using libjit to compile code at runtime and I wanted to know if
> there is a way to get better register allocation. I was looking at the
> assembly output of a normal dotproduct function which takes pointer to
> two array and size, loops through the two array at the same time and
> sums the product at each position.
> ----------- similar c code-------
> uint dotproduct (uint* arr1, uint* arr2, uint size) {
> uint sum = 0;
> uint i = 0;
> while (i<size) {
> sum = sum + arr1 [i] + arr2 [i];
> i = i+1;
> }
> return sum;
> }
> ------------------------------------------------
> The three address code
> function dot-product(l1 : ptr, l2 : ptr, i3 : uint) : uint
> incoming_reg(l1, rdi)
> incoming_reg(l2, rsi)
> incoming_reg(i3, rdx)
> .L:
> i8 = 0
> i7 = 0
> .L0:
> i11 = ilt_un(i8, i3)
> if ige_un(i8, i3) then goto .L1
> .L:
> i14 = i8 * 4
> i15 = trunc_int(i14)
> i12 = i15
> l16 = expand_int(i12)
> l17 = l1 + l16
> i19 = load_relative_int(l17, 0)
> l20 = expand_int(i12)
> l21 = l2 + l20
> i22 = load_relative_int(l21, 0)
> i23 = i19 * i22
> i24 = i7 + i23
> i7 = i24
> i26 = i8 + 1
> i8 = i26
> goto .L0
> ends_in_dead
> .L:
> .L1:
> return_int(i7)
> ends_in_dead
> .L:
> .L:
> end
> -------------------------------------------
> and compiled assembly code:
> ptrstruct<16>ptrfunction dot-product(ptr, ptr, uint) : uint
>
> /tmp/libjit-dump.o: file format elf64-x86-64
>
>
> Disassembly of section .text:
>
> 00007f6e1b9acf6e <.text>:
> 7f6e1b9acf6e: 55 push %rbp
> 7f6e1b9acf6f: 48 8b ec mov %rsp,%rbp
> 7f6e1b9acf72: 48 83 ec 30 sub $0x30,%rsp
> 7f6e1b9acf76: 4c 89 2c 24 mov %r13,(%rsp)
> 7f6e1b9acf7a: 4c 89 74 24 08 mov %r14,0x8(%rsp)
> 7f6e1b9acf7f: 4c 89 7c 24 10 mov %r15,0x10(%rsp)
> 7f6e1b9acf84: 48 89 7d f8 mov %rdi,-0x8(%rbp)
> 7f6e1b9acf88: 48 89 75 f0 mov %rsi,-0x10(%rbp)
> 7f6e1b9acf8c: 4c 8b ea mov %rdx,%r13
> 7f6e1b9acf8f: 45 33 ff xor %r15d,%r15d
> 7f6e1b9acf92: 45 33 f6 xor %r14d,%r14d
> 7f6e1b9acf95: 45 3b fd cmp %r13d,%r15d
> 7f6e1b9acf98: 0f 83 29 00 00 00 jae 0x7f6e1b9acfc7
> 7f6e1b9acf9e: 41 8b c7 mov %r15d,%eax
> 7f6e1b9acfa1: c1 e0 02 shl $0x2,%eax
> 7f6e1b9acfa4: 48 63 c8 movslq %eax,%rcx
> 7f6e1b9acfa7: 48 8b 55 f8 mov -0x8(%rbp),%rdx
> 7f6e1b9acfab: 48 03 ca add %rdx,%rcx
> 7f6e1b9acfae: 8b 09 mov (%rcx),%ecx
> 7f6e1b9acfb0: 48 63 c0 movslq %eax,%rax
> 7f6e1b9acfb3: 48 8b 55 f0 mov -0x10(%rbp),%rdx
> 7f6e1b9acfb7: 48 03 c2 add %rdx,%rax
> 7f6e1b9acfba: 8b 00 mov (%rax),%eax
> 7f6e1b9acfbc: 0f af c8 imul %eax,%ecx
> 7f6e1b9acfbf: 44 03 f1 add %ecx,%r14d
> 7f6e1b9acfc2: 41 ff c7 inc %r15d
> 7f6e1b9acfc5: eb ce jmp 0x7f6e1b9acf95
> 7f6e1b9acfc7: 41 8b c6 mov %r14d,%eax
> 7f6e1b9acfca: 4c 8b 2c 24 mov (%rsp),%r13
> 7f6e1b9acfce: 4c 8b 74 24 08 mov 0x8(%rsp),%r14
> 7f6e1b9acfd3: 4c 8b 7c 24 10 mov 0x10(%rsp),%r15
> 7f6e1b9acfd8: 48 8b e5 mov %rbp,%rsp
> 7f6e1b9acfdb: 5d pop %rbp
> 7f6e1b9acfdc: c3 retq
>
> end
> ----------------------------------------
> Here the incoming pointer are stored on stack and in every loop
> iteration first we get the pointer value from stack and then we add
> index.
> I was wondering if there is a way to optimize this and keep the array
> pointers in registers.
> I am using libjit from the repo directly with commit:
> http://git.savannah.gnu.org/cgit/libjit.git/commit/?id=0f176f54f5f2d0e15e9d87940e5c4f51e6b5e850
> Any help will be appreciated.
> Thanks,
> Rajan
>