tinycc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Tinycc-devel] Implementing gcc intrinsics


From: David Mertens
Subject: Re: [Tinycc-devel] Implementing gcc intrinsics
Date: Sun, 24 Apr 2016 13:58:59 -0400

Hello Vladimir,

I figured I would add a few points of clarification:

On Sat, Apr 16, 2016 at 12:00 PM, Vladimir Vissoultchev <address@hidden> wrote:
> PS: please help to fix a current version of the exsymtab. There are some
> changes in the current tcc which breaks Symtab streaming (asm_label)

Well, this looks like a ginormous patch (close to 7k lines). I had browsed the exsymtab fork on github before (had plans to hoist its tests suite that never flied) and frankly either I don't get the point of it or the impl seems over-complicated.

The utility of exsymtab becomes clearer when you realize what I'm really using it for. I want to be able to compile lots of different blocks of code and share symbols between them without having to write header files.
 
Tries for sym lookup IMO are a dead end. I already implemented one on ident_table to replace the TOK_HASH_FUNC and results are negative (at least 20% performance hit). It's not that I not tried a lot to fine tune the implementation -- including direct 256 elements arrays for children and even balanced AVL trees in desperation.

The problem with tries (and trees in general) is that these consume extra memory (10s of MB) and that costs in term of cache misses. Trees are average on most operations (including insert/update/delete) but sym lookup needs fastest possible retrieval, initial inserts are amortized by lookups in the long run.

Heh, the real problem with tries is that they are a huge pain to debug. I switched to hash tables for exactly that reason. I was pleasantly surprised to discover that performance improved substantially after the switch.
 
IMO, the way to go w/ performance improvement is to vectorize the curent hash implementation. Instead of computing the hash one character at a time, take next 4 characters, process these in parallel and compute 4 hashes. Then combine these hashes (w/ XOR or anything) and proceed with collisions handling. Even collisions are not perf problem at the moment because increasing hash table to 64k slots (practicly w/o collisions) has no noticable performance improvement. On contrary -- w/ large hash tables cache misses on array lookups start to kick in.
 
And this might be the most important clarification: exsymtab is not a performance optimization. It adds a new feature so that I can implement C::Blocks.

For example, here (https://github.com/run4flat/C-Blocks/blob/master/examples/point-example.pl) is a simple Perl script with a block of C code (demarcated by "clex") that define a struct and some functions for operating on the struct. Those are later used with C code that is woven directly into the Perl code (demarcated by "cblock").

A more important example is a Perl module that implements a the basic point class (https://github.com/run4flat/C-Blocks/blob/master/examples/mgpoint.pm). This is important because the struct and functions are later used in a script (https://github.com/run4flat/C-Blocks/blob/master/examples/libobjmg.pl) which "use"es that module, again weaving C code directly into the Perl code. The exsymtab work makes it possible to coordinate the symbol tables for all of these disparate blocks of C code.

I hope that clarifies its purpose. The exsymtab project is not meant to produce performance optimizations for tcc itself, except possibly for precompiled headers. It's meant as a new way to simplify using tcc to JIT code.

David

--
 "Debugging is twice as hard as writing the code in the first place.
  Therefore, if you write the code as cleverly as possible, you are,
  by definition, not smart enough to debug it." -- Brian Kernighan

reply via email to

[Prev in Thread] Current Thread [Next in Thread]