[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
octave and LLVM google summer of code project
From: |
Duncan Sands |
Subject: |
octave and LLVM google summer of code project |
Date: |
Tue, 8 Apr 2008 10:09:12 +0200 |
Hi, over at the LLVM project (http://llvm.org) we've just received
a google summer of code project proposal from Max Rietmann who wants
to use LLVM's just-in-time optimizers and code generation to speed
up octave's interpreter. I've pasted his application at the end of
this email. While some of us have used octave, none of the LLVM
people know anything about its internals so it is hard for us to tell
if this project makes sense or not. Is this something octave would
be interested in? Is it do-able in 10 weeks? Finally, if the project
looks good it would be helpful to have an octave expert as a mentor
as well as someone from LLVM - any suggestions?
Best wishes,
Duncan.
PS: The project proposal:
I will write an Octave to LLVM bytecode compiler. I plan to write it in Python
or C++, both of which I know well from previous personal projects. Octave is an
open-source numerical programming toolkit that implements similar functionality
to that of Mathworks Matlab. Despite recent improvements, in many cases Octave
runs much slower than Matlab.
Octave developers have discussed how to remedy this and I propose to use LLVM
as the interpreter and use it to compile down to native code when desired.
Using LLVMs optimizations, Octave could become a great numerical method
library, speeding up development for those in the computational sciences, who
often write their final code in C or Fortran. Octave's Vector operations make
it easy to code numerical methods and by using LLVM to optimize these, it will
be much more competitive than it currently is - Perhaps it could even surpass
Matlab in speed.
Detailed Description
GSoC 2008 Application
Mentoring Organization: LLVM
Max Rietmann
Background: I studied electrical and computer engineering at Cornell
university as an undergraduate, but upon graduating I realized that my talents
and interests lie in a more mathematical realm. I am currently one semester
into a masters in applied mathematics at San Diego State University (SDSU). My
specific program has a concentration on (nonlinear) Dynamical Systems, which
can range from modeling cellular activity to finding chaotic orbits around
galaxies. I'm currently interested in chaotic behavior and where it occurs in a
phase space.
Much of my work is done with the help some sort of mathematical toolkit like
Mathematica or Matlab. For much of my numerical calculations I use Matlab, but
only out of impatience. Before I was able to obtain a license to Matlab, I used
Octave, which is an open source implementation of most of Matlab's
functionality. Even after getting a Matlab license, I mostly used Octave
because It loads into the command line faster than Matlab can load its gui.
Unfortunately I have found Octave to be noticeably slower than Matlab (up to an
order of magnitude), which has caused me to use Matlab exclusively.
In reading the Octave mailing list they spoke of trying to re-implement the
Octave runtime on something like LLVM or the JVM so as to take advantage of
better optimizations. For example, when Matlab encounters loops, it attempts to
use JIT techniques to vectorize the loop speeding up the code tremendously. The
Octave developers would love to be able to do this, but they are unsure of how
to progress forward. They are not a mentoring organization this year, but LLVM
is. I have read through the tutorial on making a LLVM compiler for a simple
language called Kaleidoscope that compiles down to LLVM bytecode. After reading
the tutorial, I feel that I would be capable of implementing most of the Octave
functionality in LLVM.
Project Description and Outline: I will write an Octave to LLVM bytecode
compiler. I plan to write it in Python or C++, both of which I know well from
previous personal projects. My preference language preference is Python, but
since one of the tutorials is using C++ and since Octave is written using
mostly C++, I might use that to be consistent.
My Knowledge of the Octave backend is such: It parses the language and
generates its own internal representation. The interpreter then runs through
the representation and makes calls to various numerical libraries when
appropriate. For example, it uses libraries lapack and fftpack for matrix and
fft operations, which are both written in fortran. From discussions on the
mailing list, its clear that Octave's speed problems do not lie in its lack of
ability in numerical operations, but instead are due to slow interpretation of
the base language. I specifically notice the biggest speed issues when looping.
In some cases, JIT compilation could be used to vectorize the looping code. But
this still does not account for its relatively slow looping over
non-vectorizeable code.
Because it uses a large number of libraries to execute its numerical
subroutines, I think that LLVM makes a wonderful choice as a bytecode solution
and runtime. From my readings, calling to external libraries from LLVM is
relatively straightforward and most importantly fast.
Despite my excitement for this project, I am less knowledgeable about
compiler-design as I want to be. There are a lot of details I do not yet
understand or even know about, but I will try to outline my plan in as much
detail as I can.
The first big step will be to write a parser for the Octave code. From my own
source-code digging, it seems that Octave uses a parser-generator to parse its
code and generate an op-code style language for the interpreter. It has a large
library of op-codes (66 in the OPERATORS folder). Many of the op-codes are
matrix based and converting matrix operations to LLVM bytecode will be a
significant challenge. Because Octave has its own op-code language I imagine it
will be difficult to use much existing code to make a compiler for LLVM. My
parser will have to interpret the Matrix operations from Octave and
appropriately break them apart to either call the appropriate numerics toolbox
or generate the functional LLVM code.
So instead of overwhelming myself talking about Matrix operations I will
describe the easy part. First, I will implement the basic functionality of the
language leaving out vectors and matrices to learn more about LLVM and writing
a parser. I've written a parser tree in Objective-C for a math program I
started, so I already have some ideas about how that will work. I was planning
on writing my own parser, but I might instead use a parser generator to
generate the data-object-tree (I think they call it an AST). From there, I will
need to build a tree to LLVM bytecode generator. For the basic operations
without vectors and matrices this should be straightforward. Once this is all
implemented and working I will begin with adding vectors. Mathematically,
vectors are just a type of Matrix, but I am unsure if generalizing this much
will make the fastest code. I imagine that I will handle vectors separately and
optimize them by hand. Next will be matrix operations. Since octave uses a
matrix library to "outsource" its matrix operations, I would also try to do
something similar. I will try to use the same libraries Octave uses, but I do
not know much about calling libraries from LLVM, that will take research and
advice from the LLVM devs.
I think the biggest thing for me to keep in mind is looking at the project
incrementally. First start simple with the parser and basic code like:
* simple assignment: x=2;
* all the various arithmetic ops: +,-,*,/
* control structures (if,then,else)
* looping (for,while)
For loops use vectors, so I would have to implement a basic vector to do this.
* Function definition and calling
With these built, one would be able to build most programs sans vector/matrix
operations.
Now I add:
*vectors as a variable type and all that comes with that:
* Vector addition/multiplication x=v1+v2, x=v1*v2 and x=v1.*v2
(the .* multiplication means to multiply corresponding entries)
The final step will be the Matrix operations, which are an entirely new bag of
tricks.
Overall Goals: My main goal for the end of the summer is to have the Octave
language running on LLVM. Improving Octave's speed is the motivation for the
project, but that can always come later. I hope to lay the ground work for the
Octave team to pick up my LLVM compiler, develop it further and migrate the
current runtime from its C++ roots to LLVM. I see Octave's speed as a major
drawback and helping to fix this would not only help me in my academic
pursuits, it might make Octave more mainstream, which will save buying (or
pirating in many cases) Matlab licenses by those poor starving engineering
students of the world.
Can I do this?: I have quite a lot of experience coding and I've worked in
assembly before. I also know the octave (and matlab) code very well because of
my coursework. I also have the summer free and would use the stipend from
Google as my summer job. I would spend the summer coding with breaks to go
surfing and climbing. I don't have a specific mentor picked out, but I imagine
that the existing mentors from LLVM would be extremely helpful and that the
Octave developers would be extremely excited to have someone like me lay the
groundwork for restructuring the Octave runtime. As I said, they've already
mentioned this on the Octave mailing list, so I'm sure there are Octave
developers who would love to help when I need advice and to help test. The
tutorials on the LLVM website should get me up and running quickly and
introduce me to the more difficult concepts that I am still learning about.
One positive part of this project I see is that its relatively clear way
forward. The Octave language is well documented and my work at the beginning
will just be to implement each part of the language in LLVM. I start first with
arithmetic operators (+,-,*,/) and move on to variables, functions, loops, and
then finally to vectors and matrices. Because the LLVM language makes writing a
compiler so much easier, I really feel like I can at least get the basics
working relatively quickly.
- octave and LLVM google summer of code project,
Duncan Sands <=
- Re: octave and LLVM google summer of code project, David Bateman, 2008/04/08
- Re: octave and LLVM google summer of code project, John W. Eaton, 2008/04/08
- Re: octave and LLVM google summer of code project, Duncan Sands, 2008/04/09
- Re: octave and LLVM google summer of code project, David Bateman, 2008/04/09
- Re: octave and LLVM google summer of code project, Eugene I, 2008/04/19
- Re: octave and LLVM google summer of code project, dbateman, 2008/04/19
- Re: octave and LLVM google summer of code project, Eugene I, 2008/04/19
- Re: octave and LLVM google summer of code project, dbateman, 2008/04/20
- Re: octave and LLVM google summer of code project, John W. Eaton, 2008/04/20
- Re: octave and LLVM google summer of code project, John W. Eaton, 2008/04/20