A couple of weeks ago, I introduced you to one of my current pet projects, mostly aimed at making my next job easier but which could also be of interest to some of you: CLplusplus. Since then, that project has grown quite a bit, and I think it is now safe to say that it is about 50% there. Not in the usual terms of “50% of the code is written, untested and undocumented”, but in actual whole-project effort terms.
At the time where I wrote my last post on the subject, the full platform layer was ready. This meant that one could easily query platforms and devices, select some that met a user’s need, and create a context from them. But beyond that, raw OpenCL API calls were still needed.
Since then, I’ve been busy with the runtime layer, implementing such niceties as command queues, events, buffers, program objects, and kernels. There is also a small image implementation in there since yesterday, though right now it doesn’t do much. With all this infrastructure in place, one may now write pretty full-featured OpenCL programs, only hitting the wall of missing library features when looking for fairly advanced features that will be discussed below.
In parallel, I have added plenty of small examples applications to the root of the git repository, which serve a dual purpose: to document the expected usage patterns of the library, and to check that the code works in real hardware interaction scenarios (something which hardware-agnostic unit tests cannot do).
In the runtime layer, there are essentially three core features missing:
- Images + samplers, for optimized access to neighbouring data.
- Separate program object compilation, to ease the pain of compiling large projects with plenty of OpenCL C files.
- Native kernels, to help users leverage those implementations which have them.
Once the runtime layer is complete, the next step for me will be to produce a high-quality documentation of the library, including a reference listing of all public functions, and a higher-level introduction to the API focusing on design goals and general principles. The later should also feature pointers to the examples, offering some kind of basic self-learning program.
Longer-term, on the “might do, or might not” front, I believe interesting tracks to explore would be advertising the library to potential users, and implementing a bigger program which uses it as a final test. I have in my drawers a physical simulation which is easily parallelizable (and, in fact, already runs in parallel on CPUs using OpenMP) and is starved for computational power due to its O(N²) complexity as a function of integrand resolution. I’m curious about how well such a simulation would scale to a GPU architecture.