I pushed our r6xx/r7xx bringup tool r600_demo yesterday very late in the evening, while Alex pushed the according DRM support. Of course, Phoronix posted a lengthy article about it directly after the push happened. Originally, I had planned to blog about it after dinner, but fell asleep after this exhausting day
Now this is only a developer's tool, but if you are interested in experimenting with your newly acquired freedom, the README describes what you have to do. I'll explain some parts of the inner working here - beware that this gets quite technical:
The register names and values have mostly been autogenerated from the to-be-released documentation, the result is the rather lengthy (188kb) r600_reg_auto_r6xx.h. Some additional registers had to be added by hand to r600_reg_r6xx.h, the hope is that we can eventually improve the documentation and parsing process, so that the need for manually added data is reduced. r600_reg_r7xx.h adds registers that are r7xx only, or changed from r6xx to r7xx. It's only 8k, which shows how close r6xx and r7xx are architecture wise.
r600_demo sends commands directly to the DRM by the same means that is actually reserved for the Xserver, so don't expect it to behave nicely if the Xserver has to draw anything while r600_demo is running. Best don't start any X11 client on the same screen where r600_demo will run on.
The rendering commands themselves are put together in a local buffer that is later submitted to the DRM. For convenience there is a set of macros + functions in r600_emit.h, that use the interface defined in r600_hwapi.h: E32(), EFLOAT(), PACK0(), PACK3(), which emit the 32bit integers, floats, Packet0 and Packet3 headers, respectively (see r5xx Docs, page 24ff, what the PacketX commands actually mean - this part is largely similar to r6xx, just some Packet3 command numbers changed). EREG() and EREGFLOAT() emit packets that set one register exactly, and are the most often used macros.
r600_init.c is probably the most interesting file - it contains code for initializing and setup of the most important subsystems; thus the file is probably a misnomer.
flush_cache() isn't really used ATM, because it turns out that Intel CPUs' memory writeback ordering is sufficient enough for caches to be flushed before the CP's registers are written - at least if there is a sequence point (e.g. function call) in between, otherwise the compiler might reorder. As the CP is written in the kernel only, this is trivially true.
What turned out to be really necessary is flushing the GPU's caches before reusing them, especially the input caches (shader, vertices, attributes, textures), except for the ring buffer which apparently isn't cached. So far we only have a all-or-nothing solution, called cp_set_surface_sync() in r600_lib.c. That has to get more fine granular. flush_gpu_input_cache() is supposed to do similar, but is untested and most probably wrong in its current form. We don't do the flushing in any but the EXA tests so far, so you typically have to reset the engine before each test (see the README).
r600_lib.c contains helper routines of which most you shouldn't notice at all - except for the GPU state output you get at the end of each r600_demo run. Please note that the GPU state flags aren't understood very well at this time, just that if a lot of the state flags light up, something has gone haywire. And there are tons of state flags...
Now if you want to experiment with new rendering approaches, best read r600_triangles.c for how to render triangles and how to push coordinates with different resolution (be aware that only a few combinations work, more about that in a later blog, but floats always work). In r600_texture.c you can learn how to setup and use a texture. And if you want to experiment with blending, you have to read r600_exa.c - but be aware that this is the freshest code of all