Upgrading my Chumby 8 kernel part 5: graphics

[ comments ]

At this point in my Chumby kernel upgrade project (parts 1, 2, 3, and 4 here), I had made a ton of progress but there wasn’t really much to show for it because I didn’t have the LCD working. Even though I had put a ton of work into the project, the display was still black. I knew it was time to get it working.

I started out with U-Boot. As a very basic overview of the LCD controller in the PXA168, basically you just set aside some of your RAM for a framebuffer, copy image data into it, tell the controller the format and address of the framebuffer, set up the clocking and timing, and turn it on. Then it just handles everything in the background for you.

The steps I listed above are overly simplified — there is more stuff going on with the PXA168’s display controller. But it’s enough to get a splash screen working in U-Boot. I booted into the old kernel and dumped the LCD registers using devmem. Here’s an example of this process. The LCD_SPU_DMA_CTRL0 register contains a bunch of format configuration bits for the framebuffer, such as which bits are red/green/blue. It’s at offset 0x190 in the LCD controller, and the LCD controller is located at an offset of 0xD420B000, so I could dump the 32-bit register value with this command:

devmem 0xD420B190 32

This resulted in a printout of the value of the register:


The bits in this value, if you look in the PXA168 software manual, indicate that graphics are enabled, red and blue are swapped, and the graphics format is “RGBA888” (which is unclear to me…maybe they mean RGBA8888?). Don’t worry, I won’t bore you with the details of every single bit of every register. I went through the registers and figured out what the values needed to be. Needless to say this process was very tedious, but it gave me a good idea of the steps needed to configure the LCD. I could write similar values to the registers in U-Boot using the mw.l command in order to get the display running. I also looked at the original U-Boot, which had some assembly code for setting up the LCD (board/pxa/aspenite/bbu_LCD.S). This combined with the register values obtained inside of Linux told me pretty much everything I needed to know.

I was able to write register values manually to enable the display controller and its clock. It didn’t work right away, but I eventually realized I had to turn on the backlight. I was able to shine a flashlight onto the screen to see that I had the display showing something, but the backlight wasn’t on. The backlight is wired to a pin that can be configured for PWM to allow finer control of the intensity. I will talk about PWM in a future post. I didn’t have PWM working yet so I just configured it as a GPIO pin and wrote a 1 to it. This turned the backlight on at 100% and showed the garbage I could previously only see with a flashlight. I was able to change the garbage by writing different framebuffer addresses to the LCD_CFG_GRA_START_ADDR0 register. Here’s the kind of stuff I was seeing (note that this is a recreation using my Infocast 8 instead of the Chumby 8 I was originally doing the development on):

It was around now that I started thinking about how I could hook this up as a proper U-Boot driver rather than register hacks. After all, U-Boot has a subsystem for display devices that allows you to use commands like bmp to display an image on the screen. I started looking at my modern U-Boot and happened to notice the mvebu_lcd driver that already existed. It is intended for a different Marvell chip, but the register defines were very similar to the PXA168’s. The bit definitions are ever so slightly different, and there are also some hardcoded things like the clock source that needed to change. It gave me a great framework to follow though. All it really does is write the bare minimum of registers necessary to get the LCD up and running. I copied it and changed the initialization to match what I needed. The commit containing my PXA168 driver for U-Boot is here. If you want to see how I added it to the device tree and turned on the backlight through the device tree, check out the other commits in my chumby8 branch. Later on I tweaked the display timings to get a refresh rate of exactly 60 Hz. With everything in place, I could finally tell it to display a Chumby splash screen:

ext4load mmc 0:2 0x1000000 /boot/logo_silvermoon_chumby_normal.bmp
bmp display 0x1000000

Now that I had U-Boot setting up a basic framebuffer, I knew I could use the simpledrm driver in the kernel. It’s the Direct Rendering Manager equivalent of the old simplefb driver. I wanted to avoid the fbdev system if I could, because it’s deprecated. In fact, I noticed that there was a pxa168fb driver in the mainline kernel, but it was a legacy fbdev driver. I didn’t want to use it.

I figured out where in RAM U-Boot was consistently allocating the framebuffer by dumping the LCD_CFG_GRA_START_ADDR0 register (it turned out to be 0x7E00000), and reserved it in the device tree so that Linux wouldn’t try to use it:

reserved-memory {
	#address-cells = <1>;
	#size-cells = <1>;
	display_reserved: framebuffer@7e00000 {
		reg = <0x07e00000 (800 * 600 * 4)>;

Then I enabled the simpledrm driver in the kernel config and added it to the device tree:

chosen {
	framebuffer@7e00000 {
		compatible = "simple-framebuffer";
		reg = <0x07e00000 (800 * 600 * 4)>;
		width = <800>;
		height = <600>;
		stride = <(800 * 4)>;
		format = "a8r8g8b8";
		clocks = <&soc_clocks PXA168_CLK_DISP0>;
		resets = <&soc_clocks PXA168_CLK_DISP0>;

This actually worked great (almost on the first try), and I was able to run simple apps. Here’s how I ran a Qt app using a few environment variables to set up “dumb buffer” support:

export QT_QPA_PLATFORM=linuxfb
export QT_QPA_FB_DRM=1

The only challenge I ran into was I was using an older 5.15 kernel (with my various fixes implemented) and it didn’t work at first. I ended up upgrading my kernel to the newest version and then it magically started working. I didn’t do any research into which newer commit fixed it.

This was a great first step, but I wanted to make better use of the available resources on the PXA168. The documentation indicated that the PXA168 has a graphics accelerator. Reading through the various datasheets and manuals, there wasn’t much documentation about it, but it is a 2D graphics accelerator known as the Vivante GC300. The PXA168 software manual just said to go through the APIs in the Marvell board support package to use it, and definitely didn’t provide any register information. I was already familiar with the Vivante GPU from experience with other projects and knew about the etnaviv project which includes a mainlined open-source driver for Linux. I decided to jump into getting the etnaviv driver working, even though I had no idea how to use it as a 2D accelerator.

I started out by playing around with enabling the clocks. The old kernel had a special sequence of register writes in order to enable the GPU clock, involving writes and readbacks:

static void gc300_clk_enable(struct clk *clk)
	u32 tmp = __raw_readl(clk->clk_rst), flag;
	/* reset gc clock */
	__raw_writel(tmp & ~0x07, clk->clk_rst);
	tmp = __raw_readl(clk->clk_rst);
	/* select GC clock source */
	gc_lookaround_rate(clk->rate, &flag);
	tmp &= ~0xc0;
	tmp |= flag;
	__raw_writel(tmp, clk->clk_rst);
	/* enable GC CLK EN */
	__raw_writel(tmp | 0x10, clk->clk_rst);
	tmp = __raw_readl(clk->clk_rst);
	/* enable GC HCLK EN */
	__raw_writel(tmp | 0x08, clk->clk_rst);
	tmp = __raw_readl(clk->clk_rst);
	/* enable GC ACLK EN */
	__raw_writel(tmp | 0x20, clk->clk_rst);
	tmp = __raw_readl(clk->clk_rst);
	/* reset GC */
	__raw_writel(tmp & ~0x07, clk->clk_rst);
	tmp = __raw_readl(clk->clk_rst);
	/* pull GC out of reset */
	__raw_writel(tmp | 0x2, clk->clk_rst);
	tmp = __raw_readl(clk->clk_rst);
	/* delay 48 cycles */
	/* pull GC AXI/AHB out of reset */
	__raw_writel(tmp | 0x5, clk->clk_rst);
	tmp = __raw_readl(clk->clk_rst);

I didn’t see anything resembling this power sequence in the mainline kernel. I cleaned it up a bit (the readbacks seemed unnecessary) and put the powerup sequence into U-Boot instead for simplicity, also running the clock at the maximum setting of 624 MHz for the best performance. I suppose the correct way to do this would be to implement a custom PXA168-specific clock in the kernel using the common clock framework, but here’s the simple U-Boot code I ended up with instead:

	writel(0x00, &apmuclkres->gccrc);
	/* Select GC clock source (624 MHz) */
	writel(0xC0, &apmuclkres->gccrc);
	/* Enable GC CLK EN */
	writel(0xD0, &apmuclkres->gccrc);
	/* Enable GC HCLK EN */
	writel(0xD8, &apmuclkres->gccrc);
	/* Enable GC ACLK EN */
	writel(0xF8, &apmuclkres->gccrc);
	/* Reset GC */
	writel(0xF8, &apmuclkres->gccrc);
	/* Pull GC out of reset */
	writel(0xFA, &apmuclkres->gccrc);
	/* Delay 48 cycles */
	/* Pull GC AXI/AHB out of reset */
	writel(0xFF, &apmuclkres->gccrc);

With the clock enabled, I moved onto enabling the etnaviv driver in my device tree. I had to make up a dummy clock to use in the device tree because the clock was already enabled by U-Boot.

gpu: gpu@c0400000 {
	compatible = "vivante,gc";
	reg = <0xc0400000 0x4000>;
	clocks = <&soc_clocks PXA168_CLK_GC_CORE>;
	clock-names = "core";
	interrupts = <8>;
	status = "okay";

This kind of worked. The etnaviv driver detected it as a GC300 and then crapped out:

[   16.866069] etnaviv etnaviv: bound c0400000.gpu (ops gpu_ops [etnaviv])
[   16.954299] etnaviv-gpu c0400000.gpu: model: GC300, revision: 1051
[   17.057319] 8<--- cut here ---
[   17.059387] Unhandled fault: imprecise external abort (0x406) at 0xc88b0100
[   17.062475] [c88b0100] *pgd=01436811, *pte=c0400653, *ppte=c0400552
[   17.069558] Internal error: : 406 [#1] PREEMPT ARM
[   17.075866] Modules linked in: soundcore ssp etnaviv(+) gpu_sched
[   17.080745] CPU: 0 PID: 117 Comm: udevd Not tainted 5.18.9 #4
[   17.086849] Hardware name: Marvell PXA168 (Device Tree Support)
[   17.092611] PC is at etnaviv_gpu_hw_init+0x74/0x2ac [etnaviv]

I tracked this down to being caused by a call to etnaviv_gpu_enable_mlcg(), so I commented it out and got a little further with a similar error:

[ 17.293178] Unhandled fault: imprecise external abort (0xc06) at 0xc88b010c

This was solvable by commenting out a call to etnaviv_gpu_setup_pulse_eater(). With both of those calls commented out, I got a semi-successful driver load:

[ 16.749199] etnaviv etnaviv: bound c0400000.gpu (ops gpu_ops [etnaviv])
[ 16.806309] etnaviv-gpu c0400000.gpu: model: GC300, revision: 1051
[ 17.116585] [drm] Initialized etnaviv 1.3.0 20151214 for etnaviv on minor 1
[ 17.243810] etnaviv-gpu c0400000.gpu: GPU not yet idle, mask: 0x000000fe

I felt a little in over my head here. I wasn’t sure if the “not yet idle” was caused by the things I commented out — and why should I have to comment them out anyway? So I asked for some help in #etnaviv on OFTC. austriancoder gave me some great advice — look at one of the original Vivante drivers and see if there’s some kind of difference in how it initializes things compared to the etnaviv module.

When I got a chance to dive in deeper, I discovered that there were several quirks for the GC300 that etnaviv was missing. The big one is that there are power registers usually mapped from 0x100 to 0x110, but on old revisions of the GC300 they are actually mapped from 0x200 to 0x210 instead. Notice above that the two external abort addresses were 0xc88b0100 and 0xc88b010c. Although the GPU is at physical address 0xc0400000, the virtual address it is mapped to is likely 0xc88b0000 and these are both accesses in that power register range. Aha!

I also found two other small quirks: the GC300 doesn’t provide a bit to indicate it has a 2D pipe (newer Vivante GPUs do), and its idle register doesn’t report several unpopulated bits as being idle even though the driver looks at them to decide whether the GPU is idle. That likely explains the final “GPU not yet idle” error I was seeing.

With all of those quirks taken care of, I was finally able to observe a perfect driver load with no errors:

[   16.873744] etnaviv etnaviv: bound c0400000.gpu (ops gpu_ops [etnaviv])
[   16.901762] etnaviv-gpu c0400000.gpu: model: GC300, revision: 1051
[   16.926025] [drm] Initialized etnaviv 1.3.0 20151214 for etnaviv on minor 1

I submitted these patches upstream (it took a few tries to get everything right), and they were first included in Linux 6.2. Working with the etnaviv developers was a very positive experience — they were patient with me and helped me get things sorted out. Thank you Christian and Lucas!

Anyway, back to getting the GPU working! I suddenly realized that I had no freaking idea about how to even use the 2D accelerator. I got the kernel module to load, but it didn’t do anything on its own. How would a Qt app know how to send 2D commands to the GPU? And how would the GPU interact with the display controller? This is usually the point where I do a bunch of searching to find other people who have set it up, or tutorials that users of other vendors like TI/Freescale/ST have created. I really struggled to find anything related to etnaviv for 2D. Most uses of it that I could find were related to the Vivante 3D GPU, which the PXA168 doesn’t have. So anything related to Mesa or OpenGL ES was irrelevant to what I was trying to accomplish.

After a long struggle, I did start to find a few things. It seemed as though I needed to use X11 in order to accelerate 2D operations, unless I wanted to write a bunch of custom code to use the GPU directly. That sounded like too much work, so I opted to go for the first approach. I found an Xorg plugin called xf86-video-armada that was capable of using either the Vivante or etnaviv driver. I confirmed on IRC (thanks cphealy!) that it was the only known way to use the 2D acceleration on these GPUs.

This was kind of a bummer because I didn’t want to use an X server, but I went forward with it anyway. The other thing that concerned me was that xf86-video-armada claimed to only be compatible with two kernel DRM KMS drivers: Freescale i.MX and Marvell Armada 510 (Dove). I was using the very basic simpledrm driver which definitely wouldn’t work with it. Since the PXA168 is a Marvell chip, I started looking into the Armada 510’s driver to see how compatible it could possibly be with the PXA168’s LCD controller.

I was pleasantly surprised to discover that the Armada driver in the mainline kernel seemed to support an LCD controller that was very, very similar to the one in the PXA168. It was even designed for supporting different variants (with the Armada 510 being the only variant so far), so I had to add a tiny bit of setup code for the clocking of the PXA168 variant.

The other thing I noticed was that it didn’t seem fully set up for use with the device tree, likely because it’s a pretty old driver. The maintainer of the Armada DRM driver, Russell King, has a few patches for adding DT support. These extra commits were super helpful for me. By the way, Russell is the one who originally ported ARM to Linux many moons ago. He’s still very active in the community to this day and maintains a lot of ARM-related stuff.

As I kept going, I found a few other little things I needed to do. The output format and pin selection were hardcoded for one particular configuration that didn’t match what I needed, so I added device tree support for customizing them. I also needed to add a panel to the panel-simple driver for the AUO A080SN01 display in the Chumby. At first I was confused because I thought I would be adding the timings for the panel in the device tree, but it turns out that’s not how you’re supposed to do it. Finally, I struggled with getting the panel working. It took me a while to figure out that the Armada driver wasn’t setting up the infrastructure necessary for hooking up to a panel, so I added a bit of boilerplate code (still no idea how correct it is) to set up an encoder and bridge so that everything was hooked up properly. This took me forever to figure out and I had to do a deep dive into the Linux DRM system which I still feel super intimidated by. But it was extremely satisfying to see it all come together!

I also had to figure out how to set up Xorg with the xf86-video-armada driver, and then also figure out how to hook it up to use the 2D GPU with etnaviv. Getting the driver working was a bit of a challenge at first. I discovered I needed to add -Wl,-z,lazy to the LDFLAGS because of interdependencies between pieces. Without lazy binding, Xorg refused to load the relevant modules. Interestingly, ld’s man page says that lazy binding is the default. I guess something about my build environment changes the default. To be honest, I still don’t fully understand how everything connects together, especially as I’m writing this many months later. But essentially, my xorg.conf just has to contain something like this:

Section "Device"
	Identifier "Driver 0"
	Screen 0
	Driver "armada"
	Option "UseGPU" "TRUE"
	Option "XvAccel" "TRUE"
	Option "AccelModule" "etnadrm_gpu"
	Option "DRI" "TRUE"
	Option "UseKMSBo" "FALSE"

The XvAccel option allows me to accelerate video using the XVideo extension, making use of the LCD controller having a separate overlay window that can be displayed above the main graphics. I was able to play back a small video file using mpv:

mpv --fs /sample_640x360.avi

It works well! The video plays smoothly. Note that the video is only 640×360. It is making use of hardware scaling to display it full-screen. If I try to play something larger like 720p video, it gets choppy and drops frames. Some of the original marketing info for the PXA168 did mention supporting HD video, but I dug in deeper and discovered that video decoding was handled through special Marvell proprietary libraries. I think these libraries make use of the PXA168’s WMMX2 SIMD coprocessor to accelerate video decoding. Maybe someone could make a custom GStreamer plugin that uses them or something. It looks like that’s what Chumby had — a special GStreamer plugin that used IPP. I’m guessing those plugins don’t work with modern GStreamer versions anymore, but I don’t know for sure. If you’re interested, check out gst_pxa168-1.0.tgz on the Chumby source code page. I don’t have any current plans to look further into that.

I spent more time than I’d care to admit tracking down an issue where the first time I played a video with mpv, it would display a blue screen instead of the actual video. Every subsequent time I played a video it worked fine. It turns out this was a problem in the xf86-video-armada driver that was a pretty simple fix involving the Xv color key. The idea behind the color key is that you draw your entire normal framebuffer contents in the graphics layer, except you draw blue (or whatever the color key is) everywhere you want the video to appear. Then the hardware is configured to allow the video to draw anywhere that the graphics layer is blue. This mechanism allows X11 to draw something else overlapping a portion of the displayed video without putting any strain on the CPU. The display controller hardware handles it for free.

The problem was that mpv was turning off automatic painting of the color key so it could paint the color key on its own. This isn’t a problem by itself, but xf86-video-armada was coded to change the color key to black when automatic painting was disabled. mpv had already determined that the color key was blue, so it was drawing a blue rectangle even though the hardware was now configured to look for black instead of blue. Let’s just say I had a lot of “fun” tracing my way through register values, the kernel driver, X11, and mpv figuring out the root problem. xscope ended up being a very valuable tool.

That wasn’t the only blue screen problem I had to fix. On a more recent kernel, I noticed that video playback stopped working in the Armada driver. It would just display a blue screen every time with a big long error trace. The blue screen was probably showing up because the color key was being drawn but the video layer wasn’t working. The relevant part of the error trace is:

WARNING: CPU: 0 PID: 170 at drivers/gpu/drm/drm_gem_framebuffer_helper.c:60 drm_gem_fb_get_obj+0xf8/0x110
armada-drm armada-drm: drm_WARN_ON_ONCE(!fb->obj[plane])

This also ended up being easy to solve. My solution feels hacky, but it definitely fixes the problem. The gist of the problem is that all of the color planes (e.g. Y, U, and V when displaying YUV video) are put into one single GEM object by the Armada driver, but other code in the DRM subsystem expects that each plane will be in a separate GEM object. I’ve tried submitting this patch twice and received no response either time. I am kind of worried that my patch for this problem might be approaching it the wrong way, but it would be nice to at least be told that…

In general, I don’t think I’m going to be able to submit my graphics changes upstream. My changes are dependent on Russell King’s patches to the Armada DRM driver (circa 2018) that aren’t in the mainline kernel, and I can’t even get a response on my small bug fix patch for the GEM object issue described above. This doesn’t give me much hope that I’ll be able to submit any changes to the Armada DRM driver upstream. That’s okay though. I’ll focus on other subsystems instead. There will likely be some ugly graphics-related patches to this driver that I would need to keep out of the mainline tree anyway, such as the hack I did to keep the early U-Boot splash screen up while the kernel loads, so I’m not really upset about it.

After working hard to get all of this working, I couldn’t help but wonder just how much the Vivante 2D GPU was helping. I figured out how to test X11 drawing performance with and without the 2D graphics acceleration available. To do this, I used x11perf. This copywinwin500 test showed that the etnaviv driver gave me the best performance. It also showed that the armada driver by itself was slower than simpledrm:

x11perf -repeat 2 -reps 500 -copywinwin500
  • simpledrm:
    • 4000 trep @ 6.8489 msec ( 146.0/sec): Copy 500×500 from window to window
  • armada without etnaviv:
    • 4000 trep @ 13.1455 msec ( 76.1/sec): Copy 500×500 from window to window
  • armada with etnaviv:
    • 4000 trep @ 3.1871 msec ( 314.0/sec): Copy 500×500 from window to window

I actually found that etnaviv is slower or about the same in a lot of other tests. For example the “circulate” test is actually slower with etnaviv than without. So I’m not sure exactly how much value I’m getting out of the 2D accelerator. It appears that Chumby had some code that directly asked the 2D accelerator to do things through the proprietary Vivante library instead — see the same gst_pxa168-1.0.tgz file I mentioned earlier. It also contains the Vivante driver. I’m probably not going to go any further with this. As long as I can run a Qt app and display something, I’ll be happy, even if performance doesn’t end up being great. I consider being able to play a 640×360 video smoothly a bonus win already.

I’m still very glad I went through this entire process either way! I learned a ton about the DRM subsystem and was able to discover that an appropriate driver mostly already existed for the PXA168’s display controller. I still need to decide whether I want to continue using etnaviv for 2D acceleration. For now I’m keeping it enabled, but it might be possible to bail on using Xorg if I don’t need it. On the other hand, the XVideo stuff works really well with mpv, despite mpv warning me that it’s a legacy video output method with bad performance. By the way, don’t search Google for XVideo. You won’t find what you’re looking for. You’ve been warned!

That’s a wrap for getting the display working. At some point in the future I’m going to discuss what it took to get PWM control of the backlight working for brightness control, but this post is getting too long to fit it. I’ll probably talk about that in the next one.

[ comments ]