uni.horse / type-safe bitfield registers in C++

I recently had a need to generate a bunch of code for interacting with memory-mapped registers on a microcontroller. These are magic global variables that look like they're in RAM, but instead have their bits wired directly to various bits of hardware outside the CPU core.

For example, you might have some physical pins on the chip that can be configured as input, output, or connected to some other internal peripheral that controls them ("alternate function"). In a datasheet, those are given to you in a really big list of tables that look like this:

GPIO port mode register (GPIOx_MODER) (x = A, B, C, D, F)

Address offset: 0x00

Reset value: 0xEBFF FFFF (port A)

Reset value: 0xFFFF FFFF (ports other than A)

31302928272625242322212019181716
MODE15[1:0]MODE14[1:0]MODE13[1:0]MODE12[1:0]MODE11[1:0]MODE10[1:0]MODE9[1:0]MODE8[1:0]
rwrwrwrwrwrwrwrwrwrwrwrwrwrwrwrw
1514131211109876543210
MODE7[1:0]MODE6[1:0]MODE5[1:0]MODE4[1:0]MODE3[1:0]MODE2[1:0]MODE1[1:0]MODE0[1:0]
rwrwrwrwrwrwrwrwrwrwrwrwrwrwrwrw
Bits 31:0: MODEy[1:0]: Port x configuration for I/O y (y = 15 to 0)
These bits are written by software to set the I/O to one of four operating modes.
00: Input
01: Output
10: Alternate function
11: Analog

(Reproduced from ST RM0490, section 8.4.1.)

There are a *lot* of these. They each have their own flags with their own meanings. It's very common to want to flip a few bits in one of these without changing the rest of the bits. This is what bitwise operators are for:

GPIOA_MODER = (GPIOA_MODER & ~MODE4_MASK) | (0b01 << MODE4_SHIFT);

Importantly, it's also common to want to flip a few of the bits at the same time without changing other bits. This might, for instance, be the difference between turning something off and on again and changing its mode while it's running. That is, these are different operations which may have different effects:

SOME_REGISTER &= ~MODE_MASK; // mode is now 0
SOME_REGISTER |= new_mode << MODE_SHIFT; // mode is now new_mode
 
SOME_REGISTER = (SOME_REGISTER & ~MODE_MASK) | (new_mode << MODE_SHIFT); // mode is now new_mode, without ever having been set to 0

(Register and field names are traditionally named in all caps, ideally with incomprehensible acronyms like UIFREMA or RXFNEIE.)

Sometimes, in the case of very important registers that will break everything if accidentally poked the wrong way, registers have a "key" field that must contain a certain magic value on writes, else the write is ignored. The Cortex-M AIRCR register, which controls the air around the device contains flags for things like "immediately reset the whole system" and "idk what this does but if you write a 1 to it you get Unpredictable Behavior", has a field called VECTKEY that must be set to 0x05FA every time you write to it.

So that's the use case. Bitwise operators are the usual solution to this, but they have some problems:


So I spent some time thinking about how to do it better. Here's what I came up with.

A set of fields is just an integer value in memory somewhere. This is templated on the integer type, which is always an unsigned 32-bit integer for the microcontrollers I work with, but it was easy to make it more general so why not.

template <typename T>
struct Fields {
	typedef T TInt;
	TInt value;
};

A register is also just an integer somewhere. It's the same size of integer, but it's a different sort of thing. It's declared volatile so the compiler doesn't try to do any optimizations on reads/writes. If we write a value and then write a different value immediately after, or write and then immediately read, or read twice without writing in between, we really do mean "do this apparently weird and pointless thing because it will have effects outside what the compiler can see".

This is templated on TFields, which is a subtype of Fields that defines the bit fields that exist in this register.

template <typename TFields>
struct Register {
	typedef typename TFields::TInt TInt;
	volatile TInt value;

You can set a register to a value, either a raw integer value or a set of fields.

	TFields operator =(TFields f) { value = f.value; return f; }
	TInt operator =(TInt i) { value = i; return i; }

You can also read from a register. You might be doing this because you just need to read some fields from it (get()), or you may be changing some of those fields (but_with()). from_empty() is a convenience function for when you want to set the whole register but specify fields individually. If you're reading with the intent of writing back, it's tagged [[nodiscard]] so if you try to just do R.but_with().MODE(1); as its own statement the compiler warns you that doesn't do anything.

	TFields const get() { return {value}; }
	[[nodiscard]] TFields but_with() { return {value}; }
	[[nodiscard]] TFields from_empty() { return {0}; }
};

Last, we define some convenience macros for defining field sets. Each bitfield gets an accessor function name() and a setter function name(value). The setter function, importantly, clears the old value when writing a new value, because not doing that is approximately never what you want.

#define MASK(nbits) (decltype(value)(-1) >> (sizeof(decltype(value)) * 8 - nbits))
#define SHIFTED_MASK(nbits, shift) (MASK(nbits) << (shift))
#define DEFINE_FIELD(name, nbits, shift) \
	decltype(value) name() const { \
		return (value & SHIFTED_MASK((nbits), (shift))) >> (shift); \
	} \
	[[nodiscard]] auto name(decltype(value) v) { \
		value = (value & ~SHIFTED_MASK((nbits), (shift))) | ((v << (shift)) & SHIFTED_MASK((nbits), (shift))); \
		return *this; \
	}

We can now use this to define a register. In practice I'm mostly generating these from SVD files, but I have a few handwritten ones for registers that are missing from the SVD I'm using. I'm using namespaces to group registers into peripherals instead of the more common struct GPIO { u32 MODER; } because it's easier to set register addresses than manually adding padding to a struct all over the place.

namespace gpioa {
	static u32 const BASE = 0x50000000;
	struct MODER_fields : Fields<u32> {
		DEFINE_FIELD(MODE0, 2, 0)
		DEFINE_FIELD(MODE1, 2, 2)
		DEFINE_FIELD(MODE2, 2, 4)
		DEFINE_FIELD(MODE3, 2, 6)
		DEFINE_FIELD(MODE4, 2, 8)
		DEFINE_FIELD(MODE5, 2, 10)
		DEFINE_FIELD(MODE6, 2, 12)
		DEFINE_FIELD(MODE7, 2, 14)
		DEFINE_FIELD(MODE8, 2, 16)
		DEFINE_FIELD(MODE9, 2, 18)
		DEFINE_FIELD(MODE10, 2, 20)
		DEFINE_FIELD(MODE11, 2, 22)
		DEFINE_FIELD(MODE12, 2, 24)
		DEFINE_FIELD(MODE13, 2, 26)
		DEFINE_FIELD(MODE14, 2, 28)
		DEFINE_FIELD(MODE15, 2, 30)
	};
	static Register<MODER_fields>& MODER = *reinterpret_cast<Register<MODER_fields>*>(BASE + 0);
}

And finally, we can use this to flip some bits:

using namespace gpioa;
u32 const gpio_input = 0b00, gpio_output = 0b01, gpio_altfunc = 0b10, gpio_analog = 0b11;
MODER = MODER.but_with().MODE3(gpio_altfunc).MODE10(gpio_altfunc).MODE9(gpio_altfunc).MODE5(gpio_altfunc);

One might wonder whether this is less efficient than direct bit-twiddling. After all, things that run on microcontrollers are often timing-sensitive. Without optimization (gcc -O0) it is indeed horrifying and uses a bunch of function calls to change a few bits. At -O1 or higher, they compile exactly the same:

extern "C" void direct_example() {
	*(u32*)(0x50000000) = (*(u32*)(0x50000000)
		& ~((0b11 << 3*2) | (0b11 << 10*2) | (0b11 << 9*2) | (0b11 << 5*2)))
		| ((0b10 << 3*2) | (0b10 << 10*2) | (0b10 << 9*2) | (0b10 << 5*2));
}
 
extern "C" void fancy_example() {
	using namespace mcu::gpioa;
	u32 const gpio_input = 0b00, gpio_output = 0b01, gpio_altfunc = 0b10, gpio_analog = 0b11;
	MODER = MODER.but_with().MODE3(gpio_altfunc).MODE10(gpio_altfunc).MODE9(gpio_altfunc).MODE5(gpio_altfunc);
}
08000240 <direct_example>:
 8000240:	21a0      	movs	r1, #160	@ 0xa0
 8000242:	05c9      	lsls	r1, r1, #23
 8000244:	680a      	ldr	r2, [r1, #0]
 8000246:	4b03      	ldr	r3, [pc, #12]	@ (8000254 <direct_example+0x14>)
 8000248:	401a      	ands	r2, r3
 800024a:	4b03      	ldr	r3, [pc, #12]	@ (8000258 <direct_example+0x18>)
 800024c:	4313      	orrs	r3, r2
 800024e:	600b      	str	r3, [r1, #0]
 8000250:	4770      	bx	lr
 8000252:	46c0      	nop			@ (mov r8, r8)
 8000254:	ffc3f33f 	.word	0xffc3f33f
 8000258:	00280880 	.word	0x00280880
 
0800025c <fancy_example>:
 800025c:	21a0      	movs	r1, #160	@ 0xa0
 800025e:	05c9      	lsls	r1, r1, #23
 8000260:	680a      	ldr	r2, [r1, #0]
 8000262:	4b03      	ldr	r3, [pc, #12]	@ (8000270 <fancy_example+0x14>)
 8000264:	401a      	ands	r2, r3
 8000266:	4b03      	ldr	r3, [pc, #12]	@ (8000274 <fancy_example+0x18>)
 8000268:	4313      	orrs	r3, r2
 800026a:	600b      	str	r3, [r1, #0]
 800026c:	4770      	bx	lr
 800026e:	46c0      	nop			@ (mov r8, r8)
 8000270:	ffc3f33f 	.word	0xffc3f33f
 8000274:	00280880 	.word	0x00280880

Modern compilers are cool.