A Very Simple Database and some other Rust thoughts

I just finished (for the most part - some return types need a bit of tuning to cover for failure scenarios) a small learning project for Rust, aptly named 'Very Simple Database'. It's a very simple, in-memory database, with optional single-file flushing to store data between sessions, available at Github. Beyond being just a simple database, it was, in various aspects, purposefully tailored as a learning project about certain kind of design patterns. More of that in a second!

In other news, we continue to run Rust in production at where I work with this audio-analyzer project. I've also been doing one larger, more generic hobby project; namely, an SCTP-over-UDP implementation for Rust. SCTP is a networking protocol kind of in-between TCP and UDP. Like UDP, it's message based, like TCP, it's reliable, though its reliability level is configurable (even fully optional), so it could theoretically be used in similar strict realtime environments as UDP. Though I realized that there's been a lot of duplicate work with another implementation. I suppose the effort should be combined at some point, but I still wanted to do the initial architecturing from scratch as yet another learning experience about working with Rust. There are some differences between the two implementations. In Ralith's implementation, macros are used to generate separate structs for chunk types (chunk being a holder for data in the SCTP protocol), while I use struct variants and struct wrappers. It's very interesting to see and compare all these different approaches to the same problem. A good learning experience!

Oh. We'll also have some Rust run at the FIFA World Cup! Pretty cool, huh?

Later on I'll mention a few features I'm eagerly waiting, but now, back to VSD.

The many steps..

So starting with VSD, I wanted to kind of experiment with well-structured, completely idiomatic, threaded Rust. The initial database, without threading, was pretty straightforward. In that first iteration, the impl block for the VSD struct was no more than 50 lines, including comments and whitespaces. Pretty minimal for reading and writing arbitrary data to a hashmap and writing and reading the hashmap to/from disk! Serde was, of course, pivotal in minimizing the work I needed to do. The more I use Serde, the more and more I come to appreciate it as not only a good Rust library, but as a really damn great serialization framework without even looking into the language behind it.

In the next step, it was time to bring threading in. The reason (perhaps a made up reason but bear with me - this is not for production insomuch as it was for practice) for threading was letting the database file writes to happen periodically in the background without stopping the whole process in the case of very large databases. For added challenge, I wanted to make the threading completely optional but still wanted to avoid wrapping everything inside an Arc>. In reality, one should just do that, locks aren't that much of a performance hit, especially not when working single-threadedly.

Okay, so, first I tried to implement fearless concurrency! by wrapping the whole VSD struct inside another struct that holds a handle to the actual VSD struct via an Arc>. It looked like so:

pub struct VSD { vsd_inner: Arc<Mutex<VSDInner>>};

impl VSD {
  fn some_threaded_function(&self) {
    vsd_inner.some_function(self.vsd_inner.lock().unwrap());
  }
}

pub struct VSDInner<T: serde::Serialize + serde::de::DeserializeOwned> { .. }

impl VSDInner {
  some_function(..)
}

But this turned out to be a bit more verbose and confusing than I wanted. For this to work, you need several ::new()s, passing stuff around becomes a bit problematic and needs a lot of workarounds, ownership becomes a bit weirder, etc.

I also was initially a little bit afraid if I would have to implement every holdable data trait separately, like impl<u8> VSD, impl<String> VSD, but turns out I could just do impl<T: serde::Serialize> to accept only types that are serializable by Serde. That was cool!

In the next attempt, I decided to add a trait for selecting whether we want threaded or non-threaded version of the library. This was the first attempt at that. Now we would have separate implementations for VSD depending on what its exact type is, like so:

// Threaded
impl <T: serde::Serialize + serde::de::DeserializeOwned + Send + Sync, LOCK: VSDLocked + Send> VSD<T, LOCK> { }
// Non-threaded
impl <T: serde::Serialize + serde::de::DeserializeOwned + Send + Sync, LOCK: VSDUnlocked> VSD<T, LOCK> { }

For this to work, I need the two traits, VSDUnlocked and VSDLocked. But using "empty" traits for this doesn't exactly work as I thought. You need the PhantomData thing at line 47 and you need the generic to be bound by ?Sized. At this stage, I had a lot of problems in understanding how borrowing and ownership worked in this context and happened on behavior that was unintuitive to me. There I needed to first bind file_clone.lock().unwrap() to a name before matching, since the match expression creates a temporary that is discarded before the arms are executed. And trait specialization is still an open issue in parts.

Iterating on this concept, I eventually ended up using two full traits, VSDUnlocked and VSDLocked, which implement different versions of similarly named functions. So instead of adding separate implementations of VSD depending on whether its type is VSD<VSDUnlocked> or VSD<VSDLocked>, I add separate implementations via the traits, i.e. impl VSDUnlocked for VSD and impl VSDLocked for VSD.

Initially, I had also handled the database type on struct level, so that using the struct looked like this:

use vsd::{VSD, VSDUnlocked};

let mut vsd = VSD::<String, VSDUnlocked>::new();
vsd.write("write_test", "hello world!".to_string());
println!("{:?}", vsd.read("write_test").unwrap());

After changing to trait implementation and changing generics to function level at read and write, I ended up with the final design, where using the database now looks like this:

use vsd::{VSD, VSDUnlocked};

let mut vsd = VSD::new();
vsd.write::<String>("write_test", "hello world!".to_string());
println!("{:?}", vsd.read::<String>("write_test").unwrap());

Looks better, no?

In the end, the implementation is no more than ~300 or so lines, give or take, including comments, formatting and module imports. In that 300 lines, you get a simple hashmap-based database that works over all primitive types, which can be saved on and loaded from the file system, with safe(ish) concurrency for caching multiple writes before flushing them to disk. I think that's pretty good given all the safety and performance guarantees that Rust provides.

So that's the short of this few day project and its phases of design. In the end, the project was principally very easy, but what made it interesting was going through these design phases and, in the side, learning more about Rust. Learning a language and adopting its idiomatic style is not a quick process, at least not to me, and it's important to keep codin' it.

Oh, one final note: I spend way too long figuring out why my file writes seemed to append despite me opening the file with append(false). Turns out that after reading the whole file, its internal cursor is set to end. This also affects file.set_len() - it's only setting the length of the file after where the cursor currently is. Its documentation doesn't make this explicit though.

The other Rust thoughts!

No one said that this was going to be a short write-up. Still got some to say!

Types for enum variants
In one other project - the SCTP-over-UDP one - I've taken to a design approach where enum struct variants are used to compose types of SCTP messages. The chunks that make up a SCTP message are quite straightforward; they have certain common values and then they have unique values as appropriate for the chunk type. In practice, defining the chunks looks like this. Unfortunately I can not implement functions per variant (or have default values for the fields), so I have to use separate creator functions implemented in the Message struct starting from line 25. There was a RFC for making enum variants first class types - RFC #1450 - Types for enum variants - but it's since been closed as pending a more concrete proposal. The code would, probably, look cleaner if I used trait objects instead of enum variants for message chunks, but there shouldn't really be a need to, since we don't need dynamic dispatch. The chunk types are known beforehand in every situation as the SCTP protocol is also static and doesn't need to be expanded on.

Optional and keyword arguments for functions
Another thing that I hope gets eventually implemented are optional arguments for functions. In an early version of VSD, I had two different calls to opening a database; vds::open(..) and vsd::open_with_caching(..). Albeit I didn't end up using that approach (and I think the trait approach is actually better and means that I don't need if clauses inside vsd implementation to check for configuration options), I think it'd been cleaner if I could have just done vsd::open(.., with_caching: bool = False). I guess it's subjective, but if your parameters rely on a dynamic configuration, you have to do stuff like this; if my_conf { open_with_conf() } else { open_without_conf() or use the builder pattern, open.with_conf(my_conf) or pass an object holding the configuration, let config = Config; config.my_conf = my_conf; open(config). None of those are, in my humble opinion, cleaner than simply being able to do either open() or open(my_conf = my_conf).

More expressive trait bounds
Lastly, when I was musing over the design options with VSD, I tried some some things with traits that weren't apparently quite yet supported, namely negative bounds and partial specialization of implementations. RFC #1053 - More flexible coherence rules that permit overlap covers the issue. In one implementation I tried I wanted to be able to say that "this is the correct implementation if T is VSDLocked but not VSDUnlocked" to help the compiler understand which implementation I wanted. I know that these are exlusive traits and only one can be logically implemented per type, but how do I tell this to the compiler? It turned out that this wasn't actually required as I ended up using trait scoping for choosing which trait's implementation I wanted, but the idea could be more generally applied. If we could exclude traits from being implemented by a type, we'd have more flexible ways to describe which implementation should be used by which types. One other aspect is being able to define broad implementations over broad amount of types. Like over all integers or all number types. I'm not sure if there's a way to already do this, but it'd be pretty sweet to be able to do u8, u16, u32, et cetera in one sweep over implementing separately for all three but have separate implementation for, say, f64 and f32. For partial specialization, I think this is the complete and accepted implementation proposal, but I'm not sure as to the status of it.

..And that's all for now. Happy codin'!

Jalmari Ikävalko

Read more posts by this author.