Generating Sudoku Boards pt. 3: Rust for WebAssembly

by Ross Harrison

Nov 25, 2018

Part my Sudoku Board Generation Series:

As mentioned in the first post of this series, the initial purpose of creating a sudoku board generator was to look at the performance characteristics Web Assembly (WASM). This post covers how the WASM build was created for the Rust version of the generator.

At the time of the initial post, I had a working WASM build that could be run in a browser. However, this build was a simple translation of the binary. There was no API made available in the JS layer. This isn’t particularly interesting. This time around I created a library that exposes an API that can be used by a JavaScript runtime.

Additionally, the build tools and runtime environments have changed a bit. Node now supports WASM with no additional configuration. And the Rust and WebAssembly project has blossomed. This project is dedicated to making Rust and WASM “the best choice for fast code on the web”. Part of that project is creating tools for making the process as easy as possible. I’ll be using their wasm-pack tool in order to build a module that can be used in node.

I had to work around a lot of issues and hit a few dead-ends due to my lack of knowledge from both Rust and WASM. What is presented below is a cleaned up version of the steps I went through.

You can see the source code associated with this post here.

Installing Tools

These instructions are modified from the “Rust Wasm book” and “_wasm-bindgen_ basic usage”, in order to use the _rustup_ toolchain manager and the _node_ runtime.

Node

Installing node is pretty trivial. You can get the latest binaries on their downloads page, use your package manager, or use NVM. Just use the latest stable version and you should be fine.

Rust

The first item to install is rustup. This application installs and manages the tools needed to build rust for different targets. Below shows how to install rustup and the wasm buildchain to the nightly channel.

Note: The nightly channel was used because I wasn’t able to build the WASM dependencies on stable.

# Install rustup and add binaries to current
shell $PATHcurl https://sh.rustup.rs -sSf | shsource $HOME/.cargo/env
# Update rustup and add wasm target to the nightly buildchain
rustup update --allrustup target add wasm32-unknown-unknown --toolchain nightly

I then added the below snippet to my ~/.profile file so the tools will be configured when I restart my computer.

export PATH="$HOME/.cargo/bin:$PATH"

Finally, I used cargo to install wasm-pack. This added the wasm-pack executable to my PATH.

cargo install wasm-pack

Note: I was unable to install wasm-pack on OSX. There was an issue including the headers for libz. I had the headers and binaries installed via brew, as well as through xcode, but cargo wasn’t able to find them. Rather than debug the issue, I just switch to my Linux machine, the install worked immediately after installing the dependencies.

Update: I was able to install wasm-pack on OSX after removing the brew versions of llvm and clang . Now make clean run-js runs as expected.

Defining the Library Interface

As mentioned before I wanted to created a module that could be executed from JavaScript, rather than simply including and running. The interface I settled on is as follows.

pub fn generate_and_fill_board(board_size: usize, all_neighbors: bool) -> SudokuBoard;
pub fn generate_and_fill_boards(board_count: usize, board_size: usize, all_neighbors: bool);
pub fn serializeBoard(board: &SudokuBoard) -> String;

generate_and_fill_board creates a new board, fills it with a random valid sudoku solution, and returns the struct. generate_and_fill_boards loops the previous functions, creating and filling board_count number of boards. This function was implemented in order to test the impact of communication between WASM and JavaScript multiple times vs. a single long-running job.

Note: Ideally I would compare returning single SudokuBoard structs with a Vec<SudokuBoard>. However, this is not currently supported by wasm_bindgen .

Finally, serializeBoard was included so I could actually seem some results.

Adding Multiple Cargo Builds

Cargo allows for multiple builds. This is done by adding separate entries that target different source files. I added the following entries to the bottom of my Cargo.toml file.

[lib]
crate-type = ["cdylib", "rlib"]
name = "sudoku_generator"
path = "src/sudoku_generator.rs"

[[bin]]
name = "runner"
path = "src/bin.rs"

The lib entry defines the library configuration. This will be used by wasm-pack. The crate-type property defines how this module will be linked. The cdylib value means that a dynamically linked library will be produced. The rslib value means that it can be included statically by the bin build.

By comparison, bin is simple, only defining the name and path.

Exposing functions with `wasm_bindgen`

These instructions are based on _wasm_bindgen_ basic usage instructions. I also looked at the project generated in the hello world example for reference

Initially, I added all of the new dependencies to both builds.

[dependencies]
cfg-if="0.1"
time = "0.1"
rand = { version = "0.6", features = ["wasm-bindgen"] }
wasm-bindgen = "0.2"
# `wee_alloc` is a tiny allocator for wasm that is only ~1K in code size
# compared to the default allocator's ~10K. It is slower than the default
# allocator, however.
#
# Unfortunately, `wee_alloc` requires nightly Rust when targeting wasm for now.
wee_alloc = { version = "0.4.2", optional = true }
# The `console_error_panic_hook` crate provides better debugging of panics by
# logging them with `console.error`. This is great for development, but requires
# all the `std::fmt` and `std::panicking` infrastructure, so isn't great for
# code size when deploying.
console_error_panic_hook = { version = "0.1.1", optional = true }

The cfg-if is used by the Hello World boilerplate to conditionally use a different allocator. The time crate was used before. I had to update the rand package as they recently added wasm-bindgen support. The attributes we’ll be using later are included in the wasm-bindgen package.

In order to support error logging and a different allocator, the console_error_panic_hook and wee_alloc packages are included by the Hello World example linked above.

To expose the library API to JavaScript the #[wasm_bindgen] attribute was added to the following functions and structs.

#[wasm_bindgen]
pub fn generate_and_fill_boards(board_count: usize, board_size: usize, all_neighbors: bool) { /**/ }

#[wasm_bindgen]
pub fn generate_and_fill_board(board_size: usize, all_neighbors: bool) -> SudokuBoard { /**/ }

#[wasm_bindgen]
pub fn serializeBoard(board: &SudokuBoard) -> String { /**/ }

#[wasm_bindgen]
pub struct SudokuBoard { /**/ }

The wasm_bindgen attribute generates the code necessary to expose those functions and structs to JavaScript. All values exposed to JavaScript must be annotated in this way. This includes the types returned by the exposed functions. That is why the SudokuBoard struct was annotated as well.

Although wasm_bindgen supports exporting instance methods, I found it easier to compile when the method SudokuBoard::serialize was refactored to the function serializeBoard. If I wanted to support an interactive game with this library, adjusting the compilation rules would be the least of my concerns.

Forced to use (Much Slower?) OsRng

To shuffle the valid options for each cell (and generate random boards). I was the rand::thread_rng. Unfortunately, this is not supported by wasm_bindgen. Only OsRng is available as of rand version 0.6. To see what performance impact this had. I added a feature to my library that will allow me to test this out. The feature was implemented as follows.

impl SudokuBoard {
    fn do_fill(&mut self, index: usize) -> bool {
        let ref cell_pos = index_to_coord(index, &self.board_config);
        let neighbor_values = self.neighbor_values(cell_pos);

        // get remaining values and shuffle them
        let mut remaining_values: Vec<u64> = self.valid_values
            .difference(&neighbor_values).cloned().collect();

        if cfg!(feature = "thread_rng") {
            rand::thread_rng().shuffle(&mut remaining_values);
        }

        if cfg!(not(feature = "thread_rng")) {
            OsRng::new().unwrap().shuffle(&mut remaining_values);
        }

        // try the remaining values
        for v in remaining_values {
            self.mark_cell(cell_pos, v);

            if index == self.board_config.size_quad - 1 || self.do_fill(index + 1) {
                return true;
            }
        }

        self.mark_cell(cell_pos, 0);
        return false;
    }
}

You can see on lines 10 through 16 the difference ranges being used. The cfg! macro is used in combination with feature detection, creating two compile time branches. A quick comparison of the two methods can be seen below.

# Os Rng
> make run-native
cargo +nightly build --release
    Finished release [optimized] target(s) in 0.03s
./target/release/runner 100
time: 0.04272449 s
board count 100
boards per second 2340.5779682800194

# Thread Rng
make run-native-thread-rng
cargo +nightly build --features="thread_rng" --release
    Finished release [optimized] target(s) in 0.03s
./target/release/runner 100
time: 0.024462508 s
board count 100
boards per second 4087.8882901131806

Quite the difference! Almost twice the running time for 100 boards. I initially thought this might be due to allocating a new OsRng for each cell. However, using a shared OsRng instance didn’t seem to have any effect. I’ll be comparing these more thoroughly in a later post. For now, it seems like a bummer to have to use a slower library.

Optimizing Build for size

A concern of WASM builds is the size of the generated binary. This is also true for embedded use-cases and is supported by Cargo. There are two options for optimizing for size opt-level= "s" and opt-level = "z". The z option is more aggressive but takes longer to run.

The options had some strange results with my project. I built the library with z, s, and with optimization for speed (the O3 option in gcc). And got the following results for the module size in bytes:

Oz - 1502285Os - 1485024O3 - 1468290

These versions are all very large for use on the web (~1.5 MB). But strangely, the size reduction was best when optimizing for speed, and worst when using aggressive size optimization. This is the opposite of what I expected. This bothers me, but an answer will have to wait for another day.

Actually Running the Code

So far it’s been a lot of work with no payoff. Let’s take a look at how I got the WASM module to actually build and run. I structured my builds around a small Makefile. The entry for the WASM module looks as follows.

pkg: src/sudoku_generator.rs Cargo.toml
	mkdir -p pkg
wasm-pack build --target nodejs

Then I load the generated module with a node script in the root of the project.

const sg = require('./pkg/sudoku_generator');
const board_count = Number(process.argv[2]);

if (isNaN(board_count)) {
  console.error("\nUsage: node ./hello.js [BOARD_COUNT]\n");
  process.exit(1);
}

for(let i = 0; i < board_count; i++) {
  console.log(
    sg.serializeBoard(sg.generate_and_fill_board(3, false))
  );
}

You can see that the generate_and_fill_board function is called, and a serialized representation of the board is logged to the console. But does it work?

We can run the script and see the output as follows.

make pkg && node hello.js 5

And the result?

1|2|4|5|8|9|3|7|6|8|6|9|7|2|3|1|4|5|5|3|7|6|1|4|8|2|9|9|7|5|4|6|1|2|3|8|3|1|2|8|7|5|9|6|4|4|8|6|9|3|2|7|5|1|7|5|8|2|9|6|4|1|3|6|9|3|1|4|7|5|8|2|2|4|1|3|5|8|6|9|74|9|7|1|3|6|5|8|2|3|2|5|7|4|8|1|9|6|6|1|8|5|9|2|3|4|7|8|3|4|6|7|5|2|1|9|7|6|9|4|2|1|8|5|3|1|5|2|9|8|3|7|6|4|2|7|6|8|5|9|4|3|1|5|4|1|3|6|7|9|2|8|9|8|3|2|1|4|6|7|52|1|6|8|9|7|3|4|5|3|5|9|1|2|4|8|6|7|8|4|7|6|3|5|1|2|9|5|6|8|3|1|9|2|7|4|9|3|2|7|4|6|5|8|1|4|7|1|5|8|2|6|9|3|1|2|3|4|7|8|9|5|6|7|8|5|9|6|3|4|1|2|6|9|4|2|5|1|7|3|85|2|8|3|6|9|4|1|7|7|9|6|1|2|4|5|8|3|3|1|4|5|8|7|2|6|9|2|3|9|8|5|6|1|7|4|6|5|1|7|4|3|9|2|8|8|4|7|9|1|2|6|3|5|9|7|5|2|3|1|8|4|6|1|6|3|4|9|8|7|5|2|4|8|2|6|7|5|3|9|11|8|7|9|2|6|5|3|4|4|2|5|8|3|7|1|6|9|6|3|9|4|1|5|8|2|7|8|7|4|6|5|2|9|1|3|2|5|1|7|9|3|6|4|8|9|6|3|1|4|8|7|5|2|5|9|6|2|7|4|3|8|1|7|4|8|3|6|1|2|9|5|3|1|2|5|8|9|4|7|63|8|1|9|2|5|7|4|6|6|5|2|3|4|7|8|1|9|9|4|7|1|6|8|3|5|2|8|2|5|6|3|1|4|9|7|4|1|3|8|7|9|2|6|5|7|6|9|2|5|4|1|3|8|1|3|6|7|9|2|5|8|4|2|9|4|5|8|3|6|7|1|5|7|8|4|1|6|9|2|37|3|1|9|2|6|5|8|4|9|2|4|5|3|8|7|1|6|5|6|8|7|4|1|2|3|9|1|4|9|8|6|2|3|5|7|2|5|6|1|7|3|9|4|8|3|8|7|4|5|9|6|2|1|4|1|3|2|9|7|8|6|5|8|7|2|6|1|5|4|9|3|6|9|5|3|8|4|1|7|21|3|9|4|7|6|8|2|5|7|6|5|3|8|2|4|9|1|4|8|2|9|1|5|6|3|7|3|1|8|7|9|4|2|5|6|6|2|4|1|5|8|3|7|9|9|5|7|6|2|3|1|8|4|5|7|6|2|3|1|9|4|8|2|9|1|8|4|7|5|6|3|8|4|3|5|6|9|7|1|26|2|5|7|3|9|4|8|1|9|7|1|6|4|8|3|2|5|3|4|8|1|2|5|7|9|6|8|3|7|5|6|4|2|1|9|4|1|2|9|8|3|6|5|7|5|6|9|2|7|1|8|4|3|1|8|4|3|9|6|5|7|2|7|9|6|8|5|2|1|3|4|2|5|3|4|1|7|9|6|86|8|1|2|9|3|5|7|4|4|2|9|1|5|7|6|3|8|7|5|3|8|4|6|2|1|9|1|4|8|7|6|2|9|5|3|9|3|7|4|1|5|8|2|6|5|6|2|3|8|9|7|4|1|2|1|5|9|3|8|4|6|7|8|7|4|6|2|1|3|9|5|3|9|6|5|7|4|1|8|2

Great! We’ve got the serialized output of 10 probably unique boards.

Making it work natively … again

Getting the library to work with WASM was pretty cool, but I inadvertently broke my native (Linux) builds. I received build errors like the following.

error[E0432]: unresolved import `wasm_bindgen`
   --> src/sudoku_generator.rs:151:1
    |
151 | #[wasm_bindgen]
    | ^^^^^^^^^^^^^^^ did you mean `self::wasm_bindgen`?

error: cannot determine resolution for the macro `__wbindgen_if_not_std`
   --> src/sudoku_generator.rs:151:1
    |
151 | #[wasm_bindgen]
    | ^^^^^^^^^^^^^^^
    |
    = note: import resolution is stuck, try simplifying macro imports

error[E0433]: failed to resolve: use of undeclared type or module `WasmRefCell`
   --> src/sudoku_generator.rs:151:1
    |
151 | #[wasm_bindgen]
    | ^^^^^^^^^^^^^^^ use of undeclared type or module `WasmRefCell`

After trying a few workarounds, I settled on excluding wasm-bindgen from the native build. Firstly I excluded the WASM specific packages from the Cargo.toml native builds.

[dependencies]
cfg-if="0.1"
time = "0.1"
rand = { version = "0.6", features = ["wasm-bindgen"] }

[target.'cfg(target_arch = "wasm32")'.dependencies]
wasm-bindgen = "0.2"
# `wee_alloc` is a tiny allocator for wasm that is only ~1K in code size
# compared to the default allocator's ~10K. It is slower than the default
# allocator, however.
#
# Unfortunately, `wee_alloc` requires nightly Rust when targeting wasm for now.
wee_alloc = { version = "0.4.2", optional = true }
# The `console_error_panic_hook` crate provides better debugging of panic    s by
# logging them with `console.error`. This is great for development, but r    equires
# all the `std::fmt` and `std::panicking` infrastructure, so isn't great     for
# code size when deploying.
console_error_panic_hook = { version = "0.1.1", optional = true }

The [target.'cfg(target_arch = "wasm32")'.dependencies] section excludes the below packages from all build targets except for wasm32. They won’t be included in any native builds.

Next, I had to exclude wasm_bindgen from the library source code. This included both the imports and the attributes. The imports can be handled in the same way that the Hello World example handled the allocator. Rust provides a mechanism excluding the attributes via the cfg_attr attribute. This attribute allows for compile-time rules to adjust what attributes are added (or not). An outline of the changes can be seen below.

cfg_if! {
    // exclude wasm bindings unless we need them
    if #[cfg(target_arch = "wasm32")] {
        extern crate wasm_bindgen;
        use self::wasm_bindgen::prelude::*;
    }
}

#[cfg_attr(target_arch = "wasm32", wasm_bindgen)]
pub struct SudokuBoard { /* */ }

#[cfg_attr(target_arch = "wasm32", wasm_bindgen)]
pub fn generate_and_fill_boards(board_count: usize, board_size: usize, all_neighbors: bool) { /* */ }

#[cfg_attr(target_arch = "wasm32", wasm_bindgen)]
pub fn generate_and_fill_board(board_size: usize, all_neighbors: bool) -> SudokuBoard { /* */ }

#[cfg_attr(target_arch = "wasm32", wasm_bindgen)]
pub fn serializeBoard(board: &SudokuBoard) -> String { /* */ }

Essentially all #[wasm_bindgen] entries were replaced with the more verbose #[cfg_attr(target_arch = "wasm32", wasm_bindgen)]. A little ugly, but it gets the job done.

Now when I run the native binary I get the expected results.

> make run-native

# compilation output

./target/release/runner 100
time: 0.025662673 s
board count 100
boards per second 3896.7102140918837

Summary

In this post, I went over how I modified my sudoku board generator to build a library for use as a web assembly module, and within a native Linux build. I ran into and worked around some compatibility issues. And came across some strange optimization behavior.

Next, I’ll outline a C++ implementation of a WASM library. After that, it’s time to benchmark!