Part my Sudoku Board Generation Series:
- Part 1: Structure & Algorithm
- Part 2: Implementation Comparison
- Part 3: Rust for WebAssembly
- Part 4: C++ for WebAssembly
As mentioned in the first post of this series, the initial purpose of creating a sudoku board generator was to look at the performance characteristics Web Assembly (WASM). This post covers how the WASM build was created for the Rust version of the generator.
At the time of the initial post, I had a working WASM build that could be run in a browser. However, this build was a simple translation of the binary. There was no API made available in the JS layer. This isn’t particularly interesting. This time around I created a library that exposes an API that can be used by a JavaScript runtime.
Additionally, the build tools and runtime environments have changed a bit. Node now supports WASM with no additional configuration. And the Rust and WebAssembly project has blossomed. This project is dedicated to making Rust and WASM “the best choice for fast code on the web”. Part of that project is creating tools for making the process as easy as possible. I’ll be using their wasm-pack
tool in order to build a module that can be used in node.
I had to work around a lot of issues and hit a few dead-ends due to my lack of knowledge from both Rust and WASM. What is presented below is a cleaned up version of the steps I went through.
You can see the source code associated with this post here.
Installing Tools
These instructions are modified from the “Rust Wasm book” and “_wasm-bindgen_
basic usage”, in order to use the _rustup_
toolchain manager and the _node_
runtime.
Node
Installing node is pretty trivial. You can get the latest binaries on their downloads page, use your package manager, or use NVM. Just use the latest stable version and you should be fine.
Rust
The first item to install is rustup
. This application installs and manages the tools needed to build rust for different targets. Below shows how to install rustup
and the wasm
buildchain to the nightly
channel.
Note: The nightly
channel was used because I wasn’t able to build the WASM dependencies on stable
.
# Install rustup and add binaries to current
shell $PATHcurl https://sh.rustup.rs -sSf | shsource $HOME/.cargo/env
# Update rustup and add wasm target to the nightly buildchain
rustup update --allrustup target add wasm32-unknown-unknown --toolchain nightly
I then added the below snippet to my ~/.profile
file so the tools will be configured when I restart my computer.
export PATH="$HOME/.cargo/bin:$PATH"
Finally, I used cargo
to install wasm-pack
. This added the wasm-pack
executable to my PATH
.
cargo install wasm-pack
Note: I was unable to install
wasm-pack
on OSX. There was an issue including the headers forlibz
. I had the headers and binaries installed via brew, as well as through xcode, butcargo
wasn’t able to find them. Rather than debug the issue, I just switch to my Linux machine, the install worked immediately after installing the dependencies.
Update: I was able to install wasm-pack
on OSX after removing the brew
versions of llvm
and clang
. Now make clean run-js
runs as expected.
Defining the Library Interface
As mentioned before I wanted to created a module that could be executed from JavaScript, rather than simply including and running. The interface I settled on is as follows.
pub fn generate_and_fill_board(board_size: usize, all_neighbors: bool) -> SudokuBoard;
pub fn generate_and_fill_boards(board_count: usize, board_size: usize, all_neighbors: bool);
pub fn serializeBoard(board: &SudokuBoard) -> String;
generate_and_fill_board
creates a new board, fills it with a random valid sudoku solution, and returns the struct. generate_and_fill_boards
loops the previous functions, creating and filling board_count
number of boards. This function was implemented in order to test the impact of communication between WASM and JavaScript multiple times vs. a single long-running job.
Note: Ideally I would compare returning single SudokuBoard
structs with a Vec<SudokuBoard>
. However, this is not currently supported by wasm_bindgen
.
Finally, serializeBoard
was included so I could actually seem some results.
Adding Multiple Cargo Builds
Cargo allows for multiple builds. This is done by adding separate entries that target different source files. I added the following entries to the bottom of my Cargo.toml
file.
[lib]
crate-type = ["cdylib", "rlib"]
name = "sudoku_generator"
path = "src/sudoku_generator.rs"
[[bin]]
name = "runner"
path = "src/bin.rs"
The lib
entry defines the library configuration. This will be used by wasm-pack
. The crate-type
property defines how this module will be linked. The cdylib
value means that a dynamically linked library will be produced. The rslib
value means that it can be included statically by the bin
build.
By comparison, bin
is simple, only defining the name
and path
.
Exposing functions with wasm_bindgen
These instructions are based on _wasm_bindgen_
basic usage instructions. I also looked at the project generated in the hello world example for reference
Initially, I added all of the new dependencies to both builds.
[dependencies]
cfg-if="0.1"
time = "0.1"
rand = { version = "0.6", features = ["wasm-bindgen"] }
wasm-bindgen = "0.2"
# `wee_alloc` is a tiny allocator for wasm that is only ~1K in code size
# compared to the default allocator's ~10K. It is slower than the default
# allocator, however.
#
# Unfortunately, `wee_alloc` requires nightly Rust when targeting wasm for now.
wee_alloc = { version = "0.4.2", optional = true }
# The `console_error_panic_hook` crate provides better debugging of panics by
# logging them with `console.error`. This is great for development, but requires
# all the `std::fmt` and `std::panicking` infrastructure, so isn't great for
# code size when deploying.
console_error_panic_hook = { version = "0.1.1", optional = true }
The cfg-if
is used by the Hello World boilerplate to conditionally use a different allocator. The time
crate was used before. I had to update the rand
package as they recently added wasm-bindgen
support. The attributes we’ll be using later are included in the wasm-bindgen
package.
In order to support error logging and a different allocator, the console_error_panic_hook
and wee_alloc
packages are included by the Hello World example linked above.
To expose the library API to JavaScript the #[wasm_bindgen]
attribute was added to the following functions and structs.
#[wasm_bindgen]
pub fn generate_and_fill_boards(board_count: usize, board_size: usize, all_neighbors: bool) { /**/ }
#[wasm_bindgen]
pub fn generate_and_fill_board(board_size: usize, all_neighbors: bool) -> SudokuBoard { /**/ }
#[wasm_bindgen]
pub fn serializeBoard(board: &SudokuBoard) -> String { /**/ }
#[wasm_bindgen]
pub struct SudokuBoard { /**/ }
The wasm_bindgen
attribute generates the code necessary to expose those functions and structs to JavaScript. All values exposed to JavaScript must be annotated in this way. This includes the types returned by the exposed functions. That is why the SudokuBoard
struct was annotated as well.
Although wasm_bindgen
supports exporting instance methods, I found it easier to compile when the method SudokuBoard::serialize
was refactored to the function serializeBoard
. If I wanted to support an interactive game with this library, adjusting the compilation rules would be the least of my concerns.
Forced to use (Much Slower?) OsRng
To shuffle the valid options for each cell (and generate random boards). I was the rand::thread_rng
. Unfortunately, this is not supported by wasm_bindgen
. Only OsRng
is available as of rand
version 0.6
. To see what performance impact this had. I added a feature to my library that will allow me to test this out. The feature was implemented as follows.
impl SudokuBoard {
fn do_fill(&mut self, index: usize) -> bool {
let ref cell_pos = index_to_coord(index, &self.board_config);
let neighbor_values = self.neighbor_values(cell_pos);
// get remaining values and shuffle them
let mut remaining_values: Vec<u64> = self.valid_values
.difference(&neighbor_values).cloned().collect();
if cfg!(feature = "thread_rng") {
rand::thread_rng().shuffle(&mut remaining_values);
}
if cfg!(not(feature = "thread_rng")) {
OsRng::new().unwrap().shuffle(&mut remaining_values);
}
// try the remaining values
for v in remaining_values {
self.mark_cell(cell_pos, v);
if index == self.board_config.size_quad - 1 || self.do_fill(index + 1) {
return true;
}
}
self.mark_cell(cell_pos, 0);
return false;
}
}
You can see on lines 10 through 16 the difference ranges being used. The cfg!
macro is used in combination with feature detection, creating two compile time branches. A quick comparison of the two methods can be seen below.
# Os Rng
> make run-native
cargo +nightly build --release
Finished release [optimized] target(s) in 0.03s
./target/release/runner 100
time: 0.04272449 s
board count 100
boards per second 2340.5779682800194
# Thread Rng
make run-native-thread-rng
cargo +nightly build --features="thread_rng" --release
Finished release [optimized] target(s) in 0.03s
./target/release/runner 100
time: 0.024462508 s
board count 100
boards per second 4087.8882901131806
Quite the difference! Almost twice the running time for 100 boards. I initially thought this might be due to allocating a new OsRng
for each cell. However, using a shared OsRng
instance didn’t seem to have any effect. I’ll be comparing these more thoroughly in a later post. For now, it seems like a bummer to have to use a slower library.
Optimizing Build for size
A concern of WASM builds is the size of the generated binary. This is also true for embedded use-cases and is supported by Cargo. There are two options for optimizing for size opt-level= "s"
and opt-level = "z"
. The z
option is more aggressive but takes longer to run.
The options had some strange results with my project. I built the library with z
, s
, and with optimization for speed (the O3
option in gcc
). And got the following results for the module size in bytes:
Oz - 1502285Os - 1485024O3 - 1468290
These versions are all very large for use on the web (~1.5 MB). But strangely, the size reduction was best when optimizing for speed, and worst when using aggressive size optimization. This is the opposite of what I expected. This bothers me, but an answer will have to wait for another day.
Actually Running the Code
So far it’s been a lot of work with no payoff. Let’s take a look at how I got the WASM module to actually build and run. I structured my builds around a small Makefile.
The entry for the WASM module looks as follows.
pkg: src/sudoku_generator.rs Cargo.toml
mkdir -p pkg
wasm-pack build --target nodejs
Then I load the generated module with a node script in the root of the project.
const sg = require('./pkg/sudoku_generator');
const board_count = Number(process.argv[2]);
if (isNaN(board_count)) {
console.error("\nUsage: node ./hello.js [BOARD_COUNT]\n");
process.exit(1);
}
for(let i = 0; i < board_count; i++) {
console.log(
sg.serializeBoard(sg.generate_and_fill_board(3, false))
);
}
You can see that the generate_and_fill_board
function is called, and a serialized representation of the board is logged to the console. But does it work?
We can run the script and see the output as follows.
make pkg && node hello.js 5
And the result?
1|2|4|5|8|9|3|7|6|8|6|9|7|2|3|1|4|5|5|3|7|6|1|4|8|2|9|9|7|5|4|6|1|2|3|8|3|1|2|8|7|5|9|6|4|4|8|6|9|3|2|7|5|1|7|5|8|2|9|6|4|1|3|6|9|3|1|4|7|5|8|2|2|4|1|3|5|8|6|9|74|9|7|1|3|6|5|8|2|3|2|5|7|4|8|1|9|6|6|1|8|5|9|2|3|4|7|8|3|4|6|7|5|2|1|9|7|6|9|4|2|1|8|5|3|1|5|2|9|8|3|7|6|4|2|7|6|8|5|9|4|3|1|5|4|1|3|6|7|9|2|8|9|8|3|2|1|4|6|7|52|1|6|8|9|7|3|4|5|3|5|9|1|2|4|8|6|7|8|4|7|6|3|5|1|2|9|5|6|8|3|1|9|2|7|4|9|3|2|7|4|6|5|8|1|4|7|1|5|8|2|6|9|3|1|2|3|4|7|8|9|5|6|7|8|5|9|6|3|4|1|2|6|9|4|2|5|1|7|3|85|2|8|3|6|9|4|1|7|7|9|6|1|2|4|5|8|3|3|1|4|5|8|7|2|6|9|2|3|9|8|5|6|1|7|4|6|5|1|7|4|3|9|2|8|8|4|7|9|1|2|6|3|5|9|7|5|2|3|1|8|4|6|1|6|3|4|9|8|7|5|2|4|8|2|6|7|5|3|9|11|8|7|9|2|6|5|3|4|4|2|5|8|3|7|1|6|9|6|3|9|4|1|5|8|2|7|8|7|4|6|5|2|9|1|3|2|5|1|7|9|3|6|4|8|9|6|3|1|4|8|7|5|2|5|9|6|2|7|4|3|8|1|7|4|8|3|6|1|2|9|5|3|1|2|5|8|9|4|7|63|8|1|9|2|5|7|4|6|6|5|2|3|4|7|8|1|9|9|4|7|1|6|8|3|5|2|8|2|5|6|3|1|4|9|7|4|1|3|8|7|9|2|6|5|7|6|9|2|5|4|1|3|8|1|3|6|7|9|2|5|8|4|2|9|4|5|8|3|6|7|1|5|7|8|4|1|6|9|2|37|3|1|9|2|6|5|8|4|9|2|4|5|3|8|7|1|6|5|6|8|7|4|1|2|3|9|1|4|9|8|6|2|3|5|7|2|5|6|1|7|3|9|4|8|3|8|7|4|5|9|6|2|1|4|1|3|2|9|7|8|6|5|8|7|2|6|1|5|4|9|3|6|9|5|3|8|4|1|7|21|3|9|4|7|6|8|2|5|7|6|5|3|8|2|4|9|1|4|8|2|9|1|5|6|3|7|3|1|8|7|9|4|2|5|6|6|2|4|1|5|8|3|7|9|9|5|7|6|2|3|1|8|4|5|7|6|2|3|1|9|4|8|2|9|1|8|4|7|5|6|3|8|4|3|5|6|9|7|1|26|2|5|7|3|9|4|8|1|9|7|1|6|4|8|3|2|5|3|4|8|1|2|5|7|9|6|8|3|7|5|6|4|2|1|9|4|1|2|9|8|3|6|5|7|5|6|9|2|7|1|8|4|3|1|8|4|3|9|6|5|7|2|7|9|6|8|5|2|1|3|4|2|5|3|4|1|7|9|6|86|8|1|2|9|3|5|7|4|4|2|9|1|5|7|6|3|8|7|5|3|8|4|6|2|1|9|1|4|8|7|6|2|9|5|3|9|3|7|4|1|5|8|2|6|5|6|2|3|8|9|7|4|1|2|1|5|9|3|8|4|6|7|8|7|4|6|2|1|3|9|5|3|9|6|5|7|4|1|8|2
Great! We’ve got the serialized output of 10 probably unique boards.
Making it work natively … again
Getting the library to work with WASM was pretty cool, but I inadvertently broke my native (Linux) builds. I received build errors like the following.
error[E0432]: unresolved import `wasm_bindgen`
--> src/sudoku_generator.rs:151:1
|
151 | #[wasm_bindgen]
| ^^^^^^^^^^^^^^^ did you mean `self::wasm_bindgen`?
error: cannot determine resolution for the macro `__wbindgen_if_not_std`
--> src/sudoku_generator.rs:151:1
|
151 | #[wasm_bindgen]
| ^^^^^^^^^^^^^^^
|
= note: import resolution is stuck, try simplifying macro imports
error[E0433]: failed to resolve: use of undeclared type or module `WasmRefCell`
--> src/sudoku_generator.rs:151:1
|
151 | #[wasm_bindgen]
| ^^^^^^^^^^^^^^^ use of undeclared type or module `WasmRefCell`
After trying a few workarounds, I settled on excluding wasm-bindgen
from the native build. Firstly I excluded the WASM specific packages from the Cargo.toml
native builds.
[dependencies]
cfg-if="0.1"
time = "0.1"
rand = { version = "0.6", features = ["wasm-bindgen"] }
[target.'cfg(target_arch = "wasm32")'.dependencies]
wasm-bindgen = "0.2"
# `wee_alloc` is a tiny allocator for wasm that is only ~1K in code size
# compared to the default allocator's ~10K. It is slower than the default
# allocator, however.
#
# Unfortunately, `wee_alloc` requires nightly Rust when targeting wasm for now.
wee_alloc = { version = "0.4.2", optional = true }
# The `console_error_panic_hook` crate provides better debugging of panic s by
# logging them with `console.error`. This is great for development, but r equires
# all the `std::fmt` and `std::panicking` infrastructure, so isn't great for
# code size when deploying.
console_error_panic_hook = { version = "0.1.1", optional = true }
The [target.'cfg(target_arch = "wasm32")'.dependencies]
section excludes the below packages from all build targets except for wasm32
. They won’t be included in any native builds.
Next, I had to exclude wasm_bindgen
from the library source code. This included both the imports and the attributes. The imports can be handled in the same way that the Hello World example handled the allocator. Rust provides a mechanism excluding the attributes via the cfg_attr
attribute. This attribute allows for compile-time rules to adjust what attributes are added (or not). An outline of the changes can be seen below.
cfg_if! {
// exclude wasm bindings unless we need them
if #[cfg(target_arch = "wasm32")] {
extern crate wasm_bindgen;
use self::wasm_bindgen::prelude::*;
}
}
#[cfg_attr(target_arch = "wasm32", wasm_bindgen)]
pub struct SudokuBoard { /* */ }
#[cfg_attr(target_arch = "wasm32", wasm_bindgen)]
pub fn generate_and_fill_boards(board_count: usize, board_size: usize, all_neighbors: bool) { /* */ }
#[cfg_attr(target_arch = "wasm32", wasm_bindgen)]
pub fn generate_and_fill_board(board_size: usize, all_neighbors: bool) -> SudokuBoard { /* */ }
#[cfg_attr(target_arch = "wasm32", wasm_bindgen)]
pub fn serializeBoard(board: &SudokuBoard) -> String { /* */ }
Essentially all #[wasm_bindgen]
entries were replaced with the more verbose #[cfg_attr(target_arch = "wasm32", wasm_bindgen)]
. A little ugly, but it gets the job done.
Now when I run the native binary I get the expected results.
> make run-native
# compilation output
./target/release/runner 100
time: 0.025662673 s
board count 100
boards per second 3896.7102140918837
Summary
In this post, I went over how I modified my sudoku board generator to build a library for use as a web assembly module, and within a native Linux build. I ran into and worked around some compatibility issues. And came across some strange optimization behavior.
Next, I’ll outline a C++ implementation of a WASM library. After that, it’s time to benchmark!