The solver was written in C++ and compiled to Webassembly using Emscripten. The GUI was written in Javascript. Both the original C++ source code and javascript can be found on my Github.
Sudoku is a logic puzzle game played on a 9x9 grid of cells. The objective is to fill each row, each column, and each 3x3 sub-grid (box) with the numbers 1-9. The only rule is that no number can appear more than once in any row, column, or box. Although you can arrive at the solution through guessing, the purpose of the game is to find cell solutions through logical deductions.
Entering data - Solutions can be entered or removed using a keyboard or the Digit input mode next to the board. In the process of solving, it's often helpful to write possible solutions into a cell before you know the true solution. Also known as pencil marks, these mini-numbers can be entered while Candidate mode is selected.
Custom puzzles - To enter a custom puzzle, first select Custom from the puzzle dropdown. Then, any number of given cell solutions (or clues) and candidates can be entered manually as described above. Alternatively, puzzle clues can be entered as an 81 character string in the Custom Board Editor. For unsolved cells in this string, any character other than 1-9 may be used (common characters are . or 0).
Solving - The solver works by applying a predetermined order of strategies (described below) until some change is made to the board, displaying the results, and then starting the strategy list over from the beginning. The Step button will show the result of one successful strategy at a time while the Solve button will find and display all the steps needed to solve the puzzle. Any change made can also be undone with the Undo button. For the most difficult puzzles, the solver may run out of strategies in which case the solved puzzle (if it exists) will be found by a fast brute force search.
First, a brief explanation of the terminology used in the following strategies:
naked single, hidden single
This simple strategy comes in two types. A naked single appears when a candidate is the only possible number left in a cell. Conversely, a hidden single is when there is only one possible cell left for a candidate. Or said another way, if candidate k can only be in one cell within a row, column, or box, k is a hidden single.
Singles are the bread and butter of sudoku solving strategies and are always the first thing you should look for. These two strategies alone are enough to solve nearly all "easy" and many "intermediate" level puzzles.
naked pair/triplet/quad, hidden pair/triplet/quad, pointing pair/triplet, box/line reduction, x-wing, swordfish, jellyfish
Also known as locked sets, the remainder of the simple strategies can all be described by a common line of reasoning:
If there is a set of n X's and n Y's in Z, any other X's in those n Y's can be removed
where X, Y, and Z are some combination of row/column/box/cell/candidate and n is the size of the subset
These subset strategies can be broken into four groups:
Naked subset - If there are n cells containing only n candidates in a unit, then those n candidates can be removed from other cells in that unit.
Naked subsets can come as pairs, triplets, or quads, corresponding to n = 2, 3, 4. For triplets or quads, it is not necessary for every cell to contain every candidate; only that the total number of cells and candidates equals 3 or 4 (as shown in the naked triplet example).
Hidden subset - If there are n candidates within only n cells in a unit, then any other candidates in those n cells can be removed.
The same naming conventions listed above also apply to hidden subsets.
Fish subset
(Type 1) If there are n rows where n columns contain a candidate, that candidate can be removed from any other rows in those n columns.
(Type 2) If there are n columns where n rows contain a candidate, that candidate can be removed from any other columns in those n rows.
These formations have different names depending on their size n:
Intersection subset
(Pointing) If a candidate appears in only n cells in a box and those cells are all in the same row/column, that candidate can be removed from the other cells in that row/column.
(Box/line) If a candidate appears in only n cells in a row/column and those cells are all in the same box, that candidate can be removed from the other cells in that box.
Intersection subsets can only have a size of n = 2 or n = 3 because a box can only have 3 cells at most aligned on the same row or column.
y-wing, xyz-wing, wxyz-wing
Bent sets are essentially naked subsets "bent" onto two different units. They consist of n cells containing only n candidates where the cells belong to 2 different units. The 2 units must be different types and at least 1 cell must be present in both units. The cell(s) that are part of both units are the hinge cells.
Under certain conditions a bent set can be treated like a naked set where the n candidates must all appear in some combination in the n cells of the set:
(Type 1) If exactly 1 of the n candidates in the bent set is present in 2 separate units, that candidate can be removed from any cells outside the bent set that can "see" all instances of that candidate in the bent set.
(Type 2) If none of the n candidates in the bent set are present in 2 separate units, all of the n candidates in the bent set can be removed from any cells outside the bent set that "see" all instances of each candidate in the bent set.
Type 1 bent sets appear far more often. Both types of bent sets can be referred to by their size similar to naked/hidden sets (bent triplet, bent quad, bent quint). They have also been given the following names:
simple coloring, 3D-medusa coloring
The coloring strategy introduces the more general chaining strategies. Imagine that a row has only 2 possible cells for candidate k. If the first cell IS k in the board's solution, then the second cell CANNOT be k. It is also equally valid to say that if the first cell is NOT k then the second cell MUST be (since there must be a k in this row). These two candidate possibilities are fundamentally linked.
By finding and connecting all of these strong links between candidates, we can create a connected graph of vertices. Coloring these vertices alternating between two colors will then represent the two possible outcomes the graph could resolve to in the board's solution.
There are 2 types of these graphs. Simple coloring describes a graph that is built on a single candidate k. 3D-Medusa refers to a graph that contains any of the candidates from 1-9. 3D-Medusa coloring requires another type of strong link: when a single cell contains only 2 candidates, those 2 candidates can be connected with a strong link using the same reasoning given above.
With either type of graph, there are two ways to find eliminations:
(Rule 1) If any uncolored candidate on the board can "see" both colors of the graph, that candidate isn't possible in either outcome and can be removed.
(Rule 2) If all candidates in a cell can "see" color A, OR if all occurrences of candidate k in a unit can "see" color A, all candidates colored A can be removed and all candidates colored B are the solutions to their cell.
alternating inference chain, x-cycle
As previously described, a strong link between candidates has only 2 possible outcomes in the board's solution. When multiple candidates are connected with strong links, that group of candidates forms a strongly-linked chain. The chain as a whole also has only two possible outcomes. Coloring strategies only consider one strongly-linked chain at a time when searching for eliminations. We can take chaining strategies further by introducing weak links to our graph.
Any two candidates that "see" each other and don't form a strong link instead form a weak link. Like strong links, if one end of the weak link IS the cell solution, the other CANNOT be. However, the inverse is not true: if one end is NOT the cell solution, we can't make any inferences about the other end.
With weak links we can connect several strongly-linked chains together and make more inferences. The same rules in Coloring above also apply to alternating inference chains:
(Rule 1) If both possible outcomes of a chain lead to a candidate being "turned off", that candidate can be eliminated.
This contradiction produces a pattern consisting of a cycle of alternating strong/weak links that contains two weak links adjacent to the eliminated candidate. Logically, you can interpret the cycle as saying "if candidate k is the solution to its cell, the links of the cycle imply it must not be the solution". This works because "turning on" k results in the other end of the adjacent weak link being "turned off." That "off" candidate then implies the candidate at the other end of the next (strong) link must be "turned on". These inferences alternate around the cycle back to k and work in either direction. The resulting contradiction allows us to remove k.
When a cycle with these properties consists of only one candidate, it is called an X-Cycle.
(Rule 2) If setting a strongly-linked chain to one of its two possible outcomes leads to a conflict elsewhere on the board, that entire chain must be set to its other possible outcome.
Here, a conflict means any impossible scenario on the board. For example, setting a chain to one of its possibilities could imply (1) a candidate appearing twice in a unit, (2) a candidate not appearing in a unit, (3) two different candidates both set in the same cell, or (4) a cell with no candidates in it. Since every strongly-linked chain only has two possibilities, a conflict arising with one possibility means that the chain must be set to its other possibility.