Problem 89 - Roman numerals

For a number written in Roman numerals to be considered valid there are basic rules which must be followed. Even though the rules allow some numbers to be expressed in more than one way there is always a "best" way of writing a particular number.

For example, it would appear that there are at least six ways of writing the number sixteen:

IIIIIIIIIIIIIIII
VIIIIIIIIIII
VVIIIIII
XIIIIII
VVVI
XVI

However, according to the rules only XIIIIII and XVI are valid, and the last example is considered to be the most efficient, as it uses the least number of numerals.

The 11K text file, roman.txt (right click and 'Save Link/Target As...'), contains one thousand numbers written in valid, but not necessarily minimal, Roman numerals; see About... Roman Numerals for the definitive rules for this problem.

Find the number of characters saved by writing each of these in their minimal form.

Note: You can assume that all the Roman numerals in the file contain no more than four consecutive identical units.


For reference, standard Roman numerals represent the following values:

I = 1
V = 5
X = 10
L = 50
C = 100
D = 500
M = 1000

Implementation

My idea for solving this problem is to write two functions, one to convert a valid Roman numeral to its decimal equivalent and another to convert a decimal number to its minimal Roman numeral representation. Then when the file of Roman numerals is read, each numeral will be converted to decimal and back to minimal Roman numeral form to compare the amount of characters saved. The decimal → Roman numeral function is simpler to write, so we'll start with that.

One important consideration is the range of numbers that can be represented as Roman numerals. According to the provided instructions, the largest representable number in minimal form would be 3,999 (MMMCMXCIX) since there is no standard symbol for 5,000. For this problem, knowing that we will have to read non-standard Roman numerals, I arbitrarily chose 100,000 as the upper limit. Numbers over 3,999 will simply have additional M's prefixed at the front (i.e. 4000 = MMMM, 11593 = MMMMMMMMMMMDXCIII).

Decimal → Roman Function

An array of possible Roman numeral combinations will simplify writing the two functions:

#include <string>

// 0, 100, 200, 300, ... 0, 10, 20, 30, ... 0, 1, 2, 3, ...
const std::string romanNumerals[][10] = {
	{"", "C", "CC", "CCC", "CD", "D", "DC", "DCC", "DCCC", "CM"}, 
	{"", "X", "XX", "XXX", "XL", "L", "LX", "LXX", "LXXX", "XC"},
	{"", "I", "II", "III", "IV", "V", "VI", "VII", "VIII", "IX"}
};

This array represents every possible hundred's digit, ten's digit, and one's digit in order. The first index chooses the decimal place and the second index is the value of the digit. To get the Roman numeral for each digit of "679," index [0][6] gives "DC", index [1][7] gives "LXX", and index [2][9] gives "IX". Together, DCLXXIX = 679.

With this array, the decimal → Roman function is simple:

std::string decimalToRoman(int decimal) {
	std::string roman = "";
	if (decimal <= 0 || decimal > 100'000) return roman;

	// thousands
	int count = decimal / 1'000;
	decimal -= 1'000*count;
	roman.append(count, 'M');

	// hundreds, tens, ones
	for (int i = 0, e = 100; i < 3; i++, e /= 10) {	
		int digit = (decimal / e) % 10;
		roman += romanNumerals[i][digit];
		decimal -= e*digit;
	}
	return roman;
}

This function constructs a string digit-by-digit to represent the Roman numeral. If the input is invalid an empty string is returned. The thousand's digit simply corresponds to a leading number of M's. The hundred's, ten's, and one's digits use the array above.

Roman → Decimal Function

The second function, to convert a Roman numeral to a decimal is a little trickier. According to the problem's extra details, a valid Roman numeral must follow these rules:

(1) Numerals must be arranged in descending order of size
(2) M, C, and X cannot be equaled or exceeded by smaller denominations
(3) D, L, and V can each only appear once

In addition to these rules, the use of subtractive combinations (e.g. 9 = IX rather than VIIII) introduces these additional rules:

(4) Only one I, X, and C can be used as the leading numeral in part of a subtractive pair
(5) I can only be placed before V and X
(6) X can only be placed before L and C
(7) C can only be placed before D and M

Our function needs to be able to accept an arbitrary string and correctly parse its decimal value while also adhering to these rules. I'll use the following variables in this function:

int decimalVal = 0;
int denomVal = 0;
int prevDenomVal = 1'000;
int fives[3]{};

decimalVal will store the resulting value of the Roman numeral input. denomVal will store the decimal value of the current character being processed and denomVal and prevDenomVal together will be used to check for subtractive combinations and that rule (1) is followed. The fives array will be used to check for rule (3) violations (D, L, and V are 5, 50, and 500 respectively).

With these values, the input string, given as numeral, will be checked one character at a time from left to right:

for (int i = 0; i < numeral.size(); i++) {
	switch (std::toupper(numeral[i])) {
		case 'I': denomVal = 1; break;
		case 'V': denomVal = 5; fives[0]++; break;
		case 'X': denomVal = 10; break;
		case 'L': denomVal = 50; fives[1]++; break;
		case 'C': denomVal = 100; break;
		case 'D': denomVal = 500; fives[2]++; break;
		case 'M': denomVal = 1000; break;
		default: return -1;
	}
	if (denomVal > prevDenomVal) {	// check for subtractive combo
		// subtractive combo can only have 1 leading numeral
		if ((decimalVal/prevDenomVal)%10 != 1) return -1;
		int r = denomVal / prevDenomVal;
		if (r == 5) denomVal = (3*denomVal / 5);		// IV, XL, CD
		else if (r == 10) denomVal = (8*denomVal / 10);	// IX, XC, CM
		else return -1;		// numerals not in descending order
	} else if (prevDenomVal/denomVal == 3 || prevDenomVal/denomVal == 8) {
		// subtractive combo cannot be followed by the same
		// leading numeral (ex: XLX)
		return -1;
	}
	decimalVal += denomVal;
	prevDenomVal = denomVal;
}

If the input is not a valid Roman numeral, the function will return -1.

Because we are reading the numeral from left to right, rule (1) implies that the only valid case where the current denomination value will be larger than the previous is when there is a subtractive combination. The two if statements check for valid subtractive combinations and enforce rules (1), (4), (5), (6), and (7). I use the fact that a valid subtractive combination is one of two types:

  • One unit less than 5, 50, or 500 (IV, XL, CD)
  • One unit less than 10, 100, or 1000 (IX, XC, CM)

The lines

int r = denomVal / prevDenomVal;
if (r == 5) denomVal = (3*denomVal / 5);		// IV, XL, CD
else if (r == 10) denomVal = (8*denomVal / 10);	// IX, XC, CM

check which type the subtractive pair is. Once a pair is detected, the value of the first character will have already been added to the total in the previous loop iteration. Therefore a modified value should be added in the current iteration to account for the combination. For example, when processing the "V" in an IV combination, the combined value is 4 but a 1 has already been added from the previous iteration. So only 3 should be added this iteration rather than 5 a "V" would normally imply.

This check:

// subtractive combo can only have 1 leading numeral
if ((decimalVal/prevDenomVal)%10 != 1) return -1;

enforces rule (4). It makes sure the last non-zero digit of the running total is 1. For example, if CXL is entered, once "L" is encountered, the current value will be 110 (100 + 10). If instead the invalid input CXXL is entered, the value will be 120 (100 + 10 + 10).

Finally, the fives array is checked for rule (3) violations:

for (auto count : fives) if (count > 1) return -1;

The full function as well as the driver function are given below.

#include <cctype>		// std::toupper
#include <fstream>
#include <iostream>		// std::cerr

int romanToDecimal(std::string numeral) {
	if (numeral.empty()) return 0;
	int decimalVal = 0;
	int denomVal = 0;
	int prevDenomVal = 1'000;
	int fives[3]{};	// 'V', 'L', 'D' can only appear once each
	for (int i = 0; i < numeral.size(); i++) {
		switch (std::toupper(numeral[i])) {
			case 'I': denomVal = 1; break;
			case 'V': denomVal = 5; fives[0]++; break;
			case 'X': denomVal = 10; break;
			case 'L': denomVal = 50; fives[1]++; break;
			case 'C': denomVal = 100; break;
			case 'D': denomVal = 500; fives[2]++; break;
			case 'M': denomVal = 1000; break;
			default: return -1;
		}
		if (denomVal > prevDenomVal) {	// check for subtractive combo
			// subtractive combo can only have 1 leading numeral
			if ((decimalVal/prevDenomVal)%10 != 1) return -1;
			int r = denomVal / prevDenomVal;
			if (r == 5) denomVal = (3*denomVal / 5);		// IV, XL, CD
			else if (r == 10) denomVal = (8*denomVal / 10);	// IX, XC, CM
			else return -1;		// numerals not in descending order
		} else if (prevDenomVal/denomVal == 3 || prevDenomVal/denomVal == 8) {
			// subtractive combo cannot be followed by the same
			// leading numeral (ex: XLX)
			return -1;
	}
		decimalVal += denomVal;
		prevDenomVal = denomVal;
	}
	for (auto count : fives) if (count > 1) return -1;
	return decimalVal;
}

int numExtraChars(std::string file) {
	std::ifstream f(file);
	if (!f) {
		std::cerr << "Error opening file";
		return -1;
	}
	int numReducedChars = 0;
	std::string numeral;
	while (f >> numeral) {
		std::string reduced = decimalToRoman(romanToDecimal(numeral));
		if (reduced.empty()) {
			std::cerr << "Invalid input";
			return -1;
		}
		numReducedChars += (numeral.size() - reduced.size());
	}
	return numReducedChars;
}