Categories
Coding

Bash Parameter Substitution

There is one neat trick that the Bash shell offers that I occasionally find very useful. However, usually when I go to use it, it’s been just long enough for me to forget how to do it. So here it is, so I know how to find it.

It’s called parameter substitution, and it works like this (paraphrased from the Advanced Bash Scripting Guide):

${var#Pattern}, ${var##Pattern} – Removes from $var the shortest/longest part of $Pattern that matches the front end of $var.

${var%Pattern}, ${var%%Pattern} – Remove from $var the shortest/longest part of $Pattern that matches the back end of $var.

I normally use it when I want to strip the file extension from a set of files and replace it with another, as part of an overall process like so:

for file in $(ls *.DAT); do mv $file ${file%.DAT}.ABC; done

will move all .DAT files in the current directory to files with a .ABC extension instead.

Categories
Coding

Perl Regular Expressions

I don’t get to use Perl very often these days, but when I do, it’s usually because I want to do some powerful data manipulation in a hurry. In the last couple of weeks, I’ve used it to extract and format a large amount of financial-related data for reporting purposes (it is the Practical Extraction and Reporting Language, after all), and currently I am using it to extract data from a series of text files containing technical trivia. The text files are in question-and-answer format, and the regex I have defined to extract the data and write to an XML-based format looks like this (snipped):

while ($trivia =~ m/^Q\s\$([A-Z0-9]{3})\)\s(.*?)(?=^A)/smg ) {
my $q = "<question number=\"$1\">$2</question>\n";
push (@questions, $q);
}

while ($trivia =~ m/^A\s\$([A-Z0-9]{3})\)\s(.*?)(?=^Q|\Z)/smg ) {
my $a = "<answer number=\"$1\">$2</answer>\n";
push (@answers, $a);
}

It’s a two-pass approach using two separate patterns. The two regex expressions look very similar, and indeed they are. The main difference is the lookahead portion, which is the (?=^Q|\Z) bit. This is slightly different for both expressions (it acts as an anchor that tells the parser when to stop).

None of this is very hard in Java, either, our patterns can be ported straight over:

Pattern patt = Pattern.compile(
"^Q\\s\\$([A-Z0-9]{3})\\)\\s(.*?)(?=^A|\\Z)",
Pattern.DOTALL | Pattern.MULTILINE);

However, what is really powerful about the Perl approach is the amazing power of its regex engine. For instance, it would be nice to be able to dynamically switch between the anchoring conditions, depending on what we had just picked up as the value of $1. So, for instance, if $1 was “Foo”, we could change the anchoring condition to be “Bar”, giving us a dynamic regex that automagically seems to understand the semantics of the data it is processing, as well as just the syntax.

In Perl, this is possible using the (??{}) and (?{}) operators. These allow you to execute some code in the body of the regular expression and use the output of that code as a dynamic pattern. You can even “feed” the code with backreferences from the current pattern.

To illustrate, check out the following examples in Perl.

First, our test string that we will search upon:

my $string = "Hello World";

Let’s also create a couple of lookup tables:

my %sym = (
'Hello' => 'World',
'World' => 'Hello'
);

my %sym2 = (
'Hello' => '\w{5}'
);

And now let’s use them.

my $var = "Hello";

First, a simple lookup using a variable:

if ($string =~ m/$var/) {
print "Matched variable\n";
}

Now, let’s insert some dynamic Perl code into the regex:

if ($string =~ m/$sym{'Hello'}/) {
print "Matched hash lookup\n";
}

Next, let’s pass a backreference into the dynamic code:

if ($string =~ m/(Hello) (??{$sym{$1}})/) {
print "Matched dynamic hash lookup using backreference\n";
}

And finally (the best bit) – using a backreference, the hash lookup resolves to a string of regex metacharacters (\w{5} in this case)! This is exactly what I need to make the lookahead dynamic.

if ($string =~ m/(Hello) (??{$sym2{$1}})/) {
print "Matched dynamic hash lookup, resolving to regex metacharacters, using backreference\n";
}

It’s things like this that make me reach for Perl time and time again .

Categories
Coding

Overflowing BigDecimals

If you do a lot of numerical calculations in Java where control over precision and rounding is important (especially currency-related calculations), the BigDecimal class should be your first port of call. They encapsulate an integer unscaled value, and a 32-bit precision scaling factor, which provides a large range of values.

If we define a variable called dividendTax as a BigInteger like so:

private BigDecimal dividendTax;

with a scaling factor of 2 (for currency) and a suitable rounding mode:

static final Integer SCALE = 2;
public static final RoundingMode ROUND_MODE = RoundingMode.HALF_UP;

We can round and scale an intermediate BigInteger-based calculation to a Double value like so:


public Double getDividendTax() {
return dividendTax.setScale(Constants.SCALE, Constants.ROUND_MODE).doubleValue();
}

Operations like divide() are methods on the BigDecimal instances themselves:

public Double getMonthlyTaxableAmount() {
return annualDividendTax.divide(new BigDecimal(12.0).setScale(Constants.SCALE, Constants.ROUND_MODE).doubleValue();
}

The above code snippet is possibly dangerous, in that the divide() operation can throw an ArithmeticException if the result from divide() is an infinite expansion (e.g. 1 divided by 3). This happened to me a couple of times until I figured what was missing.

The key is to pass a MathContext instance which specifies the correct precision to use. By default, the BigDecimal arithmetical operators use unlimited precision, which will obviously fail if a division operation results in an infinite series. So you need to truncate the results using one of the predefined precision ranges:

public Double getMonthlyTaxableAmount() {
return annualDividendTax.divide(new BigDecimal(12.0), MathContext.DECIMAL32).setScale(Constants.SCALE, Constants.ROUND_MODE).doubleValue();
}

Keep this in mind if you are using division operations that may potentially give infinite-precision results.