Categories
Coding

Web Retrieval With Ruby

I recently had the need to automatically retrieve and parse a table of BT fixed-line call tariff data. Normally I would use Perl for this sort of thing. However on this occasion, I decided this might be a good opportunity to learn a bit of Ruby.


require 'net/http'
require 'html/tree'
require 'html/xmltree'
require 'http-access2'

client=HTTPAccess2::Client.new()
url = 'http://www.bt.com/...' # long URI omitted

parser = HTMLTree::Parser.new(false,false)
parser.feed(client.get_content(url))

tariffs = Array.new()

# Iterate through each <tr>
rows = parser.html.select { |ea| ea.tag == 'tr' }

# Extract and normalize the content
rows.each { |row|
texts = row.select { |item| item.data? }. # just look at cdata
collect { |data| data.strip }. # strip it
select { |data| data.size > 0 } # and keep the non-blank fields
texts = texts.join('|')

# Only store the contents that contain actual call tariff data
tariffs.push(texts) if (/^[^|]+\|((\d)+\.(\d)+\|){2}(\d)+\.(\d)+$/ =~ texts)
}

# Send to stdout so we can run $ ./client.rb > tariffs.dat
puts tariffs

This produces a pipe-delimited output of call tariffs by country. My initial impressions of Ruby (I’m way behind the curve here) are:

  • It’s very “Perl-like” in some ways – you can see a definite Perl influence in the language.
  • I love the iterator and closure syntax: collect(), map(), etc. It’s very clean and intuitive.
  • The idea of code blocks as first-class objects seems to be integral to Ruby: in the code above, the output of a select {} block is passed to a collect {} block, which is passed in turn to another select {} block (all done within an each {} block). Very reminiscent of the simple building block approach of Unix shell commands.

There seems to be a lot of hype around Ruby at the moment, mainly driven by Rails. However, the basic language itself is quite exciting in that it seems to be as useful and concise as Perl, whilst having some syntactic advantages that make it more readable and maintainable.

Categories
Coding

Bash One-Liner To Generate MD5s for Artifacts

I’m putting this here so I don’t forget it…a quick one-liner to generate hashes for all artifacts created by a Maven build:


$ for distfile in $(ls *.{jar,zip,gz,bz2}); do md5sum $distfile > $distfile.md5; done

Categories
Coding

Executing Ant tasks from Maven 2

One of the great things about Maven is that it does so much for you. Unfortunately, this also means that it must make a lot of assumptions (and a few restrictions) in order to do this well. One example is that of creating redistributables. Commons::Net is used mainly for FTP (probably about 90%+ of its users just use this functionality). Thus, I wanted to create a smaller FTP-client only jar alongside the main package. Unfortunately, I couldnt find a way to get Maven to do this. I guess I could have written a Mojo-based plugin that would allow me to specify and generate separate redistributables, but this seemed like a lot of work just for this task. The basic limitation comes from the module-centric way that Maven looks at projects and redistributable package, and in this case, I wasn’t satisfied that I could split the project up in a way that would suit both me and Maven’s requirements. Thankfully, there is an easier way: just use the Maven AntRun plugin:


<plugin>
<artifactId>maven-antrun-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<configuration>
<tasks>
<jar destfile="target/commons-net-ftp-${version}.jar">
<fileset dir="target/classes"
includes="org/apache/commons/net/ftp/**,
org/apache/commons/net/*,
org/apache/commons/net/io/*,org/apache/commons/net/util/*"/>
<fileset dir="${basedir}" includes="LICENSE.txt"/>
<manifest>
<attribute name="Implementation-Vendor" value="Apache Software Foundation"/>
</manifest>
</jar>
</tasks>
</configuration>
<goals>
<goal>run</goal>
</goals>
</execution>
</executions>
</plugin>

As you can see from the section marked in bold, you have access to the properties exposed by the POM. You just need to hook the task into a specific phase in the lifecycle (I have hooked it into the “package” phase). Of course, if your task is not going to be trivial, you can always encapsulate it in a separate build file and invoke it using Ant’s ant taskdef.