Category: Coding

Topics related to software development

Coding

Web Retrieval With Ruby

I recently had the need to automatically retrieve and parse a table of BT fixed-line call tariff data. Normally I would use Perl for this sort of thing. However on this occasion, I decided this might be a good opportunity to learn a bit of Ruby.

require 'net/http' require 'html/tree' require 'html/xmltree' require 'http-access2'


client=HTTPAccess2::Client.new()

url = 'http://www.bt.com/...'           # long URI omitted
parser = HTMLTree::Parser.new(false,false)

parser.feed(client.get_content(url))
tariffs = Array.new()
# Iterate through each <tr>

rows = parser.html.select { |ea| ea.tag == 'tr' }
# Extract and normalize the content

rows.each { |row|

  texts = row.select { |item| item.data?  }.  # just look at cdata

    collect { |data| data.strip }.            # strip it

    select { |data| data.size > 0 }           # and keep the non-blank fields

  texts = texts.join('|')
  # Only store the contents that contain actual call tariff data

  tariffs.push(texts) if (/^[^|]+\|((\d)+\.(\d)+\|){2}(\d)+\.(\d)+$/ =~ texts)

}

# Send to stdout so we can run $ ./client.rb > tariffs.dat puts tariffs

This produces a pipe-delimited output of call tariffs by country. My initial impressions of Ruby (I’m way behind the curve here) are:

It’s very “Perl-like” in some ways – you can see a definite Perl influence in the language.
I love the iterator and closure syntax: collect(), map(), etc. It’s very clean and intuitive.
The idea of code blocks as first-class objects seems to be integral to Ruby: in the code above, the output of a select {} block is passed to a collect {} block, which is passed in turn to another select {} block (all done within an each {} block). Very reminiscent of the simple building block approach of Unix shell commands.

There seems to be a lot of hype around Ruby at the moment, mainly driven by Rails. However, the basic language itself is quite exciting in that it seems to be as useful and concise as Perl, whilst having some syntactic advantages that make it more readable and maintainable.

Coding

Bash One-Liner To Generate MD5s for Artifacts

Post author By Rory Winston
Post date September 8, 2006

I’m putting this here so I don’t forget it…a quick one-liner to generate hashes for all artifacts created by a Maven build:

$ for distfile in $(ls *.{jar,zip,gz,bz2}); do md5sum $distfile > $distfile.md5; done

Coding

Executing Ant tasks from Maven 2

Post author By Rory Winston
Post date August 30, 2006
No Comments on Executing Ant tasks from Maven 2

One of the great things about Maven is that it does so much for you. Unfortunately, this also means that it must make a lot of assumptions (and a few restrictions) in order to do this well. One example is that of creating redistributables. Commons::Net is used mainly for FTP (probably about 90%+ of its users just use this functionality). Thus, I wanted to create a smaller FTP-client only jar alongside the main package. Unfortunately, I couldnt find a way to get Maven to do this. I guess I could have written a Mojo-based plugin that would allow me to specify and generate separate redistributables, but this seemed like a lot of work just for this task. The basic limitation comes from the module-centric way that Maven looks at projects and redistributable package, and in this case, I wasn’t satisfied that I could split the project up in a way that would suit both me and Maven’s requirements. Thankfully, there is an easier way: just use the Maven AntRun plugin:

<plugin> <artifactId>maven-antrun-plugin</artifactId> <executions> <execution> <phase>package</phase> <configuration> <tasks> <jar destfile="target/commons-net-ftp-${version}.jar"> <fileset dir="target/classes" includes="org/apache/commons/net/ftp/**, org/apache/commons/net/*, org/apache/commons/net/io/*,org/apache/commons/net/util/*"/> <fileset dir="${basedir}" includes="LICENSE.txt"/> <manifest> <attribute name="Implementation-Vendor" value="Apache Software Foundation"/> </manifest> </jar> </tasks> </configuration> <goals> <goal>run</goal> </goals> </execution> </executions> </plugin>

As you can see from the section marked in bold, you have access to the properties exposed by the POM. You just need to hook the task into a specific phase in the lifecycle (I have hooked it into the “package” phase). Of course, if your task is not going to be trivial, you can always encapsulate it in a separate build file and invoke it using Ant’s ant taskdef.