Categories
Coding

Data Precision, Excel, and Commons::Math

A post on the Jakarta User mailing list piqued my interest this week. The poster had noticed that they were getting significantly different results for some statistical measures from the output of Commons::Math vs. what Excel was producing. This, if true, would be a pretty serious situation. Excel’s calculation engine is proven and very mature (I know one of the guys who works on it, and he’s a genius), so any discrepancy would seem to point to Commons::Math.

Needless to say, the best way to verify this is with a simple “spike”, as the agile guys would say. So I fired up Excel, and using its random number generator, produced 20,000 normally distributed numbers with a standard deviation of 1,000 and a mean of 5,000. I used the Tools > Data Analysis add-in to do this, but you could also use the =NORMINV(rand(),mean,standard_dev) function.

When this was complete, I exported the data to a text file, and read in the values and calculated some simple stats using Commons::Math. Here is the sample program, if you’re interested:


package uk.co.researchkitchen.math;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

import org.apache.commons.math.stat.descriptive.moment.Mean;
import org.apache.commons.math.stat.descriptive.moment.StandardDeviation;
import org.apache.commons.math.stat.descriptive.moment.Variance;
import org.apache.commons.math.stat.descriptive.rank.Median;

public class TestPrecision {
  
  static final int DATA_SIZE = 20000;
  
  double[] data = new double[DATA_SIZE];
  
  public static void main(String[] argsthrows IOException {
    TestPrecision testPrecision = new TestPrecision();
    // Book1.txt is an exported Excel spreadsheet containing
    // 20,000 normally distributed numbers with a mean of ~5,000
    // and a stdev (sigma) of ~1,000
    testPrecision.calculateStats("c:\\temp\\Book1.txt");
  }
  
  public void calculateStats(String filenamethrows IOException {
    BufferedReader br = new BufferedReader(new FileReader(filename));
    
    int count = 0;
    String line = null;
    while ((line = br.readLine()) != null) {
      double datum = Double.valueOf(line);
      System.out.println(datum);
      data[count++= datum;
    }
    
    System.out.println("Read " + count + " items of data.");
    
    System.out.println("Standard deviation = " new StandardDeviation().evaluate(data));
    System.out.println("Median = " new Median().evaluate(data));
    System.out.println("Mean = " new Mean().evaluate(data));
    System.out.println("Variance = " new Variance().evaluate(data));
  }

}

I then went back to Excel and calculated the same measures in there (calculating both the 1/(N-1) sample and 1/N population standard deviation and variance). Here are the tabulated results:

Commons Math Excel
Standard Deviation 1005.8672054459015 1005.8672054332
Median 50011.934275 50011.9342750000
Mean 50008.74172390576 50008.7417239057
Variance 1011768.8349915474 1011768.8349659700

As suspected, they are almost identical, bar some rounding differences, at least an order of magnitude closer than the figures in the post (I told Excel to limit the precision to 10 digits, hence some figures seem a smaller precision). I don’t know what precision Excel uses internally for these calculations. It may be interesting to write a BigDecimal-based equivalent of the statistics package in Commons::Math.

There may be other reasons why the numbers given in the example don’t match, but unless I’m missing something obscure, or specific to the use case shown, it looks like the issue is not with the internal implementation of [math] (which incidentally looks like a very very neat little toolkit).

Categories
Coding

Spring and Hibernate’s getCurrentSession()

If you are using Spring to wrap a Hibernate SessionFactory and you are not using Spring-managed transactions, you may run into an issue. The reason is that Spring by default will wrap Hibernate’s SessionFactory implementation and delegate to its own transactional version. If you are just using the simple ThreadLocal-based session-per-request functionality, then when you attempt to open the Session, you will get an IllegalStateException thrown, with the error message "No Hibernate Session bound to thread, and configuration does not allow creation of non-transactional one here". This happens because Spring’s SessionFactoryUtils checks if the Session is bound to Spring’s transactional support, and by default throws an error if it is not.

The solution to this is to set the property

<property name="exposeTransactionAwareSessionFactory"><value>false</value></property>

in the Spring config. This will return the “raw” SessionFactory instead of the proxied one. A snippet of code from AbstractSessionFactoryBean shows where the check is done:

 
/**
   * Wrap the given SessionFactory with a transaction-aware proxy, if demanded.
   @param rawSf the raw SessionFactory as built by <code>buildSessionFactory()</code>
   @return the SessionFactory reference to expose
   @see #buildSessionFactory()
   @see #getTransactionAwareSessionFactoryProxy
   */
  protected SessionFactory wrapSessionFactoryIfNecessary(SessionFactory rawSf) {
    if (isExposeTransactionAwareSessionFactory()) {
       return getTransactionAwareSessionFactoryProxy(rawSf);
    }
    else {
      return rawSf;
    }
  }

A sample Spring config is shown below.

<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:aop="http://www.springframework.org/schema/aop"
       xmlns:tx="http://www.springframework.org/schema/tx"
       xsi:schemaLocation="
       http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans-2.0.xsd
       http://www.springframework.org/schema/tx http://www.springframework.org/schema/tx/spring-tx-2.0.xsd
       http://www.springframework.org/schema/aop http://www.springframework.org/schema/aop/spring-aop-2.0.xsd">

  <bean id="dataSource" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close">
    <property name="driverClassName" value="org.hsqldb.jdbcDriver"/>
    <property name="url" value="jdbc:hsqldb:hsql://localhost:9001"/>
    <property name="username" value="sa"/>
    <property name="password" value=""/>
  </bean>

  <bean id="sessionFactory" class="org.springframework.orm.hibernate3.annotation.AnnotationSessionFactoryBean">
    <property name="dataSource" ref="dataSource"/>
     <property name="exposeTransactionAwareSessionFactory"><value>false</value></property>
    <property name="annotatedPackages">
      <list>
        <value>uk.co.researchkitchen.hibernate</value>
      </list>
    </property>
    <property name="annotatedClasses">
        <list>
                <value>uk.co.researchkitchen.hibernate.Product</value>
                <value>uk.co.researchkitchen.hibernate.ProductDescription</value>
        </list>
    </property>
    <property name="hibernateProperties">
      <value>
        hibernate.dialect=org.hibernate.dialect.HSQLDialect
        hibernate.show_sql=true
            hibernate.hbm2ddl.auto=create
            hibernate.current_session_context_class=thread
      </value>
    </property>
  </bean>

</beans>
Categories
Coding

Ignoring Bash Aliases

If you have any aliases defined in your .bashrc or .bash_profile files, occassionally you may run into a situation where you would like to selectively ignore those aliases. In my shell, I have aliased ls like so:

alias ls ‘ls -lF –color’

However, when running a simple command like the following (which searches for filenames within a set of jar files), this breaks, as the output of ls is in long format. Instead of having to use cut or awk to slice off the parts of the output we need, we can escape the aliased command, effectively telling bash to ignore the alias:

for j in $(\ls *.jar); do jar tvf $j | grep -i “gfxjavactask” ; done

Note the \ in front of the command name. This is an obscure one, I know! 🙂