Author: Rory Winston

High-Frequency Statistical Arbitrage

Post author By Rory Winston
Post date June 6, 2012

Computational statistical arbitrage systems are now de rigeur, especially for high-frequency, liquid markets (such as FX).
Statistical arbitrage can be defined as an extension of riskless arbitrage,
and is quantified more precisely as an attempt to exploit small and consistent regularities in
asset price dynamics through use of a suitable framework for statistical modelling.

Statistical arbitrage has been defined formally (e.g. by Jarrow) as a zero initial cost, self-financing strategy with cumulative discounted value $v(t)$ such that:

$ v(0) = 0 $,
$ \lim_{t\to\infty} E^P[v(t)] > 0 $,
$ \lim_{t\to\infty} P(v(t) < 0) = 0 $,
$ \lim_{t\to\infty} \frac{Var^P[v(t)]}{t}=0 \mbox{ if } P(v(t)<0) > 0 \mbox{ , } \forall{t} < \infty $

These conditions can be described as follows: (1) the position has a zero initial cost (it is a self-financing trading strategy), (2) the expected discounted profit is positive in the limit, (3) the probability of a loss converges to zero, and (4) a time-averaged variance measure converges to zero if the probability of a loss does not become zero in finite time. The fourth condition separates a standard arbitrage from a statistical arbitrage opportunity.

We can represent a statistical arbitrage condition as $$ \left| \phi(X_t – SA(X_t))\right| < \mbox{TransactionCost} $$ Where $\phi()$ is the payoff (profit) function, $X$ is an arbitrary asset (or weighted basket of assets) and $SA(X)$ is a synthetic asset constructed to replicate the payoff of $X$. Some popular statistical arbitrage techniques are described below. Index Arbitrage

Index arbitrage is a strategy undertaken when the traded value of an index (for example, the index futures price) moves sufficiently far away from the weighted components of the index (see Hull for details). For example, for an equity index, the no-arbitrage condition could be expressed as:

\[ \left| F_t – \sum_{i} w_i S_t^i e^{(r-q_i)(T-t)}\right| < \mbox{Cost}\] where $q_i$ is the dividend rate for stock i, and $F_t$ is the index futures price at time t. The deviation between the futures price and the weighted index basket is called the basis. Index arbitrage was one of the earliest applications of program trading. An alternative form of index arbitrage was a system where sufficient deviations in the forecasted variance of the relationship (estimated by regression) between index pairs and the implied volatilities (estimated from index option prices) on the indices were classed as an arbitrage opportunity. There are many variations on this theme in operation based on the VIX market today. Statistical Pairs trading is based on the notion of relative pricing - securities with similar characteristics should be priced roughly equally. Typically, a long-short position in two assets is created such that the portfolio is uncorrelated to market returns (i.e. it has a negligible beta). The basis in this case is the spread between the two assets. Depending on whether the trader expects the spread to contract or expand, the trade action is called shorting the spread or buying the spread. Such trades are also called convergence trades.

A popular and powerful statistical technique used in pairs trading is cointegration, which is the identification of a linear combination of multiple non-stationary data series to form a stationary (and hence predictable) series.

Trading Algorithms

In recent years, computer algorithms have become the decision-making machines behind many trading strategies. The ability to deal with large numbers of inputs, utilise long variable histories, and quickly evaluate quantitative conditions to produce a trading signal, have made algorithmic trading systems the natural evolutionary step in high-frequency financial applications. Originally the main focus of algorithmic trading systems was in neutral impact market strategies (e.g. Volume Weighted Average Price and Time Weighted Average Price trading), however, their scope has widened considerably, and much of the work previously performed by manual systematic traders can now be done by “black box” algorithms.

Trading algorithms are no different from human traders in that they need an unambiguous measure of performance – i.e. risk versus return. The ubiquitous Sharpe ratio ($\frac{\mu_r – \mu_f}{\sigma}$) is a popular measure, although other measures are also used. A measure of trading performance that is commonly used is that of total return, which is defined as

\[ R_T \equiv \sum_{j=1}^{n}r_j \]

over a number of transactions n, and a return per transaction $r_j$. The annualized total return is defined as $R_A = R_T \frac{d_A}{d_T}$, where $d_A$ is the number of trading days in a year, and $d_T$ is the number of days in the trading period specified by $R_T$. The maximum drawdown over a certain time period is defined as $D_T \equiv \max(R_{t_a}-R_{t_b}|t_0 \leq t_a \leq t_b \leq t_E)$, where $T = t_E – t_0$, and $R_{t_a}$ and $R_{t_b}$ are the total returns of the periods from $t_0$ to $t_a$ and $t_b$ respectively. A resulting indicator is the Stirling Ratio, which is defined as

\[ SR = \frac{R_T}{D_T} \]

High-frequency tick data possesses certain characteristics which are not as apparent in aggregated data. Some of these characteristics include:

Non-normal characteristic probability distributions. High-frequency data may have large kurtosis (heavy tails), and be asymmetrically skewed;
Diurnal seasonality – an intraday seasonal pattern influenced by the restrictions on trading times in markets. For instance, trading activity may be busiest at the start and end of the trading day. This may not apply so much to foreign exchange, as the FX market is a decentralized 24-hour operation, however, we may see trend patterns in tick interarrival times around business end-of-day times in particular locations;
Real-time high frequency data may contain errors, missing or duplicated tick values, or other anomalies. Whilst historical data feeds will normally contain corrections to data anomalies, real-time data collection processes must be aware of the fact that adjustments may need to be made to the incoming data feeds.

Coding R

The Euler Method In R

Post author By Rory Winston
Post date April 12, 2012

The Euler Method is a very simple method used for numerical solution of initial-value problems. Although there are much better methods in practise, it is a nice intuitive mechanism.

The objective is to find a solution to the equation

$$ \frac{dy}{dt} = f(t,y) $$

over a grid of points (equally spaced, in our case). Euler’s method uses the relation (basically a truncated form of Taylor’s approximation theorem with the error terms chopped off):

$$ y(t_{i+1}) = y(t_i) + h\frac{df(t_i, y(t_i))}{dt}$$

In R, we can express this iterative solution as:

[source lang=”R”]
euler <- function(dy.dx=function(x,y){}, h=1E-7, y0=1, start=0, end=1) {
nsteps <- (end-start)/h
ys <- numeric(nsteps+1)
ys[1] <- y0
for (i in 1:nsteps) {
x <- start + (i-1)*h
ys[i+1] <- ys[i] + h*dy.dx(x,ys[i])
}
ys
}
[/source]

Note that given the start and end points, and the size of each step, we figure out the number of steps. Inside the loop, we calculate each successive approximation.

An example using the difference equation

$$ \frac{df(x,y)}{dx} = 3x – y + 8$$

is:

[source lang=”R”]
dy.dx <- function(x,y) { 3*x – y + 8 }
euler(dy.dx, start=0, end=0.5, h=0.1, y0=3)
[1] 3.00000 3.50000 3.98000 4.44200 4.88780 5.31902
[/source]

Coding R

Newton’s Method In R

Post author By Rory Winston
Post date April 8, 2012

Here is a toy example of implementing Newton’s method in R. I found some old code that I had written a few years ago when illustrating the difference between convergence properties of various root-finding algorithms, and this example shows a couple of nice features of R.

Newtons method is an iterative root-finding algorithm where we start with an initial guess $x_0$ and each successive guess is refined using the iterative process:

$$ x_{n+1} = x_n – \frac{f(x_n)}{f'(x_n)} $$

Here is a simple implementation of this in R:

[source lang=”R”]
newton <- function(f, tol=1E-12,x0=1,N=20) {
h <- 0.001
i <- 1; x1 <- x0
p <- numeric(N)
while (i<=N) {
df.dx <- (f(x0+h)-f(x0))/h
x1 <- (x0 – (f(x0)/df.dx))
p[i] <- x1
i <- i + 1
if (abs(x1-x0) < tol) break
x0 <- x1
}
return(p[1:(i-1)])
}
[/source]

Note a couple of things:

* The step-size for numerical differentiation is hardcoded to be 0.001. This is arbitrary and should probably be a parameter.
* The algorithm will run until either the number of steps N has been reached, or the error tolerance $\left|x_{n+1}-x_n\right|< \epsilon$, where $\epsilon$ is defined as the tolerance parameter tol. * The function returns a vector of the iterated x positions, which will be <= N. Here is a simple example: for instance, to find the zeros of the function $$ f(x) = x^3 + 4x^2 - 10 $$ which in R we can represent as: [source lang="R"]f <- function(x) { x^3 + 4*x^2 -10 }[/source] Let's plot the function over the range [1,2]:

[source lang=”R”]
plot(x,f(x), type=’l’, lwd=1.5, main=expression(sin(x/2) * cos(x/4)))
abline(h=0)
[/source]

It seems obvious from the plot that the zero of f(x) in the range [1,2] is somewhere between x=1.3 and x=1.4.

This is made even more clear if we tabulate the x,y values over this range:

[source lang=”R”]
> xy <- cbind(x,f(x))
> xy
x
[1,] 1.0 -5.000
[2,] 1.1 -3.829
[3,] 1.2 -2.512
[4,] 1.3 -1.043
[5,] 1.4 0.584
[6,] 1.5 2.375
[7,] 1.6 4.336
[8,] 1.7 6.473
[9,] 1.8 8.792
[10,] 1.9 11.299
[11,] 2.0 14.000
[/source]

Using the function defined earlier, we can run the root-finding algorithm:

[source lang=”R”]
> p <- newton(f, x0=1, N=10)
> p
[1] 1.454256 1.368917 1.365238 1.365230 1.365230 1.365230 1.365230
[/source]

This returns 1.365230 as the root of f(x). Plotting the last value in the iteration:

[source lang=”R”]
> abline(v=p[length(p)])
[/source]

In the iteration loop, we use a schoolbook-style numerical derivative:

$$ \frac{dy}{dx} = \frac{f(x+\epsilon)-f(x)}{\epsilon} $$

Its also worth noting that R has some support for symbolic derivatives, so we could use a symbolic function expression. A simple example of symbolic differentiation in R (note that R works with expression instances when using symbolic differentiation):

[source lang=”R”]
> e <- expression(sin(x/2)*cos(x/4))
> dydx <- D(e, "x")
> dydx
cos(x/2) * (1/2) * cos(x/4) – sin(x/2) * (sin(x/4) * (1/4))
[/source]

In order for this to be useful, obviously we need to be able to evaluate the calculated derivative. We can do this using eval:

[source lang=”R”]
> z <- seq(-1,1,.1)
> eval(dydx, list(x=z))
[1] 0.3954974 0.4146144 0.4320092 0.4475873 0.4612640 0.4729651 0.4826267
[8] 0.4901964 0.4956329 0.4989067 0.5000000 0.4989067 0.4956329 0.4901964
[15] 0.4826267 0.4729651 0.4612640 0.4475873 0.4320092 0.4146144 0.3954974
[/source]

Note that we can bind the x parameter in the expression when calling eval().