Author: Rory Winston

Market Making and The Win/Loss Ratio

Post author By Rory Winston
Post date June 16, 2020

The article https://online.wsj.com/public/resources/documents/VirtuOverview.pdf is a neat little illustration of a simple asymptotic toy distribution given an initial probability of a win or loss per-trade. It is used as an example to illustrate the basic methodology behind the working market-maker business – develop a small edge and scale this up as cheaply as possible to maximise the probability of overall profit.

If we take $p=0.51$ as the probability of a win per-trade and then after $n$ transactions we will have a number of ‘wins’ k that will vary from 0 to n. We model each trade as the outcome of a binomial 0-1 trial.

In order to come out at breakeven or better, the number of wins k needs to be at least $\frac{n}{2}$. Using the binomial distribution this can be modelled as:

$P\left(n>\frac{k}{2}\right) = \sum_{\frac{k}{2}}^\infty \frac{n!}{k!(n-k)!}p^k(1-p)^{n-k}$

As the binomial distribution converges to a normal $\mathcal{N}(np, np(1-p))$ as n gets large, we can use the distribution below to model the win/loss probability over n:

$ \int_{\frac{k}{2}}^\infty \mathcal{N}\left(np, np(1-p) \right) dx $

Which is

$ \int_{\frac{k}{2}}^\infty \frac{1}{\sigma\sqrt{2}\pi}e^{-\frac{1}{2}\frac{x-\mu}{\sigma}^2} dx$

Where $\mu=np$ and $\sigma^2=np(1-p)$

This can be modelled in R

> p <- 0.51
> n <- 100
> 1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
[1] 0.5792754
> n <- 1000
> 1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
[1] 0.7364967

Showing that with a win probability of 51% 100 trades gives us a 57% probability of breakeven or better and 1000 trades gives us a 73% chance of breakeven or better.

We can plot the probability of breakeven holding p constant and changing n from 1 to 1000:

 n<-seq(1,1000)
> y <- 1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
> library(ggplot2)
> library(scales)
> qplot(n,y)+scale_y_continuous(label=percent)

Which produces the following graph

Which shows the convergence to a sure 100% probability of profit as n gets large.

To make it more interesting we can generate different paths for n from 1 to 10000 but also vary the win probability from say 45% to 51% and look at the paths as we vary n and p:

n <- seq(1,10000)
p<- 0.5
y <- 1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
plot(n, y, type='l', ylim=c(0,1))

probs <- seq(0.45, 0.55, length.out = 100)
for (pr in seq_along(probs)){ 
 p<-probs[pr]
 y<-1-pnorm(q=n/2, mean=n*p,sd=sqrt(n*p*(1-p)))
 lines(x=n,y=y,col=ifelse(y<0.5,rgb(1,0,0,.5),rgb(0,1,0,.5)))
}

Which shows the probabilities of breakeven or better given a number of different starting win/loss probabilities and a varying number of trades. The path with $p=0.5$ is shown in black.

Coding R

Approximating e

Post author By Rory Winston
Post date June 13, 2020

I was reading Simon Singh’s The Simpsons And Their Mathematical Secrets today and he mentioned a simple method for approximating e – given a uniform RNG , e can be approximated by the average number of draws required for the sum of the draws to exceed 1. This is a neat little demonstration and easy to generate in R – taking the uniform RNG and plotting the average number of draws required to exceed a sum of one, and then replicating this using an increasing number of draws to illustrate convergence:

# Function to calculate number of draws required
function() {
  r <- runif(1,0,1)
  n <- 1
  while (r<1) {
    r <- r+runif(1,0,1)
    n <- n+1
  }
  return(n)
}
# Generate a series of draws from 2 .. 2^16 (65536)
N<-2^seq(1,16)
# Generate simulation
y <- sapply(N, function(x)mean(replicate(x,gen())))
# Plot convergence
qplot(1:16, y) + geom_line(linetype=2) + geom_hline(aes(yintercept=exp(1)),color='red')

The Simpsons and Their Mathematical Secrets by Simon Singh
My rating: 4 of 5 stars

Tags math, R

Coding kdb

Functional Selects/Updates in kdb+

Post author By Rory Winston
Post date June 9, 2015

Functional selects/updates are a relatively trick topic in kdb+ – mainly as the syntax takes a lot of getting used to. They are normally required when there are some dynamic elements in e.g. column selection or grouping criteria.

They are pretty well covered in Q For Mortals, but I wanted to add a couple of examples…combining functional select and update for example:

To start, load the sample tables in sp.q..we will use the table called ‘p’:

\l sp.q
q)p
p | name color weight city w
--| ----------------------------
p1| nut red 12 london 91
p2| bolt green 17 paris 91
p3| screw blue 17 rome 91
p4| screw red 14 london 91
p5| cam blue 12 paris 91
p6| cog red 19 london 91

Functional Selects

A simple select from p with some criteria:


q)select from p where city=`london
p | name color weight city w
--| ----------------------------
p1| nut red 12 london 91
p4| screw red 14 london 91
p6| cog red 19 london 91

Now lets look at the parse tree for this query:

q)parse "select from p where city=`london"
?
`p
,,(=;`city;,`london)
0b
()

Now in order to convert this to a functional select, we need to turn this parse tree into an executable statement using the ? operator.

The basic form of the ? operator is


?[tablename;(select criteria);(grouping criteria);(columns)]

The parse tree above gives us each of the four elements in the right order – we just need to convert them to a valid functional syntax. For the example above this translates to:


q)?[p;enlist (=;`city;enlist `london);0b;()]
p | name color weight city w
--| ----------------------------
p1| nut red 12 london 91
p4| screw red 14 london 91
p6| cog red 19 london 91

So if we want to add e.g. column selection:

q)select name,color,weight from p where city=`london
name color weight
------------------
nut red 12
screw red 14
cog red 19

The parse tree looks like:


q)parse"select name,color,weight from p where city=`london"
?
`p
,,(=;`city;,`london)
0b
`name`color`weight!`name`color`weight

In this case we have added a dictionary mapping selected columns to their output names:


q)?[p;enlist (=;`city;enlist `london);0b;(`name`color`weight)!(`name`color`weight)]
name color weight
------------------
nut red 12
screw red 14
cog red 19

Similarly, we can change the select criteria:


q)select name,color,weight from p where city in `london`paris
name color weight
------------------
nut red 12
bolt green 17
screw red 14
cam blue 12
cog red 19

Which produces the following parse tree:


q)parse"select name,color,weight from p where city in `london`paris"
?
`p
,,(in;`city;,`london`paris)
0b
`name`color`weight!`name`colour`weight

This is only a small modification to the original functional select:


q)?[p;enlist (in;`city;enlist `london`paris);0b;(`name`color`weight)!(`name`color`weight)]
name color weight
------------------
nut red 12
bolt green 17
screw red 14
cam blue 12
cog red 19

Functional Updates

Functional updates follow an almost identical form, but use the ! operator, e..g


q)update w:sum weight by city from p
p | name color weight city w
--| ----------------------------
p1| nut red 12 london 45
p2| bolt green 17 paris 29
p3| screw blue 17 rome 17
p4| screw red 14 london 45
p5| cam blue 12 paris 29
p6| cog red 19 london 45

q)parse "update w:sum weight by city from p"
!
`p
()
(,`city)!,`city
(,`w)!,(sum;`weight)

This parse tree maps to:


q)![p;();(enlist `city)!enlist `city;(enlist `w)!enlist (sum;`weight)]
p | name color weight city w
--| ----------------------------
p1| nut red 12 london 45
p2| bolt green 17 paris 29
p3| screw blue 17 rome 17
p4| screw red 14 london 45
p5| cam blue 12 paris 29
p6| cog red 19 london 45

Now if we want to update the output from a select, e.g. a simple grouped update:


q)update w:sum weight by color from select name,color,weight from p where city in `london`paris
name color weight w
---------------------
nut red 12 45
bolt green 17 17
screw red 14 45
cam blue 12 12
cog red 19 45

The parse tree for this looks like:


q)parse"update w:sum weight by color from select name,color,weight from p where city in `london`paris"
!
(?;`p;,,(in;`city;,`london`paris);0b;`name`color`weight!`name`color`weight)
()
(,`color)!,`color
(,`w)!,(sum;`weight)

This parse tree looks complex, but the main complexity comes from the nested functional select within.

We could explicitly write the entire function (nested select and update):


q)![?[p;enlist (in;`city;enlist `london`paris);0b;`name`color`weight!`name`color`weight];();(enlist `color)!enlist `color;(enlist `w)!enlist (sum;`weight)]
name color weight w
---------------------
nut red 12 45
bolt green 17 17
screw red 14 45
cam blue 12 12
cog red 19 45

However it may be easier to read if we store the select in its own variable:


q)sel::?[p;enlist (in;`city;enlist `london`paris);0b;`name`color`weight!`name`color`weight]
q)sel
name color weight
------------------
nut red 12
bolt green 17
screw red 14
cam blue 12
cog red 19

And then refer to the select thus:


q)![sel;();(enlist `color)!enlist `color;(enlist `w)!enlist (sum;`weight)]
name color weight w
---------------------
nut red 12 45
bolt green 17 17
screw red 14 45
cam blue 12 12
cog red 19 45