[Originally published Sep 2007]
Here’s a problem I spent a while thinking about and solving. It’s quite a core one, and a bit tricky on the edge cases.
Some background: I’m building a system trading backtesting engine. Well actually, I’ve finished building it. Now I’m just making it do fun things.
Anyway, so, we have a program (a “system”) that decides when to Buy & Sell (ie, places “orders” or “signals”, usually at a specific price) in the market. We then run that against a bunch of market data, and see how well it does. From this we either tweak the system, or trade it. Rinse, Repeat, this is more or less what I do all day. Well, I’m more on the testing side than the development side, but anyway, given the way things are going, that’s largely a moot point.
Here is the basic bar – in our case, daily. These are probably familiar to you, but can be a bit weird to start with. The way you interpret these is as follows. The little tick on the left is the Open Price. The tick on the right is the Close price. The line that goes up stops at the High price for the day, and the line that goes down stops at the Low price for the day. All the action happens between those points. However, this bar by itself doesn’t tell us anything about WHAT happened during that day. That’s where things get tricky.
So, the key question is – if we have multiple orders on that bar, how do we know which are hit, and in which order? This is important if we are to reliably determine how a system actually trades in the market place.
Of course, usually we only have one order per bar, so there is simply a question of “Was the order price within the range of the bar (>= L and <= H)?” If it’s within the range, then the order is hit, if not, not. Very simple1.
However, as we move to faster and faster systems – with greater likelihood of multiple orders (e.g. an entry and multiple exits) within the range of a single bar, things get more complicated. In order to decipher this accurately, the key question becomes:
What order did the Open, High, Low & Close occur in?
We don’t actually have any intra bar data, only the O, H, L & C prices. Nothing about which are hit first (although of course O is first, C last).
Depending on what we decide, we could end up flat when we should be in a position (or vice versa), or hitting a loss before a profit, thereby significantly affecting system profitability. The cumulative effect of these seemingly insignificant differences can easily result in very nasty “real time” surprises when we’re actually trading a system that may otherwise have tested positively.
The higher frequency the system (i.e., the more often it trades), the greater the likelihood of multiple signals on an individual bar. Cumulatively we could end significantly different from how the system will actually behave when traded.
Obviously the simplest solution is to get finer grained data – if we’re currently using daily, then use half hourly, etc. However:
a) no matter how fine grained, there will always be some signals on the same bar, &
b) what do we do when we don’t have, or choose not to use, finer grained data (eg for performance or cost reasons)?
—
My previous testing platform has a very simple policy. The shortest distance gets covered first, regardless of overall market direction. However this simple strategy is flawed.
For example, if the OHLC are O=660.15, H=662.25, L=658.15, C=658.50 (as on the S&P500, 19/Sep/1984), it trades thus:
660.15 (O) -> 658.15 (L) -> 662.25 (H) -> 658.50 (C)
because the distance from O->L is only 2, but the distance from O->H is 2.1.
However, this has a total market distance travelled of abs(658.15 – 660.15) + abs(662.25 – 658.15) + abs(658.50 – 662.25) = 2 + 4.1 + 3.75 = 9.85.
However, if you look at this bar, it’s a downward bar (i.e. C < O). In an overall downward trend it’s much more likely that the market would have hit the High first, then reversed, come down, hit the Low, then closed slightly higher, but still lower than Open, i.e.:
660.15 (O) -> 662.25 (H) -> 658.15 (L) -> 658.50 (C).
(i.e., a total market distance travelled of 2.1 + 4.1 + 0.35 = 6.55)
—
In order to unravel this, we need to figure out, for each possible type of bar, what order the OHLC would have been hit in. We can then “traverse” each leg of the bar in turn – in the above example, we travel from Open up to High, hitting any prices in that range, ranked from lowest to highest. We then travel from High down to Low, hitting any prices in that range, from highest to lowest. Finally we travel from Low up to Close, hitting prices from lowest to highest. Of course, any order hit can also trigger later orders which must also be accounted for and kept track of.
Straight up. i.e. O=L, C=H Straight down. i.e. O=H, L=C
This is the simplest situation. The bar has only one leg, and simply traverses directly from O->C, regardless of overall market direction.
Lower, but upwards. i.e. O <> L, C=H Lower, but downwards. i.e. O <> L, C=H
Slightly more complicated, this bar has two legs. The market first traverses from O->L, then from L->C. Whether the Open is the High, or the Close is the High is irrelevant.
Higher, but upwards. i.e. C<>H, L=O Higher, but downwards. i.e. O<>H, L=C
As above, the bar has two legs. First from O->H, then from H->C. Overall trend is irrelevant.
Upward only, but no overall movement. i.e. O=L=C, H diff Downward only, but no overall movement. i.e. O=H=C, L diff
These bars are relatively simple. Only two legs. On the left, O->H, H->C. On the right, O->L, L->C.
All different, upwards. i.e. O<>H<>L<>C, C > O All different, downwards. i.e. O<>H<>L<>C, C < O
This is the controversial bar. In a long, upward bar (as on the left), it is rather unlikely (although not impossible, of course, just unlikely2) that the market would go from O->H, then all the way down to L, then back to C. So, the 3 legs we decide are O->L, then L->H, then H->C. On a downward bar (as on the right), we reverse the order, O->H, H->L, L->C.
Completely flat. i.e. O=H=L=C
This is a rather silly bar. No legs to speak of, we just have to check if any orders hit the exact price (O=H=L=C). It’s debatable if anything would have traded, since markets this flat typically mean zero liquidity – i.e., no trades at that price.
Up & downward, but no overall movement
This is the most complicated case. How do we decide which order the prices were hit in?
Above we’ve been using the overall trend of the bar to decide. However, there is no trend here.
The simplest solution is to look at the previous bar (since we always have that data). If the trend from the previous bar is down, then we are likely at a trend reversal point. I.e., the previous bar the market was pushing downwards, like this:
The market continued downwards to the Low, then reversed, hit the High, then finally settles back at the Close. In other words, there wasn’t enough selling pressure in the market to push to a new low. Note that this is the exact opposite of our usual trend-based decision. Normally if the trend is down, we estimate that the High was hit first. Here we hit the Low first.
Of course, if the previous bar was an upward trending bar (C > 0) then we estimate that the High was hit first, then the Low, then Close (ie, O->H->L->C) – in other words, there wasn’t enough buying pressure to push to a Close above the Open.
Very occasionally you do get situations where you have several of these types of bars in a row – in which case we look back two bars, etc. Of course, accuracy drops, but fortunately this is rare enough not to worry about too much.
—
Hopefully, by combining all of these decisions, you end up with a much truer picture of what any system is doing, within any given bar. With all backtesting, it’s always a tradeoff of likely accuracy versus effort. How can you be sure you’re not over-optimizing? How can you be sure your system will trade exactly as it has tested? Is it worth the additional effort? Etc. This kind of analysis is just a small step closer to reality, in terms of simulated versus actual performance.
[ed: It turns out the above methodology is about 90% correct, in terms of OHLC order, vs ~80% for the “shortest distance first” strategy. It’s still not perfect, but it is a significant improvement & worth the extra clock cycles]
Footnotes
[1] Of course, there is also the question of whether our orders would be filled at EXACTLY the High or Low of the bar. i.e., should we use >=L & <=H or >L & <H?
[2] The entire strategy will never be 100% correct, of course, because any market will sometimes behave in unexpected ways. What we should attempt to do is get our estimates as close as reasonably possible to reality. More accurate = better informed decisions = greater likelihood of successfully trading systems.