Race results data
can now be had in virtually unlimited quantities, in pre-digitized format.
Presumably, somebody, somewhere typed the stuff into a computer, but it
ain’t me anymore—thank goodness. The raw data is now available,
for a price, from several sources on the Web, and if you are going to put
it to your own use, the first thing you need to decide is how large your
study population should be.
>With horse racing statistics, the
standard take has always been that bigger is better. Everybody knows that
tiny samples give invalid results; small samples can give skewed results—big
is good—humongous is even better. Right? So, naturally,
what horseracing statisticians have tended to do is shoot for "humongous,"
and at the same time, there has been a shift from the goals of identifying
handicapping factors to identifying ROI (return on investment) factors
over, say, 15,000 races.
To get right to the point, this "homogenizes"
the data.
With all due respect to those horseracing
statisticians who have labored over massive data sets of race results—they
are usually singing under the wrong window. This is a game where short-term
variability is crucial—and big samples blur opportunities.
What is often lost in these broad-based
computer studies is the importance of the variability that makes up the
day-to-day reality of the two main sets of information: racing data (times,
etc., which Eric Langjahr termed "The Cold Dope") and tote board data (odds,
etc.), which are part-and-parcel of the "variance" that makes betting "scores"
possible.
In the 1960s and ‘70s, data sets
had to be "punched in"—literally, on keypunch machines, while squinting
at microscopic print in the Form. I did not wear glasses until I
did a lot of this in the late ‘70s. (You also had to stand at the
machine, and there was often only one machine per 20,000 or so students
and faculty, so it was not unusual to get your turn at 3 o’clock in the
morning.)
As a result, early data sets were
small, but hopes were high—after all, this was a computer the size of a
tractor-trailer so some miracle was bound to happen. The goals were simple:
looking for patterning of handicapping factors in a fairly traditional
sense. There was no miracle, but a lot of the goals were met. We know much
more about handicapping factors today than we did then, thanks to the published
works of William Quirin and those who have followed.
I vividly remember the frustration
of simply getting the data then. The charts were published in the
paper Form, often hit-or-miss. The day you thought the charts for
a certain race day should be published, they weren’t. It was difficult
to even find a Form in my area, and past Forms had
to be ordered at higher-than-face cost and if you were lucky, they arrived
in a tattered bundle, maybe six weeks later. I am still waiting for several
bundles I ordered in the early ‘80s.
I also distinctly remember wanting
more!
Bigger data sets! I wanted humongous. I was wrong. Like virtually
everyone else then, I was constrained to looking at smaller data sets and
smaller questions—and that turned out to be a lucky stroke.
There are many questions in horse
racing where it would be nice to have a population of 15,000 races, but
there are many more where smaller, more compact and focused populations
identify patterning, which large populations completely obscure.
What researchers have generally looked
for in big-data runs are factors that show a certain percentage profit
or loss over, say 15,000 instances. An example of one of the simplest types
of factors tested would be the profit percentage of theoretical flat bets
on favorites. A more complex one might be the profit percentage of hypothetical
bets on three-year-olds after a certain length lay off after July 31 of
the year. It is a certainty that if you run enough of these little
simulations that you will find some that show various profits—always small.
It’s only a little tongue-in-cheek
to ask: "Okay, now—have you got 15,000 bets—and the several years it would
take to make them when the angle arises?"
In our little world of horse racing,
variance
happens. If you’re going to take any of these statistical angles seriously,
you’d better have those bets and the time, because you might not score
until bet number 14,998, then lose it all on 14,999 and be back to zero
again at 15,000.
Extremely large samples in horse
racing are not totally useless and I’m not suggesting that you don’t invest
in some of the statistics-based studies that are available. It is good
to know that class-droppers tend to win greater than their fair share of
races—duh—and many of the other positive and negative "impact" factors
either identified or verified through large sample studies. These are things
everyone should be able to grab off a synapse at the appropriate moment
during the handicapping thought process.
But 15,000-race samples completely
blur the hour-to-hour, day-to-day, and week-to-week variability, which
creates the opportunities for bettors to score. In the days when handicappers
generally focused on one track, Andy Beyer recommended taking a day before
the season started in a closed room with a year’s supply of last year’s
Forms, and a bottle of Jack Daniels. The purpose was to develop "class-par
times," which I’ve never been too crazy about, but the result—aside
from a hangover, if you followed his instructions literally—was a good
overview of a year’s racing at your home track. You couldn’t help but
pick up on both patterning and quirks in the results charts, which would
help you deal with the beginning of a new season.
If you follow one or two home tracks,
this is still fine advice, although that’s about the only scenario in which
I’d worry much about the pars (or the variants for which they form the
baseline, but that’s another story). However, many of us today do not follow
a single track or even regional circuit, and are more likely to be placing
bets at ten tracks or more across the country, although not necessarily
on the same day. (Albeit, there are accounts of system players who go much
further than that.) With almost unlimited availability of tracks for simulcast
betting, most bettors I know have broadened their field of play well beyond
a local circuit, though they still tend to focus primarily on tracks that
they know to some extent or have played before.
Large populations of races for statistical
studies are valuable for large, fundamental questions, but usually small
profit. The variability that we move on as value bettors is more often
short-term—sometimes instantaneous—and a lot more profitable.
With comma-delimited past performance
and results data available fairly cheaply on the internet, and with spreadsheets
now virtually standard equipment on every computer, you may dream up your
own approaches to identifying short-term patterns at your tracks. My suggestion
is not to worry about humongous samples and fundamental questions of racing
per
se, but think small and think local—local, at least, to the tracks
you play, which may be scattered across three time zones.
If computer analyses are not your
idea of recreation, you can still look for patterning and opportunities
by simply eyeballing past performances and results charts. It is
extremely handy now to use the computer to get to race results charts provided
by a number of Web sites. If for some reason I am going to try working
a track I’m not familiar with, or just haven’t worked for a while, I’ll
usually pull up some recent results charts on the Web to see what’s going
on.
For my style of play, I like to see
some "normal" variability displayed in the odds payoffs. By that I mean
a few races will show $4.20, $3.60, and $2.40, but there will also be a
healthy mix of patterns like $26.80, $5.60, and $4.80. I especially like
to see patterns like $4.80, $11.80, $3.60, because they often indicate
handicappable place overlays and, although none of these patterns predict
future events, they suggest that the field is open and the opposition from
the rest of the crowd in shaping the odds is normal.
Once in a while, you’ll find a pattern
where the public is "On." Early in the year, I pulled up results
from (I believe it was) Penn National, where all the races were 1 mi 70
yds and the crowd was nailing every race for a period of at least several
days. One way to find value, which has become more difficult with simulcasting,
is to find a really dumb crowd—that one obviously wasn’t.
For some approaches to betting this
scenario might be a goldmine, but not for mine, so when it happens, I’m
somewhere else.
However, a great opportunity is happening
right now and requires no searching through data. It happens every
year, and it is the fall fairs. The fair circuit is on in California and
Maryland, and begins this weekend in New Mexico. Some handicappers specialize
in fairs and for those handicappers the fall season is like Christmas for
Macy’s; it makes the bulk of their annual profit.
Except for a small percentage of
serious handicappers, fair crowds are rank amateurs; they can no more handicap
a horse race than a tractor pull. The horses come in from the surrounding
circuit tracks, which are generally in hiatus before changing to fall venues.
This is one of the few times when you can worry a lot less about fine-tuning
"value"—good old-fashioned handicapping comes to the forefront.
If you have a way of dealing with complete fields of shippers, this is
the time for good handicappers to reap the fall harvest.