Wednesday, October 31, 2007

Tactics Training as Probabilities

Tonight, I want to discuss a probabilistic perspective on the problem solving session. For this analysis, I'm going to model a tactical session as a binomial sampling phenomenon.

First, what is a binomial sampling? This question is best answered using the real-world example of a series of coin tosses. Were we to toss a coin several times and record the result at each toss, we would have a binomial sampling of heads and tails--its that simple. Tactical sessions follow the same model because each problem attempted by the tactician has one of two outcomes: pass or fail. Fortunately, the mathematics behind the probabilities of a binomial distribution are well established, so I won't review them here. But I will calculate probabilities using the binomial probabilities calculator available at VassarStats.

The following table shows the Probability that a Tactician with given a accuracy will solve at least a given number of problems Correct in a given number of Tries.
Tactician With
Accuracy
TriesNumber
Correct
Probability
98%10098 (98%)0.67669
98%10099 (99%)0.40327
99%10098 (98%)0.92063
99%10099 (99%)0.73576
98%200196 (98%)0.62884
98%200198 (99%)0.23515
99%200196 (98%)0.94825
99%200198 (99%)0.67668
98%1000990 (99%)0.01023
99%1000990 (99%)0.58304

Lets first consider the likelihood for a 98% tactician to "accidentally" get 99 problems out of 100 on some arbitrary set of 100 problems he might encounter in a normal training session: 40% (0.40327). Now, consider the likelihood that this 98% tactician will get 198 right out of 200: 23.5%. So the 98% tactician will look like a 99% tactician on over 40% of his stretches of 100 problems and more than 23% of his stretches of 200 problems!

Lets compare a 198/200 performance to our expectations for a true 99% tactician: 67.7%. So a true 99% tactician will look like a 99% tactician on more than 2/3 of his stretches of 200 problems.

So what do all of these numbers mean? In short, these probabilities say that 200 problems are insufficient to confidently distinguish a 99% tactician from a 98% tactician. Only at very high numbers of problems can these two tacticians be distinguished with good certainty. In the example above, a 98% tactician has only about a 1% chance to get at least 990 of 1000 problems correct while a 99% tactician has significantly better than an even chance (58.3%).

So, what is the bottom line of this analysis when thinking of our own accuracy statistics? Well, one must do a lot of problems of at least a given accuracy before he can be certain that he is actually a tactician of that accuracy!

With this in mind, I present my performance today at the CTS. I wish I would have warmed up--three misses before number 30.

4p-1f-12p-1f-8p-1f-73p
97% @ 1395 ± 90 ; 1383 final

Tuesday, October 30, 2007

The CTS Training Parameters

Yesterday, I finished discussing the topic of Accuracy Versus Strength by alluding to training parameters. Today I will begin to identify these parameters and hopefully unambiguously define them. To introduce the importance of this topic, I borrow from waaek's comment to yesterday's post:
...people use CTS in different ways and for their own purposes. I personally use CTS as [a] source of tactical training puzzles to train my analysis ability. My goal is to improve the accuracy, and over time, speed, of my analysis. Accuracy at this point is by far the #1 concern for me.
The notion of "using" CTS covers not only one's goals but also one's approach to training. But how are we to know the efficacy of our training methods? I propose that the first step is to identify quantitatively the components of those training methods and to give them concrete definitions.

Rating
A tactician's rating, or more accurately his Glicko rating, is a bottom-line measurement of his strength as a tactician. Several assumptions are used by the Glicko rating system but the most critical (and perhaps most fallacious) assumption is that a tactician will use rating and rating alone as his sole indicator of progress. As discussed yesterday and as evidenced by comments like waaek's, this assumption is categorically wrong most of the time. The truth of the matter is that different tacticians are comfortable with different training methods and these methods directly impact their rating.

Accuracy Rate
Most CTS tacticians are familiar with their accuracy as an aggregate statistic that summarizes their total passes and total fails for all problems solved at CTS. However, because training habits change over time, using the aggregate accuracy of a tactician to measure his tactical strength can be misleading. Instead, I propose the concept of accuracy rate, which measures a tactician's accuracy over a certain number of problems. For example, during my last 500 problems I have missed ("failed") nine problems and solved correctly ("passed") 491, giving me an accuracy 98.2% for these 500 problems. So an accuracy rate must combine both the accuracy and the number of problems solved at that accuracy. Here is the formula I propose to calculate accuracy rate (see below for a mathematical explanation):
-N * log(1 - A)
Here, A is the accuracy of a tactician over number of problems N.

Let's see how this definition behaves when comparing some hypothetical problem runs:

Number SolvedAccuracyAccuracy Rate
5000.9822008.69
1000.98160.94
2000.98321.89
1000.99230.26
2000.99460.52

So, were one to interpret accuracy rate (rather loosely) as solving power, the accuracy rate suggests that solving 200 problems at 99% accuracy takes much more solving power than solving 100 problems of equivalent difficulty at 98% accuracy.

I am open to alternative proposals for a metric to quantify accuracy rate. Please submit comments if you have any suggestions.

I have several more training parameters to define, but its getting late, so I'll finish with my performance today followed by a mathematical explanation of accuracy rate for the curious.

40p-1f-59p
99% @ 1410 ± 89 ; 1383 final

An Explanation of Accuracy Rate
To combine accuracy and problems solved to create an accuracy rate, I propose borrowing from a common technique used to multiply probabilities, which makes use of this property of logarithms:
log(p1 * p2) = log(p1) + log(p2)
The idea is that, the higher a tacticians accuracy, the less likely that his answers arise from chance. So the combination I propose is
-N * log(1-A)
where A is the accuracy of a tactician over number of problems N. Subtracting A from 1 comes from the fact that a higher accuracy means a lower likelihood of guessing. Taking the negative of the product is a simple way of making the score a positive number because the log of a fraction is negative. Multiplying N and A comes from the following property of logarithms:
log(XY) = Y * log(X)

Monday, October 29, 2007

Introduction to Chess Vortex

In the next few days, I'm going to back up and discuss some of the issues that led me to create the Chess Vortex Blog and corresponding Chess Vortex Project. I will then discuss in general terms how the Chess Vortex Project might approach addressing these issues in a scientific way. And finally, I will discuss the motivation behind making Chess Vortex a community project, what it means for Chess Vortex to be a community project, and how I will attempt to act as an agent of the community to create the "products" unique to Chess Vortex.

My hope is that these Chess Vortex "products", as I am calling them, will be three-fold in nature. First, I imagine a new set of representational tools to help evaluate graphically, or via other media, human interaction with tactical problems. Second, I imagine a set of mathematical tools to analyze behavior and the cognitive process of solving tactical problems. And third, I imagine that Chess Vortex will yield--as its most valuable product--new knowledge that leads to a deeper understanding of these cognitive processes.

So first lets begin with the fundamental issues. Today the issue will be...

Accuracy Versus Rating
Probably the most controversial topic on the CTS Message Board is the nature of tactical strength (aka problem-solving strength). This controversy arises from the observation that one's accuracy (number of problems correct per number attempted) can be sacrificed for Glicko rating and vice-versa. As a result, the Glicko rating assigned by the CTS has a duplicitous nature and, like all things duplicitous, can only be trusted if taken in proper context. The context for the Glicko rating, therefore, is not merely one's Rating Deviation (RD), which is explicitly included as a parameter in the system, but also one's accuracy. The inherent difficulty in assessing the accuracy, however, is that it is not formally part of the Glicko rating system and thus its weighting in one's rating is not easily determined. One goal of the Chess Vortex Project is to determine this weighting using robust mathematical analysis. The idea would be to create a scale that incorporates Glicko rating, RD, and accuracy to dependably compare the tactical strength of any two tacticians. Moreover, and decidedly more importantly, the hope is to be able to compare the improvement of tacticians who train at differing accuracy rates (time-averaged accuracy) to determine the optimal training parameters for developing tactical strength.

I am not finished with this issue by any stretch of the imagination, but I promised myself some sleep tonight (for a change), and so now I present my CTS performance today--somewhat for purposes of vanity, but mostly to show off the Chess Vortex community's first (albeit a work in progress) product--the Session Graph:

27p-1f-72p 99% @ 1405 ± 96 ; 1393 final

And now, before I forget, I happily present the

Chess Tactics Server Problem of the Day
p01056

Black to Move

Here's the solution and why I like it (start selecting text following the colon): 1...Nh3+. Now if 2.gxh3, then its mate in three: 2...Qh4+ 3.Ke2 Rg2+ 4.Rf2 Qxf2++.

Saturday, October 27, 2007

Undulations of Chess Consciousness

I begin with a summary of my efforts today, in order to provide a context for our topic:

7p-1f-25p-1f-27p-1f-12p-1f-25p
96% @ 1353 ± 81; 1403 final

Unfortunately, I dipped from my target of 98%--I'm blaming it on a general shortage of sleep this week and some pretty intense Saturday blitz earlier today. I went 3-3-0 with my latest OTB nemesis (let's call him Chernobyl Grunvasser). Chernobyl is a blitz fanatic, and that's all we play when we get together. I am basically just learning how to play blitz so today was a first indication that I am getting better at it. Usually my sessions with him are skewed significantly in his favor.

Digression on the Performance Graph
Before I get started on the actual topic of the day, I would like to say that Loomis's suggestion to get rid of the annoying outlines around the bars in the Performance Graph make all of the difference in the world. At this point, I'm thinking of using color (or shading) to represent the rating of the problem relative to the tactician's rating--perhaps using a blue-cyan-yellow, blue-cyan-magenta, even a cyan-white-orange gradient. I'm not sure which yet, and I'll probably do a little experimentation with it. This will put into use wormwood's excellent suggestion to use shading to increase the information dimensionality. I am probably not going to attempt to capture the cumulative rating change in the representation, as I am not sure of the utility of representing this change for my purposes--which will become clearer in the future. I decided to keep time on the X-axis because it seems very natural. In fact, if one looks at my performance graph tonight, he will likely notice, above all, variations in my time management.

In particular, I solved some clusters of problems much quicker than the others. The most prominent such cluster preceded one of my four fails for the evening (p50383). Recently, waaek has described bouts of fogginess when working problems or when playing over the board. For high accuracy players such as waaek, me (lately), or dogwaste (aka dktransform), I think that these mental lapses are reflected in a shorter decision process. In fact, inspecting dktransform's latest actions, I notice a similar pattern. He has 2 fails of 20 problems shown and the first fail came after a run of 4 problems with an average time solved of 5.25 seconds, though the average of this 20 was almost twice that at 10.3 seconds. And I'm guessing that his typical average time is probably significantly greater than 10.3 and that he had gone to his mental capacity for this particular session. (I would have made a performance graph of dktransform's latest actions, but I have yet to write a parser for the "Latest Actions" page.)

These observations lead me to conjecture that the key to accurate chess is to habitually reign in the natural mental tendency to short-circuit the analytical process (i.e. guess). Such guessing is likely a useful biological adaptation, but is not best practice in the mathematically precise realm of the chess board. Cultivating a complete analytical habit becomes even more difficult with fatigue, and, barring a dogmatic yet virtuous pursuit of one's personal throughput goals, it is best to simply know when to stop solving for the evening.

And now, I happily present the

Chess Tactics Server Problem of the Day
p19052

White to Move

Here's the solution and why I like it (start selecting text following the colon): 2.Nc6 is forcing because it attacks the black Queen, so its easy to guess. But its beauty is White's threat of mate in 6 if Black simply recaptures: 2...bxc6 3.Qa6+ Kb8 4.Bxa7+ Ka8 5.Bb6+ Kb8 6.Qa7+ Kc8 7.Qa8++.

Visual Representation II

Part I: The Drawbacks of Run-length Encoding
Wormwood has been adamant about the virtues of run-length encoding to represent a problem session (e.g. 37p-1f-36p-1f-25p, as discussed a couple of nights ago). However, the examples I have shown previously have been drawn from my recent high accuracy efforts and so look good when represented with run-length encoding. My opinion is that the practicality of run-length encoding to represent a session breaks down for sessions of significantly lower accuracy. Here is a real example on the far end of the spectrum (my efforts on 2/13/07):

3p-2f-2p-1f-2p-1f-1p-3f-1p-2f-8p-2f-8p-1f-1p-1f-2p-2f-1p-2f-1p-
1f-1p-1f-1p-2f-4p-2f-4p-1f-4p-1f-4p-3f-1p-2f-1p-1f-2p-2f-1p-1f-2p-1f-
1p-
1f-3p-1f-6p-2f-3p-1f-3p-1f-15p-2f-4p-1f-1p-2f-4p-4f-4p-1f-1p-2f-2p-
2f-3p-3f-3p-2f-4p-2f-2p-2f-1p-2f-2p-1f-11p-2f-2p-1f-1p-1f-1p-5f-2p-1f-
4p-
1f-1p-2f-4p-5f-4p-1f-2p-3f-5p-3f-5p-1f-7p-2f-1p-1f-1p-1f-1p-1f-2p-
1f-1p-1f-2p-4f-6p-1f-5p-1f-1p-1f-1p-1f-2p-2f-1p-1f-2p-1f-5p

This session was 197/309 (64% @ 1506 ± 90 ; 1476 final). As anyone can plainly observe, run-length encoding is not as advantageous in conveying the gestalt of a session when the accuracy drops. Furthermore, I would argue that a purely graphical (non-textual) representation would be more helpful, as I intend to show below.


Part II: A Second Round of Graphical Representation
After staring for some time at last night's attempt at a representation scheme, I arrived at the same conclusion as Loomis that the time weighting on the X-axis made it appear that more points were being lost by the tactician than really are. Aside from being misleading, I felt that this was intensely unfair to the hard working tactician. Tonight I attempt to fix that particular shortcoming by spacing the problems by time (actually log base 2.73 time), but keeping the bars at unit width. First, here is tonight's session (click to see full-size image):


37p-1f-36p-1f-25p
98% @ 1419 ± 92 ; 1399 final

And here is the same type of representation for the 2/13/07 session discussed above (you will definitely want to click for the full-size image):



If you open both full-size images simultaneously, you will be able to make direct comparisons. The differences in success rate, time for the session, time per problem, overall rating change (red versus blue), and change in RD over the session, are easily spotted. Notice how the bars get progressively shorter moving from right to left along the 2/13/07 session. This is the RD shrinking to about 17 from approximately 30. Notice as well that the 2/13/07 session has most of the blue above the line and tonight's session has about a fair mix--the graphical representation reveals the tactician's approach to time and score management that is not apparent in run-length encoding.

I won't be able to address wormwood's other suggestions tonight (yet again) because it is getting late, but they are still bouncing around prominently in my head. I definitely like the shading suggestion, but I want to find the perfect use. Of course, this may entail finding a way to make the bars wider, as the thin bars above are not entirely amenable to shading.

And, to my genuine dismay, no problem tonight...

Friday, October 26, 2007

Visual Represention I

Tonight I have spent a large chunk my blog time generating a preliminary code base for the Visual Display of a Problem Session. While I agree with wormwood's suggestion to use color to represent the time to solve a problem, I did not have time tonight to implement the coloring. This will be a relatively simple addition but I focused tonight on drawing the rectangles instead, in the hopes to get some feedback on the overall feel of the representation. I have my own opinions, but I will reserve those so that I may get unbiased opinions from fellow tacticians.

Here is my first try, based exclusively on my previous ideas:


10p-1f-89p
99% @ 1431 ± 89 ; 1401 final


From a graphical stand point, my 99 of 100 Chess Tactics Server effort tonight was pretty boring as it yields only one tiny splotch of red. However, since this is my best effort in months, I am shamelessly using it as the example session for this first attempt at a representation scheme.

Unfortunately, no time for a problem of the day...

Wednesday, October 24, 2007

Visual Display of a Problem Session

Today, I will brainstorm about how to graphically display a session of tactical problem solving.

Last night, I came up with the following format and I unabashedly admit I am quite fond of it:

6p-1f-45p-1f-30p-1f-16p

A beautiful (in my humble opinion) aspect of this representation is that any tactician will immediately understand what it represents: alternating spans of successes and failures. Indeed, a computer could easily be programmed to parse the above information and and spew forth a variety of statistics about it. In fact, some would argue that the above actually represents a computer program because of its unambiguous grammar and semantics.

Incidentally, the above represents my efforts tonight at the CTS (97% @ 1428±92, 1403 final).
I also must dutifully acknowledge that the "p/f" notation for tactical problem solving is borrowed from dktransform's method of representing his own success statistics on the CTS Message Board.

Though I really like this representation for both practical and narcissistic reasons, it omits some critical information. Most importantly, it groups problems, so that each looses its individual contribution to the session. Moreover, for each problem, it fails to represent (1) how long the problem took, (2) what the rating of the problem was, and (3) how the tactician's performance on the problem affected the tactician's rating.

At first glance, these three components of a problem appear too high in dimensionality to be represented on a two dimensional medium like a computer screen or a sheet of paper. But, upon deeper consideration, the information in (1) and (2) is convoluted into the information in (3), so we can conveniently move back to a lower dimensional space without sacrificing too much information to practical considerations.

So here is my proposal:
  • A unit of time is 10 seconds, which is roughly the maximum time to solve a problem on the CTS and still be awarded points.
  • Problems that are fails are bright red bars of unit width (say 3 pixles).
  • Passes are darkish blue bars (we are color-blind friendly here).
  • The time the tactician took to solve the problem is proportional (or perhaps logarithmically proportional) to its width.
  • The contribution of a problem to a tactician's score is the height of the bar above (in the case of a pass) or below (in the case of a fail) a reference line.
Unfortunately, I have yet to implement this representation as I am still in the brainstorming phase. But, barring some compelling comments by fellow tacticians, I think I have settled and will probably work towards it in the very near future.

Now, for the

Chess Tactics Server Problem of the Day

Black to Move

Here's the solution and why I like it (start selecting text following the colon): 1...Rxb5. Now, 2.Qxb5?? results in a famous mate in five: 2...Qd4+ 3.Kh1 (not 3.Kf1 because of Qf2++) 3...Nf2+ 4.Kg1 Nh3+ 5.Kh1 Qg1+! 6.Rxg1 Nf2++. Of course if 2.cxb5? then 2...Bd4+ forks Q and K.

Zone of the Vortex

A current training goal of mine is to do 10,000 problems on the CTS at 98% accuracy, maintaining a 1400 rating while I do it--or at the very least, finishing the 10,000 problems with a 1400 rating. This goal is not as easy as it sounds, and tonight my performance demonstrated why. In a nutshell, my session went like this:

7p-1f-14p-1f-3p-1f-16p-2f-3p-1f-9p-1f-53p-1f-6p-1f-2p-1f

So its obvious that tonight's wasn't such a good session. Clearly, my work schedule these last couple of weeks is starting to catch up with me. Also, the fact that I spent a significant part of my sleeping hours last night stewing over the squatter in my parking space didn't help either. So I have been due for a disaster like tonight.

The 53 consecutive passes made tonight's session more-or-less bearable. That 53 passes, however, seems terribly interesting to me. With a goal of 98% per session, 53 passes should not be unusual. But tonight it clearly was, considering the average length of a successful run was about 13. Was it luck, or was I in the "Zone"? Well, the average difficulty (rating) of my problems tonight was about 1415 ± 92 and the stretch of 53 had a an average rating of about 1399 ± 90. So it looks like that 53 represented my being in a Zone, considering the degree to which it is a statistical outlier in terms of length but not in terms of difficulty. Because calculating all of these numbers took some time from my musings, I have compiled a short to-do list for upcoming postings:

  • Figure out how to quantify the likelihood of "Zonedness"
  • If "Zonedness" exists, determine how to quantify its degree, or intensity
  • Figure out how to maximize time spent in the Zone.
In a future blog, I will wax philosophic about this issue.

Tuesday, October 23, 2007

Path of the Vortex

I am LaskoVortex of the Chess Tactics Server, and this is my chess blog. CTS must be the original tactics server, and today, in an effort to get a quick blog entry, I'm going to rip some text I wrote right from the Message Board (with some minor edits). This was a response to waaek, a fellow Tactician, who wondered what history information the CTS keeps. So, I replied with some prose that goes a little like this:

I think CTS keeps a stat for number right and total problems. And that's all they need to calculate your accuracy. It'd be nice if we got the entire history. This site has served less than 20M problems. So, if each problem took up 4 bytes for the time stamp, 3 bytes for how long the problem took with 1/10th second precision (with a max time of 24.2726 days per problem and a special value if it was a fail, like 0), 3 bytes for the problem number (assuming they will never have more than 16.77 million problems to choose from), and 8 bytes for the tactician ID (assuming no real limit on the number of tacticians who can sign up), and 8 bytes total for both the problem rating and the tactician rating when they did the problem, and another 2 bytes for how their rating changed, then the total bytes per problem would be 4+3+3+8+8+2=28 bytes. So 28*20M = 560M bytes. This means they could have stored every problem every tactician has done on this site on a $20.00 USB memory stick with plenty of space left over.

You might think that I must have already considered the issue of saving every problem for every user of CTS and what the space requirements for that would be--well, you would be correct. More on that in the future--it will become important.

For now, I leave you with the Chess Vortex motto (stolen from Jason D. Enochs and paraphrased a tiny bit):

If you aren't sick of working chess problems, then you haven't been applying yourself.