Tuesday, November 25, 2014

Running by Numbers

I participated in my first Half-Marathon, the Airtel Delhi Half-Marathon, on 23rd November 2014. Over 32,000 people participated, with around 12,000 for the half marathon, and another 20,000 for other races.
Well aware of being all of 44 years old, I was thrilled that not only was I able to complete the half marathon, I did that within 2 hours – the best that I could have hoped for. This set me thinking on the age vs half-marathon performance issue – was I justified in patting myself on the back?

I leeched all the summary data that I could from the organiser’s website, and this is what I found: 
The visualisation above shows the half marathon performance of 7 different age categories, 18-25, 25-30, 30-35, 35-40, 40-45, 45-50 and 50-60. For each age category, the relative % share of people completing the race in four different time slots is shown, Less than 1.5 hours, b/w 1.5 and 2 hours, between 2 and 2.5 hours, and between 2.5 and 3 hours. The darker the colour (red), the better the performance.
As can be seen above, and true to my expectations, the best performing age group was 18-25. And after that- nothing made sense. The data shows that the older you are, the better is your performance in the half marathon. This is how the weighted average time for the different age group looks:
Why would this be happening?
Not willing to accept that endurance performance can improve with age, I hypothesized that this is an example of self-selection bias. The half-marathon participants in the older age groups were the ones who were fitter than average, who had probably run half marathons in the past, and were confident of their ability to compete in this event.
To test this hypothesis, the simplest way I could think of was to compare the percentage representation of each age group in the marathon with the general population.I figured that if there is some kind of self-selection, then the older age group would be under-represented in the marathon. 
Here is the visual display of this data:
India is a young nation, and the % of population in each age group only decreases with age. However, the difference in the age profile of half-marathon participants and Delhi does not appear significant enough to explain the improved performance with age. Except for the youngest age group (18-25) which is under-represented in the marathon, rest are largely in keeping with the population at large.
Probably, the self-selection would be visible in some other attribute - income, education..

However, the census data for Delhi did show something interesting:
The saw-tooth like distribution of population by age is obviously wrong, unless somebody hypothesizes a link with 5-year plans. A simpler hypothesis is available - each peak in the graphic above is at a round number: 30, 35, 40, 45..... Maybe, there is a learning for the census surveyors, don't ask people their age, ask for their year of birth.

(I checked a few sites on age versus marathon performance. It appears that there is no significant impact on performance till about the age of 50. Here is one interesting article: http://www.runnersworld.com/masters-training/age-matters 
However, I doubt whether any amount of trawling the web would find a paper supporting the data of marathon peformance improvement with age all the way upto 60)


Sunday, November 2, 2014

Will Analytics cause the Engine of Capitalism to sputter and drive the Black Swan to extinction?

This is the golden age of analytics when practical implementation of interesting data driven ideas is happening at a fast clip. The only concern raised as of now is in relation to privacy- Big Data shows signs of enabling the creation of Orwellian Big Brother. But there is one more area of concern which I want to draw attention, that of the possibility of Analytics actually slowing down the pace of innovation. If true, society will pay a heavy price.
We are the products of evolution with strange cognitive quirks. Cognitive science is just about beginning to grapple with the fundamental question – what makes us tick? Daniel Kahneman, in his seminal book “Thinking, Fast and Slow”, says that one of our cognitive flaws – Delusional Optimism- might well be the engine of Capitalism.
To quote Kahneman: Most of us view the world as more benign than it really is, our own attributes as more favorable than they truly are, and the goals we adopt as more achievable than they are likely to be”. He goes on to say “When action is needed, optimism, even of the mildly delusional variety, may be a good thing. The evidence suggests that an optimistic bias plays a role- sometimes the dominant role- whether individuals or institutions voluntarily take on significant risks”. And finally, “The optimistic risk taking of entrepreneurs surely contributes to the economic dynamism of a capitalistic society, even if most risk takers end up disappointed”
Instances of such delusion are well known and documented. To give a few examples (from Wikipedia entry under Illusory Superiority)
  • In a survey of faculty at the University of Nebraska, 68% rated themselves in the top 25% for teaching ability.
  • In a similar survey, 87% of MBA students at Stanford University rated their academic performance as above the median.
  • In a survey of participants to evaluate their position on eight different dimensions relating to driving skill almost 80% of participants had evaluated themselves as being above the average driver.

Kahneman gives a similar example of entrepreneurial delusion. Though the chances that a small business will survive for five years in the US are about 35%, fully 81% of the entrepreneurs put their personal odds of success at 7 out of 10 or higher, and 33% said their chance of failing was zero.
Anything which reduces the rate of risk taking in society will negatively impact the rate of innovation and consequently growth. One way this can happen is when enough people stop being delusional, and see the world as it really is. 
What has Analytics got to do with this? There is a limit to which we are capable of self delusion. Given enough data, most of us would be forced to accept the truth of our limitations, and the true odds that we face. For instance, if driving capability becomes an objective measure for which data is widely available- something which is already beginning to happen with on-board monitors and insurance premium linked to our driving performance, it will be difficult to overestimate our driving capability. And this is what Analytics is likely to do-make more data readily available in all spheres of our life along with personalised predictions of the odds for success making self-delusional optimism increasingly difficult to sustain. 
And when that happens, our risk taking capability will go down along with the rate of innovation and growth – the engine of capitalism may start sputtering.
This problem is compounded when you factor in the key insight of Nassim Taleb in his book Black Swan, that “A small number of Black Swans (or extreme rare events) explains almost everything in our world, from the success of ideas and religions, to the dynamics of historical events, to elements of our own personal lives”

Large impact innovations are Black Swan events, which are not a result of systematic planning but are primarily a matter of chance fuelled by risk taking. With fewer people taking risks and following their deluded dreams, even the Black Swan may go the way of the Dodo.


Image (black swan: http://commons.wikimedia.org/wiki/File:Cygnus_atratus_Running.jpg)