Sunday, December 21, 2014

The importance of Visual Analytics - An Auditor's perspective



A picture is worth a thousand words. I guess we have all heard it a thousand times to the point that we no longer credit it with much wisdom. A slight modification by John Tukey, the noted mathematician and statistician well known for his work on exploratory data analysis, gets it perfect: “A picture may be worth a thousand words, but it may take a hundred words to do it.” And the above opens the doors to Visual Analytics, visualize data to the equivalent of a 1000 words, and take a 100 words to explain the basic idea leading to comprehension – hopefully, the “Aha” experience, for the reader.
In their book- “Designing Data Visualizations”, the authors Noah Ilinsky & Julie Steele provide a very simple description of Data Visualisation - moving information from point A to point B. Whereas in exploratory Visualisation, it is from the Dataset (point A) to the Designer’s mind (point B), in Explanatory Visualisation, it is from the Designer’s mind (point A) to the Reader’s mind (point B).
No article or book on Visual analytics is complete without these two visualisations or Infographics, which exemplify exploratory and explanatory visualisation.
The first is the map drawn by physician John Snow in 1854 of deaths during a cholera epidemic in London. This is shown below. 


The dots represent deaths due to cholera, and the water pumps are marked with an “X”. John Snow observed the clustering of deaths around water pump on Broad Street, and using it as evidence, convinced the concerned authority to disable the Broad Street pump by removing its handle. As the story goes, within two weeks, the epidemic, which had already taken more than five hundred lives, ended. This is an example of exploratory visualisation where the discovery of the link between Cholera deaths and water pump (Contaminated water source) was made possible by visually representing the data. What makes this even more amazing is that the deduction predates germ theory; the mechanism by which cholera was transmitted was not known at that time.
The next classic example of explanatory visualisation or story telling is Charles Joseph Minard's famous 1867 graph depicting Napoleon’s invasion of Russia in 1812.  



Edward Tufte, the Visualisation guru, calls it as probably the best statistical graphic ever drawn. The map portrays the losses suffered by Napoleon's army in the Russian campaign of 1812. Beginning at the Polish-Russian border, the thick band in brown shows the size of the army at each position. The path of Napoleon's retreat from Moscow in the bitterly cold winter is depicted by the dark lower band, which is tied to temperature and time scales. The numbers on this chart indicates 422,000 soldiers crossing the Neman with Napoleon and only 10,000 returning from the disastrous campaign.
Both exploratory and explanatory visualisations are of importance to the auditor, for the auditor’s job does not end with obtaining the correct insight, it ends only when it is effectively communicated to the management for taking necessary action either for improved governance or accountability. For achieving the latter, explanatory visualisation becomes critical. In his seminal book- The Grammar of Graphics, Leland Wilkinson had stated: “I believe the largest business market for graphics will continue to be analysis and reporting, despite the enthusiastic predictions for data mining, visualisation, animation and virtual reality. The reason, I think, is simple. People in business and science have more trouble communicating than discovering”. This is likely be as true for the auditor; visual analytics tools can be used for better communicating their audit findings.
The data analytics literature distinguishes between two different modes of analysis, exploratory and confirmatory. Exploratory data analysis should be the first stage of data analysis, it is bottom-up and inductive. The use of visual exploratory techniques can help auditors see patterns, trends, and outliers that are otherwise hidden, and reveal relationships between variables that could be the foundation for a confirmatory model.
In a recent paper on Data Analytics for Internal Audit, KPMG has observed that the traditional focus of Internal Audit departments has been on transaction based analytics to identify exceptions in populations when applying selected business rules-based filters in key areas of risk such as revenue or procurement. These transactional, rules-based analytics, or "micro-level" analytics, can provide significant value for known conditions where assessment of the frequency and magnitude of the conditions needs to be performed. The popular computer assisted audit tools & techniques (CAATS) usage by Public Auditors, such as IDEA or ACL, is also micro analytics oriented, and is thus good only for evaluating known conditions. These are ill designed to meet the exploratory data analysis needs, or macro level analytics which deliver value by identifying broader patterns and trends of risks.
John Tukey had once observed that “… the picture-examining eye is the best finder we have of the wholly unanticipated.”. With visual analytics through use of tools like Tableau and QlikView, the auditor can obtain insights leading to improved quality of audit which are not possible through traditional CAATs.
The limitations of traditional CAATs used by Auditors, – IDEA, ACL, MS Excel and MS Access, in handling large data volumes had started becoming apparent in the last few years. MS Excel has a limit of roughly one million rows of data (though PowerPivot in Excel 2010/2013 can circumvent this limitation), MS Access cannot handle more than 2 GB of data, and IDEA, though stated to have no hard limit on data volume, can become extremely sluggish when handling over 10 Million rows of Data, with queries taking multiple hours to execute. In contrast, the same analytical tests on the in-memory analytics tool like QlikView and Tableau are seen to return results in seconds – a thousand fold improvement in performance. Apart from speed, the in-memory analytics tool are also seen to be more efficient in storing data, with extremely high compression rates making it possible to analyse raw data many times the multiple of the physical RAM available on the auditor’s laptop/desktop. Most financial audits involve structured databases, where datasets are in the Gigabyte range, with a single dataset exceeding 1 Terabyte being an extremely rare event. With this, even auditor’s laptop can be used to analyse most large datasets obtained in the course of audit.
The true power of the newer generation analytics tool, which is expected to impact audit quality and effectiveness is their ability to both handle large data volumes, and in the use of Visual analytics to derive audit insights.
If there is any more convincing required for use of visual analytics, then one only need refer  the example of Anscombe’s quartet, constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. The four datasets, shown below, have identical statistical properties, with same mean and variance for x and y in each case, and with near identical correlation coefficient and linear regression equation. 

 However, the datasets are very different, as can be seen in the graphical presentation below:
Its time that the auditors expand their use of data analytics to deliver higher level of assurance in their audits. The profession needs this desperately, as the recent article in 13th December 2014 print edition of Economist titled - Accounting Scandals - The dozy watchdogs, highlights.

Friday, December 12, 2014

Naive Ubernomics



If we go by the adage, there is no such thing as bad publicity, then Uber got media exposure in India in December 2014 in keeping with its eye-popping 40 billion dollar valuation.

It first began with the negative- the alleged rape of a passenger by the Uber cab driver. There was media outrage, public outcry on the lax background check by Uber before recruiting a partner, leading to the banning of Uber operations in Delhi and a few other Indian cities.
Now I see the trickle of positives, how Uber adds to efficiency, lowers cost etc., leading to the inevitable deluge, reversal of bans, and establishment of Uber as chief transport disruptor in India.
It is when the positive aspect of Uber, from its benevolence in compensating the taxi drivers, or the lost high earnings of Uber cab driver, such as this story in Economic Times, that I start wondering- how Über-naive are we.
Just as not everyone can be above average, not everyone can get a bargain at the same time. Uber has made rapid inroads in various cities across the world in part by skirting the regulations over the Taxi business. It has reduced or eliminated whatever entry barriers existed in a person ferrying people for money within a city. And as the entry barriers decrease, the supply of Taxis would increase to the point that earnings of a Taxi driver are no more than the minimum (subsistence) wage that he/she is willing to accept. Uber does not come with any technological innovation in the mode of transport – the Taxi driver is still faced with the same cost of financing, fit-out, insurance, maintenance and fuel. At most, he need no longer comply with select regulations imposed by the local government controlling either the supply of taxis, or impacting service quality, or does away with the middleman, the various Taxi stand operators as in India.
In his book Zero to One, Peter Thiel, the founder of Paypal and an early investor in Facebook advocates finding and building monopolies as the only source of profit, and relegates competition to the status of a flawed ideology pervading society that distorts thinking.
To quote Thiel - "Competition means no profits for anybody, no meaningful differentiation, and a struggle for survival". This wisdom is backed by Economics 101- in a state of perfect competition, firm cannot make any more money than is necessary to cover its economic costs. Thus, profits are by definition transient, and sustained profit call for some type of monopoly power, as advocated by Peter Thiel.
This is what will happen to the Uber Taxi driver, the fares will go down to the point that driving a Taxi can only be a part time job, not enough to provide a decent quality of life. By connecting the Taxi driver to the customer, Uber becomes the only middleman in the game, raking in a cool 20% for providing this “service”.
The problem does not stop with this mode of disruption with all risks farmed out to the Taxi driver and the consumer, there is a bigger concern caused by the informational advantage that Uber may find difficult to resist exploiting. Even now, while Uber connects the buyer and the seller of Taxi services, the prices are still set by Uber. A truly free market would allow a large number of buyers and sellers to interact leading to price discovery through the invisible hand. In the world of Uber, the invisible hand is the Uber pricing algorithm, not the buyer or the seller. Uber has its model of Surge pricing whereing prices are increased by Uber to match its assessment of supply and demand. But with Uber servers and algorithms having complete information on both the buyers and sellers of Taxi services, it can easily manipulate these prices to its advantage. It is a common economic phenomena that sometimes farmers can earn a higher income during a bad season
With Uber, be prepared for a highly volatile weather, with frequent surges.

Saturday, December 6, 2014

Pack size regulation in India - how can the consumer benefit?

Consumers in India are spoiled for choice now. You can walk down the aisle of a grocery store and devote a good portion of your time in deciding which of the 200+ pack size and brand combination of toothpaste or toilet soap or shampoo would you like to buy.
My concern with the plethora of choice does not stem from the Paradox of Choice, a thesis by American psychologist Barry Schwartz in his book by the same name where he argues that eliminating consumer choices can greatly reduce anxiety for shoppers. He’s got a point there, but my concern is with the cognitive load that is placed on deciding which pack is the best bargain.
Of course, there is the media onslaught through the ad jingle on the TV, bill boards on the road, and the personalised and targeted ads on the websites which “help” me in making that choice – at least at the brand level. The cognitive load lies in deciding which pack size gives the best value, even after I narrow the choice down to one or two brands. There is one heuristics that we implicitly believe – the larger the pack size, the lower the unit cost (Cost per litre or per kg). It’s when I do not want to go with the heuristic and actually do the computation mentally that my head starts spinning – there are TOO MANY pack sizes.
Take for instance- the ubiquitous shampoo. Flipkart, one of the e-retailer for all things under the sun, stocks over 50 brands of shampoos in 1500+ different brand-pack sizes combination. A brick and mortar retailer has fewer options on display, but still in the excess of 50.
How does one compare which is cheaper? …….You need a calculator, or need to be a savant
And after you have spent about 30 seconds trying to juggle those number around, you kind of feel sorry for yourself and say: “It’s not worth it- go with the big pack heuristic”
Is there a case for protecting the consumer’s interest by regulating pack sizes?
In India, we do have such legislation. The Legal Metrology (Packaged Commodities) Rules, 2011, which itself derives authority from Legal Metrology Act-2009, does impose some restriction on pack of pre-packaged commodities for direct sale to consumers.
It identifies 19 different commodities that are to be packed only in specified quantities by weight, measure or number. For example, biscuits are to be packed only in the following sizes: 25g, 50g, 75g, 100g, 150g, 200g, 250g, 300g and thereafter in multiples of 100g up to 1 kg. 
This kind of stipulation doesn't much help the computationally challenged, for there are still too many permitted pack sizes to facilitate easy comparison.
For instance which pack of biscuits is the best buy:
150 g pack for Rs 10, Or
250g for Rs 15, Or
400 g for Rs 25?
Was that easy? I doubt it.

The only perceivable benefit of legislating pack size in an economy like India with 5% + inflation is that the manufacturer/retailer cannot reduce the pack sizes to counter inflation while keeping the price constant.
However, since the Indian act permits even for these 19 commodities to be packed in a size other than that prescribed by simply affixing a declaration on the pack that it is 'Not a standard pack size’, not much purpose is served.
Is there a simpler way of helping the consumer make informed choice?
It would have been far simpler to make it mandatory for retailers to display the unit price of each commodity so that the user can make an informed choice, and leave the pack size itself unregulated. This is what the European Union had done in 1998 vide EC Directive 98/6/EC on consumer protection which made it compulsory to indicate the selling price and the unit price of all products offered by traders to consumers, in order to improve consumer information and to facilitate price comparisons.
I had observed this personally in the retail outlets in Vienna. This makes bargain hunting a very easy exercise- every pack size of a product shows both the price of the pack, and the equivalent price for a standard unit, eg Euros per KG, or Euros per litre.
Here is the data for the three biscuit pack sizes mentioned earlier:
Pack Size
Price
Unit Price (Rs per Kg)
150 g
₹ 10
₹ 67
250 g
₹ 15
₹ 60
400 g
₹ 25
₹ 63

As far as standardising the pack sizes is concerned, a working paper of the EU (2002) had examined whether there is an “overriding need” of a public nature for legislating mandatory ranges of sizes Union-wide, and concluded that there is no public need for regulation of mandatory sizes, as the existing legislation on unit price labelling requirements (price per kg/litre of Directive 98/6/EC) enables consumers to do a price comparison easily. Mandatory sizes may actually impede innovation and hamper competitiveness.
Consequently, as per EU Regulation Directive 2007/45/EC, since April 2009, EU sizes apply for wine and spirits and national sizes were abolished for all products.
So, while Europe from 2009 has moved away from attempts at standardising pack sizes, India enacted legislation to do just the opposite.
In the US, display of unit price is not a uniform requirement. Currently, nineteen (19) states and two (2) territories have unit pricing laws or regulations in force.
Can the e-retailers volunteer to fill the informational gap?
While the physical retail outlets will have to incur some cost in displaying unit price information against every pack size, the e-retailers can do it for practically free.
So why don’t I see the unit price information on the various Indian e-retailers such as Flipkart, or Snapdeal, or Big Basket?  Is it simply a case of lack of awareness of customer needs, or is there something more to it?
An interesting pointer on this issue is the recent news article (June 2014) carried on Bloomberg about the agreement of six major retailers including WalMart and Costco with New York’s attorney general to put the unit prices on the Web-based shopping platforms. Significantly, Amazon declined to participate in New York’s initiative.

I hope that the e-retailers in India volunteer to display unit price information. If not, they stand to lose the trust of the consumers.