Sunday, December 21, 2014

The importance of Visual Analytics - An Auditor's perspective



A picture is worth a thousand words. I guess we have all heard it a thousand times to the point that we no longer credit it with much wisdom. A slight modification by John Tukey, the noted mathematician and statistician well known for his work on exploratory data analysis, gets it perfect: “A picture may be worth a thousand words, but it may take a hundred words to do it.” And the above opens the doors to Visual Analytics, visualize data to the equivalent of a 1000 words, and take a 100 words to explain the basic idea leading to comprehension – hopefully, the “Aha” experience, for the reader.
In their book- “Designing Data Visualizations”, the authors Noah Ilinsky & Julie Steele provide a very simple description of Data Visualisation - moving information from point A to point B. Whereas in exploratory Visualisation, it is from the Dataset (point A) to the Designer’s mind (point B), in Explanatory Visualisation, it is from the Designer’s mind (point A) to the Reader’s mind (point B).
No article or book on Visual analytics is complete without these two visualisations or Infographics, which exemplify exploratory and explanatory visualisation.
The first is the map drawn by physician John Snow in 1854 of deaths during a cholera epidemic in London. This is shown below. 


The dots represent deaths due to cholera, and the water pumps are marked with an “X”. John Snow observed the clustering of deaths around water pump on Broad Street, and using it as evidence, convinced the concerned authority to disable the Broad Street pump by removing its handle. As the story goes, within two weeks, the epidemic, which had already taken more than five hundred lives, ended. This is an example of exploratory visualisation where the discovery of the link between Cholera deaths and water pump (Contaminated water source) was made possible by visually representing the data. What makes this even more amazing is that the deduction predates germ theory; the mechanism by which cholera was transmitted was not known at that time.
The next classic example of explanatory visualisation or story telling is Charles Joseph Minard's famous 1867 graph depicting Napoleon’s invasion of Russia in 1812.  



Edward Tufte, the Visualisation guru, calls it as probably the best statistical graphic ever drawn. The map portrays the losses suffered by Napoleon's army in the Russian campaign of 1812. Beginning at the Polish-Russian border, the thick band in brown shows the size of the army at each position. The path of Napoleon's retreat from Moscow in the bitterly cold winter is depicted by the dark lower band, which is tied to temperature and time scales. The numbers on this chart indicates 422,000 soldiers crossing the Neman with Napoleon and only 10,000 returning from the disastrous campaign.
Both exploratory and explanatory visualisations are of importance to the auditor, for the auditor’s job does not end with obtaining the correct insight, it ends only when it is effectively communicated to the management for taking necessary action either for improved governance or accountability. For achieving the latter, explanatory visualisation becomes critical. In his seminal book- The Grammar of Graphics, Leland Wilkinson had stated: “I believe the largest business market for graphics will continue to be analysis and reporting, despite the enthusiastic predictions for data mining, visualisation, animation and virtual reality. The reason, I think, is simple. People in business and science have more trouble communicating than discovering”. This is likely be as true for the auditor; visual analytics tools can be used for better communicating their audit findings.
The data analytics literature distinguishes between two different modes of analysis, exploratory and confirmatory. Exploratory data analysis should be the first stage of data analysis, it is bottom-up and inductive. The use of visual exploratory techniques can help auditors see patterns, trends, and outliers that are otherwise hidden, and reveal relationships between variables that could be the foundation for a confirmatory model.
In a recent paper on Data Analytics for Internal Audit, KPMG has observed that the traditional focus of Internal Audit departments has been on transaction based analytics to identify exceptions in populations when applying selected business rules-based filters in key areas of risk such as revenue or procurement. These transactional, rules-based analytics, or "micro-level" analytics, can provide significant value for known conditions where assessment of the frequency and magnitude of the conditions needs to be performed. The popular computer assisted audit tools & techniques (CAATS) usage by Public Auditors, such as IDEA or ACL, is also micro analytics oriented, and is thus good only for evaluating known conditions. These are ill designed to meet the exploratory data analysis needs, or macro level analytics which deliver value by identifying broader patterns and trends of risks.
John Tukey had once observed that “… the picture-examining eye is the best finder we have of the wholly unanticipated.”. With visual analytics through use of tools like Tableau and QlikView, the auditor can obtain insights leading to improved quality of audit which are not possible through traditional CAATs.
The limitations of traditional CAATs used by Auditors, – IDEA, ACL, MS Excel and MS Access, in handling large data volumes had started becoming apparent in the last few years. MS Excel has a limit of roughly one million rows of data (though PowerPivot in Excel 2010/2013 can circumvent this limitation), MS Access cannot handle more than 2 GB of data, and IDEA, though stated to have no hard limit on data volume, can become extremely sluggish when handling over 10 Million rows of Data, with queries taking multiple hours to execute. In contrast, the same analytical tests on the in-memory analytics tool like QlikView and Tableau are seen to return results in seconds – a thousand fold improvement in performance. Apart from speed, the in-memory analytics tool are also seen to be more efficient in storing data, with extremely high compression rates making it possible to analyse raw data many times the multiple of the physical RAM available on the auditor’s laptop/desktop. Most financial audits involve structured databases, where datasets are in the Gigabyte range, with a single dataset exceeding 1 Terabyte being an extremely rare event. With this, even auditor’s laptop can be used to analyse most large datasets obtained in the course of audit.
The true power of the newer generation analytics tool, which is expected to impact audit quality and effectiveness is their ability to both handle large data volumes, and in the use of Visual analytics to derive audit insights.
If there is any more convincing required for use of visual analytics, then one only need refer  the example of Anscombe’s quartet, constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data before analyzing it and the effect of outliers on statistical properties. The four datasets, shown below, have identical statistical properties, with same mean and variance for x and y in each case, and with near identical correlation coefficient and linear regression equation. 

 However, the datasets are very different, as can be seen in the graphical presentation below:
Its time that the auditors expand their use of data analytics to deliver higher level of assurance in their audits. The profession needs this desperately, as the recent article in 13th December 2014 print edition of Economist titled - Accounting Scandals - The dozy watchdogs, highlights.

2 comments:

  1. I have read this post very carefully. Thanks Gopinath for share this kind post. It is really informative post about analysis.I also share a resource for your checking about Exploratory and Confirmatory Data Analysis http://www.statisticaldataanalysis.net/comparative-data-analysis-definition/

    ReplyDelete