A picture
is worth a thousand words. I guess we have all heard it a thousand times to the
point that we no longer credit it with much wisdom. A slight modification by
John Tukey, the noted mathematician
and statistician well known for his work on exploratory data analysis, gets it perfect: “A picture may
be worth a thousand words, but it may take a hundred words to do it.” And
the above opens the doors to Visual Analytics, visualize data to the equivalent
of a 1000 words, and take a 100 words to explain the basic idea leading to
comprehension – hopefully, the “Aha” experience, for the reader.
In their book- “Designing
Data Visualizations”, the authors Noah Ilinsky & Julie Steele provide a
very simple description of Data Visualisation - moving information from point
A to point B. Whereas in exploratory
Visualisation, it is from the Dataset (point A) to the Designer’s mind (point
B), in Explanatory
Visualisation, it is from the Designer’s mind (point A) to the Reader’s mind (point
B).
No article or book on
Visual analytics is complete without these two visualisations or Infographics,
which exemplify exploratory and explanatory visualisation.
The first is the map drawn by physician John
Snow in 1854 of deaths during a cholera epidemic in London. This is shown
below.
The dots represent deaths due to cholera, and the water pumps are marked
with an “X”. John Snow observed the clustering of deaths around water pump on
Broad Street, and using it as evidence, convinced the concerned authority to
disable the Broad Street pump by removing its handle. As the story goes, within
two weeks, the epidemic, which had already taken more than five hundred lives,
ended. This is an example of exploratory visualisation where the discovery of
the link between Cholera deaths and water pump (Contaminated water source) was
made possible by visually representing the data. What makes this even more
amazing is that the deduction predates germ theory; the mechanism by which
cholera was transmitted was not known at that time.
The next classic example of explanatory
visualisation or story telling is Charles Joseph Minard's famous 1867 graph
depicting Napoleon’s invasion of Russia in 1812.
Edward Tufte, the Visualisation guru, calls
it as probably the best statistical graphic ever drawn. The map portrays the
losses suffered by Napoleon's army in the Russian campaign of 1812. Beginning at
the Polish-Russian border, the thick band in brown shows the size of the army
at each position. The path of Napoleon's retreat from Moscow in the bitterly
cold winter is depicted by the dark lower band, which is tied to temperature
and time scales. The numbers on this chart indicates 422,000 soldiers crossing
the Neman with Napoleon and only 10,000 returning from the disastrous campaign.
Both exploratory and
explanatory visualisations are of importance to the auditor, for the auditor’s
job does not end with obtaining the correct insight, it ends only when it is
effectively communicated to the management for taking necessary action either
for improved governance or accountability. For achieving the latter,
explanatory visualisation becomes critical. In his seminal book- The Grammar of
Graphics, Leland Wilkinson had stated: “I believe the largest business
market for graphics will continue to be analysis and reporting, despite the
enthusiastic predictions for data mining, visualisation, animation and virtual
reality. The reason, I think, is simple. People in business and science have
more trouble communicating than discovering”. This is likely be as true
for the auditor; visual analytics tools can be used for better communicating
their audit findings.
The data analytics literature distinguishes
between two different modes of analysis, exploratory and confirmatory.
Exploratory data analysis should be the first stage of data analysis, it is
bottom-up and inductive. The use of visual exploratory techniques can help
auditors see patterns, trends, and outliers that are otherwise hidden, and
reveal relationships between variables that could be the foundation for a
confirmatory model.
In a recent paper on Data Analytics for
Internal Audit, KPMG has observed that the traditional focus of Internal Audit
departments has been on transaction based analytics to identify exceptions in
populations when applying selected business rules-based filters in key areas of
risk such as revenue or procurement. These transactional, rules-based
analytics, or "micro-level" analytics, can provide significant value
for known conditions where assessment of the frequency and magnitude of the
conditions needs to be performed. The popular computer assisted audit tools
& techniques (CAATS) usage by Public Auditors, such as IDEA or ACL, is also
micro analytics oriented, and is thus good only for evaluating known
conditions. These are ill designed to meet the exploratory data analysis needs,
or macro level analytics which deliver value by identifying broader patterns
and trends of risks.
John Tukey had once
observed that “… the picture-examining eye is the best
finder we have of the wholly unanticipated.”. With visual analytics
through use of tools like Tableau and QlikView, the auditor can obtain insights
leading to improved quality of audit which are not possible through traditional
CAATs.
The limitations of
traditional CAATs used by Auditors, – IDEA, ACL, MS Excel and MS Access, in
handling large data volumes had started becoming apparent in the last few
years. MS Excel has a limit of roughly one million rows of data (though
PowerPivot in Excel 2010/2013 can circumvent this limitation), MS Access cannot
handle more than 2 GB of data, and IDEA, though stated to have no hard limit on
data volume, can become extremely sluggish when handling over 10 Million rows
of Data, with queries taking multiple hours to execute. In contrast, the same
analytical tests on the in-memory analytics tool like QlikView and Tableau are
seen to return results in seconds – a thousand fold improvement in performance.
Apart from speed, the in-memory analytics tool are also seen to be more
efficient in storing data, with extremely high compression rates making it
possible to analyse raw data many times the multiple of the physical RAM
available on the auditor’s laptop/desktop. Most financial audits involve
structured databases, where datasets are in the Gigabyte range, with a single
dataset exceeding 1 Terabyte being an extremely rare event. With this, even
auditor’s laptop can be used to analyse most large datasets obtained in the
course of audit.
The true power of the
newer generation analytics tool, which is expected to impact audit quality and
effectiveness is their ability to both handle large data volumes, and in the
use of Visual analytics to derive audit insights.
If there is any more
convincing required for use of visual analytics, then one only need refer the example of Anscombe’s quartet, constructed
in 1973 by the statistician Francis Anscombe to demonstrate both the importance
of graphing data before analyzing it and the effect of outliers on statistical
properties. The four datasets, shown below, have identical statistical
properties, with same mean and variance for x and y in each case, and with near
identical correlation coefficient and linear regression equation.
However, the
datasets are very different, as can be seen in the graphical presentation
below:
Its time that the auditors expand their use of data analytics to deliver higher level of assurance in their audits. The profession needs this desperately, as the recent article in 13th December 2014 print edition of Economist titled - Accounting Scandals - The dozy watchdogs, highlights.
Wow, great post.
ReplyDeleteI have read this post very carefully. Thanks Gopinath for share this kind post. It is really informative post about analysis.I also share a resource for your checking about Exploratory and Confirmatory Data Analysis http://www.statisticaldataanalysis.net/comparative-data-analysis-definition/
ReplyDelete