Sunday, 25 August 2013

Complexity Profiling and Causality


A Complexity Profile is probably the most important result of a complexity analysis and it may be helpful when it comes to shedding some light on the issue of causality. Its interpretation, therefore, is of paramount importance. Before this is done, it is important to consolidate a few basic concepts. There are two types of variables in a system:
  • Inputs
  • Outputs

These can be classified in two other categories:
  • Controllable
  • Uncontrollable

There are different situations that one can be confronted with:
  • Variables are only inputs (e.g. accelerator pedal angle)
  • Variables are only outputs (e.g. stock values, survey results)
  • Both inputs and outputs are present

But first of all, what is complexity? Complexity is a measure of how much information a system “contains” and how much this information is structured. One could simply sum up the Shannon entropies of each variable and conclude that this is the total amount of information in a system. However, because variables can be correlated, they give rise to structure. Structure means the system can “do more” and, potentially, perform new functions. Structure is present everywhere in Nature.  More structured information means more correlations within the system.  Critical complexity measures how much information can a system contain before it starts to lose this structure (i.e. before this information becomes meaningless).  Since information is measured in bits complexity is measured in bits.

The importance of structure is paramount. An analogy: the mass of an atom’s nucleus is less than the sum of the masses of its components. This is because the energy going into the various bindings has an equivalent in terms of mass (m=E/c^2). It is this amount that is “lost” when measuring the mass of the nucleus as a whole. The same is with complexity. It measures the information within a system not only based on the sum of the Shannon entropies of each variable, it also takes into account the “bindings” between the variables. This means that structure also carries information, not just each variable. This structure is reflected in the so-called Complexity Map.

Complexity is like energy. More energy one has, more can be turned into work in order to accomplish something. More complexity means more information and more information also means that more can be done.

What does the Complexity Map show? It shows which groups of variables vary together. It does NOT indicate if A is causing a variation in B or vice-versa, it simply shows how variables are grouped when they change. In other words, “when variable A varies, B also varies” – this is all that can be said, unless one knows specifically that a certain variable is independent and is controllable and its variations are intended.

A Complexity Profile (or Complexity Spectrum) shows how much information is “lost” from a system (a multi-dimensional data array) if a particular variable is removed.  The measurement is provided in percentage terms. The contributions to a Complexity Profile are ranked in descending order. When a variable is at the top of the CP it does not necessarily mean that it is the most important one or that it dominates/controls the system in question. This is ONLY true if the variable is an input.

When the first variable in a CP profile is removed, all one can say for sure is that the data set without that variable will experience the largest possible loss of information. The fact that a variable lies at the top of the CP does not automatically mean that it drives the business. Why is that the case? The first important step in a complexity analysis of any system is the synthesis of a meaningful data set. If you put in garbage, the results will be in proportion to the amount of garbage with respect to meaningful data. It is up to the user to collect meaningful data that embraces correctly a given problem and not indiscriminately. Therefore, if you are completely sure that your data is correct and meaningful (i.e. is of high quality), then indeed the CP provides a correct ranking of the variables in terms of how much information each variable contributes to the whole picture. But what does that physically mean? It means that the variable in question varies a lot AND it does so in unison (i.e. with structure) with numerous other variables.This means it is important, it is a driver.

The CP, therefore, is an objective way of ranking (weighing) variables as it ranks them based on how much information they carry. Therefore, if a variable lies in the upper part of the CP and it is a controllable input to your system then indeed it is an important business driver. And what about outputs? What if you have, say N stocks, and therefore N observable outputs from a system (stock exchange). How is the CP to be interpreted then? The above comment in red still holds. But can anything else be said in such a case? Probably yes.

A common question people formulate (even though we think this is not a good question to ask) is that of causality. If A and B vary together, is it A that causes the variation in B or vice-versa? This question is very difficult to answer (unless one has “insider” information). It is one of those questions that have no answer and that are useless to ask (is pizza better than spaghetti?). However, the Complexity Profile can help.

Let us see an example, the DJIA Index. The Complexity Map is illustrated below (click image to navigate map).


The corresponding CP is this:




This is a case in which it is impossible, for example, to say if it is the price of Home Depot stocks that drives the price of Citigroup stocks or vice-versa. What does it mean “to drive”?  The relationship in question is shown below:


What really drives both stocks is the market but that cannot be measured easily. So, what we can do is to assume that if two variables co-vary (vary together) the one with a higher CP contribution drives the other. In this case we could say that Citigroup “dominates” Home Depot. It is very difficult to disprove such a statement (unless one has privileged information or if the data has been manipulated).

In the case in question we could say that Citigroup dominates the DJIA Index even though market capitalization or stock value could hint something different. In summary, we could conclude that a Complexity Profile may help solve the eternal issue of causality (which seems to trouble humanity so much).


www.ontonix.com