Too much data not enough information: a survival guide to the information age

Posts

Architecture is the Keystone [Data Architecture 1/3]

We went through a very painful process, we lost time, the most expense resource on the planet. We also lost some of our people, yet another extremely painful process. But let me back-up a bit in the story. When we started iSkoot there were three core tech challenges, creating a virtual audio driver, transporting the media in realtime to an IP-PBX and scaling the solution. One of these things is not like the other. Scaling the solution demanded that we pay attention to the architecture of the first two core-tech issues. Architecture was the keystone to startups back then. And when we said architecture we meant System & Software Architecture. Today there has been a significant shift in the hi-tech world, systems and software have been replaced by data as the core value of a company and the keystone, that internally magic thing that binds all the elements into a whole. At Blue dot we lived this shift. Blue dot started in the age of System Architecture....

Too Much Data -- summary post

Here is a set of summary links: I) We live in the Information Age https://data-information-meaning.blogspot.com/2019/03/we-live-in-information-age.html II) Too much data https://data-information-meaning.blogspot.com/2019/03/too-much-data.html III) Metrics https://data-information-meaning.blogspot.com/2019/03/metrics.html IV) Abstractions and judgements https://data-information-meaning.blogspot.com/2019/03/abstractions-and-judgements.html V) How do we know we made a reasonable judgement? https://data-information-meaning.blogspot.com/2019/04/how-do-we-know-we-made-reasonable.html ---- https://data-information-meaning.blogspot.com/2019/04/scale-hierarchy-and-distance-metrics.html

VI) Scale -- hierarchy and distance metrics

VI) Scale -- hierarchy and distance metrics Judgements are phase transitions, they bring us to higher levels of representation. Scale -- hierarchy and distance metrics I wrote the following: https://ashlag-cause-and-kook-affect.blogspot.com/2018/10/the-natural-scale-of-thing.html Here is a snippet: There can be no metric to measure the distance between two things across levels Why not? Well simply: 1. A metric to measure distance between two elements requires that the elements be in the same dimension 2. By comparing two elements at different levels we are reducing the higher dimensional object to the lower dimension, this will collapse the information in the system. Here is an example: 1. Assuming I want to measure the health benefits of eating a balanced diet. 2. At a high level I can gather data in a three dimensional space, carbs, protein and fat, macro nutrients. 3. I can also gather data at a lower level that is in a higher dimensional space, we can breakdown protei...

How does learning work, how much data is needed to learn and don't cross the streams!

Learning is the process by which a model is constructed, the model describes a set of observations. The more compact the model, the better the learning process is considered. This is manifest in the ability of the model to predict and generalize (out-of-sample) data. But let's not confuse learning with classification. Again, the essence of learning is the model construction and the condensed representation of the observations. So how many observations, data elements, are required to construct a model? [Nassim Taleb addresses this question here: https://arxiv.org/pdf/1802. 05495.pdf ] The typical answer is for a Gaussian/normal distribution, 30 observations, simple. We construct a model of the mean and variance of the data by calculating the average and variance from our 30 sample observations. Clearly this is not true in all cases, we do not always have simple normal distributions. And in more complex case we would require more obs...

1.65 Phase transitions, a measure of learning

Phase transitions, a measure of learning https://rumble.com/vcj8fk-1.65-phase-transitions-a-measure-of-learning.html Lets compare KMeans to faddc KMeans faddc That was quite dramatic, how did we get there: KMeans faddc Neat right, KMeans spreads its representations equally across the entire dataset, minimising the global loss of information The thing to notice with faddc is that the representation is very stable up to a point, at a certain point there is a dramatic shift in the representation. Here I graph the derivative energy, the change, the difference between the distortion given each 'k'....

1.6 Phase transitions, a measure of learning

Phase transitions, a measure of learning https://rumble.com/vcg8gw-1.6-phase-transitions-a-measure-of-learning.html Phase transitions demonstrate a loose coupling. A tight coupling like in KMeans mirror the distortion at each level. A loose coupling enables the higher level to move at a different pace, disjoint, from the lower level. This separation between levels, indicates that the levels represent different descriptions of the data, they speak different languages. Important to differentiate between the model, the heirarchy, the grammar, and the content. So next series I will do that. The learning is in the model not the content. The content can be memorized, its the relationships between the content that are learnt. Here is a slightly different dataset, there are at least two apparent scales. Lets see what KMeans does as we increase 'k': Neat right, KMeans spreads its representations equally across the entire dataset, minimising the global lo...