Skip to main content

Does data speak for itself?

Saw this:

https://informedinsport.com/new-blog/special-post-the-illogic-of-being-data-driven

The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning

— Nate Silver


So, does data have a way to speak for itself, is all interpretation and meaning imbued solely by the people analyzing the data?

The problem with this question is that it misses a level of analysis.

True, meaning is only conferred through humans, outside the system, mapping external systems to the data.  Semiotic definition of meaning holds.

However, since there are multiple potential interpretations we need a step to maximize the Information in the data before communicating its message to the people.

True it requires that people impose a metric, a type of a prior assumption, yet it is not an assumption on interpretation rather an assumption of the field of analysis.  My favorite metric is that coincidence is informational.  Thats it.  Hence two observations that co-occur are related to each other.  The basic assumption is that 'time' is a shared metric across all observations.

Then the data does speak for itself, it communicates a highly informational message.  We still need to hear the message and interpret it to imbue the message with meaning.  However, since the data has spoken well, with high informational content, the message will be clear.

Should I give an example?

Comments

Popular posts from this blog

0.0 Introduction to advanced concepts in AI and Machine Learning

Introduction to advanced concepts in AI and Machine Learning I created a set of short videos and blog posts to introduce some advanced ideas in AI and Machine Learning.  It is easier for me to think about them as I met them, chronologically in my life, but I may revisit the ideas later from a different perspective. I also noticed that one of things I am doing is utilising slightly off-centre tools to describe an idea.  So for example, I employ Kohonen Feature Maps to describe embeddings.  I think I gain a couple of things this way, first it is a different perspective than most people are used to.  In addition, well you will see :-) I recommend first opening the blog entry (as per the links below), then concurrently watching the linked video. Hope you enjoy these as much as I did putting them together, David Here are links: https://data-information-meaning.blogspot.com/2020/12/memorization-learning-and-classification.html https://data-information-meaning.blogspot.com/...

1.65 Phase transitions, a measure of learning

Phase transitions, a measure of learning https://rumble.com/vcj8fk-1.65-phase-transitions-a-measure-of-learning.html Lets compare KMeans to faddc KMeans faddc That was quite dramatic, how did we get there: KMeans faddc Neat right, KMeans spreads its representations equally across the entire dataset, minimising the global loss of information The thing to notice with faddc is that the representation is very stable up to a point, at a certain point there is a dramatic shift in the representation.   Here I graph the derivative energy, the change, the difference between the distortion given each 'k'....

1.6 Phase transitions, a measure of learning

Phase transitions, a measure of learning https://rumble.com/vcg8gw-1.6-phase-transitions-a-measure-of-learning.html Phase transitions demonstrate a loose coupling.  A tight coupling like in KMeans mirror the distortion at each level.  A loose coupling enables the higher level to move at a different pace, disjoint, from the lower level.  This separation between levels, indicates that the levels represent different descriptions of the data, they speak different languages. Important to differentiate between the model, the heirarchy, the grammar, and the content.  So next series I will do that.  The learning is in the model not the content.  The content can be memorized, its the relationships between the content that are learnt. Here is a slightly different dataset, there are at least two apparent scales.  Lets see what KMeans does as we increase 'k': Neat right, KMeans spreads its representations equally across the entire dataset, minimising the global lo...