Skip to main content

1.4 Kohenen feature maps (new embedded space) vs Histograms of activation

Kohenen Feature Maps are an interesting blend of two processes, simultaneously creating a model and imposing on that model a new metric space, it works by taking a simple KMeans learning function and blend it with a graph.

https://rumble.com/vc7ei8-1.4-kohonen-feature-map.html

So here is online KMeans in actions:

A centroid is placed in a metric space, when a data element is observed, the system calculates the distance between the closest centroid (in my example there is only a single centroid) and the data element, then moves the centroid in the direction of the data element.
When a new data element is observed the centroid moves towards the new data element
Fast forward and lets put two centroids in play:
Now when a new data element is observed, the two centroids compete, the closest centroid is the 'winner', and only the closest centroid is moved in the direction of the new data element.  This plays out very much like classic KMeans (if the data is non-stationary and stochastic, i.e. arrives in a random fashion such that order is not important [batch processing provides that] and is random in its nature).

Now Prof. T. Kohonen had the following idea.  Why not link sets of centroids (he called them Neurons) to each other, such that when a 'winner' is identified not only does the 'winner' move closer to the new data element, but also its neighbours.  The neighbours may not move as quickly or as far, but they keep the direction.

The result of this algorithm is that it creates spatially consistent feature maps, where stimuli, i.e. observations, that are similar (close to each other in the observations space) are represented by Neurons that are close to each other in the Neuronal space.

When I came across this, I really enjoyed the way the output represented a spatial map of the brain.

But more importantly, the model was populated with Neurons not centroids anymore.  They carried more information than just a representation of a global minimisation, there is a local element that promotes a bottom up self-organising model.  This creates an order in the otherwise random assignment of centroids.  The activations, the representations in the new space of the observations, echo the observations in their original metric space.

Dr. Maxim Shoshani taught me to look at histograms of activation.  So really just count the number of data elements that are classified by each centroid.  Simple but very powerful.  The histogram can then be analysed without a numeric metric between centroid, this is what I leveraged much later for my deep learning without gradient descent.

To summarise:

I now think the generic histogram is the better approach, not to try to impose a numeric metric space (not to create an embedding and then place the Neurons/Centroids within the embedded space).  Rather, let the judgment create categories that have no inherent distance metric, then utilise the data to determine the metric.  So a realtime metric can be created based on the current data.

This is an important point, may be I should expand on it now, but I will get back to it again.  Just quickly mention, two aspects of creating the embedded space.  1) It echoes the original data, things that are similar in the original space are similar in the new embedded space, sounds good right, not really, it keeps the lower level connected to the higher level, there is less learning going on, since the higher levels are not free from the constraints of the lower level.  2) When the data distribution changes, the model is locked into the original metrics (true that Joshua Benjio tries to avoid this with 'attention' but why create a problem and then try to work around it).



Comments

Popular posts from this blog

V) How do we know we made a reasonable judgement?

V) How do we know we made a reasonable judgement? I was by my brother in NY, on my way to the airport, and I spotted a book by Umberto Eco on information and open systems.  I borrowed the book (and still have it -- sorry Jacob),  just on the whim that I would enjoy more Eco in my life.  I discovered much more, the book is Eco's earlier writing, semiotics mixed with art and science, and has had a profound affect on me.  Eco makes the argument that Shannon's description of information, a measure of the communicability of a message, provides for a measure of art. If it helps think about 'On Interpretation' by Susan Sontag, experience art without interpreting it.  There is no message not even one that we the viewer creates.   There is no meaning to be had, just an experience.  The flip side of this argument is that when there is interpretation there is meaning.  This view, proposed by Semiotics, states that when two closed systems meet and are ...

0.0 Introduction to advanced concepts in AI and Machine Learning

Introduction to advanced concepts in AI and Machine Learning I created a set of short videos and blog posts to introduce some advanced ideas in AI and Machine Learning.  It is easier for me to think about them as I met them, chronologically in my life, but I may revisit the ideas later from a different perspective. I also noticed that one of things I am doing is utilising slightly off-centre tools to describe an idea.  So for example, I employ Kohonen Feature Maps to describe embeddings.  I think I gain a couple of things this way, first it is a different perspective than most people are used to.  In addition, well you will see :-) I recommend first opening the blog entry (as per the links below), then concurrently watching the linked video. Hope you enjoy these as much as I did putting them together, David Here are links: https://data-information-meaning.blogspot.com/2020/12/memorization-learning-and-classification.html https://data-information-meaning.blogspot.com/...

III) Metrics

III) Metrics One of these things is not like the other -- but two of these things are distant from a third. I grew up with Brisk Torah, more specifically my father was a Talmid of Rabbi Joseph Soloveichik and dialectic thinking was part and parcel of our discussions.  Two things, two dinim, the rhythm in the flow between two things.  Dialectics not dichotomies.  The idea espoused by the Rambam in his description of Love and Awe, mutually exclusive, we travel between them. Why create duality?  Dialectics or dichotomies provide a powerful tool, but what is it that tool? What is the challenge? I think the Rabbinic language might be נתת דברך לשיעורים, 'your words are given to degrees', the idea being that without clear definitions we are left with vague language, something is more than something else, ok, but how much more? This I think is the reasoning for the first of the twenty one questions I was taught by my father's mother, 'is it bigger than a breadbox?',...