Skip to main content

1.4 Kohenen feature maps (new embedded space) vs Histograms of activation

Kohenen Feature Maps are an interesting blend of two processes, simultaneously creating a model and imposing on that model a new metric space, it works by taking a simple KMeans learning function and blend it with a graph.

https://rumble.com/vc7ei8-1.4-kohonen-feature-map.html

So here is online KMeans in actions:

A centroid is placed in a metric space, when a data element is observed, the system calculates the distance between the closest centroid (in my example there is only a single centroid) and the data element, then moves the centroid in the direction of the data element.
When a new data element is observed the centroid moves towards the new data element
Fast forward and lets put two centroids in play:
Now when a new data element is observed, the two centroids compete, the closest centroid is the 'winner', and only the closest centroid is moved in the direction of the new data element.  This plays out very much like classic KMeans (if the data is non-stationary and stochastic, i.e. arrives in a random fashion such that order is not important [batch processing provides that] and is random in its nature).

Now Prof. T. Kohonen had the following idea.  Why not link sets of centroids (he called them Neurons) to each other, such that when a 'winner' is identified not only does the 'winner' move closer to the new data element, but also its neighbours.  The neighbours may not move as quickly or as far, but they keep the direction.

The result of this algorithm is that it creates spatially consistent feature maps, where stimuli, i.e. observations, that are similar (close to each other in the observations space) are represented by Neurons that are close to each other in the Neuronal space.

When I came across this, I really enjoyed the way the output represented a spatial map of the brain.

But more importantly, the model was populated with Neurons not centroids anymore.  They carried more information than just a representation of a global minimisation, there is a local element that promotes a bottom up self-organising model.  This creates an order in the otherwise random assignment of centroids.  The activations, the representations in the new space of the observations, echo the observations in their original metric space.

Dr. Maxim Shoshani taught me to look at histograms of activation.  So really just count the number of data elements that are classified by each centroid.  Simple but very powerful.  The histogram can then be analysed without a numeric metric between centroid, this is what I leveraged much later for my deep learning without gradient descent.

To summarise:

I now think the generic histogram is the better approach, not to try to impose a numeric metric space (not to create an embedding and then place the Neurons/Centroids within the embedded space).  Rather, let the judgment create categories that have no inherent distance metric, then utilise the data to determine the metric.  So a realtime metric can be created based on the current data.

This is an important point, may be I should expand on it now, but I will get back to it again.  Just quickly mention, two aspects of creating the embedded space.  1) It echoes the original data, things that are similar in the original space are similar in the new embedded space, sounds good right, not really, it keeps the lower level connected to the higher level, there is less learning going on, since the higher levels are not free from the constraints of the lower level.  2) When the data distribution changes, the model is locked into the original metrics (true that Joshua Benjio tries to avoid this with 'attention' but why create a problem and then try to work around it).



Comments

Popular posts from this blog

III) Metrics

III) Metrics One of these things is not like the other -- but two of these things are distant from a third. I grew up with Brisk Torah, more specifically my father was a Talmid of Rabbi Joseph Soloveichik and dialectic thinking was part and parcel of our discussions.  Two things, two dinim, the rhythm in the flow between two things.  Dialectics not dichotomies.  The idea espoused by the Rambam in his description of Love and Awe, mutually exclusive, we travel between them. Why create duality?  Dialectics or dichotomies provide a powerful tool, but what is it that tool? What is the challenge? I think the Rabbinic language might be נתת דברך לשיעורים, 'your words are given to degrees', the idea being that without clear definitions we are left with vague language, something is more than something else, ok, but how much more? This I think is the reasoning for the first of the twenty one questions I was taught by my father's mother, 'is it bigger than a breadbox?',...

0.0 Introduction to advanced concepts in AI and Machine Learning

Introduction to advanced concepts in AI and Machine Learning I created a set of short videos and blog posts to introduce some advanced ideas in AI and Machine Learning.  It is easier for me to think about them as I met them, chronologically in my life, but I may revisit the ideas later from a different perspective. I also noticed that one of things I am doing is utilising slightly off-centre tools to describe an idea.  So for example, I employ Kohonen Feature Maps to describe embeddings.  I think I gain a couple of things this way, first it is a different perspective than most people are used to.  In addition, well you will see :-) I recommend first opening the blog entry (as per the links below), then concurrently watching the linked video. Hope you enjoy these as much as I did putting them together, David Here are links: https://data-information-meaning.blogspot.com/2020/12/memorization-learning-and-classification.html https://data-information-meaning.blogspot.com/...

No a penguin is not an ashcan; how to evolve from supervised to semi-supervised learning

Recently Yann LeCun complemented the authors of 'A ConvNet for the 2020s' https://mobile.twitter.com/ylecun/status/1481194969830498308?s=20 https://github.com/facebookresearch/ConvNeXt These statements imply that continued improvements in the metrics of success are indicators that learning is improving.  Furthermore says LeCun, common sense reinforces this idea that 'helpful tricks' are successful in increasing the learning that occurs in these models.  But are these models learning? Are they learning better? Or perhaps they have succeeded at overfitting and scoring better but have not learnt anything new. We took a look at the what the model learned, not just how it scored on its own metric.  To this end we created a graph with links between each image and its top 5 classifications, the weights of the links are in proportion to the score of the class.   Here are the data files: https://github.com/DaliaSmirnov/imagenet_research/blob/main/prediction_df_resnet50.p...