Skip to main content

Posts

No a penguin is not an ashcan; how to evolve from supervised to semi-supervised learning

Recently Yann LeCun complemented the authors of 'A ConvNet for the 2020s' https://mobile.twitter.com/ylecun/status/1481194969830498308?s=20 https://github.com/facebookresearch/ConvNeXt These statements imply that continued improvements in the metrics of success are indicators that learning is improving.  Furthermore says LeCun, common sense reinforces this idea that 'helpful tricks' are successful in increasing the learning that occurs in these models.  But are these models learning? Are they learning better? Or perhaps they have succeeded at overfitting and scoring better but have not learnt anything new. We took a look at the what the model learned, not just how it scored on its own metric.  To this end we created a graph with links between each image and its top 5 classifications, the weights of the links are in proportion to the score of the class.   Here are the data files: https://github.com/DaliaSmirnov/imagenet_research/blob/main/prediction_df_resnet50.p...
Recent posts

Architecture is the Keystone [Data Architecture 1/3]

We went through a very painful process, we lost time, the most expense resource on the planet.  We also lost some of our people, yet another extremely painful process. But let me back-up a bit in the story. When we started iSkoot there were three core tech challenges, creating a virtual audio driver, transporting the media in realtime to an IP-PBX and scaling the solution.  One of these things is not like the other. Scaling the solution demanded that we pay attention to the architecture of the first two core-tech issues.  Architecture was the keystone to startups back then.  And when we said architecture we meant System & Software Architecture. Today there has been a significant shift in the hi-tech world, systems and software have been replaced by data as the core value of a company and the keystone, that internally magic thing that binds all the elements into a whole. At Blue dot we lived this shift.  Blue dot started in the age of System Architecture....

Too Much Data -- summary post

Here is a set of summary links: I) We live in the Information Age https://data-information-meaning.blogspot.com/2019/03/we-live-in-information-age.html II) Too much data https://data-information-meaning.blogspot.com/2019/03/too-much-data.html III) Metrics https://data-information-meaning.blogspot.com/2019/03/metrics.html IV) Abstractions and judgements https://data-information-meaning.blogspot.com/2019/03/abstractions-and-judgements.html V) How do we know we made a reasonable judgement? https://data-information-meaning.blogspot.com/2019/04/how-do-we-know-we-made-reasonable.html ---- https://data-information-meaning.blogspot.com/2019/04/scale-hierarchy-and-distance-metrics.html

VI) Scale -- hierarchy and distance metrics

VI) Scale -- hierarchy and distance metrics Judgements are phase transitions, they bring us to higher levels of representation. Scale -- hierarchy and distance metrics I wrote the following: https://ashlag-cause-and-kook-affect.blogspot.com/2018/10/the-natural-scale-of-thing.html Here is a snippet: There can be no metric to measure the distance between two things across levels Why not?  Well simply: 1. A metric to measure distance between two elements requires that the elements be in the same dimension 2. By comparing two elements at different levels we are reducing the higher dimensional object to the lower dimension, this will collapse the information in the system. Here is an example: 1. Assuming I want to measure the health benefits of eating a balanced diet. 2. At a high level I can gather data in a three dimensional space, carbs, protein and fat, macro nutrients. 3. I can also gather data at a lower level that is in a higher dimensional space, we can breakdown protei...

How does learning work, how much data is needed to learn and don't cross the streams!

Learning is the process by which a model is constructed, the model describes a set of observations.  The more compact the model, the better the learning process is considered.  This is manifest in the ability of the model to predict and generalize (out-of-sample) data.  But let's not confuse learning with classification.  Again, the essence of learning is the model construction and the condensed representation of the observations. So how many observations, data elements, are required to construct a model?  [Nassim Taleb addresses this question here:  https://arxiv.org/pdf/1802. 05495.pdf ] The typical answer is for a Gaussian/normal distribution, 30 observations, simple.  We construct a model of the mean and variance of the data by calculating the average and variance from our 30 sample observations. Clearly this is not true in all cases, we do not always have simple normal distributions.  And in more complex case we would require more obs...

1.65 Phase transitions, a measure of learning

Phase transitions, a measure of learning https://rumble.com/vcj8fk-1.65-phase-transitions-a-measure-of-learning.html Lets compare KMeans to faddc KMeans faddc That was quite dramatic, how did we get there: KMeans faddc Neat right, KMeans spreads its representations equally across the entire dataset, minimising the global loss of information The thing to notice with faddc is that the representation is very stable up to a point, at a certain point there is a dramatic shift in the representation.   Here I graph the derivative energy, the change, the difference between the distortion given each 'k'....

1.6 Phase transitions, a measure of learning

Phase transitions, a measure of learning https://rumble.com/vcg8gw-1.6-phase-transitions-a-measure-of-learning.html Phase transitions demonstrate a loose coupling.  A tight coupling like in KMeans mirror the distortion at each level.  A loose coupling enables the higher level to move at a different pace, disjoint, from the lower level.  This separation between levels, indicates that the levels represent different descriptions of the data, they speak different languages. Important to differentiate between the model, the heirarchy, the grammar, and the content.  So next series I will do that.  The learning is in the model not the content.  The content can be memorized, its the relationships between the content that are learnt. Here is a slightly different dataset, there are at least two apparent scales.  Lets see what KMeans does as we increase 'k': Neat right, KMeans spreads its representations equally across the entire dataset, minimising the global lo...