Skip to main content

How does learning work, how much data is needed to learn and don't cross the streams!

Learning is the process by which a model is constructed, the model describes a set of observations.  The more compact the model, the better the learning process is considered.  This is manifest in the ability of the model to predict and generalize (out-of-sample) data.  But let's not confuse learning with classification.  Again, the essence of learning is the model construction and the condensed representation of the observations.

So how many observations, data elements, are required to construct a model? 

[Nassim Taleb addresses this question here: https://arxiv.org/pdf/1802.05495.pdf]

The typical answer is for a Gaussian/normal distribution, 30 observations, simple.  We construct a model of the mean and variance of the data by calculating the average and variance from our 30 sample observations.

Clearly this is not true in all cases, we do not always have simple normal distributions.  And in more complex case we would require more observations.  But let's assume the magic number of 30 is true.

So what happens when you increase the dimensionality of the data.  Well the 'curse of dimensionality' takes over.  The number of observations increases exponentially.  So now a relatively simple model of two or three dimensions, for example macro nutrients Carbs/Protein/Fat, would require 900 to 27000 observations.  This says, that if we wanted to describe the effect a diet has on a person and we measure the macro nutrients, we would need a sample size in the tens of thousands.

Now what happens when we go to something like micro nutrients, well there are nine essential amino acids, that gives us 30^9, a very large number (19683000000000). So even in an ideal case where all other confounding variables were isolated, you would still need an enormous test population (actually two or three, control/placebo groups as well).

This is why Prof. John Ionidies says: "Risk-conferring nutritional combinations may vary by an individual’s genetic background, metabolic profile, age, or environmental exposures. Disentangling the potential influence on health outcomes of a single dietary component from these other variables is challenging, if not impossible" [my emphasis], John P. A. Ioannidis, MD, DSc, The Challenge of Reforming Nutritional Epidemiologic Research

What does that mean in practice?  It means that all research based on observational data done today vastly underestimates the amount of data they need.  Yep, nothing published is good science.  Sugar is bad for you? Fat is good? Eggs? Vaccines?

But wait you say, that can't be, I know somethings work.  Gravity seems to be true and it is based on observational data (at least at first it was).

There are two answers to this, first our subjective definition of truth, gravity is true, is bolstered by a good argument.  We perceive the fact as being true if we have confidence in the fact, and even bad data science provides confidence.

But the better answer is that when we analyze gravity we simplify the problem space, it ends up being a single dimensional problem, and we don't need that many observations, thirty is enough.

Wait, gravity is complicated, just measuring the canon ball vs. the musket ball confused lots of people, including Galileo.  There are factors such as wind resistance that confound the variables and confuse the measurements.  But we simplified.

The trick to simplification is abstraction.  Going up a level of representation....

 




Comments

Popular posts from this blog

III) Metrics

III) Metrics One of these things is not like the other -- but two of these things are distant from a third. I grew up with Brisk Torah, more specifically my father was a Talmid of Rabbi Joseph Soloveichik and dialectic thinking was part and parcel of our discussions.  Two things, two dinim, the rhythm in the flow between two things.  Dialectics not dichotomies.  The idea espoused by the Rambam in his description of Love and Awe, mutually exclusive, we travel between them. Why create duality?  Dialectics or dichotomies provide a powerful tool, but what is it that tool? What is the challenge? I think the Rabbinic language might be נתת דברך לשיעורים, 'your words are given to degrees', the idea being that without clear definitions we are left with vague language, something is more than something else, ok, but how much more? This I think is the reasoning for the first of the twenty one questions I was taught by my father's mother, 'is it bigger than a breadbox?',...

0.0 Introduction to advanced concepts in AI and Machine Learning

Introduction to advanced concepts in AI and Machine Learning I created a set of short videos and blog posts to introduce some advanced ideas in AI and Machine Learning.  It is easier for me to think about them as I met them, chronologically in my life, but I may revisit the ideas later from a different perspective. I also noticed that one of things I am doing is utilising slightly off-centre tools to describe an idea.  So for example, I employ Kohonen Feature Maps to describe embeddings.  I think I gain a couple of things this way, first it is a different perspective than most people are used to.  In addition, well you will see :-) I recommend first opening the blog entry (as per the links below), then concurrently watching the linked video. Hope you enjoy these as much as I did putting them together, David Here are links: https://data-information-meaning.blogspot.com/2020/12/memorization-learning-and-classification.html https://data-information-meaning.blogspot.com/...

No a penguin is not an ashcan; how to evolve from supervised to semi-supervised learning

Recently Yann LeCun complemented the authors of 'A ConvNet for the 2020s' https://mobile.twitter.com/ylecun/status/1481194969830498308?s=20 https://github.com/facebookresearch/ConvNeXt These statements imply that continued improvements in the metrics of success are indicators that learning is improving.  Furthermore says LeCun, common sense reinforces this idea that 'helpful tricks' are successful in increasing the learning that occurs in these models.  But are these models learning? Are they learning better? Or perhaps they have succeeded at overfitting and scoring better but have not learnt anything new. We took a look at the what the model learned, not just how it scored on its own metric.  To this end we created a graph with links between each image and its top 5 classifications, the weights of the links are in proportion to the score of the class.   Here are the data files: https://github.com/DaliaSmirnov/imagenet_research/blob/main/prediction_df_resnet50.p...