Using features for the specialization of algorithms
A widespread sickness of present “narrow AI” approaches is the almost irresistible urge to set up rigid algorithms that find solutions in an as large as possible search space. This always leads to a narrow search space containing very complex solutions. As we have pointed out many times, we need an algorithm that covers a wide range of simple solutions, rather than going deep into complexity.
This approach however, requires to ability to construct new algorithms that are appropriate for the given task on the fly, since general intelligence means to be able to solve problems without knowing them in advance when the system is built. This on the fly construction is called specialization. We have referred to a similar process of data-dependent search space expansion. After all, if the algorithm at hand covers a narrow search space and the solution lies outside of it, the algorithm has to be modified such that its search space expands toward the solution and possibly narrows down in the opposite direction.
We suggest to use features. After all, enumerating an objects features leads to an incremental narrowing of the set of objects that still fall into the description. For example, if we say, the object is yellow, we have narrowed down the set of possible objects significantly, but there are still quite many. If we add that the object is of the size of a cat, the set is reduced to cat-sized yellow objects, and so on. Similarly, the set can be expanded by releasing some feature constraints.
Imagine a high-dimensional search space and the solution is part of a small subspace. For example, in probabilistic generative models, often a constraint is given by the data, such as \( X+Y=10\), and a solution may be \( X=3, Y=7\). To find the solution, it is much more efficient to parametrize the line represented by the constraint and search through the one-dimensional space rather than the two dimensional one.
When the way we travel through the search space is not fixed, it seems like we land in the field of policy learning like in reinforcement learning. Together with feature induction this reminds very much of AIXI.
The setting is the following. Let a generative model in form of a function \( f(X,Y)=X+Y\) be given and some loss function \( L(X,Y)=\left(f(X,Y)-10\right)^2\). We can always sample from \( X\) and \( Y\). Of course, we could now perform gradient descent or some other stupid, wannabe-general optimization algorithm. Instead, we want to system to construct it’s own algorithm that is well suited for this particular function. And it shall construct a different algorithm for a different function, for example \( g(X,Y)=\sqrt(X^2+Y^2)\). The first algorithm would ideally parametrize a straight line, the second a circle. This is what is meant by specialization.
Now, the point is <strong>not</strong> to do function inversion, e.g. the analytic or numeric construction of \( f^{-1}\) or \( g^{-1}\), since first, it is difficult, and second generally intractable. After all, humans seem to be able to move efficiently around search spaces that are not easily representable in an analytic way. Or are we too confident about not using inverse functions? After all, they are the ones that exhibit the highest compression, and it is compression that we want to be guided by. Further, our incremental compression scheme involves the search for inverse functions. Let’s keep that question open.
Here is how features can come into play. Suppose we have found a few samples with zero or close to zero loss. For example, \( X=2,4,9\) and \( Y=8,6,1\), which form a sequence. Then one could search for an inverse function \( 10-X\) to compute the \( Y\) values in the parameters while have \( f\) as the feature function. However, this is a bad example, since the blind search for a feature inverse is just as hard as inverting the whole function, since this problem seems to have a single feature.
This looks much more difficult than it seemed at first glance.