Learning spatial relations

The big demonstrator that we have in mind is the ability to talk about some line drawing scene, after having extracted various objects from it through compression. That is very ambitious indeed. One project that comes to mind is some kind of object learning with children. You point to an object, say it’s name and the system learns about it. Or it can ask you about objects. Then you can give commands and say “put the plate on the table”, “is there something in the plate?”, “cut away a leg of the chair”, “divide the line in half”. It would really be an interactive scene comprehension test. Or geometrical relations. “If a line intersects with two parallel lines, then the angles will be equal.” This is the assessment of the truth value of a statement. But also word learning would be possible. How about relations? This will be important, like spatial relations for example. ABOVE, LEFT, IN, ON, AT, ATTACHED. The simplest relation in one dimensional case is probably LEFT-OF and RIGHT-OF. When we have identified two objects on the tape, one is on the left of the right one. However, this relation will probably be difficult to learn, since it is not salient. But if the objects are attached then it is much more salient. For example, if all are zero, but one object is a 111111 and the other is 22222 then it is quite salient if you see ..00011111122222000… somewhere in the midst of zeros. The relation becomes an additional but optional constraint.

If we could only reproduce Jean Mandler’s “concept primitives”, it would be hilarious!

Hence, the AT(TACHED) relation is simple, since it is a compressing constraint. What about LEFT, RIGHT and BETWEEN? LEFT and RIGHT could be relations that occur as orthogonal relations to the AT relation, since they determine independent aspects of the spatial relation. Hence, a spatial relation is simple the determination of the spatial position relatively to an object. AT is a fairly compressing relations since it has a binary value but significantly narrows the set of residual positions. BETWEEN is simply a combination of left and right in the 1D case. In 2D things are more complicated. The object in between two objects is somewhere on a line between them. Hence it is not far away from the set of points defining that line. The details will kill us but so be it. The close-to-line scheme is fairly compressing and hence a useful determinant of a spatial relation. The ON relation is simply an AT plus ABOVE.

Now comes the really hard one: IN and OUT. It seems to require an understanding that an object defines a subset of the whole space. If your position is part of that subspace, then you are inside that object. And it is fairly compressing. Somehow, a closed path must be imagined along the border of the object. What is the inside of a closed path? It is also related to the concept of whether there is a path leading to the outside. Those are complex thoughts. What any path can or can not do is a type of statement whose truth value would be hard to assess. However, those any-path-statements have to be cracked somehow anyway. But let’s skip them for now.

The main lesson from those considerations is the need for features. The extensions of the present function network that are mentioned in the previous post are not enough. About features, see next post…