Wednesday, October 10, 2007

30 Minutes is not the same as half an hour

No it isn't playing with the words, so stop trying to figure it out that way.
As usual I have been digging in some stuff, this time related to Natural Language processing. And on the other side, I had to determine an estimate for a task in my work. Usually we measure tasks by hours. But the task was somehow small that it takes less than an hour. So I wrote 30 minutes.
After I sent the mail, I kept thinking, why didn't I write half an hour ? Inherently "half an hour" is more rough estimate than 30 minutes. Exactly like when you measure yourself to one and a half meter, against 150 cm.
It is about the scale, no; it is about the precision. I've always been fascinated how reading that the government have spent 30 Billion LE on that project seems normal, while reading that it spent 30,000 Million LE seems huge! Reading that it spent 30,000,000,000 LE would seem gigantic. 3 * 10^10, almost means nothing. Although they represent the same quantity.
What have derived this idea, is thinking how an NLP program will ever comprehend such style in understanding, which I believe is very related to how human mind evaluates things. It maybe even somewhat related to psychology than reasoning; subconsciously: that number takes that many characters, then it must be huge!
A theory I've been thinking in for a while, in NLP and Natural Language Understanding in specific, is that most babies know nothing about languages when they are newly born. Yet they function correctly. So language is complementary, but not essential in learning. Language only aids learning by adding new means to it: communication.
So I believe that before any serious trials are made to make a full natural language understanding program, knowledge representation and learning methods need to be formalized first. You can't add communication facilities to a program that will not use it. Well, in fact you can, but it is useless.
Babies see a bottle, then most of the time, they hear 'bottle', so according to the famous psychology phenomena "Conditioning"[1], both events: seeing a bottle, and hearing the word 'bottle' stimulate each other. That's simplified. It is actually kind of statistical learning. Over time, they learn to be more sensitive to some phonemes than the other, depending on the inherent probabilities in the language most common in their environment. "Koreans notoriously fail to distinguish 'l' and 'r' sounds"[2].
If I would guess similar pattern for vision, and motor skills. For example, I might guess that babies might see the percepted image as-is, but with the aid of the 3 dimensional perception, they can notice certain patterns as they move around, leading to detecting the boundaries of objects. As they do that, and over time, they learn the patterns, and the colors, and learn to use them to distinguish objects and extract them. I might use the existence of optical illusions as evidence. Somewhere in your childhood you have mixed objects like that. Kids usually deal with big cubes and learn the dimensions to sharpen the object-detection neural networks in their brains. And they grow up they get more exposed to more complicated patterns, like text for example.
So if we assumed a simplistic model, that a human is a combinations of sensors, and a statistical pattern-matching learning machine, and non-linear initially-empty knowledge/rule base, that is stimulated by the sensors and the pattern matching machine, then we only need to figure out the inter-dependencies between them and also figure out the abstract functionality contract for each of them, to be able to create a real NLU program sometime.
What on the stage now is artificial hard-coded NLU programs. what I am seeking is a self-learning NLU program, that can teach itself languages, and can then teach itself everything. Of course, sensors are needs, experiments which the system should go through (life-experiments) should be available to the system. But one more thing, that were not in the past assumptions can destroy all that. The assessment problem.
The assessment problem
On what basis should the program accept of reject a new piece of knowledge ? The program till now is only a pure knowledge collection machine, stimulated by external events. What's the use ? The program must ponder the knowledge it gained, and assess the acceptable and non-acceptable pieces of knowledge, especially that there is a lot of contradiction out there, because knowledge available in the world is highly affected by personal view points and preferences. How the program will learn the ability to compromise between two conflicting facts ? If there is two different view points, how can the program choose one ? Or should it compromise or select one, or worse yet, select only two from a set of contradictions ?
Humans, have been forged to the good nature, such contradictions and wrong choices affect their nature after that. It's a matter of trust, which source should the program trust ? And one what basis should it grant or revoke trust ?
This is far beyond the scope of NLU, or machine learning, this.. is .. humanity ! Yes, this is1 human kind. No application what soever will be able to reach that state consistently and independently. As you might have noticed, this is not first-order knowledge. They might create applications that understand first-order knowledge better than the human (i.e. some advanced sort of databases), but they will never create a program that can reach arbitrary orders of knowledge. At least not one that will be close the the human. The humans will prevail the top in that.

This article is related to the meta-ization phenomena that I wanted to write about a long time ago, I don't recall I wrote it yet.

References:
[1] http://www.alleydog.com/101notes/conditioning.html
[2] http://www.asian-efl-journal.com/dec_03_sub.K1.php

4 comments:

Hossam Sadik said...

nice article ya nabil .. although every day you prove that we underestimate how geek u r :D

Mohammad Alaggan said...

al 7amd llah matl3tsh sheteema :D ana 3amal men sa3et ma3rft en fe comment men hossam wana 7asses ennaha sheteema mesh 3aref leh :D
Anyway thanks for your comment ya basha :D

Ahmed Essawy said...

da nafs al kalam 2le 2ltly 3aleh fe 7fz al kaleman be lo3at tnya :)

Mohammad Alaggan said...

exactly :)