Problem Statement: Automatic semantic image annotation is becoming very important now because the naive users communicate with the
image database via high-level semantic visual concepts, not low-level visual features as it has been treated by the traditional
content-based image retrieval systems. Unfortunately, automatic semantic image annotation is still very challenging because of the
following problems: (a) How the human being interprete the semantic meaning of an image? From the above figure, one can find that the
human beings may interprete the semantic meaning of the images according to the individual objects, object-object relationships or global
information. Some researchers mentioned that the human beings recognize and interprete an image based on the objects, but some researchers
claimed that the human beings recognize an image without using the details of the image. How the human exactly interprete the semantic
meaning of an image? How much psychological and psychophysical studies should be inputed? (b) How we can simulate the human vision system
to interprete the semantic concepts of an image? It is very hard if not impossible because there is a semantic gap between the semantic
visual concepts and the low-level visual features which the computer can calculate automatically for us. What is the rule for our
techniques to do this? Learning from examples? (c) How the naive users specify their query concepts and how the database answer their
queries? Most naive users may not have examples to specify their query concepts, and some of them may not know exactly what they want at
all. How semantic image annotation system can support more effcient search engine?

"You push the button, we do the rest" ---Slogan for Eastman Kodak---
![]()