We can access text document in databases easily now. However, the people never stop dreaming,
can we access images or videos by using their visual perceptions? We can support image or video databases access by
using a simple approach: ask people to annotate the images or videos in databases by using keywords based on their
understanding, and then these images or videos in databases are indexed by using these selected keywords, thus the viusal
database (image and video databases) access problem has been transformed to the traditional text databases. Unfortunately,
different people has different understanding of the semantic meaning of the same image or video, or even the same people may
have different feeling of the same image or video at different situation. Therefore, it is very important to support
image and video access via their visual perceptions. In this course, we will focus on how to support content-based image
or video database access via their visual perceptions not keywords of manual text annotation.
Course Outline:
Image and video coding standards, such as JPEG, JPEG2000, MPEG-1, 2, 4;
Image and Video description standard: MPEG-7 and XML;
Image and video analysis techniques;
Image search engine;
Video search engine;
Industry issues on image and video search engine;
Image and video streaming over networks;
Visual Database security issue;
Current research issues in image and video search engine;
Open discussion: Visual Databases: Who cares?
Suggested Text Books (optional):
A. Rosenfeld, D. Doermann, D. DeMenthon, ``Video Mining", Kluwer Academic
Publishers, 2003.
Yihong Gong, W. Xu, ``Machine Learning for Multimedia Content Analysis", Springer, 2007.
Project demonstration(Dec.1, 2008): (a) shot detection (Li Yu, Debamit Dutta, Zhiyong Guo); (b) semantic image classification (Daniel McIntyre,
Sean Crilley, Yinbo Li, Laura Vandivier, Wenwen Dou).
Course Projects:
If you pick one of the following specific topics for presentation: Informeida at CMU,
Projects at Columbia University, Projects at University of Amsterdam, Concept Ontology for
Text Classification, Concept Ontology for Multimedia Classification, Volume-Based Video
Representation, Video/Image Visualization, you will work on reading all these given papers
and summarize and give at least 1.2 hours presentation. Before you do that, you have to
discuss with course instructor frist.
If you pick up one of other papers to give half hour presentation, then you can just
do one project as shown in below.
Others, you will do two projects as shown below.
Project one: Automatic Salient Object Detection and Image Classification (source code
for image segmentation will be provided)
Project two: Automic Video Shot Detection from MPEG Video Streams (MPEG decode source
code will be provided)