Fall 2008 ITCS6157: Visual Databases

What we will study:

We can access text document in databases easily now. However, the people never stop dreaming, can we access images or videos by using their visual perceptions? We can support image or video databases access by using a simple approach: ask people to annotate the images or videos in databases by using keywords based on their understanding, and then these images or videos in databases are indexed by using these selected keywords, thus the viusal database (image and video databases) access problem has been transformed to the traditional text databases. Unfortunately, different people has different understanding of the semantic meaning of the same image or video, or even the same people may have different feeling of the same image or video at different situation. Therefore, it is very important to support image and video access via their visual perceptions. In this course, we will focus on how to support content-based image or video database access via their visual perceptions not keywords of manual text annotation.

Course Outline:

  • Image and video coding standards, such as JPEG, JPEG2000, MPEG-1, 2, 4;
  • Image and Video description standard: MPEG-7 and XML;
  • Image and video analysis techniques;
  • Image search engine;
  • Video search engine;
  • Industry issues on image and video search engine;
  • Image and video streaming over networks;
  • Visual Database security issue;
  • Current research issues in image and video search engine;
  • Open discussion: Visual Databases: Who cares?
  • Suggested Text Books (optional):

  • A. Rosenfeld, D. Doermann, D. DeMenthon, ``Video Mining", Kluwer Academic Publishers, 2003.
  • Yihong Gong, W. Xu, ``Machine Learning for Multimedia Content Analysis", Springer, 2007.
  • others
  • Articles and journal papers.
  • Grading Format:

  • Understanding research papers 10%
  • Project and presentation 25%
  • Middle and Final test 65%
  • Classroom: Woodward Hall 130

    Class Time: Monday 11AM-2PM

    Instructor Office Time:

  • Monday 3:00PM-6:00PM or make appointment
  • Mid-Term Test: Oct. 27, 2008

    Papers for Student Presentation:

  • Informedia Project at CMU: paper 1, Paper 2, Paper 3, Paper 4. presented by Protik Maitra, Oct. 20, 2008
  • Project at Columbia University: Paper 1 , Paper 2 , Paper 3 , Paper 4 , Paper 5 . presented by Swapna Savant, Oct. 20, 2008
  • Project at University of Amsterdam: Paper 1 , Paper 2 , Paper 3 , Paper 4 . presented by Siddharth Palaniswami, Nov. 3, 2008
  • Concept Ontology for Text Classification: Paper 1 , Paper 2 , Paper 3 , Paper 4 , presented by Bhumika Thakker, Nov.3, 2008
  • Concept Ontology for Multimedia Classification: Paper 1 , Paper 2 , Paper 3 , Paper 4 , presented by Garima Jain, Nov.10, 2008
  • Volume-Based Video Representation: Paper 1, Paper 2 , Paper 3 , Paper 4, presented by Anuradha Venkataraman, Nov. 10, 2008
  • Video/Image Visualization: Paper 1 , Paper 2 , Paper 3 , Paper 4 . presented by Vishal Sheth, Nov. 17, 2008
  • Reports for project implementation at Nov. 16
  • ``Sharing visual features for multiclass and multiview object detection", presented by Vikram Kalmegh, Nov. 17, 2008
  • S. Tong, E. Chang, ``Support Vector Machine Active Learning for Image Retrieval",, ACM Multimedia, presented by Morris S. LeBlanc, Nov. 17, 2008
  • K. Barnard, D. Forth, ``Learning the Semantics of Words and Pictures", ICCV, presented by Swarupsingh Baran, Nov. 24, 2008
  • A. Mojsilovic, et al., ``Matching and Retrieval Based on the Vocabulary and Grammar of Color Patterns", , IEEE Trans. on Image Processing,
  • C. Carson, et al., ``Blobworld: Image Segmentation using EM and its Application to Image Querying", , IEEE Trans. on PAMI,
  • Y. Chen and J. Wang, ``Image Categorization by Learning and Reasoning with Regions", J. of Machine Learning Research, 2004. paper 2 paper 3 presented by Darshit Parekh, Nov. 24, 2008
  • A. Oliva, et al., ``Modeling the shape of the scene: A Holistic Representation of the Spatial Envelop", Int. J. of Computer Vision,
  • W.H. Adams, et al., ``Semantic Indexing of Multimedia Content using Visual, Audio, and Text Cue", EURASIP, presented by Sriram Tata, Nov. 24, 2008
  • M. Naphade et al., ``A Factor Graph Framework for Semantic Video Indexing", IEEE Trans. on CSVT,
  • Z. Ghahramani et al, ``Factorial Hidden Markov Models", Machine Learning.
  • Project demonstration(Dec.1, 2008): (a) shot detection (Li Yu, Debamit Dutta, Zhiyong Guo); (b) semantic image classification (Daniel McIntyre, Sean Crilley, Yinbo Li, Laura Vandivier, Wenwen Dou).
  • Course Projects:

  • If you pick one of the following specific topics for presentation: Informeida at CMU, Projects at Columbia University, Projects at University of Amsterdam, Concept Ontology for Text Classification, Concept Ontology for Multimedia Classification, Volume-Based Video Representation, Video/Image Visualization, you will work on reading all these given papers and summarize and give at least 1.2 hours presentation. Before you do that, you have to discuss with course instructor frist.
  • If you pick up one of other papers to give half hour presentation, then you can just do one project as shown in below.
  • Others, you will do two projects as shown below.
  • Project one: Automatic Salient Object Detection and Image Classification (source code for image segmentation will be provided)
  • Project two: Automic Video Shot Detection from MPEG Video Streams (MPEG decode source code will be provided)
  • Project Description
  • Schedule:

  • Introduction: Topics and issues
  • Introduction
  • JPEG Image Coding Standard
  • Content-Based Image Analysis
  • Semantic Image Classification Techniques
  • Personalized Image Recommendation
  • Relevance Feedback for Image Retrieval
  • CBIR: challenges and chances
  • CBIR: challenges and chances
  • MPEG Video Coding Standard
  • Video Shot Detection From MPEG Bit Stream
  • Video Object Detection for Video Database Indexing
  • Semantic Video Classification Techniques
  • Personalized News Recommendation
  • MPEG-4 Object-Based Video Coding Standard
  • Traditional Techniques for Video Database Indexing
  • Bayesian Learning for Classification
  • Multimedia System Design
  • Network Protocols for Video Transmission
  • QoS Control for Video Transmission
  • Video Transcoding for Adaptive Streaming
  • Database and System Security Issues
  • Security Design for Video Database System
  • Existing Systems Introduction: IBM QBIC
  • Student Presentations (schedule will given after assignment).
  • Challenges for Visual Database: Open Discussion
  • Final Test!

  • Do what you can, with what you have, where you are! ---Theodore Roosevelt---