Content-based Indexing and Retrieval of Videos

The last decade has witnessed a tremendous increase in the digital multimedia data including images, audio, videos as well as News, entertainment and sports channels. Such enormous collections of videos have opened up a whole new world of challenges to develop smart content retrieval systems allowing users an efficient and effective retrieval of desired content. Traditional video search engines rely on matching the provided query words with user-assigned tags for retrieval purposes. Such systems do not take into accoamunt the actual content of video which itself can serve as a semantic index for robust and efficient retrieval. Content-based search engines, on the other hand, exploit the rich visual, audio and textual content to index and subsequently retrieve videos. Among these, the focus of our current product lies on the textual (News tickers, scorecards, credits etc.) and audio (spoken words) content. More specifically, we have developed a comprehensive video retrieval system that allows users to input (query) keywords and retrieve all videos where the keyword has appeared (either in speech on in the form of caption text). State-of-the-art deep learning-based algorithms are applied for text detection and recognition while an off-the-shelf API is employed for speech recognition. Taking into account the local needs, we have targeted text and audio in two languages, Urdu and English but the system can easily be extended to other languages as well. Once the videos are indexed, users can provide query keyword and all the videos (and frames) where the keyword has appeared are returned to the user. The system is developed as a desktop application while retrieval can also be supported through web if required.

The last decade has witnessed a tremendous increase in the digital multimedia data including images, audio, videos as well as News, entertainment and sports channels. Such enormous collections of videos have opened up a whole new world of challenges to develop smart content retrieval systems allowing users an efficient and effective retrieval of desired content. Traditional video search engines rely on matching the provided query words with user-assigned tags for retrieval purposes. Such systems do not take into account the actual content of video which itself can serve as a semantic index for robust and efficient retrieval. Content-based search engines, on the other hand, exploit the rich visual, audio and textual content to index and subsequently retrieve videos. Among these, the focus of our current product lies on the textual (News tickers, scorecards, credits etc.) and audio (spoken words) content. More specifically, we have developed a comprehensive video retrieval system that allows users to input (query) keywords and retrieve all videos where the keyword has appeared (either in speech on in the form of caption text). State-of-the-art deep learning-based algorithms are applied for text detection and recognition while an off-the-shelf API is employed for speech recognition. Taking into account the local needs, we have targeted text and audio in two languages, Urdu and English but the system can easily be extended to other languages as well. Once the videos are indexed, users can provide query keyword and all the videos (and frames) where the keyword has appeared are returned to the user. The system is developed as a desktop application while retrieval can also be supported through web if required.

Funding Body: IGNITE, National Technology Fund.
Partner: Associated Press of Pakistan (APP)

Project Team

  • Dr. Imran Siddiqi – Principal Investigator
  • Dr. Shehzad Khalid – Co-Principal Investigator
  • Dr. Ahmad Salman – Team Lead (Audio)
  • Muhammad Atif
  • Osama Zeeshan
  • Muhammad Numan Khan
  • Mazahir Hussain
  • Syed Ghulam Mustafa
  • Wajahat Nawaz
  • Ghulam Ali Mirza – PhD Scholar
  • Umar Hayat – MS Student
  • Shanza Ejaz – MS Student
  • Hamayoun Ahmad – MS Student
  • Saim Danish – BS Student
  • Manghangir Sham Sunder – BS Student
  • Ashbal Sohail – BS Student
  • Danish Yasin – BS Student

Publications

  1. Mirza, A., Zeshan, O., Atif, M., & Siddiqi, I. (2020). Detection and recognition of cursive text from video frames. EURASIP Journal on Image and Video Processing, 2020(1), 1-19.
  2. Mirza, A., & Siddiqi, I. (2020). Recognition of cursive video text using a deep learning framework. IET Image Processing, 14(14), 3444-3455.
  3. Mirza, A., Siddiqi, I., Hayat, U., Atif, M., & Mustufa, S. G. (2020, October). Recognition of Cursive Caption Text Using Deep Learning-A Comparative Study on Recognition Units. In International Conference on Pattern Recognition and Artificial Intelligence (pp. 156-167). Springer, Cham.
  4. Mirza, A., Siddiqi, I., Mustufa, S. G., & Hussain, M. (2019, July). Impact of pre-processing on recognition of cursive video text. In Iberian Conference on Pattern Recognition and Image Analysis (pp. 565-576). Springer, Cham.
  5. Hayat, U., Aatif, M., Zeeshan, O., & Siddiqi, I. (2018, November). Ligature recognition in Urdu caption text using deep convolutional neural networks. In 2018 14th International Conference on Emerging Technologies (ICET) (pp. 1-6). IEEE. (Best Paper Award)
  6. Mirza, A., Fayyaz, M., Seher, Z., & Siddiqi, I. (2018, March). Urdu caption text detection using textural features. In Proceedings of the 2nd Mediterranean Conference on Pattern Recognition and Artificial Intelligence (pp. 70-75). (Best Paper Award).

Downloads

Ground Truth Labeling Tool

  • Ground truth labeling Software can be downloaded here with user manual.
  • Dataset UTiV can be downloaded here by signing the license agreement.

Project Demo

Play Video
Output of Text Detector