Indexed Dataset from YouTube for a Content-Based Video Search Engine

Adly, Ahmad Sedky; Hegazy, Islam; Elarif, Taha; Abdelwahab, M. S.

doi:10.21608/ijicis.2021.68816.1072

Indexed Dataset from YouTube for a Content-Based Video Search Engine

Document Type : Original Article

Authors

¹ Misr University for Science and Technology

² Faculty of Computer and Information Sciences

³ Computer Science Dep., Faculty of Computer and Information Sciences, Ain Shams University- Egypt

⁴ Computer Science Dept. Faculty of Info. Technology Misr University for Science & Technology

10.21608/ijicis.2021.68816.1072

Abstract

Numerous researches on content-based video indexing and retrieval besides video search engines are tied to a large-scaled video dataset. Unfortunately, reduction in open-sourced datasets resulted in complications for novel approaches exploration. Although, video datasets that index video files located on public video streaming services have other purposes, such as annotation, learning, classification, and other computer vision areas, with little interest in indexing public video links for purpose of searching and retrieval. This paper introduces a novel large-scaled dataset based on YouTube video links to evaluate the proposed content-based video search engine, gathered 1088 videos, that represent more than 65 hours of video, 11,000 video shots, and 66,000 unmarked and marked keyframes, 80 different object names used for marking. Moreover, a state-of-the-art features vector, and combinational-based matching, beneficial to the accuracy, speed, and precision of the video retrieval process. Any video record in the dataset is represented by three features: temporal combination vector, object combination vector with shot annotations, and 6 keyframes, sideways with other metadata. Video classification for the dataset was also imposed to expand the efficiency of retrieval of video-based queries. A two-phased approach has been used based on object and event classification, storing video records in aggregations related to feature vectors extracted. While object aggregation stores video records with the maximal occurrence of extracted object/concept from all shots, event aggregation classify based on groups according to the number of shots per video. This study indexed 58 out of 80 different object/concept categories, each has 9 shot number groups.

Keywords