Skip to Main content Skip to Navigation

Object detection and traffic prediction using Deep Learning on compressed road images and videos

Abstract : The PhD thesis is a CIFRE carried out with Actemium Paris Transport, a company that operates in the field of lntelligent Transport Systems (ITS) and, in particular, provides solutions for the surveillance of road tunnels. In the thesis, we address the learning of efficient deep learning models that directly process compressed images/videos to lower the computation resource requirements and to allow for large scale deployment of the solutions. More specifically, we target two types of compression, JPEG image compression and MPEG4 part-2 video compression, for two specific applications: object detection and traffic flow rate estimation. The first contribution focuses on object detection in JPEG compressed images. As the JPEG algorithm compresses the images from a spatial representation into a tiled frequency space, the main challenge is to design detection models able to correctly estimate the position of objects based on the frequency representation. Using JPEG compressed images as inputs, we investigate deep learning architectures for object detection and demonstrate a x 1.7 speed up at detection time, while only reducing the detection performance by 5.5%. Moreover, we empirically demonstrate that only part of the compressed information, namely the luminance component, is required to match the accuracy of the full input methods. Our second contribution addresses the problem of estimating the flow rate (number of vehicles/unit of time) from MPEG4 part-2 compressed video streams issued from road surveillance cameras. The MPEG4 part-2 compression algorithm uses a coarse representation of the pixel flow across frames to reduce the size of the videos to be encoded. Therefore, the approximate flow representation appears relevant to estimate the flow rate, while reducing the computation and memory requirements. We propose multiple end-to-end deep learning architectures using this coarse pixel flow representation as input. Using these models, we demonstrate that predicting the flow rate directly from MPEG4 part-2 compressed video streams can be achieved, while reaching improved accuracy in comparison with a more classical RGB-based model. We also show an impressive speed up of x3200. Furthermore, as training data may be scarce due to practical constraints, we explore domain adaptation to transfer learned models from one camera to another and provide with a thorough analysis of the constraints that may impede such transfer.
Complete list of metadata
Contributor : ABES STAR :  Contact
Submitted on : Monday, March 28, 2022 - 12:38:10 PM
Last modification on : Tuesday, March 29, 2022 - 4:07:28 AM
Long-term archiving on: : Wednesday, June 29, 2022 - 6:45:39 PM


Version validated by the jury (STAR)


  • HAL Id : tel-03621557, version 1


Benjamin Deguerre. Object detection and traffic prediction using Deep Learning on compressed road images and videos. Computer Vision and Pattern Recognition [cs.CV]. Normandie Université, 2021. English. ⟨NNT : 2021NORMIR28⟩. ⟨tel-03621557⟩



Record views


Files downloads