Modified ResNet-152 Network With Hybrid Pyramidal Pooling for Local Change Detection

Panda M.K.; Subudhi B.N.; Veerakumar T.; Jakhetiya V.

Full metadata record

DC Field	Value	Language
dc.contributor.author	Panda M.K.	en_US
dc.contributor.author	Subudhi B.N.	en_US
dc.contributor.author	Veerakumar T.	en_US
dc.contributor.author	Jakhetiya V.	en_US
dc.date.accessioned	2023-11-30T08:45:22Z	-
dc.date.available	2023-11-30T08:45:22Z	-
dc.date.issued	2023	-
dc.identifier.issn	2691-4581	-
dc.identifier.other	EID(2-s2.0-85166773512)	-
dc.identifier.uri	https://dx.doi.org/10.1109/TAI.2023.3299903	-
dc.identifier.uri	http://localhost:8080/xmlui/handle/123456789/685	-
dc.description.abstract	Background subtraction is an essential step in many computer vision tasks. In this article, we put forth a unique attempt to detect the local changes in challenging video scenes by exploring the capabilities of an encoder-decoder type network that employs a modified ResNet-152 architecture with a multi-scale features extraction framework. The proposed encoder network consists of a modified ResNet-152 network where the initial two blocks are freeze and the weights of the last blocks are learned using a transfer learning mechanism. The said encoder can reduce the computational complexity of the proposed model and extract fine as well as coarse-scale features. We have proposed a multiscale features extraction (MFE) mechanism block which is a hybridization of pyramidal pooling architecture (PPA), and various atrous convolutional layers where the high-level features from the encoder network are utilized to extract features at various scales. The use of PPA in the MFE block preserves maximum value in every pooling area, to retain the contextual relationship between the pixels in the complex video frames that can handle various challenging scenes. The proposed decoder network consists of stacked transposed convolution layers that learn a mapping from feature space to image space, predicting a score map. Then, a threshold is applied on the score map to get the binary class labels as the background and foreground. The shortcut connections followed by global average pooling (GAP) drive the low-level feature coefficients from the encoder network to the decoder network to enhance the feature representation. The performance of the proposed scheme is validated by testing it against thirty-one state-of-the-art techniques. The results obtained by the proposed method are corroborated qualitatively as well as quantitatively. Further, the efficacy of the proposed algorithm is verified with an unseen video setup and is found to provide better performance. IEEE	en_US
dc.language.iso	en	en_US
dc.publisher	Institute of Electrical and Electronics Engineers Inc.	en_US
dc.source	IEEE Transactions on Artificial Intelligence	en_US
dc.subject	Background subtraction	en_US
dc.subject	Computer architecture	en_US
dc.subject	Computer vision	en_US
dc.subject	Convolutional neural networks	en_US
dc.subject	Decoding	en_US
dc.subject	deep neural network	en_US
dc.subject	Feature extraction	en_US
dc.subject	multi-scale features extraction block	en_US
dc.subject	Object detection	en_US
dc.subject	Surveillance	en_US
dc.title	Modified ResNet-152 Network With Hybrid Pyramidal Pooling for Local Change Detection	en_US
dc.type	Journal Article	en_US
Appears in Collections:	Journal Article