Skip navigation

Please use this identifier to cite or link to this item: http://10.10.120.238:8080/xmlui/handle/123456789/685
Full metadata record
DC FieldValueLanguage
dc.contributor.authorPanda M.K.en_US
dc.contributor.authorSubudhi B.N.en_US
dc.contributor.authorVeerakumar T.en_US
dc.contributor.authorJakhetiya V.en_US
dc.date.accessioned2023-11-30T08:45:22Z-
dc.date.available2023-11-30T08:45:22Z-
dc.date.issued2023-
dc.identifier.issn2691-4581-
dc.identifier.otherEID(2-s2.0-85166773512)-
dc.identifier.urihttps://dx.doi.org/10.1109/TAI.2023.3299903-
dc.identifier.urihttp://localhost:8080/xmlui/handle/123456789/685-
dc.description.abstractBackground subtraction is an essential step in many computer vision tasks. In this article, we put forth a unique attempt to detect the local changes in challenging video scenes by exploring the capabilities of an encoder-decoder type network that employs a modified ResNet-152 architecture with a multi-scale features extraction framework. The proposed encoder network consists of a modified ResNet-152 network where the initial two blocks are freeze and the weights of the last blocks are learned using a transfer learning mechanism. The said encoder can reduce the computational complexity of the proposed model and extract fine as well as coarse-scale features. We have proposed a multiscale features extraction (MFE) mechanism block which is a hybridization of pyramidal pooling architecture (PPA), and various atrous convolutional layers where the high-level features from the encoder network are utilized to extract features at various scales. The use of PPA in the MFE block preserves maximum value in every pooling area, to retain the contextual relationship between the pixels in the complex video frames that can handle various challenging scenes. The proposed decoder network consists of stacked transposed convolution layers that learn a mapping from feature space to image space, predicting a score map. Then, a threshold is applied on the score map to get the binary class labels as the background and foreground. The shortcut connections followed by global average pooling (GAP) drive the low-level feature coefficients from the encoder network to the decoder network to enhance the feature representation. The performance of the proposed scheme is validated by testing it against thirty-one state-of-the-art techniques. The results obtained by the proposed method are corroborated qualitatively as well as quantitatively. Further, the efficacy of the proposed algorithm is verified with an unseen video setup and is found to provide better performance. IEEEen_US
dc.language.isoenen_US
dc.publisherInstitute of Electrical and Electronics Engineers Inc.en_US
dc.sourceIEEE Transactions on Artificial Intelligenceen_US
dc.subjectBackground subtractionen_US
dc.subjectComputer architectureen_US
dc.subjectComputer visionen_US
dc.subjectConvolutional neural networksen_US
dc.subjectDecodingen_US
dc.subjectdeep neural networken_US
dc.subjectFeature extractionen_US
dc.subjectmulti-scale features extraction blocken_US
dc.subjectObject detectionen_US
dc.subjectSurveillanceen_US
dc.titleModified ResNet-152 Network With Hybrid Pyramidal Pooling for Local Change Detectionen_US
dc.typeJournal Articleen_US
Appears in Collections:Journal Article

Files in This Item:
There are no files associated with this item.
Show simple item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.