Papers by Rana Kashif Raza
IEEE Access
Design of a vision-based traffic analytic system for urban traffic video scenes has a great poten... more Design of a vision-based traffic analytic system for urban traffic video scenes has a great potential in context of Intelligent Transportation System (ITS). It offers useful traffic-related insights at much lower costs compared to their conventional sensor based counterparts. However, it remains a challenging problem till today due to the complexity factors such as camera hardware constraints, camera movement, object occlusion, object speed, object resolution, traffic flow density, and lighting conditions etc. ITS has many applications including and not just limited to queue estimation, speed detection and different anomalies detection etc. All of these applications are primarily dependent on sensing vehicle presence to form some basis for analysis. Moving cast shadows of vehicles is one of the major problems that affects the vehicle detection as it can cause detection and tracking inaccuracies. Therefore, it is exceedingly important to distinguish dynamic objects from their moving cast shadows for accurate vehicle detection and recognition. This paper provides an in-depth comparative analysis of different traffic paradigm-focused conventional and state-of-the-art shadow detection and removal algorithms. Till date, there has been only one survey which highlights the shadow removal methodologies particularly for traffic paradigm. In this paper, a total of 70 research papers containing results of urban traffic scenes have been shortlisted from the last three decades to give a comprehensive overview of the work done in this area. The study reveals that the preferable way to make a comparative evaluation is to use the existing Highway I, II, and III datasets which are frequently used for qualitative or quantitative analysis of shadow detection or removal algorithms. Furthermore, the paper not only provides cues to solve moving cast shadow problems, but also suggests that even after the advent of Convolutional Neural Networks (CNN)-based vehicle detection methods, the problems caused by moving cast shadows persists. Therefore, this paper proposes a hybrid approach which uses a combination of conventional and state-of-the-art techniques as a pre-processing step for shadow detection and removal before using CNN for vehicles detection. The results indicate a significant improvement in vehicle detection accuracies after using the proposed approach. INDEX TERMS Computer vision, convolutional neural networks, deep learning, generative adversarial networks, intelligent transportation system, moving cast shadow removal, vehicle shadows, vehicle detection. The associate editor coordinating the review of this manuscript and approving it for publication was Shaohua Wan.
Lecture Notes in Electrical Engineering, 2022
The increasingly dense traffic is becoming a challenge in our local settings, urging the need for... more The increasingly dense traffic is becoming a challenge in our local settings, urging the need for a better traffic monitoring and management system. Fine-grained vehicle classification appears to be a challenging task as compared to vehicle coarse classification. Exploring a robust approach for vehicle detection and classification into fine-grained categories is therefore essentially required. Existing Vehicle Make and Model Recognition (VMMR) systems have been developed on synchronized and controlled traffic conditions. Need for robust VMMR in complex, urban, heterogeneous, and unsynchronized traffic conditions still remain an open research area. In this paper, vehicle detection and finegrained classification are addressed using deep learning. To perform fine-grained classification with related complexities, local dataset THS-10 having high intraclass and low interclass variation is exclusively prepared. The dataset consists of 4250 vehicle images of 10 vehicle models, i.e.
We propose a state of the art fusion framework of Radio Frequency Identification (RFID) and Compu... more We propose a state of the art fusion framework of Radio Frequency Identification (RFID) and Computer Vision (CV) to support object recognition and tracking in a three dimensional space. Fusion can significantly improve performance in applications of autonomous vision and navigation and site monitoring, especially in outdoor environments. Increasing safety in construction zones and enhancing security in airports are important problems that involve understanding interactions between objects, machines and material and can be solved using sensor fusion and activity analysis. Identifying objects solely via vision is computationally costly, error prone, limited by occlusion, and sometimes impossible in practice. RFID can reliably identify tagged objects and can even localize targets at coarse spatial resolution. Alternatively, CV can increase the performance of RFID by fine tuning the location information and providing fuzzy features to avoid cloning or deception. Therefore, RFID and CV provide both overlapping and unique information for deciding on object ID, location, and motion. We have implemented stereo processing using commodity cameras and have used a commercial RFID based Real Time Location System (RTLS) for our experiments and have achieved encouraging results. The performance of both modalities was evaluated separately and in fused mode. In our stereo experiments outdoors we obtained an RMS accuracy of within ∼7.6 in ( 19.3 cm) for objects up to 80 ft (24.4 m ) away from the cameras. For real time trajectories, RTLS provided 2 m to ∼2.6 m location accuracy for dynamic tagged objects in a cell of 40×40 m with four readers. We propose a fusion based tracking algorithm and our research demonstrates benefits obtained when most objects are cooperative and tagged. We abstract the information structures in order to support a Site Safety System (S-3) with diverse information sources and constraints and processes that may not have knowledge of each other. We have used relaxation to control the integration of information from CV, RFID, and naive physics in tracking. The label elimination approach readily represents the ambiguity occurring in real-life applications. The key to reducing the computational requirements is to eliminate many labels at each filtering step while keeping those labels compatible with observation. As a post processing step to labeling, we have used total track smoothness for optimization to update computed tracks for increasing system tracking reliability. Work site analysis can proceed even when information from one sensor or information source is unavailable at some time instances. We have shown with simulations and real data that fusion can greatly increase tracking performance and can reduce computational cost and combination search space up to 99% in some cases. Test cases showed how fusion can solve some difficult tracking problems outdoors. We assessed performance of tracking using track error i.e fraction of wrong trajectory point assignments. For some object trajectories outdoors, the fused system reduced the track error from 0.53 to 0.13. The likelihood of producing correct object trajectories in regions partially or fully occluded to CV is also increased. We conclude that significant real-time decision-making should be possible if the S-3 system can integrate information effectively between the sensor level and activity understanding level. Engineering faster RFID updates will likely reduce the number of objects that can be sensed; however, this should be a favorable tradeoff in a construction site. Employing knowledge based constraints and analyzing systematically object track initiation and termination are some of the possible research expansions to be worked upon in the nearfuture.
Multimedia Tools and Applications, 2021
Anomalous activity recognition deals with identifying the patterns and events that vary from the ... more Anomalous activity recognition deals with identifying the patterns and events that vary from the normal stream. In a surveillance paradigm, these events range from abuse to fighting and road accidents to snatching, etc. Due to the sparse occurrence of anomalous events, anomalous activity recognition from surveillance videos is a challenging research task. The approaches reported can be generally categorized as handcrafted and deep learning-based. Most of the reported studies address binary classification i.e. anomaly detection from surveillance videos. But these reported approaches did not address other anomalous events e.g. abuse, fight, road accidents, shooting, stealing, vandalism, and robbery, etc. from surveillance videos. Therefore, this paper aims to provide an effective framework for the recognition of different real-world anomalies from videos. This study provides a simple, yet effective approach for learning spatiotemporal features using deep 3-dimensional convolutional networks (3D ConvNets) trained on the University of Central Florida (UCF) Crime video dataset. Firstly, the frame-level labels of the UCF Crime dataset are provided, and then to extract anomalous spatiotemporal features more efficiently a fine-tuned 3D ConvNets is proposed. Findings of the proposed study are twofold 1) There exist specific, detectable, and quantifiable features in UCF Crime video feed that associate with each other 2) Multiclass learning can improve generalizing competencies of the 3D ConvNets by effectively learning frame-level information of dataset and can be leveraged in terms of better results by applying spatial augmentation. The proposed study extracted 3D features by providing frame level information and spatial augmentation to a fine-tuned pretrained model, namely 3DConvNets. Besides, the learned features are compact enough and the proposed approach outperforms significantly from state of art approaches in terms of accuracy on anomalous activity recognition having 82% AUC.
Resources, Conservation and Recycling, 2022
In this paper, we have proposed a robust Printed Circuit Board (PCB) classification system based ... more In this paper, we have proposed a robust Printed Circuit Board (PCB) classification system based on computer vision and deep learning to assist sorting e-waste for recycling. We have used a public PCB dataset acquired using a conveyor belt, as well as a locally developed PCB dataset that represents challenging practical conditions such as varying lighting, orientation, distance from camera, cast shadows, viewpoints and different cameras/resolutions. A pre-trained EfficientNet-B3 deep learning model is utilized and retrained for use with our data in PCB classification context. Deep nets are designed for closed set recognition tasks capable of classifying only the images they have been trained for. We have extended the closed set nature of deep nets for use in our open set classification tasks which require identifying unknown PCBs apart from classifying known PCBs. We have achieved an open set average accuracy of 92.4% which is state of the art given the complexities in the datasets we worked with.
Processes, 2021
In the Intelligent Transportation System (ITS) realm, queue length estimation is one of an essent... more In the Intelligent Transportation System (ITS) realm, queue length estimation is one of an essential yet a challenging task. Queue lengths are important for determining traffic density in traffic lanes so that possible congestion in any lane can be minimized. Smart roadside sensors such as loop detectors, radars and pneumatic road tubes etc. are promising for such tasks though they have a very high installation and maintenance cost. Large scale deployment of surveillance cameras have shown a great potential in the collection of vehicular data in a flexible way and are also cost effective. Similarly, vision-based sensors can be used independently or if required can also augment the functionality of other roadside sensors to effectively process queue length at prescribed traffic lanes. In this research, a CNN-based approach for estimation of vehicle queue length in an urban traffic scenario using low-resolution traffic videos is proposed. The queue length is estimated based on count o...
Applied Sciences, 2020
Active Learning (AL) for Hyperspectral Image Classification (HSIC) has been extensively studied. ... more Active Learning (AL) for Hyperspectral Image Classification (HSIC) has been extensively studied. However, the traditional AL methods do not consider randomness among the existing and new samples. Secondly, very limited AL research has been carried out on joint spectral–spatial information. Thirdly, a minor but still worth mentioning factor is the stopping criteria. Therefore, this study caters to all these issues using a spatial prior Fuzziness concept coupled with Multinomial Logistic Regression via a Splitting and Augmented Lagrangian (MLR-LORSAL) classifier with dual stopping criteria. This work further compares several sample selection methods with the diverse nature of classifiers i.e., probabilistic and non-probabilistic. The sample selection methods include Breaking Ties (BT), Mutual Information (MI) and Modified Breaking Ties (MBT). The comparative classifiers include Support Vector Machine (SVM), Extreme Learning Machine (ELM), K-Nearest Neighbour (KNN) and Ensemble Learnin...
Journal of Visual Communication and Image Representation, 2017
An adaptive algorithm that formulates an energy based stochastic segmentation with a level set me... more An adaptive algorithm that formulates an energy based stochastic segmentation with a level set methodology is proposed.The hybrid method uses global and local energies, which are efficient in matching, segmenting and tracing anatomic structures by exploiting constraints computed from the data of the image. The algorithm performs autonomous stochastic segmentation of tumor in Magnetic Resonance Imaging (MRI) by combining region based level sets globally and three established energies (uniform, separation and histogram) in a local framework. The local region is defined by the segmentation boundary which, in the case of level set method, consists of global statistics and local energies of every individual point and the local region is then updated by minimizing (or maximizing) the energies. For analysis, the algorithm is tested on low grade and high grade MR images dataset. The obtained results show that the proposed methodology provides similarity between segmented and truth image up to 89.5% by dice method, and minimum distance of 0.5(mm) by Hausdorff algorithm. This adaptive stochastic segmentation algorithm can also be used to compute segmentation when binary thresholding level is greater than 0.2.
2016 International Conference on Frontiers of Information Technology (FIT), 2016
This paper presents a novel landmark based audio fingerprinting algorithm for matching naval vess... more This paper presents a novel landmark based audio fingerprinting algorithm for matching naval vessels' acoustic signatures. The algorithm incorporates joint time - frequency based approach with parameters optimized for application to acoustic signatures of naval vessels. The technique exploits the relative time difference between neighboring frequency onsets, which is found to remain consistent in different samples originating over time from the same vessel. The algorithm has been implemented in MATLAB and trialed with real acoustic signatures of submarines. The training and test samples of submarines have been acquired from resources provided by San Francisco National Park Association [14]. Storage requirements to populate the database with 500 tracks allowing a maximum of 0.5 Million feature hashes per track remained below 1GB. On an average PC, the database hash table can be populated with feature hashes of database tracks @ 1250 hashes/second achieving conversion of 120 seconds of audio data into hashes in less than a second. Under varying attributes such as time skew, noise and sample length, the results prove algorithm robustness in identifying a correct match. Experimental results show classification rate of 94% using proposed approach which is a considerable improvement as compared to 88% achieved by [17] employing existing state of the art techniques such as Detection Envelope Modulation On Noise (DEMON) [15] and Low Frequency Analysis and Recording (LOFAR) [16].
2016 Future Technologies Conference (FTC), 2016
The applications of computer vision are widely used in traffic monitoring and surveillance. In tr... more The applications of computer vision are widely used in traffic monitoring and surveillance. In traffic monitoring, detection of vehicles plays a significant role. Different attributes such as shape, color, size, pose, illumination, shadows, occlusion, background clutter, camera viewing angle, speed of vehicles and environmental conditions pose immense and varying challenges in the detection phase. The native urban datasets namely NIPA and TOLL PLAZA acquired in complex traffic environment are used for research analysis. The selected datasets include varying attributes highlighted above. The NIPA dataset has total of 1516 vehicles whereas the TOLL PLAZA dataset contains 376 vehicles in an entire video sequence. This paper provides comparative analysis and insight on performance of cascade of boosted classifier using Haar features versus statistical analysis using blobs. Haar features help effectively in extracting discernible regions of interest in complex traffic scenes and has minimum false detection rate as compared to blob analysis. The detection results obtained from the trained Haar cascade classifier for NIPA and TOLL PLAZA datasets have 83.7% and 88.3% accuracy respectively. In contrast blob analysis has detection accuracy of only 43.8% for NIPA and 65.7% for TOLL PLAZA datasets.
2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, 2008
Most of the brain's cognitive functions rely on the coordinated interacti... more Most of the brain's cognitive functions rely on the coordinated interactions of neuronal sources that are distributed within and across specialized brain areas. It is important to quantify these temporal interactions directly from neuroimaging data such as the electroencephalogram (EEG). A variety of measures have been proposed to quantify the neural interactions including linear correlation measures and nonlinear information theoretic measures. An important aspect of neural interactions is the direction of the information flow, i.e., the causal interaction between the different regions. In this paper, we propose using a directed transinformation measure (T measure) to quantify these causal interactions. This measure is a generalization of Granger causality and quantifies both the linear and nonlinear interactions between the signals. The proposed measure is applied to both simulated and real EEG signals and is shown to be sensitive to the dependencies between signals.
SPIE Proceedings, 2012
Ensuring security in high risk areas such as an airport is an important but complex problem. Effe... more Ensuring security in high risk areas such as an airport is an important but complex problem. Effectively tracking personnel, containers, and machines is a crucial task. Moreover, security and safety require understanding the interaction of persons and objects. Computer vision (CV) has been a classic tool; however, variable lighting, imaging, and random occlusions present difficulties for real-time surveillance, resulting in erroneous object detection and trajectories. Determining object ID via CV at any instance of time in a crowded area is computationally prohibitive, yet the trajectories of personnel and objects should be known in real time. Radio Frequency Identification (RFID) can be used to reliably identify target objects and can even locate targets at coarse spatial resolution, while CV provides fuzzy features for target ID at finer resolution. Our research demonstrates benefits obtained when most objects are "cooperative" by being RFID tagged. Fusion provides a method to simplify the correspondence problem in 3D space. A surveillance system can query for unique object ID as well as tag ID information, such as target height, texture, shape and color, which can greatly enhance scene analysis. We extend geometry-based tracking so that intermittent information on ID and location can be used in determining a set of trajectories of N targets over T time steps. We show that partial-targetinformation obtained through RFID can reduce computation time (by 99.9% in some cases) and also increase the likelihood of producing correct trajectories. We conclude that real-time decision-making should be possible if the surveillance system can integrate information effectively between the sensor level and activity understanding level.
2017 International Conference on Frontiers of Information Technology (FIT), 2017
This paper presents a handwritten character recognition comparison and performance evaluation for... more This paper presents a handwritten character recognition comparison and performance evaluation for robust and precise classification of different handwritten characters. The system utilizes advanced multilayer deep neural network by collecting features from raw pixel values. The hidden layers stack deep hierarchies of non-linear features since learning complex features from conventional neural networks is very challenging. Two state of the art deep learning architectures were used which includes Caffe AlexNet [5] and GoogleNet models [6] in NVIDIA DIGITS [10]. The frameworks were trained and tested on two different datasets for incorporating diversity and complexity. One of them is the publicly available dataset i.e. Chars74K [4] comprising of 7705 characters and has upper and lowercase English alphabets, along with numerical digits. While the other dataset created locally consists of 4320 characters. The local dataset consists of 62 classes and was created by 40 subjects. It also consists upper and lowercase English alphabets, along with numerical digits. The overall dataset is divided in the ratio of 80% for training and 20% for testing phase. The time required for training phase is approximately 90 minutes. For validation part, the results obtained were compared with the groundtruth. The accuracy level achieved with AlexNet was 77.77% and 88.89% with Google Net. The higher accuracy level of GoogleNet is due to its unique combination of inception modules, each including pooling, convolutions at various scales and concatenation procedures.
Uploads
Papers by Rana Kashif Raza