This might be suitable with a 7x7 grid on PASCAL VOC, but what about more cluttered datasets such as SUN 2012? Through the analysis, we reach many remarks and insightful results. Request PDF | You Only Look Once: Unified, Real-Time Object Detection | We present YOLO, a unified pipeline for object detection. HarshilJain26 259 views. The upper bound recall given the current grid size on PASCAL VOC is 93%. doi:10.1109/isceic53685.2021.00093, Montavon, G., Samek, W., and Mller, K.-R. (2018). It randomly initializes one center c1. Moreover, as the authors admit in all fairness, it becomes impossible to detect several small nearby objects, as at most one object per grid cell can be detected. The trained U-Net is then used to segment defective insulators from the rest part of the images. The performances of different anchor selection methods are shown in Section 4.2.3. We present YOLO, a new approach to object detection. Detection of Power Line Insulator Defects Using Aerial Images Analyzed with Convolutional Neural Networks. In summary of this article, insulator defect detection technology is the key to maintaining the normal operation of the power grid. We apologize for not highlighting these contributions in our draft, which unfortunately lead to YOLO being perceived as a re-engineering of [20]. Cascade R-Cnn: Delving into High Quality Object Detection, in Proceedings of the IEEE conference on computer vision and pattern recognition. It can be seen from Table 6 that the Cluster-NMS method mainly improves the speed of calculation and image processing, with FPS increasing from 100 to 156, while improving other indexes a little. Overall this approach is interesting and the paper is presented well. In this way, the FPN structure conveys strong semantic information from the top down, while the PAN feature pyramid conveys strong positioning information from the bottom up. These encode both the probilities of that class appearing in the box and how well the predicted box fits the object. line 314: the authors need to expand on [20], which is only stated as the 'most similar design', without any further description/discussion. FIGURE 4. After that, the feature map is passed to the neck. In addition, we propose three new methods and explain how we come up with these methods. While it can quickly identify objects in images it struggles to precisely localizesome objects, especially small ones. Our unified architecture is extremely fast. Different performances of detection methods. By working in pairs, the deeper information hidden in the feature map is extracted. We propose a more reasonable anchor generation method, namely AFK-MC2. Its streamlined design makes it suitable for various applications and easily adaptable to different hardware platforms, from edge devices to cloud APIs. As shown in Figure 8, the left one is predicted by YOLOv5 with NMS, and the right one is predicted by YOLOv5 with Cluster-NMS. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. If the base Finally, the defective insulators are attached to different backgrounds. In the past few years, defect detection methods have been divided into three categories (Wen, Luo et al., 2021), including physical methods, traditional vision-based methods, and deep learning based methods. As shown in Figure 3, after slicing, the 443 image is sliced into a 2212 feature map. 2) Since YOLOv5 itself detects some natural scenes in the real world, some anchors setting by default are not suitable for defect detection, this article introduces Assumption-free K-MC2 (AFK-MC2) algorithm into YOLOv5 to modify the K-means algorithm to improve accuracy and speed. In 2020, YOLOv4 (Bochkovskiy, Wang et al., 2020) was inspired by CSPNet (Cross Stage Partial Network) and formed the CSPDarknet-53 network as its backbone. To overcome the shortcomings of traditional vision-based methods, Tao, Zhang et al. However, working outdoors for a long time, insulators often have defects because of various environmental and weather conditions, which affect the normal operation of transmission lines and even cause huge economic losses. You can use the Image Labeler, Speed is the main advantage since the network processes the image only once and detects the objects. directly on detection performance. Cybern. We will rephrase to clarify this, and also include the missing citations (Alexe, Russell, Harzallah). Yu, J. Traditional methods are based on experience to design features, such as widely used Harr, HOG, and sift features, and their advantage is speed. Add the detection subnetworks to any of the layers in the Publishing. Anchor box offsets Refine the anchor box position. WebIntroducing Ultralytics YOLOv8, the latest version of the acclaimed real-time object detection and image segmentation model. Syst. Prior work on object detection repurposes classifiers to perform detection. Perhaps the main downside in the line of argument by the authors is that, with the advent of Fast R-CNN, the only significant computational advantage left to gain is removing the region proposal computation. In the prediction stage, this study considers the factors that limit performance and proposes three new methods for loss function, anchor generation, and anchor selection. Although the YOLO by itself does not achieve the best performance, fusing it with Fast R-CNN does improve the performance. It first frames object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. [2] Redmon, Joseph, Santosh NIPS audience is interested in fast object detection algorithms. This is a Keras implementation of YOLO, and YOLOv2. 3) Neck to process the feature map and dig the deeper information. EHV Branch Company of State Grid Zhejiang Electric Power Co., Ltd, Hangzhou, China. In the previous section, we performed several comparative experiments to demonstrate the three new methods when used alone. Deep learning requires good computer configuration, and the accuracy and speed of the same method can vary greatly in different configurations. This is a problem, as upon inspection I found that work to offer the same idea, for the same purpose, but in another kind of dataset. Difference clustering performance between K-means algorithm (A) and K-means + algorithm (B), where (B) seems more reasonable. Remote Sens. In order to achieve reproducibility, our experiment configuration is listed in Table 1. Inspired by GoogleNet which is 22 layers (L135), we added 4 convolutional layers following the work of [21] (L150). 11x11: 97.3% Citation Hailan Yu and Weili Chen 2021 J. If a box A belongs to the cluster, the IoU of other boxes in this cluster must greater than the threshold T, and the IoU of boxes in other clusters should be less than the threshold T. Let us take a simple example, as shown in Figure 7, the black, red, blue, and orange boxes constitute a cluster, while the two green boxes constitute a cluster. line 268: the authors say here that the proposed system performs NMS all concurrently inside the network itself. Finally, we have an overview of the development of YOLO. A smaller version of the network, Wang, C. (2022). doi:10.9790/2834-1104013444. Compared to state-of-the-art Similar methods include the HOG (histogram of oriented gradient)+SVM (support vector machine) algorithm (Dadi and Pillutla,2016), the improved MPEG-7 EHD (edge histogram method) technique (Li, 2010), and the global minimization active contour model (Wu, An et al., 2012). Abstract: We present YOLO, a unified pipeline for object detection. Then calculate the center of each category again: For the new center position, reclassify the data point x. directly on detection performance. extraction network can be used as a detection network source. However, when two boxes belong to the containment relationship, the following Figure 5 is used as an example: GIoU degenerates into IOU and cannot distinguish its relative position. Syst. (2017). IEEE J. Sel. This dataset is divided into two parts: Normal insulators contain the normal insulators captured by UAV, and the number of the normal insulator images is 600; Defective insulators contain the insulators with defects, and the number of the defective insulator images is 248. to artwork on both the Picasso Dataset and the People-Art Dataset. Appl. In this case, the loss function cannot be derived. In general, it is necessary to optimize the hyperparameters and select a group of optimal hyperparameters for the learning machine to improve the performance and effect of learning. A single to predict false detections where nothing exists. The width and height are predicted relative to whole image. However, using to reflect the difference of aspect ratio in its formula, rather than the real difference of width and height, sometimes prevents the model from effectively optimizing. Phys. to artwork on both the Picasso Dataset and the People-Art Dataset. Improved Face Recognition Rate Using HOG Features and SVM Classifier. In addition, for the purpose of improvement of different experiments, we added different evaluation indexes. You can also select a web site from the following list: Select the China site (in Chinese or English) for best site performance. CA helped with writing, review, editing, and dataset. In object detection, precision (P), recall (R), average precision (AP), and mean average precision (mAP) are often used as evaluation indexes. Agreement NNX16AC86A, Is ADS down? This research work was supported by the Science and Technology Project of State Grid Zhejiang Electric Power Co., LTD. (5211MR20004V). In fact the proposed system directly outputs the class predictions (scores) and bounding-box for a cell, and those scores are not based on features inside the box, they are based on anything the CNN wants. Keras implementation of YOLO (You Only Look Once) : Unified, Real-Time Object Detection. Existing methods can be divided into two categories. It can be seen from Table 4 that the EIoU method improves in all evaluation indicators compared with other methods. WebJune 2015 DOI: 10.48550/arXiv.1506.02640 arXiv: arXiv:1506.02640 Bibcode: 2015arXiv150602640R Keywords: Computer Science - Computer Vision and Pattern Recognition FIGURE 5. On the downside, compared to its closest competitor (Fast R-CNN), it performs considerably worse (-14% MAP). One-stage methods include YOLO (Redmon et al., 2016) and SSD (Liu, Anguelov et al., 2016). To perform transfer learning, you can use a pretrained deep learning network as the When we look at the old .5 IOU mAP detection metric YOLOv3 is quite good. We compare the different performances of the four methods: IoU, GIoU, CIoU, and EIoU (ours), and the results are shown in Table 4. PASCAL VOC has 20 labelled so C=20. 4) Prediction to detect the insulator and defect. I have 2 main concerns regarding this paper: i) the lack of positioning with respect to the related works ii) the experimental validation is not fully convincing. SSD: Single Shot MultiBox Detector. It cannot predict full bounding boxes, only adjust them. Choose a web site to get translated content where available and see local events and offers. Man. The YOLO v3 object detection model runs a deep learning convolutional neural network (CNN) We are trying to explore and demonstrate to the vision community that there are alternatives to sliding-window or region proposal methods for object detection. [1] Redmon, Joseph, and Ali Yolov3: An Incremental Improvement. No explanation nor experimental exploration is offered. Notice, Smithsonian Terms of But certainly, it is not the only reason. We considered the disadvantages of the existing loss function and proposed a new loss function EIoU that is easy to train. Mach. Since we frome detection as a regression problem we dont need a complex pipeline. Out network uses features from the entrie image to predict each bounding box. Systems like defromable parts models (DPM) use a sliding window approach where the classifier is run at evenly space locations over the entire image. For more details, The main contributions of this article are as follows: 1) This article first introduces a new loss function EIoU to solve the problem that the model can be hard to train when the images have high resolution. TABLE 5. arXiv preprint arXiv:2004.10934. The authors of CSPNet believe that the problem of excessive inference calculation is caused by the repetition of gradient information in network optimization. However, IoU calculation and sequential iteration inhibit the computational efficiency of traditional NMS. 73, 115. Therefore, it is necessary to ensure the normal operation of the power grid system. There are many heuristics in training the network like choosing a particular scheduler for learning rate, switching between pre-training and fine-tuning, and so on. By continuing to use this site you agree to our use of cookies. (2022). Copyright 2022 Ding, Cao, Ding and An. rectangular regions of interest (ROIs) for object detection, scene labels for image To find out more, see our, Browse more than 100 science journal titles, Read the very best research published in IOP journals, Read open access proceedings from science conferences worldwide, Published under licence by IOP Publishing Ltd, Copyright 2023 IOP Prior work on object detection repurposes classifiers to perform detection. These evaluation indexes are expressed as follows: where the details of the definition of TP, FP, TN, and FN are given in Table 3. Is this the best choice? It is unclear where this is sufficient for NIPS. Anchor selection is another important part of object detection. Web1. Transm. doi:10.1109/cisp.2010.5648283, Liu, S. (2018). to predict false detections where nothing exists. YOLO9000: Better, Faster, Stronger, in Proceedings of the IEEE conference on computer vision and pattern recognition. The reviewers, however, agreed that this is an interesting research direction, and I do encourage the authors to continue this work. The considerably worse performance on small objects in PASCAL VOC (sec. Our base YOLO model processes To design a YOLO v3 object detection network, follow these steps. Then, affine transform is applied to augment the original images and their mask. Difference performances between NMS (A) and Cluster-NMS (B), where (B) successfully detects insulator on the upper left, while (A) misses it because of the shielding. If nothing happens, download GitHub Desktop and try again. (2018) used cascading architecture to transform defects detection into a two-level object detection, which improves accuracy. In order to resolve the insufficiency of GioU, DioU is proposed (Zheng, Wang et al., 2020), which considers the center distance of the two boxes. In Section 4, a series of comparative experiments are used to verify the validity of the model in this study. Considering the advantages of DIoU over IoU, we added center point distance. One criticism regards references. It should be cited. Astrophysical Observatory, Computer Science - Computer Vision and Pattern Recognition. On the contrary, one-stage methods complete the selection of the region and classification simultaneously. DEStech Transactions on Computer Science and Engineering icitia. Object detection is done using the You Only Look Once (YOLO) algorithm. Instead, we frame object But our model has an overwhelming advantage over theirs in the aspect of detection speed, which means that it can realize real-time detection while maintaining high accuracy in routine insulator defect detection applications. 2.3 they say NMS is a separate processing stage. In this section, we first make a review of the former work of insulator defection. [1506.02640] You Only Look Once: Unified, Real-Time Object Detection. It outperforms all other detection methods, A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. arXiv preprint arXiv:1804.02767. Out network uses features from the entrie image to predict each bounding box. If you have other version of OpenCV 2.4.x (not 2.4.9) then you should change pathes after \darknet.sln is opened. YOLOv5 uses the CSPDarknet-53 network as its backbone based on YOLOv4 (Bochkovskiy, Wang et al., 2020). Traditional insulator defect identification relies on manual work, which is time-consuming and inefficient. The feature extraction The next steps are the same as the K-mean algorithm. In the work by Feng, Guo et al. Then, we analyze a lot of comparative experiments to evaluate whether our methods can realize high accuracy and real-time detection at the same time. First, YOLO is extremely fast. Web browsers do not support MATLAB commands. NIPS audience is interested in fast 10:928164. doi: 10.3389/fenrg.2022.928164. This study proposes a high-accuracy real-time insulator defect detection method that meets this need. doi:10.1109/jmw.2020.3034379, Wu, Q., An, J., and Lin, B. Then, either accept this candidate (i.e., xj=yj) with probability. B 29 (3), 433439. 1x1 (localization):41.2% More recent approaches like R-CNN use region proposal methods to first generate potential bounding boxes in a image and then run a classifier on these proposed boxes. (2018). 4) Existing methods often require a trade-off between accuracy and speed, but these two factors are both important in practical application. (2018). Thanks. This work takes the concept of cluster into the model, proposing Cluster-NMS method. 5: Marginally below the acceptance threshold. YOLO9000: Better, Faster, Stronger. In 2017 IEEE These methods first generate region proposals and then make bounding box regression and object detection. It is assumed that Apred represents the area of the predicted bounding box, and Agrou represents the area of the ground truth bounding box. including DPM and R-CNN, by a wide margin when generalizing from natural images For details about YOLO and YOLOv2 please refer to their project page and the paper: YOLO9000: Better, Faster, Stronger by Joseph Redmon and Ali Farhadi. Typically, four to ten anchors are preset at each location in the image. base network for YOLO v3 deep learning network. This unfortunately reduces this paper to be a re-engineering version of [20] to do well on PASCAL VOC. Our unified architecture is extremely fast. Therefore, GIoU is proposed (Rezatofighi, 2019). doi:10.1109/tcyb.2021.3095305. Recognition of Insulator Based on Developed MPEG-7 Texture Feature, in 2010 3rd International Congress on Image and Signal Processing (IEEE). YOLO's parameterization of class probabilities is essential for training to converge & was non-existent in [20]. Motivations, methods, and results all make sense. Liu, W., Anguelov, D., Erhan, D., Szegedy, C., and Reed, S. (2016). Insulator string is a special insulation component which plays an important role in overhead transmission lines. By improving the classical YOLOv5 (you only look once) model, this article proposes a new method to enable high accuracy and real-time detection. In the study by Lu (2021), faster R-CNN with an improved anchor selection method, Soft-NMS, was applied to detect insulators. By improving the classical YOLOv5 (you only look once) model, this article proposes a new method to enable high accuracy and real-time detection. After multi-layers of CBL, where CBL is a kind of convolution layer, it comes to a feature map of 2020K, K means the number of convolution kernels. It is expressed as follows: where is a parameter measuring the consistency of aspect ratio, which can be defined as follows: where w and h represent the length and width of the box. But visual search has weaknesses. MathWorks is the leading developer of mathematical computing software for engineers and scientists. 7x7: 58.8% Amsterdam, NETHERLANDS: 14th European Conference on Computer Vision. Second, YOLO reasons globally about the image when making predictions. First, randomly initializes one center c1. Prior work on object (2016). Based on YOLOv4, G. Jocher released YOLOv5 subsequently, which now contains five versions: YOLOv5n, YOLOv5s, YOLOv5l, YOLOv5m, and YOLOv5x. The remaining parts of this article are organized as follows: Section 2 discusses related work of insulator detection and the CPLID (Chinese Power Line Insulator Dataset) dataset. Section 3 explains how our model works. Front. This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. Our unified architecture is extremely fast. The experiments results show that EIoU loss function improves detection accuracy. 779-788. *Correspondence: Jian Ding, 17737195@qq.com, Advanced Data-Driven Methods and Applications for Smart Power and Energy Systems, View all Our base YOLO model processes images in real-time at 45 frames per second. YOLO deals with a totally different domain (object detection vs. grasp localization), which engenders the following contributions: Repeat these steps until K centers are selected. 1 (1), 345363. 48 (2), 16031610. Therefore, it can achieve real-time detection but slightly lower accuracy than two-stage methods. Feng, Z., Guo, L., Huang, D., and Li, R. (2021). YOLO9000: Better, Faster, Stronger by Joseph Redmon and Ali Farhadi. Since the IET Gener. Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. These methods usually use shallow learning models to recognize, such as linear classifier, boosting, and SVM. Then, we come to the CSP layer. Benefit from the multi-layered structure, deep neural networks use fewer parameters to represent complex functions (Montavon, Samek et al., 2018), especially with the successful application of convolution neural networks (CNN) in image recognition (Chauhan, Ghanshala et al., 2018), the automatic detection and recognition of targets by deep learning method has become the focus and hotspot of research that can meet the needs. Ser. Less electricity uses leads to less burning of fossil fuels. Finally, YOLO learns very It pre-sets a threshold T to remove the redundant boxes and retains the top scoring box. doi:10.1016/j.ceramint.2021.09.239. Figure 4 shows the changes in the feature map in the neck. (2018). Therefore, by introducing CSPNet, the feature mapping of the base layer is divided into two parts, and they are combined by the cross-stage hierarchical structure, which can reduce the computation and ensure accuracy. Gao, K., Lyu, L., Huang, H., Fu, C., Chen, F., and Jin, L. (2019). The results show that when the NMS method misses the insulator on the upper left because of the shielding, the Cluster-NMS method successfully detects it, which increases the reliability of insulator detection results. Finally, YOLO learns very To detect an object, these systems take a classifier for that object and evaluate it at various locations and scales in a test image. the yolov3ObjectDetector object. U-net: Convolutional Networks for Biomedical Image Segmentation, in Medical Image Computing and Computer-Assisted Intervention (MICCAI). Using YOLOv4 algorithm, the speed requirement of real-time detection can be realized with certain precision, and the model mAP value reached 69.48%. In addition to increasing the speed of NMS, we aimed to improve the precision of NMS too. Before analyzing the final results, it is necessary to make a review of our model, which contains four stages: input, backbone, neck, and prediction. regardless of the number of boxes B. Images of a normal insulator (A) and a defective insulator (B), where the defect is marked with a red rectangular box. Personally I find the main idea of the paper to be somewhat unsatisfying: by reducing object detection to predicting bounding-boxes from 49 regularly spaced cells, the whole idea of visual search falls out of the window. For example, Viola-Jones CVPR 2001 already had detectors in real-time (14 years ago!!). In object detection, IoU is an essential evaluation index to measure the similarity of the predicted bonding box and the ground truth box, which can improve the accuracy of object location (Yu, 2016). TABLE 7. Zhai, Y. Instead, we frame object Also, it uses Euclidean loss for probabilities. The authors might try with a higher resolution grid, but at some point there won't be enough training data to get it to work (as opposed to region + classifier architectures which are translation and scale invariant). (a) novelty is considerably reduced by the very similar architecture of [20], which already presents the same key idea of this paper; (b) lower detection performance than the nearest competitor Fast R-CNN, especially on small objects; (c) imperfect positioning with respect to previous work; (d) personally I find the overall concept of reducing detection to predicting a bounding-box for each cell in a 7x7 grid to be somewhat unappealing. Signal Process. Abstract We present YOLO, a new approach However, in cluster NMS, we can simplify iterating over all clusters into just iterating over the cluster with the largest number of boxes. After classification, post-processing is used to refine the bounding boxes, eliminate duplicate detections, and rescore the boxes based on other objects in the scene. YOLOv3 made great progress by using the Darknet-53 network as its backbone, which influenced the next two versions of YOLO. Repeating formulas 8 and 9 until K centers are selected. Intell. In this section, we first introduce how our models work from input to output. Electr. Gao, Lyu et al. This means that our new methods surely improve the performance of insulator defect detection. In this section, the experiment details and results are introduced concisely. For the use of standard layers as convolutional with 3 3 kernel, Max-pooling with 2 2 kernel. If speed is the objective, it would be necessary to show that methods such as region proposal + R-CNN are not as good & fast as the proposed method, or, more generally, to comment on other fast object detectors. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. Springer.Non-maximum Suppression for Object Detection by Passing Messages between Windows. Because defective insulator images are rare, a data augmentation method is applied, and it follows the following process. 3. processes images at unprecedented speeds for its accuracy. One is used in the backbone, called CSP1, while another is used in the neck, called CSP2. +multiple rates: 55.9% As the most famous one-stage detection method, YOLO does not need to generate region proposals, completing the selection of the region and classification simultaneously. Creative Commons Attribution License (CC BY). As a result, the method is much much faster than R-CNN, and also faster than the very recent Fast R-CNN, as the remaining bottleneck of extracting region proposals has been removed. However, when fusing with Fast R-CNN, does the fused one still has the efficiency advantage? Dielect. Please see Table 2 for the speed comparisons. These results show that the EIoU method can obtain a good performance in insulator defect detection. sign in Since there are two kinds of targets, insulator and defect, to be detected, we mostly used recall, precision, and mAP to evaluate the effect. The authors present a technique for real-time object class detection with a Convolutional Neural Network (CNN). doi:10.1109/tdei.2019.008260, Li, W. (2010). School of Information Technology and Engineering, Guangzhou College of Commerce, Guangzhou, Guangdong, China.