Automatic detection of stork nests on VHV towersfmaa/papers/ConfTele2013... · 2016-12-16 · line at a time (assuming it as an H1 line) and scores every other line (assuming it as

Automatic detection of stork nests on VHV towersJoão Fernandes, Maria Paula Queluz

IT / IST-TU Lisbone-mail: [email protected]

Tomás BrandãoIT / ISCTE-IUL, Lisbon

email: [email protected]

Francisco Azevedo, João Gomes MotaAlbatroz Engineering, Lisbone-mail: [email protected]

Abstract—This paper proposes a method for the automaticdetection of stork’s nests on the top of very-high voltage (VHV)towers, using video data from a regular inspection of aerial powerlines. The method starts by detecting the towers, followed by asearch for nests on each detected tower. At first, both detectionsare applied independently to the frames extracted from the videos.Afterwards, the temporal correlation between frames is exploitedin an effort to improve the results. Tower detection relies on thesearch for straight lines that fit the tower model template. Thesearch for nests is based on their typical color range and shape.Using five videos with a total duration of nearly one hour, themethod was able to detect 83% of the existing VHV towers andto correctly identify 79% of the towers in risk (i.e., towers withstork nests).

Index Terms—VHV tower detection; nest detection; edge detec-tion; straight line detection; motion estimation.

I. INTRODUCTION

Electrical power transportation throughout the Portuguesenational grid is based on lines that are, in more than 95% ofthe cases, aerial lines. Those lines are supported by very-highvoltage (VHV) towers and are subject to faults, thus requiringregular maintenance, resorting to video cameras mounted onhelicopters. Storks are an example of an important risk factorto the electrical grid since they may lead to power failures orquality of service loss — the nests or the stork itself may causeshort-circuits, damaging the line and injuring the stork [1].

In this paper we propose a method for the automatic detectionof nests on the top of VHV towers, using video data acquired bycameras fixed on helicopters during a regular inspection of thepower lines. The output of the method is a report containing thenumber of towers and their position on the video, identifyingwhich of them represent a risk of malfunction due to thepresence of storks. Such report is intended to alert the powertransportation company operating in the Portuguese territory —Redes Energéticas Nacionais (REN) — for towers that presentthat risk.

The tower model used on the development of the method isshown on figure 1; the search for nests is reduced to the mostcritical part of the tower, represented by the regions surroundedby the blue, green and red lines.

Five videos from a regular VHV towers inspection wereused. Their total duration is nearly one hour and they containa total of 165 towers, 39 of them with stork nests. All videoshave a temporal resolution of 5 Hz and a spatial resolutionof 1440 × 1080 pixels. In order to train the method, 20 shortduration sequences and a group of 165 still images (frames)were extracted from those videos. The final assessment of themethod (using the full length videos) was based on the Hit-Rate (HR) — percentage of towers and of towers in risk (i.e.,

Figure 1. Tower model and the area of interest for nests search in red(horizontal structure) and in blue/green (vertical structures).

with stork nests) — and on the False Positive Rate (FPR) —percentage of detected towers or towers in risk without truecorrespondence.

This paper is organized as follows: Sections II and IIIdescribe the methods developed to detect towers and nests,respectively. Those methods are firstly applied by consideringeach video as a sequence of independent still images; after-wards, the temporal correlation between consecutive frames isexploited. Section IV presents the results obtained by applyingthe final method to five videos, at their full length. On SectionV, main conclusions are presented.

II. VHV TOWER DETECTION

Objects detection in images is typically based on the searchfor features that best describe them. VHV towers have a verydistinguishing characteristic from the rest of the scenario: theirwell-defined structure, formed by straight lines. Accordingly,the developed tower detection algorithm starts by searching themost contrasting straight lines on an image, in order to find outimage regions that probably contain a tower. A second and moredetailed search for straight lines is performed on the regionthat most likely contains a tower, trying to find out the setof lines that best describe it. After tower detection on videosequences, analyzed as a group of independent still images, thetemporal correlation between frames is exploited, in an attemptto find frames with undetected towers or detections without truecorrespondence.

A. VHV Tower Detection in Still Images

The method for detecting the towers in still images comprisesthe following steps:

1) Edges detection.2) Straight lines detection.3) Detection of likely tower regions.4) Scoring of likely tower regions.5) Find the tower configuration.Step 1 — Image edges are detected by applying the Canny

Edge Detector [2] to the luminance component of the image.

The two threshold values, T1 and T2 (with T1 < T2), requiredby this detector for image gradient thresholding (with hystere-sis), are set to T2 = T and T1 = 0.4 × T , initially usingT = 0.4×(maximum image gradient intensity). Edge detectionis followed by a morphological “closing” operation [3], mergingedge pixels distanced by 5 pixels or less. Finally, the numberof image pixels that were marked as “edges” is computed and,if lower than 20% of the total image pixels, T is reduced by0.01 and the described process (edges detection → “closing”operation → edges count → T adaptation) is repeated. Thepurpose of this threshold adaptation procedure is to guaranteethe presence of a significant number of edges, regardless of theimage contrast and illumination conditions.

Step 2 — In order to find straight lines on the edge map, theHough Transform [4] is applied; it gives, to every possible line,a score proportional to the number of edge pixels superimposedby that line. From this transform, the 20 highest scored linesare selected. Each detected line is further classified as vertical(if it defines an angle α ∈ [π4 ,

3π4 ] with the horizontal direction)

or horizontal (otherwise). Figure 2 depicts an example of anedge map and the corresponding set of lines extracted usingthis procedure.

(a) (b) (c)

Figure 2. (a) Original image; (b) Edges map; (c) Detected lines.

Step 3 — Every horizontal line, H, crossed by at leastone vertical line is considered as a candidate to an horizontalcontour of the tower structure. A probable tower region is thenselected as a rectangular area, with l+ l

5 pixels width (centeredin H), and l

2 pixels height ( l3 above and l

6 below H), where l isthe length, in pixels, of H. The size of this area was establishedaccording to the real dimensions of the tower area of interest(figure 1) plus a margin for errors. Figure 3 shows an exampleof the selected regions in an image.

(a) (b)

Figure 3. (a) Original image with selected regions; (b) Detected lines(horizontal in red; vertical in blue) and selected regions.

Step 4 — To identify which of the regions from the previousstep contains a tower, the regions are re-analyzed, searching formore detailed edges and straight lines, using the same methodpresented before but adapting the threshold T (used by theCanny detector) until the percentage of pixels selected as edgesbeing between 7% and 10% of the region size. Each detectedline is classified has horizontal (if α ∈ [ 5π6 , 7π

6 ]), vertical left(if α ∈ [π2 ,

3π4 ]) or vertical right (if α ∈ [π4 ,

π2 ]). Figure 4

shows the lines resulting from this second search, for one of

(a) (b)

Figure 4. (a) One of the regions from fig. 3; (b) Detected edges and lines.

the regions presented in figure 3. It can be seen that more linesfrom the tower contour were detected.

Each selected region is then scored with a value that dependson how well the detected lines match the tower template (rightpart of figure 1). The scoring algorithm chooses one horizontalline at a time (assuming it as an H1 line) and scores everyother line (assuming it as H2, L1 or L2 and R1 or R2)according to their slope, size and position (pos) relatively tothe line H1 (taking one of the lines as reference guaranteesthe algorithm insensitiveness to the image spatial resolution).The value of each one of these line features is compared withthe corresponding expected value (extracted from the towertemplate) using the matching function presented in figure 5,whose main points are defined by table I. This results innormalized matching scores for slope, size, and pos, which willbe used by the region scoring algorithm.

Algorithm 1 shows the pseudo code of the scoring algo-rithm where, according to figure 1, pos(ITL) = ITL,H1L,pos(IBL) = IBL,H2L, pos(ITR) = IRL,H1R, pos(IBR) =IBR,H2R, and pos(H2) = H1C ,H2C , with AB the distance,in pixels, between points A and B, and XC , XR, XL being,respectively, the center, rightmost and leftmost points of line X.The size and slope of a line are, respectively, its length in pixelsand its angle with the horizontal direction (previously denotedby α); NH , NV L and NV R are, respectively, the number ofhorizontal lines, vertical left lines and vertical right lines.

Algorithm 1. Score a likely tower region.

% horizontal line i is assumed to be H1;for i = 1 → Nh do

% horizontal line j is assumed to be H2;for j = 1 → Nh do

if j ̸= i then% vertical left line k is assumed to be L1 or L2;for k = 1 → NV L do

Score(Lk) = [pos(ITLk)+pos(IBLk)]/2;

Score(L) = mean[Score(Lk)];% vertical right line k is assumed to be R1 or R2;for k = 1 → NV R do

Score(Rk) = [pos(ITRk)+pos(IBRk)]/2;

Score(R) = mean[Score(Rk)];Score(H2j )=[[size(H2j )+pos(H2j )+slope(H2j )]/3+Score(L)+Score(R)]/3;

Score(H1i)=mean[Score(H2j )];

Score(Region)=max[Score(H1i)];

Step 5 — After scoring each selected region from an image,if the highest score is above 65% (value suggested fromexperimental tests), it is assumed that the image contains atower on the correspondent region. In this case, the towerconfiguration (and its respective score) is sought, searching

Figure 5. Matching function.

Table IVALUES TO BE USED IN THE MATCHING FUNCTION

Feature Expected value ∆1 ∆2

size(H2) 14/8×size(H1) 0 6/8×size(H1)pos(H2) 1/8×size(H1) 0 1/8×size(H1)

slope(H2) slope(H1) 0 1/2×slope(H1)pos(ITL) 1/8×size(H1) 2/8×size(H1) 1/8×size(H1)pos(IBL) 2/8×size(H2) 1/8×size(H2) 1/8×size(H2)pos(ITR) 1/8×size(H1) 2/8×size(H1) 1/8×size(H1)pos(IBR) 2/8×size(H2) 1/8×size(H2) 1/8×size(H2)

for the set of six lines that best match the principal contoursof the tower (lines in red, blue and green on the towertemplate of figure 1), through a procedure similar to the regionscoring method. Further details (namely, matching functionsand scoring procedure) are presented in [5]. Figure 6 showsthree VHV tower configurations: on figures 6-a) and b), towerswere well detected, although without the correct configurationin 6-a); figure 6-c) is an example of a false positive, wherea tower was detected without the true correspondence in theimage.

(a) (b) (c)

Figure 6. (a) and (b) Well detected towers; (c) False positive.

B. VHV Tower Detection in Video Sequences

If the temporal correlation present in the video sequences isexploited, results may be improved by increasing the number ofdetected towers and by reducing the number of false positives.In fact, if a tower is detected in a group of consecutive frames,the probability that a tower also exists in the following framewill be high; on the contrary, if in a group of consecutive framesa tower is detected in just one of the frames, that detection islikely to be a false positive. The method developed for detectingtowers in a video sequence involves the following procedures:

• Analyze the whole sequence as a group of still images,using the method presented in section A.

• Confirm a tower detection using neighboring frames.• Improve the tower configuration.If a tower was detected on frame n, the corresponding tower

region (previously found by applying Algorithm 1) will be usedon the following frame as the confirmation area, centered onthe same image coordinates but 10% wider, in both directions,

to cope with the camera motion. When confirming a towerdetected in frame n, the following scenarios may occur:

1) frame n+ 1 already contains a detected tower, located:a) inside the confirmation area;b) outside the confirmation area;

2) frame n+1 does not contain any detected tower. ApplyingSteps 4 and 5 of the method presented in section A, tothe confirmation area, leads to:

a) a new tower is detected;b) no tower is detected.

In scenarios 1)-a) and 2)-a) the tower detected on framen is confirmed as valid. In scenario 1)-b), a new searchfor a tower inside the confirmation area of frame n + 1 isperformed (through steps 4 and 5 of the tower detection in stillimages); if successful, the tower detected on frame n is alsoconfirmed in this case. The confirmation procedure is appliedtwo times, forward and backward, along the video sequence.The result is a list of groups of consecutive frames containingconfirmed towers (each one with an associated configurationscore) separated by frames with no detected towers.

Next step is to exploit the temporal correlation betweenframes of the same group, in order to improve the towerconfiguration in each frame. For each group of frames:

1) find the frame with the highest tower configuration score,and save it as frame n;

2) compare the score of each element (H1 and H2, V1 andV2, R1 and R2 in figure 1) of the tower configuration inframe n, with the corresponding element in frame n+1;

3) project from frame n to frame n + 1 the elementsconfiguration having lower score in n+1 than in n, usingthe correspondent motion vector [5],[6];

4) compute the score of the new tower configuration inframe n + 1; replace the old configuration if the newone gets a higher score;

5) repeat from 2), using the next frame, until the last frameof the group is reached;

6) repeat from 2), starting from frame n and going back-wards, until the first frame of the group is reached.

Figure 7 shows an example of a tower configuration, before (a)and after (b) applying the described method.

(a) Config. score = 59% (b) Config. score = 80%

Figure 7. Tower Configuration (a) before and (b) after improvement.

III. NESTS DETECTION

Nests detection is based on the search for their most evidentfeatures, which in this case are color, shape and position.Nests typically have a brownish color and an oval shape.Moreover, information about the tower is used to reduce nestspossible locations. As before, nests are first searched on thedetected towers treating the video as a sequence of independentframes; afterwards, the temporal correlation between frames isexploited.

In order to learn the nests color gamut, some image pixelsfrom nests were manually selected and compared to pixels fromthe background. The corresponding color histograms (both onRGB and HSV color space) were centered at different values,suggesting that some colors tend to appear more often on neststhan on the rest of the image. It was also verified [5] thatcolor histograms from the background and from nests can bewell described by a Gaussian distribution and by a Rayleighdistribution, respectively.

For each color component, a threshold was selected as theintersection point of the background and nests histograms,resulting in TR = 105, TG = 98, TB = 95, TH = 106,TV = 108 (it was also verified that the S component of HSVspace was not sufficiently discriminative). A classifier is appliedto every image pixel, labeling it as a nest pixel if it respectsthe following restrictions: R, G, B and V values below therespective threshold; H value above the respective threshold.Figure 8-b) shows the classifier output for an example image(pixels labeled as nests are in white). After this classification,a morphological closing operator is applied, connecting pixelscloser than 5 pixels to each other. Resulting groups of connectedpixels are considered as nest candidates if the number of pixelsin the group is between l and 10l, where l is the length, inpixels, of the tower’s H2 line. Figure 8-c) presents, in white,the resulting candidates to nests.

(a) (b) (c)

(d) (e) (f)

Figure 8. (a) Original image; (b) after thresholding; (c) connected regionsand associated ellipses (in red); (d) connected regions filtered by eccentricity;(e) tower area of interest; (f) detected nests.

Nests typically have an oval shape; accordingly, an ellipse(signalized in red on figure 8-c)) was associated to each nestof the training video sequences and it was found that thecorresponding eccentricity may have values between 0.5 and0.9 [5]; candidates with eccentricity outside this interval arediscarded (figure 8-d)). Finally, only nest candidates inside thetower area of interest (figure 8-e)) are kept. Figure 8-f) showsthe detected nests.

In order to further decrease the number of false positives,the temporal correlation between nests detected in consecutiveframes is exploited. Due to camera motion, the nests position(on the image) may vary from frame to frame, but their positionrelatively to the tower will be fixed. Nest candidates located onthe same position (of the tower referential), will be associatedto the same nest. The final list of detections will consist ofnests which were detected on more than one third of the framescontaining the same tower. This will reduce the likelihood ofa tree canopy (or other similar object) which is in line with atower at some point of the video, to be considered as a nest

(false positive).

IV. RESULTS

The proposed method was tested using 5 videos, with a totalduration of nearly one hour. The objective was to present areport for each video sequence, with the number of towers andtheir position in the video, marking a tower as “in risk” whenit has, at least, one nest. The performance of the method isquantified by the hit rate (HR) and the false positive rate (FPR)percentages; these parameters are defined as:

HR = # well detected events# existing events ; FPR = # wrongly detected events

# detected events

where an “event” may be a tower or a tower with nests (i.e., atower in risk). Results are presented in table II.

Most of the towers have been well detected, with a lowpercentage of FP’s. The exception to this behavior occurredin videos no. 3 and 4, for which the towers have not enoughcontrast with the background (not even visually). Concerningthe nest detection, the method shows a good performanceon videos no. 1 to 4. The nest detection relies on a correcttower configuration — even if a tower’s position in a frame iscorrectly detected, an incorrect tower configuration may leadto undetected nests, or to nest detections that do not havetrue correspondence; this situation occurred mostly in the fifthvideo.

Table IIDETECTION RESULTS

Video # Existing Tower # Towers Towers“in risk”number Towers HR FPR with nests HR FPR

1 42 88% 14% 21 83% 33%2 48 90% 6% 4 100% 18%3 6 50% 0% 0 - 0%4 5 40% 33% 0 - 0%5 64 81% 10% 14 67% 68%

Global 165 83% 14% 39 79% 43%

V. CONCLUSIONS

In this paper we proposed a method for the automaticdetection, in video sequences, of VHV towers and the presenceon it of storks nests. The method is intended to be used in theelectrical power transmission maintenance, from which people’sdaily life depend on. Accordingly, a fully automatic approachneeds to be extremely reliable to put aside dedicated operators.The proposed method is not yet accurate enough to completelydispense human intervention on VHV towers inspection, butmay already be considered as a valid auxiliary support to thattask.

REFERENCES

[1] Albatroz engineering, "http://www.albatroz-eng.com".[2] Canny, J., "A computational approach to edge detection"’, IEEE Trans. on

PAMI, Vol. 8, Nr 6, November 1984, pp. 679-698.[3] Gonzalez, R. C., Woods, R. E., Eddins, S. L., "Digital Image Processing

Using MATLAB", Tata McGraw Hill, 2011.[4] Duda, O., Hart, P. E., "Use of the Hough transformation to detect lines

and curves in pictures", Com. of the ACM, Vol. 15, Nr 1, January 1972.[5] Fernandes, João, "Automatic Detection of Stork Nests on Very-

High Voltage Towers", Msc Thesis, IST, October 2012. Available athttps://fenix.ist.utl.pt/downloadFile/2589873376364.

[6] Liu, B., Zaccarin, A., "New fast algorithms for the estimation of blockmotion vectors", IEEE Trans. on CSVT, Vol. 3,April 1993, pp. 148-15.

Documents

Automatic detection of stork nests on VHV towersfmaa/papers/ConfTele2013... · 2016-12-16 · line at a time (assuming it as an H1 line) and scores every other line (assuming it as