Robust individual pig tracking

ABSTRACT


INTRODUCTION
The global pig industry is valued at US$ 254 billion in 2022 and is estimated to reach US$ 418 billion by 2028 [1].In commercial farming, pigs are raised in closed pens and are subjected to stress and illness.Pig locations across time can reveal the pigs' activities and well-being [2]- [8] and enable a farm to detect a disease early [9]- [14].The accurate tracking of individual pigs benefits the billion-dollar industry.
The tracking of individual farm pigs is challenging.Pigs of similar size and age were raised in the same pen to manage their growth [10].However, similar pigs are difficult to differentiate and track individually [3].In addition, the outline shape of a pig changes according to the pig's activity.A dynamic shape complicates a vision-based tracker, which identifies an object as a pig based on it is appearance.Finally, the trajectory of the pig was random.Two pigs that meet momentarily may depart in unpredictable directions and cause the tracker to switch the pigs' identities incorrectly.Individual pig tracking aims to accurately detect and track each pig over time.Detection accuracy is commonly measured by the false positive ratio (FPR), false negative ratio (FNR), precision, recall, and F1-score.Tracking accuracy is commonly measured by FPR, FNR, mostly tracked trajectory (MTT), number of identity switches (NIS), and ISSN: 2088-8708  Robust individual pig tracking (Aggaluck Jaoukaew)

281
− Labeled datasets, which are publicly available [52] to measure the accuracy of pig detection and tracking.− An individual-pig tracking method, which is robust to a long-term pig-to-pig occlusion.− Performance evaluation of the proposed method, showing superior performance compared to state-of-theart methods.The proposed method can detect and track pigs accurately, even for pigs in an environment different from that used in training the predictive model.These contributions may improve farm practices and pig welfare.
The remainder of this paper is organized: section 2 describes the problem statement, the dataset for training a pig-recognition model, and the proposed pig-tracking method.Section 3 describes the datasets for testing the model, evaluates the performance of the proposed method for detection and tracking, discusses the results, and suggests future research directions.Section 4 concludes the paper and summarizes important findings.

METHOD
The research method entails problem formulation (subsection 2.1), preparation of the training dataset (subsection 2.2), and design of the tracking method (subsections 2.3-2.5).As an overview, the proposed pig-tracking method identifies the head and body of each pig in video frames using a faster R-CNN model specifically trained for pig recognition.The proposed method matches the bounding boxes over the pigs' bodies across frames and repeats the matching for the bounding boxes over the heads of the pigs.Finally, it matches the body and head of the pig in each frame.Each part of the research method is described in detail below.

Problem statement
The system model is illustrated in Figure 1.The pen is closed, has a video camera attached to the ceiling, and contains N pigs of similar size and appearance.The video camera provides K frames of red-green-blue (RGB) images, denoted by F [1..K], for pig tracking.Each frame contains all N pigs, possibly with pig-to-pig occlusions.The multi-tracking system takes video frames F [1..K] as input, detects, and tracks individual pigs in video frames.The output is a unique identifier, i, of each pig, where 1≤i ≤N, and the bounding box with respect to the pixel coordinates in each frame of the ith identified pig.Our aim is to design a method to track individual pigs accurately.282 assume a rectangular shape parallel to the image sides and cover the major head and body areas.For pigs that are oriented in roughly the same direction, the length and width of their boundaries are labeled as approximately the same size.This approach helps improve the accuracy of faster R-CNN boundary detection.In our prepared dataset, the total number of bounding boxes over either the head or body is approximately 6,500, a large quantity, coming from 10 pigs/image×696 images, subtracted by the number of heads and the number of bodies, respectively, that are heavily occluded.Seventy percent of the 696 images were used for training the faster R-CNN model, while the 30% remaining images were used for testing the model.In the training stage, the learning rate was set to 0.0002 and the batch size was set to 1.The learning rate was small and appropriate because when pigs stayed in a group, their appearances were similar.The batch size can be increased to suit the computing power of the training machine.In the training and testing stages, the lost value obtained from 200,000 rounds of processing was used to determine the accuracy of the generated faster R-CNN model.We use appropriate parameters to train the model.

Tracking algorithm
The tracking algorithm is shown in Algorithm 1.The algorithm uses video frames F [1.
.K] and the number N of pigs as inputs, where F [k] denotes the image at the kth video frame for 1≤k≤K.The output was an array D [1..N, kst..K] of structures, where D[n, k] contains information about the bounding boxes over the head and body of pig n in frame k.Line 1 applies the faster R-CNN model to predict the boundaries  body and  head around the pigs' bodies and heads, respectively.The variables  body [ Next, the algorithm calls the method of blob repair, which maps the bodies of the same pig across different frames to the same identification number and maps the heads of the same pig across different frames to the same identification number.The output variables  body [1..N, kst..K] and  head [1..N, kst..K] are arrays of structures containing information about the bodies and heads, respectively, of N pigs in each frame, starting from the kstth frame to the last frame.Fields of structures  body [n, k] and  head [n, k] are (xmin, ymin), (xmax, ymax), (cx, cy), and a new field id, which is the unique identification of the pig's body and head, respectively.For flexibility in blob repair, the unique identification of the pig's body may differ from that of the pig's head, even though they identify the same pig.This discrepancy will be resolved in the next step, line 4, which matches the identification number of the pig's body to the corresponding identification number of the pig's head for the same pig.Finally, line 5 consolidates the body and head of the same pig into the same unit and outputs array D [1..N, kst..K] of structures.The structure D[n, k] contains the body and head positions in frame k of the pig, whose unique identification is n.This algorithm terminates and completes the tracking task.

Blob repairing
Body-blob repair aims to match a pig's body in a given frame with the corresponding pig's body in the next frame.Head-blob repair aims to perform analogous matching for pigs' heads.See Figure 3 for an illustration.The word "repairing" emphasizes the most important step in body-to-body and head-to-head matching: to repair a lost body blob, a lost head blob, an excessive body blob, and an excessing head blob.Blob repairing matches either the heads or the bodies across different frames.The proposed method of blob repair appears in algorithm 2 and is based on the following ideas.First, the start frame, that is, the kstth frame, has N head blobs and N body blobs by construction, and has already been repaired.Blob repairing progressively matches pigs' bodies and heads in a current frame, k, with the bodies and heads in the previous frame, k-1, for kst+1≤k≤K.Second, if the number of body blobs in the current frame does not equal N, pigs in the current frame must be heavily occluded so that the faster R-CNN model either fails to detect or excessively detects a pig's body.In this case, the body blobs in the current frame are unreliable.Blob repairing will equate the positions of the pigs' bodies and heads in the current frame to those in the previous frame.See lines 3 and 4 of Algorithm 2. Third, if the number of body blobs in the current frame is N, the body blob in the current frame is matched to the nearest body blob in the previous frame.In line 8 of the algorithm, the notion of "nearest" is measured by the average displacement between the bounding-box coordinates: (, , ) = where the summation index s covers fields s {xmin, ymin, xmax, ymax}.In the algorithm, the set of unmatched indices is stored in variable U, which initially equals a full set {1, 2, 3 ..., N} and is reduced by one element at a time to the empty set-in lines 6-12.Fourth, the number of detectable pigs' heads in the current frame may differ from that in the previous frame.Blob repairing matches pigs' heads in the previous frame with the nearest ones in the current frame.Here, the notation of "nearest" for pigs' heads is the distance between the centroids of the bounding boxes (line 15 of the algorithm).
The summation index t covers fields t  {cx, cy}.Using different measures f and g for the body and head blobs is appropriate because the pig's body may be rotated within the blob, making the centroid an unfit representation of the body's blob position.In addition, the pigs' heads are smaller than their bodies and will not rotate much within the blob, making a centroid a suitable choice to capture the head position.Fifth, if the number of detectable pigs' heads in the current frame is smaller than that of the previous frame, the unmatched head blobs and their identities in the previous frame are copied to the current frame, as shown in lines 20-24.On the other hand, if the number of detectable pigs' heads in the current frame is larger than that of the previous frame, the unmatched head blobs and their identities in the current frame are discarded.The outputs of the algorithms are arrays  body [1..N, 1..K] and  head [1..N, 1..K], where  body [n, k] and  head [n, k] are the structures of the nth blob for the pig's body and head, respectively, at the kth frame, where 1≤n≤N and kst≤k≤K.

Matching of pig's body to it is head
The matching stage aims to match a body's blob to a head's blob in the same frame such that the same pig has it is body and head paired together.The key idea in the proposed matching is to examine the intersection of union (IoU) as well as to minimize the distance between the head and body's blobs.The top three IoU values are used to find the head blob with the closest distance to the body blob.Next, we describe the algorithm used for the matching stage in detail.
The algorithm for the matching stage appears in algorithm 3 and is illustrated in Figure 4.The inputs are the index of the start frame kst, and arrays  body and  head of structures, containing information about the body's blobs and head's blobs.The output is an array match [ because the minimum distance to the previous frame is unavailable, as shown in lines 14-16.In line 14, the index of the already-matched head is removed from set U. For a subsequent frame k>kst, the distances of the head's blobs in the current and previous frames are considered, as presented in lines 17-19.Figure 4(a) illustrates the pigs in the previous frame, where the bodies have already been matched to the heads.The distance between the head's blobs in the current and previous frames is calculated.Figure 4(b) illustrates the pigs in the current frame, where each pig's body is to be matched based on the distance with a head.The distances under consideration have three values stored in variables dist1, dist2 and dist3, where for u  {xmin, xmax} and ℓ=1, 2, 3.All three values are further compared to determine the closest distance based on the three conditions presented in lines 21-30.In the first condition, if dist1 is smaller than both dist2 and dist3, then index j1 is matched with the body's blob at index i, and the already-matched index j1 is removed from set U. The second and third conditions are for cases where dist2 and dist3, respectively, are the minima among the three distances.Figure 4(c) is an example of a pig's head that best matches the pig's body under consideration.After the algorithm finds the best body-to-head matching, a single ID is assigned to the body and head blobs of the same pig, as shown in Figure 4(d).Overall, the algorithm processes each frame and maps the IDs of the body and head blobs.

RESULTS AND DISCUSSION
To evaluate the performance of the proposed tracking method, we took steps to ensure fairness.First, we prepared test datasets that were disjointed from the dataset used in section 2.2 for training the pig-detection model.Then, we evaluated the performance of the proposed detection model and the proposed pig tracking method on the test datasets.

Test datasets
We prepared four datasets (Videos 1-4) to evaluate the pig identification and tracking methods.Each video contained 10 frames, down-sampled from 5,000 continuous frames, of pigs of the same number and mixed breads.Videos 1-3 were captured in the morning, midday, and evening, respectively.The pig, pen, and camera setup in videos 1-3 are the same as those in the videos used in training the Faster R-CNN model (section 2.2).Video 4 has a different camera setup, pen, and pigs from the video used in training the Faster R-CNN model.Videos 1-3 were taken from an environment familiar to the pig detection model, while video 4 was not.
Table 1 lists the characteristics of the datasets used for the pig detection.A different time of the day leads to a different pig behavior and the occlusion ratio (OR), which measures the degree to which a pig's part overlaps with the same part of another pig.The head OR equals where dk is the number of pigs' heads that overlap with the heads of any other pig in frame k, and nk=N is the total number of pigs in frame k.
The body OR is defined similarly, but on the pigs' bodies.The higher the OR, the greater the overlap, and the more difficult it is to detect a pig.The body-to-body OR is largest at 0.40 on video 2, because at noon, pigs tend to lay down, rest, and cause body-to-body occlusion.Videos 1-4 test the abilities of the detection and tracking methods under various conditions.We made the test datasets available to other researchers [52].

Performance metrics
We considered six performance metrics for pig detection: true positive ratio (TPR), FPR, FNR, precision, recall, and F1-score.In addition, we considered six performance metrics for pig tracking: TPR, FPR, FNR, MTT, NIS, and MOTA.The performance metrics for detection and tracking were obtained from the predicted bounding boxes and ground truth, without human intervention, for reproducibility.In this section, we describe the method to obtain these performance metrics.
During detection, in a given video frame f, the ground truth bounding boxes g1, g2, . . ., gN are matched with the predicted bounding boxes p1, p2, . . ., pn, where n is the number of predicted bounding boxes in frame f.The matching method is the greedy maximum-weight bipartite-graph matching, where the two disjoint sets of vertices are {g1, g2, . . ., gN} and {p1, p2, . .., pn}.The weight wi,j between vertices gi and pj favors, first, the IoU and, second, the centroid between bounding boxes.In particular, wi,j is a tuple (ai,j, bi,j) where ai,j is the IoU between gi and pj, if the IoU is ≥0.6; and ai,j is zero, otherwise.The element bi,j is the distance between the centroids of the bounding boxes gi and pj.An IoU threshold of 0.6 is suitable, meaning that the two bounding boxes are significantly overlapped [36].A comparison between the two weights begins with the IoU comparison and, in the case of a tie, is settled by the distance comparison.Following the greedy implementation, we repeatedly add an edge to the matching, starting from the maximum-weighted edge to a smaller-weighted edge, as long as the added edge preserves bipartite matching.The matching process establishes the pairs (gi, pm(i)) between the ground truth and the predicted bounding box, where m(i) is the corresponding matched vertex.After the matching, true positive (TP), false positive (FP), FN are obtained from the IoU between each bounding-box pair (gi, pm(i)), using standard definitions of these metrics [53], where the IoU threshold is set to 0.6.TPR, FPR, and FNR are the ratios of TP, FP, and FN, respectively, to the number of ground truth bounding boxes, which is 100 as shown in Table 1.Maximum matching ensures that the TPR is at the largest possible value, and that the FPR and FNR are the smallest.The TPR, FPR, and FNR in the detection can serve as the ultimate limits for the analogous tracking metrics.
To evaluate the tracking performance, we rearranged the predicted pig IDs to match those in the ground truths; and then evaluated the TP, FP, FN, MTT, and NIS.The step to rearrange the predicted pig IDs ensures fairness because the tracker and ground truth may name the same pig by two different numbers consistently through the video frames.Matching occurs between the predicted bounding boxes and the ground truth at the earliest ground-truth video frame where the predicted model detects all N pigs.The matching is the greedy maximum-weight bipartite-graph matching with the same weight construction used by the detection metrics.Having N predicted pigs means that the matching is perfect: every pig's predicted ID is matched to a unique ground-truth pig ID.The predicted pig IDs were renamed to match the ground truth IDs.
After perfect matching, the ground truth bounding box gi and predicted bounding pi for the same index i are deemed to identify the same pig.Then, the TP, FP, and FN are obtained in each frame using the standard definitions of these metrics [53], with an IoU threshold of 0.6.In computing the MTT, the predicted trajectory of a given pig is considered mostly tracked if 80% of the predicted bounding boxes significantly overlap with the ground truths.Again, the two boxes overlap significantly if their IoU is ≥0.6 [36].To obtain an NIS for a test video, we sum the NISs of the individual pigs.For example, Table 2 contains the predicted pig IDs, the NIS for each pig, and the NIS for video 1 and the proposed tracker.To obtain an NIS of a given pig, we match the pig's predicted bounding boxes to the ground-truth's bounding boxes in each ground-truth video frame, using the greedy maximum-weight bipartite-graph matching; and count the number of ID switches.Finally, MOTA is obtained from 1-x [36], where x is the ratio between the sum of FP, FN, and NIS to the sum of the total ground truth bounding boxes and the maximum possible NIS value.The total number of ground-truth bounding boxes is MN, and the maximum possible NIS value is (M-1)N, where M=10 is the total number of ground-truth frames as shown in Table 1.The method for computing the detection and tracking metrics is appropriate.

Detection evaluation
Figure 5 shows examples of bounding boxes that were generated by our model to detect the heads and bodies of pigs.Images of pigs in subfigures 5(a)-5(d) are taken from videos 1 to 4, respectively, and show a group of pigs in similar positions.A green bounding box is the detected pig's body, while a cyan box is the detected pig's head.The proposed method accurately detects pigs' bodies even when the pigs are congregated in a group and experience pig-to-pig occlusion.Detecting a pig's head is more difficult than detecting a pig's body.In Figure 5(d), video 4 contains mistakes in head detection: two bounding boxes intended for pigs' heads appear at the buttocks; and one bounding box is missing from the pig's head.Visual inspection showed that the proposed pig detection model made few mistakes and performed well in these video frames.Table 2. To obtain the NIS, predicted pig IDs are matched to ground-truth's, shown for video 1 and the proposed tracker.The ID of zero means that the tracker loses track of a pig Table 3 shows the overall detection performance of the pigs' heads and bodies.The ORs reported in Table 1 affect the FPR and FNR of pig detection.A high OR tends to lead to an error in boundary detection and, hence, a large FPR.The FNR is zero because the predictive model outputs N bounding boxes for heads and bodies, although some bounding boxes are not at the correction positions, and hence a non-zero FPR.The different lighting conditions in videos 1-3 did not significantly affect the F1-score, showing the robustness of the detection model to the lighting condition.The precision, recall, and F1-score for body detection are generally larger than those for head detection.For example, in video 3, the F1-scores are 0.99 on body detection and 0.83 on head detection.Furthermore, for unfamiliar video 4, the F1-score for body detection had the best value of 1.00, whereas the F1-score for head detection was 0.29.Bodies are larger and hence easier to detect than heads.The proposed algorithm appropriately uses the bodies of pigs for tracking.

Tracking evaluation
We compared the proposed method with the state-of-the-art methods [17], [36] which track moving animals of a similar appearance without lighting control.The global optimization (GO) method in [17] is a deterministic method using the Hungarian algorithm, whereas the simple online and real-time tracking (SORT) method in [36] is a probabilistic method using a combined Hungarian algorithm and Kalman filter.The existing methods are competitive trackers.
The results of the individual pig tracking are shown in Table 4, where the best performance for each metric is indicated in bold.The FNRs of the proposed method and the GO method are zero for every test video, while the FNRs of the SORT method are 0.10 for videos 1-2 and 0.0 for videos 3-4.The proposed and GO methods track all N pigs in each video frame, while the SORT method fails to track 10% of the pigs, i.e., 10 out of the 100 pigs in total as shown in Table 1, in either video 1 or video 2. With a large FPR and a small FPR, the proposed method places correct bounding boxes on the areas where the pigs of the intended IDs are located.The TPR of the proposed methods are 0.70, 0.99, and 1.00, which are the largest values in test videos 2, 3, and 4, respectively.In Video 1, the TPR of the proposed method is 0.64, which is the second largest value after the TPR of 0.90, achieved by the SORT method.The overall tracking accuracy was captured by MOTA.The proposed method has the largest MOTAs in three out of four test videos and a perfect MOTA of 1.00 in video 4. The proposed method performs exceptionally well on video 4, which contains different pigs in a different pen from those used by the Faster R-CNN.The exceptional performance indicates that the proposed method is a robust pig tracker in an unfamiliar environment.The proposed method outperformed the state-of-the-art methods for most videos.
In the GO method, tracking pigs from their heads is inaccurate.As shown in Table 3, detecting a pig's head is more difficult than detecting a pig's body, which is confirmed by the low TPR of head detection.For example, in Video 4, the TPR for detecting the pigs' heads was 0.17.If the pigs must be both detected and tracked, the tracking TPR will not exceed 0.17.In Table 4, the tracking TPR of the GO method  3 are the upper bounds of the tracking TPRs.As shown in Table 4, tracking pigs by their bodies, as in the proposed and SORT methods, generally outperforms tracking the pigs by their heads.
Figures 6 and 7 are examples of the trajectories obtained from fine manually labeled ground truth (dashed line) and the various trackers (solid lines) on selected pigs and videos.The x-and y-axes are the image x-and y-coordinates in pixels.The trajectories of the ground truth, proposed, and SORT methods are taken from the centroids of the pig's bodies.On the other hand, the trajectory of the GO method is from the centroids of the pig's heads.The proposed method tracks a pig accurately in Figures 6(a) and 7(a), as the ground truth's trajectories agree with the trajectories produced by the proposed method.A broken line on the trajectory indicates a loss of tracking, which contributes to a false negative (FN).A sharp transition on the trajectory indicates an unusual pig's movement and is caused by ID switches.The proposed method does not have a tracking loss on these exemplary trajectories.In contrast, the SORT method suffers from a tracking loss in Figures 6(b 4 for the SORT method on Videos 3-4, because the FNR in Table 4 is the average of the FNRs of N pigs and is rounded to two decimal places, due to the 100 available ground-truth frames in the test datasets.Furthermore, the GO method has a low tracking accuracy and produces a trajectory that is far from the ground truth.A discrepancy comes from ID switches, i.e., a different pig was tracked in Figure 6(c) (also observed in Figures 6(b) and 7(c)), and from an error in the positions of the head bounding boxes in Figure 7(c).A poor trajectory in Figure 7(c) is consistent with a large FPR of head detection on video 4 as shown in Table 3. Indeed in Table 4, the GO method has an MTT of zero; it does not track any trajectory correctly for any pig on 100 ground-truth frames as shown in Table 1.The proposed method is the most accurate tracker.

291
Several factors play a role in increasing the tracking accuracy of the proposed method.The proposed method uses a rectangle to mark the boundary of each pig.In contrast, existing methods mark each pig by using its centroid alone.As a pig moves, its centroid may become closer to one another and cause an ID switch.A rectangular boundary captures a larger portion of the body and is more robust to ID switching.These advantages improved the tracking performance of the proposed method.
There are several directions to extend this research.The proposed method for pig tracking is sufficiently general to be applied to other animals, provided that the animal's head can be distinguished from its body.Future research can entail tracking other animals that have great economic, societal, or cultural importance.In addition, the process of head matching and body matching can be achieved simultaneously across several frames, as opposed to being done on each pair of adjacent frames.To reduce the complexity of the matching, a metaheuristic algorithm such as the giant trevally optimizer (GTO) [54] can be applied.Moreover, the number of animals in each pen was fixed and known in this study.Future research may cover unknown or changing numbers of animals.Finally, future research can use knowledge of the animal's tracked location as a feature to determine its activity.These future studies will extend the proposed tracking method to a broader context.

CONCLUSION
Farm animals living in closed pens have high opportunities to fight or be injured.Individual pig tracking can increase animal welfare and provide a basis for behavioral monitoring and disorder diagnosis.However, individual pig tracking is difficult to achieve accurately because of the similarity in pig appearance and the tendency of pigs to remain in a group and create an occlusion.A method that can accurately track individual pigs has an advantage for animals and farm owners.
Using a top-view video, this study developed a method to track each pig in a realistic farm environment.To detect pigs in a given video, we created and labeled a dataset of pigs on an actual farm and trained a Faster R-CNN to recognize an object as a pig.The key idea in dataset preparation is to label only the visible pigs to increase detection accuracy.To evaluate the performance of the proposed detection method, we tested the model on separate videos and found that detection performance increased significantly when the pigs' bodies were used for identification.This finding matches the intuition that the pig's body occupies a large portion and serves to better identify a pig.The developed model detected pigs well across all test videos and stipulated that the body was the main feature for pig identification.
The proposed tracking method builds on the strength of the pig detection model and specific tracking ideas.The position of a pig in the next frame is difficult to forecast.To mitigate this difficulty, the proposed method detects both the head and body of each pig and uses only the frames in which the pig under consideration is detectable.Furthermore, to improve tracking accuracy, the proposed method uses a rectangle, as opposed to a centroid, to locate each pig.The proposed method is superior to state-of-the-art methods, namely, the SORT and GO methods.The identity of an individual pig can be tracked, even in the case of pig-to-pig occlusion.Given this advantage, the complete trajectory of each pig can be obtained and used for behavior monitoring, pig activity classification, and disease detection.

Figure 1 .
Figure 1.A system to track individual pigs takes a video as an input

Figure 2 .
Figure 2. Video frames were labelled and used for training a neural network to recognize pigs' bodies and heads

Figure 3 .
Figure 3. Blob repairing maps body and head blobs among consecutive frames, shown for N=3 pigs.The lines indicate body or head blobs that belong to the same pig

Figure 4 .
Figure 4.The matching stage (a) considers the previous frame, (b) selects each body bounding box in the current frame, (c) matches the body to a head, and (d) assigns the same ID to the matching head and body

Figure 5 .
Figure 5.The bounding boxes cover the detected pigs' bodies and heads, shown as an example for frames taken from (a) VDO1, (b) VDO2, (c) VDO3, and (d) VDO4

Figure 6 .Figure 7 .
Figure 6.The trajectories of pig ID 6 on test Video 3 show that (a) the proposed method is most accurate than (b) the SORT and (c) GO methods , in the kth video frame.Variable  body[n, k]is the structure of the bounding boxes over the body of the nth detectable pig in the kth video frame, where 1≤n≤nˆbody[k]; and variable  head[n, k]is the analogous structure over the pig's head, where 1≤n≤nˆbody[k].Structures  body [n, k] and  head [n, k] contain the points (xmin, ymin) in the upper-left corner, (xmax, ymax) in the lower-right corner, and (cx, cy) in the centroid of the bounding box.Line 2 determines the index kst of the first video frame that contains N pigs' bodies and N pigs' heads.We consider that the number of frames is sufficiently large such that kst exists.
1..N, 1..K] of integers to indicate the matching.For each frame k, the value of match [n, k]=j indicates that the body blob  body [n, k] and the head blob  head[j, k]belong to the same pig.The algorithm matching the body's blobs and head's blobs in the same frame is The index of the frame under matching is denoted by variable k, which runs from kst to K, as shown in line 1.The inner loop of the algorithm iterates through variable i, which is the index of the body's blob and ranges from 1 to N, as shown in lines 2-32.In line 5, set U of unmatched head indices is initialized to equal a full set {1, 2, 3, . .., N }.Lines 6-9 calculate the value of IoU from each pair of blobs of the body and head.The three largest IoUs, namely IoU[j1]≥IoU[j2]≥IoU[j3], were selected for further matching, as shown in line 10.If frame k is the start frame, the matching is based only onIoU[j1]

Table 1 .
Test datasets consist of four videos of different characteristics (bbox=bounding box)

Table 3 .
Detecting the pigs' bodies is more accurate than detecting the pigs' heads (Prec.=Precision)

Table 4 .
In tracking, the proposed method outperforms the state-of-the-art methods in most test datasets