Train Detector

Purpose

Use the Training panel and the Training Progress window to create and train object detectors.

Once training is complete, the detector can be saved and is available to be used in the Track workflow.

Protocol

Active Learning Protocol

Active Learning annotation is an iterative process in which a detector is incrementally improved by training a detector with current annotation data, testing the trained detector, correcting detection errors, annotating more data, and re-training the detector with the updated data.

In the Train Detector step of the Train workflow, train a detector with existing annotation data:
1. Select the object definition from the Detector dropdown menu.
2. Click Train
Finish training by doing one of the following in the Training Progress window:
- Enable Auto Stop and wait for training to stop automatically.
- Click Stop Training once True Positive, False Positive, and Regression Error have stabilized.
- Allow the training process to end once the number of training iterations or epochs defined in the Advanced settings is reached.
Close the Training Progress window.
Use the trained detector to perform object detection on a new frame:
1. Select a frame by using the video playback controls.
2. Click Detect Object
Return to the Annotate Data step of the Train workflow and do any of the following annotation edits:
- Select incorrect detections and delete them by pressing Delete on the keyboard.
- For single point annotations only, move them to a new location or re-size them.
- Click Start Annotation and annotate any missed objects.
Repeat steps 1-5 to incrementally increase the amount of training data and improve detector performance.

Detector

New detector

To create a new detector:

Select New from the Detector dropdown menu.
Select the object definition from the Target Object dropdown menu.
Click Create
In the New Detector dialog window, type in the desired name then click OK

Existing detector

Select an existing detector from the Detector dropdown menu to continue training.

Detector functions

Detector: From the dropdown menu, choose New to create a new detector or choose an existing detector to continue training.
Target Object: Choose the target object type for the new detector from the dropdown menu. If using an existing detector, the Target Object will automatically populate with the object associated with that detector.
Create Click to create the new detector.
Delete Click to delete the detector selected in the detector dropdown menu.

Training

Start training

Train Click to start training the selected detector.
Save Click to save the detector under its current name.
Save As Click to save the detector under a different name.
Input Mode: Choose Greyscale or RGB for the type of image sequence being used.
Training Region: Choose a training region from the dropdown menu: all annotated labels in an image, an individual label within an image, or the full image.

Stop training

Training can be stopped manually or automatically. See Training Progress window.

Stop Training Click to immediately stop the training process. It is recommended to stop the training process once True Positive, False Positive, and Regression Error have stabilized.
Auto Stop: Check the box to automatically stop training once mAP50 performance has converged.
If training is not stopped manually, and Auto Stop is not enabled, training will terminate after the number of training iterations or epochs defined in the Advanced settings is reached.

Test detector

Once the training session is finished, you can test how well the detector works in the current frame using Detect Point or Detect Object:

Detect Point Click to show only individual keypoints as they are detected. The graphical objects created cannot be edited nor can they be used for further training.
Detect Object Click to perform point and connection detection. Detected objects can be edited by the user then used for further training. (See Active Learning).
Clear Detection Click to remove the object or point detections.
Threshold: Use the slider to adjust the threshold for excluding less likely detections.
IoU Threshold: The Intersection over Union (IoU) threshold defines the minimum overlap required between two object bounding boxes to consider them as referring to the same object. This threshold is used during the non-maximum suppression (NMS) step to filter redundant detections.
- Lower values (e.g., 0.3): More aggressive suppression with fewer overlapping boxes retained. This reduces duplicate detections but risks removing distinct, nearby objects.
- Higher values (e.g., 0.7): Less aggressive suppression with more overlapping boxes retained. This can preserve closely spaced objects but may leave duplicate detections.

Settings

The following sections contain additional configuration options. Contact Technical Services for help with these settings by completing the Request Support form.

Classifier Design

Click for more

Classifier Design contains additional configuration options. Contact Technical Services for help with these settings by completing the Request Support form.

Classification Head: Specifies the number of units in the hidden layer immediately preceding the classification output layer. This layer processes features extracted from the backbone before predicting class scores for each anchor. The size affects the network’s capacity to model complex class distinctions; larger sizes may improve accuracy but increase computational cost.
Regression Head: Specifies the number of units in the hidden layer immediately preceding the bounding box regression output layer. This layer transforms extracted features into offset predictions for each anchor. Larger hidden layers can improve the precision of localization but increase computation.
Feature Batch Normalization: Enables batch normalization on feature maps before the detection heads. Normalization stabilizes training, accelerates convergence, and can improve overall detection accuracy.
Kernel Radius: Defines the radius of convolutional kernels used in the detection heads. Larger kernels capture more spatial context for features but require more memory and computation.
Network Base: Specifies the backbone feature extractor used by the SSD detector. These variants of VGG16 feature extraction networks differ in the pooling step size (4, 8, 16, 32), which determines the spatial resolution of the extracted feature maps. Smaller steps preserve more spatial detail, larger steps reduce resolution but improve efficiency. The system automatically selects the optimal network base based on the size of the objects in the training dataset. Users can override this selection manually, but this is not recommended, as mismatched feature resolution may degrade detection accuracy.

Training System

Augmentation

Click for more

Augmentation contains additional configuration options. Contact Technical Services for help with these settings by completing the Request Support form.

Rotation Augmentation: Specifies the maximum rotation angle applied to training images for data augmentation. During training, images may be rotated randomly within the range ± the specified number of degrees. This helps the network generalize to objects in arbitrary orientations.
- Degrees: Type in the text box, or use the up and down arrows to select the number of degrees.
- Transpose: When enabled, the training images are randomly transposed during augmentation. This increases robustness to orientation variations without altering object scale.
Scaling Augmentation: Specifies the maximum scaling factor applied to training images. Images are randomly scaled up or down by a factor within ± the specified percentage, allowing the network to better handle objects at varying sizes.
- Percent: Type in the text box, or use the up and down arrows to select the percentage.
- Mirror: When enabled, training images are randomly mirrored (horizontally and/or vertically) during augmentation. This increases model invariance to reflection transformations.
Preview Click to apply the current augmentation values to the current image frame.
Intensity: When enabled, pixel intensity values of training images are randomly adjusted in brightness to simulate lighting variations and improve generalization to diverse imaging conditions.

Advanced

Click for more

Advanced contains additional configuration options. Contact Technical Services for help with these settings by completing the Request Support form.

Train Part Affinity Field: Enables training of the Part Affinity Field (PAF) branch, which learns vector fields representing the spatial relationships between detected keypoints. When enabled, the network is optimized to predict both keypoint locations and their connectivity, improving multi-point grouping into coherent object instances. Typically required when the model must associate multiple points into structured objects (e.g., skeletons, articulated parts). Disabling reduces computation time but removes learned connectivity information.
Train Base: Controls whether the backbone (base feature extractor) weights are updated during training.
- Enabled: The backbone is fine-tuned jointly with the detection heads. This option typically yields higher accuracy when sufficient data is available.
- Disabled: The backbone is frozen and only the task-specific heads are trained. This option can reduce training time on small datasets.
Point Size Regression: Check the box to enable.
- Enabled: Enables an additional regression head that predicts the spatial extent (size or radius) associated with each detected keypoint. This allows the model to estimate not only the point location but also its scale. Useful when downstream processing depends on point footprint or object thickness.
- Disabled: Reduces model complexity when only point locations are required.
Training focus: Determines how training samples are generated from the annotated dataset, controlling whether the network is trained on full images or cropped regions centered on annotated targets. Choose one of the following from the dropdown menu:
- Image: Uses the full training images as input. Objects are learned within their natural scene context.
  - Slower convergence
  - Better background modeling
  - Lower risk of false positives
  - Recommended when sufficient training time and data are available
- Object: Generates cropped training patches centered on each annotated object. The network focuses on object local appearance with reduced background context.
  - Faster convergence
  - Higher effective sample count
  - Increased risk of false positives due to reduced background exposure
- Point: Similar to Object mode but centered on annotated keypoints. Small patches are extracted around each point instance.
  - Fastest convergence for keypoint-centric tasks
  - Strong focus on local features
  - Most prone to false positives if background variability is insufficient
  Convergence may be accelerated by first training for a few epochs using Object or Point focus, then switching to Image focus to refine background discrimination and reduce false positives. This staged strategy often combines fast initial learning with improved final robustness.
Training epoch: Specifies the total number of global training passes performed by the model. Each epoch represents one full training cycle using the currently sampled and augmented dataset.
Training frames per epoch: Defines the number of training frames randomly selected from the labeled dataset for use in each epoch. When the full dataset is large, training on all frames may degrade convergence; instead, a random subset of the specified size is sampled each cycle.
- Sampling is performed randomly at each re-sampling step.
- Lower values increase stochasticity and speed but may slow final convergence.
- Higher values increase dataset coverage but may reduce training efficiency.
Iteration per epoch: Specifies how many times the model is trained over the sampled dataset within a single epoch. In most workflows this should remain set to 1, meaning each sampled frame contributes once per epoch.
- Values greater than 1 effectively reuse the same sampled data multiple times per epoch.
- Increasing this value may increase over-fitting risk without additional data diversity.
Sampling frequency: Determines how often the training dataset is re-sampled and augmented from the original labeled inputs. Every specified number of epochs, a new random subset of frames is selected and augmentation is regenerated.
- Lower values increase data diversity during training.
- Higher values keep the same sampled set longer, improving stability but reducing variability.
- Typically balanced with Training frames per epoch.
Augmentation per frames: Specifies the number of augmented samples generated from each original training frame. Higher values increase dataset diversity by applying multiple random augmentations to the same source image, which can improve generalization.
- Increases effective dataset size without requiring additional labeled data.
- Higher values increase training time proportionally.
Optimizer: Selects the optimization algorithm used to update network weights during training.
- Adam: Adaptive learning-rate method with momentum terms; typically converges faster and requires less manual tuning.
- SGD: Stochastic Gradient Descent (SGD) often provides better final generalization but may require careful learning rate scheduling.
Learning rate: Defines the step size used by the optimizer when updating model weights. The learning rate strongly influences convergence speed and training stability.
- Too high: training may diverge or oscillate.
- Too low: training may converge very slowly or get stuck in suboptimal minima.
- Often used in conjunction with learning rate scheduling.

Training progress window

Object detectors perform two functions: classification and regression. Classification determines if something is a target object and regression determines the location of the object. The training progress window provides real-time feedback on the status and quality of detector training. It enables users to monitor detection performance, classification behavior, regression accuracy, and overall training progress. Training can be stopped manually at any time, or automatically when training performance stabilizes.

mAP50 (percent): This graph shows mAP50, a standard metric that summarizes object detection quality.

mAP50 measures how accurately the objects are detected and ranks its predictions.
A detection is considered correct if its predicted location overlaps a true object by at least 50%.
The curve shows how mAP50 evolves as training progresses.
- A rising curve indicates improving detection performance.
- A flat or oscillating curve suggests the model is approaching convergence.
- Small fluctuations are normal during training.

Point Classification Rate (percent): This graph reports classification-related statistics for detected objects.

Class Error: The raw classifier error.
True Positive: The fraction of ground-truth objects correctly detected.
False Positive: The fraction of detections that do not correspond to a real object.

Point Regression (pixels): This graph shows the regression accuracy.

Regression Error: Measures how accurately the model predicts object size and position.

It is recommended to terminate the training process by clicking Stop Training once True Positive, False Positive, and Regression Error have stabilized.

Progress: Status bar indicates how far training has progressed.

Overall Training: The percentage of training that has been performed.

Training:

Stop Training Click to immediately stop the training process. It is recommended to stop the training process once True Positive, False Positive, and Regression Error have stabilized.
Auto Stop: Check the box to automatically stop training once mAP50 performance has converged.
Min mAP50 Delta: The smallest improvement in mAP50 considered meaningful.

If mAP50 does not improve by at least the specified delta over the selected window, performance is considered converged and training is automatically stopped.

Window Size: Number of recent epochs over which improvement is evaluated.