3 Min

Integrating MLflow into ultralytics/YOLOv5

The MLflow Tracking component is an API and UI for logging parameters, code versions, metrics, and output files when running your machine learning code and for later visualizing the results. MLflow Tracking lets you log and query experiments using Python, REST, R API, and Java API APIs.

Automatic Logging

In most cases, if the model is created using well-known libraries, it should be simple to integrate MLflow by just calling the method:

mlflow.autolog()

This will automatically initiate MLflow and collect the necessary data (params, artifacts, metrics, etc.), which are then sent to the tracking server.

The following libraries support autologging:

Integration with YOLOv5

YOLOv5 is an open-source library that is available at: https://github.com/ultralytics/yolov5

As YOLOv5 is already integrated with commercial tracking engines such as W&B (wandb), we wanted to experience its integration with freely available MLflow and compare the differences.

Unfortunately, in the case of YOLOv5, we were not able to use autologging as the automatic logging for PyTorch requires the model to be trained using PyTorch Lightning. This was not the case for YOLOv5, which meant we had to get deeper into the code and manually call MLflow APIs.

YOLOv5 is written pretty much from scratch and only uses torch abilities in certain areas. This helped us here, as it was easy to pinpoint the location where we had to call our MLflow APIs.

Step 1: Start and Stop MLflow tracking

Note: “with” in python will also stop the MLflow after its scope end
  • Step 2: When MLflow is started, log the necessary params
Logging all non-None input params as an MLflow param
Logging every YOLOv5 hyperparameter as an MLflow param with a “hyper_“ prefix

Step 3: When MLflow is started, log the necessary metrics for each number of steps. The metrics are shown later in a diagram with the steps as its axis. In this case, each epoch is considered as one step.

Logging Precision, Recall, mAP, box loss, obj loss and class loss as mlflow metrics
  • Step 4: When MLflow is started, log the model, if necessary.
    • Note: scikit-learn, torch and keras models can be logged by log_model, but this was not possible with YOLOv5. Instead, we log the models as MLflow artifacts.
Logging the best current model as an MLflow artifact

MLflow Tracking Server

By default, MLflow will place the tracked information in a local folder (based on the working directory) named “mlruns”, and a local tracking server can be started using these data by running this command:

mlflow ui
  • Step 5: Before MLflow is started, set its tracking server.
Experiment name is used to group the experiments

Conclusion

MLflow is a very handy tool when it comes to monitoring and comparing the results of our training experiments. The remote tracking server is easy to set up, does not require considerable resources, and is very powerful.

On the other hand, by unifying the experiment names, several teams can collaborate on the same remote tracking server, which is of course very useful for our purposes here.

Hossein Alizadeh Moghaddam
Full Stack Software Developer
SMS group GmbH