16 · Inspect the Training Results#
When trainer.fit() finishes, it returns a Result object.
This object contains:
Final metrics → the most recent values reported from the training loop (e.g., loss at the last epoch).
Checkpoint → a reference to the latest saved checkpoint, including its path in cluster storage.
Metrics dataframe → a history of all reported metrics across epochs (accessible with
result.metrics_dataframe).Best checkpoints → Ray automatically tracks checkpoints associated with their reported metrics.
In the output above, you can see:
The final reported loss at epoch 1.
The location where checkpoints are stored (
/mnt/cluster_storage/training/distributed-mnist-resnet18/...).A list of best checkpoints with their corresponding metrics.
This makes it easy to both analyze training performance and restore the trained model later for inference.
# 16. Show the training results
result # contains metrics, checkpoints, and run history
17 · View Metrics as a DataFrame#
The Result object also includes a metrics_dataframe, which stores the full history of metrics reported during training.
Each row corresponds to one reporting step (here, each epoch).
The columns show the metrics you logged in the training loop (e.g.,
loss,epoch).This makes it easy to plot learning curves or further analyze training progress.
In the example below, you can see the training loss steadily decreasing across two epochs.
# 17. Display the full metrics history as a pandas DataFrame
result.metrics_dataframe
To learn more about the training results, see this docs on inspecting the training results.