Similar images

Similar images#

Process a new image, embed it, and then retrieve the top similar images, based on embedding similarity (cosine), from the larger dataset of images you just computed batch embeddings for.

from io import BytesIO
from PIL import Image
import numpy as np
import requests
from doggos.embed import get_top_matches, display_top_matches
def url_to_array(url):
    return np.array(Image.open(BytesIO(requests.get(url).content)).convert("RGB"))
# Embed input image.
url = "https://doggos-dataset.s3.us-west-2.amazonaws.com/samara.png"
image = url_to_array(url=url)
embedding_generator = EmbedImages(model_id="openai/clip-vit-base-patch32", device="cpu")
embedding = embedding_generator({"image": [image]})["embedding"][0]
np.shape(embedding)
(512,)
# Top matches by embedding similarity.
embeddings_ds = ray.data.read_parquet(embeddings_path)
top_matches = get_top_matches(embedding, embeddings_ds, n=5)
display_top_matches(url, top_matches)
../../../../../_images/7c6b2ef8926d73725da12375fab0c7029003dccfd82f649bceaa0e940038b95f.png
(autoscaler +12m47s) [autoscaler] [4xT4:48CPU-192GB] Attempting to add 1 node to the cluster (increasing from 0 to 1).
(autoscaler +12m52s) [autoscaler] [4xT4:48CPU-192GB|g4dn.12xlarge] [us-west-2a] [on-demand] Launched 1 instance.
(autoscaler +13m37s) [autoscaler] Cluster upscaled to {104 CPU, 8 GPU}.
(autoscaler +15m32s) [autoscaler] [8CPU-32GB] Attempting to add 1 node to the cluster (increasing from 0 to 1).
(autoscaler +15m32s) [autoscaler] [8CPU-32GB|m5.2xlarge] [us-west-2a] [on-demand] Launched 1 instance.
(autoscaler +16m2s) [autoscaler] [4xT4:48CPU-192GB] Attempting to add 1 node to the cluster (increasing from 1 to 2).
(autoscaler +16m2s) [autoscaler] [4xT4:48CPU-192GB|g4dn.12xlarge] [us-west-2a] [on-demand] Launched 1 instance.
(autoscaler +16m7s) [autoscaler] Cluster upscaled to {112 CPU, 8 GPU}.
(autoscaler +16m52s) [autoscaler] Cluster upscaled to {160 CPU, 12 GPU}.
(autoscaler +19m52s) [autoscaler] Downscaling node i-0e941ed71ef3480ee (node IP: 10.0.34.27) due to node idle termination.
(autoscaler +19m52s) [autoscaler] Cluster resized to {112 CPU, 8 GPU}.
(autoscaler +20m42s) [autoscaler] [1xT4:8CPU-32GB] Attempting to add 1 node to the cluster (increasing from 0 to 1).
(autoscaler +20m47s) [autoscaler] [1xT4:8CPU-32GB|g4dn.2xlarge] [us-west-2a] [on-demand] Launched 1 instance.
(autoscaler +20m47s) [autoscaler] [4xT4:48CPU-192GB] Attempting to add 1 node to the cluster (increasing from 1 to 2).
(autoscaler +20m52s) [autoscaler] [4xT4:48CPU-192GB|g4dn.12xlarge] [us-west-2a] [on-demand] Launched 1 instance.
(autoscaler +21m32s) [autoscaler] Cluster upscaled to {120 CPU, 9 GPU}.
(autoscaler +21m37s) [autoscaler] Cluster upscaled to {168 CPU, 13 GPU}.
(autoscaler +25m22s) [autoscaler] Downscaling node i-0ffe5abae6e899f5a (node IP: 10.0.60.138) due to node idle termination.
(autoscaler +25m27s) [autoscaler] Cluster resized to {120 CPU, 9 GPU}.
(autoscaler +28m22s) [autoscaler] Downscaling node i-0aa72cef9b8921af5 (node IP: 10.0.31.199) due to node idle termination.
(autoscaler +28m27s) [autoscaler] Cluster resized to {112 CPU, 8 GPU}.

🚨 Note: Reset this notebook using the “🔄 Restart” button location at the notebook’s menu bar. This way we can free up all the variables, utils, etc. used in this notebook.