Created at 1pm, Dec 29
firstbatchOther
0
FERRET: REFER AND GROUND ANYTHING ANYWHERE AT ANY GRANULARITY
wPY-YelqhdTEcvh8TzNkMlwWSeleHSiaZQdOgCeeHM0
File Type
PDF
Entry Count
135
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

Ferret, a new Multimodal Large Language Model (MLLM) capable of understanding spatial referring of any shape or granularity within an image and accurately grounding open-vocabulary descriptions.

Random Accuracy () 90.24 86.90 Precision () 97.72 94.40 Recall () 83.00 79.26 F1 Score () 89.76 86.19 43.78 43.26 Yes 88.57 84.09 95.13 89.27 56.57 79.67 78.24 82.20 80.17 52.53 50.37 50.19 99.13 66.64 98.77 50.10 50.05 100.00 66.71 99.90 53.97 52.07 99.60 68.39 95.63 Popular Accuracy () 84.90 83.97 Precision () 88.24 87.55 Recall () 80.53 79.20 F1 Score () 84.21 83.16 45.63 45.23 Yes 82.77 76.27 95.13 84.66 62.37 69.73 65.86 81.93 73.02 62.20 49.87 49.93 99.27 66.44 99.40 50.00 50.00 100.00 66.67 100.00 50.90 50.46 99.40 66.94 98.57
id: 2e382351171474dfd7456fa47948ef37 - page: 13
Adversarial Accuracy () 82.36 83.10 Precision () 83.60 85.60 Recall () 80.53 79.60 F1 Score () 82.00 82.49 48.18 46.50 Yes 72.10 65.13 95.13 77.32 73.03 65.17 61.19 82.93 70.42 67.77 49.70 49.85 99.07 66.32 99.37 50.00 50.00 100.00 66.67 100.00 50.67 50.34 99.33 66.82 98.67 Here are the bounding boxes for the traffic lights in the image:(x_min, y_min, x_max, y_max) = (255, 94, 324, 183)(x_min, y_min, x_max, y_max) = (321, 95, 390, 183)(x_min, y_min, x_max, y_max) = (388, 97, 457, 185)Note: These coordinates represent the top-left and bottom-right corners of the bounding boxes around the traffic lights.
id: eecd24dc7c001fdbb73bcec9590846e6 - page: 13
Grounding: What is the object in red circle used for?The object circled in red on the motorcycle is the exhaust pipe or muffler. It is used to channel and expel exhaust gases from the engine after combustion. The muffler also helps reduce the noise produced by the engine.The object in the red circle appears to be a disc brake on the motorcycle. Disc brakes are used to slow down or stop the motorcycle when the brake lever is applied. What is the object in red circle used for?The object is a pipe, typically used for transporting exhaust gases from a motorcycle. Localize each traffic light in the image using bounding box. The image size is (width, height) = (545, 813) traffic light [obj0], traffic light [obj1], traffic light [obj2], traffic light [obj3].Detect all the objects among {traffic light}. Input:Visualized Output:Visualized Output:GPT-4V:
id: f2ac38b4c7c37a54748950c890994569 - page: 13
What is the object [region0] used for?What is the object [region1] used for?The size of the input image is 775797 pixels Can you tell me what is the object in the bounding box (x1, y1)=(10, 520), (x2, y2)=(190, 600) used for?The object within the specified bounding box appears to be the exhaust or muffler of the motorcycle. It is used to direct and expel exhaust gases from the engine after combustion, reduce the noise produced by the engine.The size of the input image is 775797 pixels Can you tell me what is the object in the bounding box (x1, y1) = (570, 420), (x2, y2) = (620, 480) used for?The object within this area seems to be an engine cover or a badge on the engine. It's commonly used for brand identification and decorative purposes. Ferret:GPT-4V with Colored Region:GPT-4V with Coordinate in Text:Referring:
id: 6de84a669148858f80592ea5dd758186 - page: 13
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "wPY-YelqhdTEcvh8TzNkMlwWSeleHSiaZQdOgCeeHM0", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "wPY-YelqhdTEcvh8TzNkMlwWSeleHSiaZQdOgCeeHM0", "level": 2}'