Does the model struggle more with abstract concepts (art/logos) vs. natural images?
Highlight the reduction in model weight (e.g., from ~300MB to ~30MB).
Analyze if 4-bit (P4) is the "Goldilocks zone" or if information loss in the vision encoder outweighs the memory savings.
Measure the Cosine Similarity drift between the original CLIP and the P4 version.