12 Tutorial VI: Reasoning models and images
12.1 Virtual RStudio
12.2 Materials for this session
Reasoning models
- Reasoning model: OpenAI gpt-oss
Image analysis
- Vision model: ministral-3
- Paper with an interesting application of vision models: Meltzer et al. (2025)
We used state-of-the-art zero-shot classification with multimodal large language models (MLLM) in order to classify individual video frames. Based on the aforementioned literature of manual content analyses of music videos, we selected three major dimensions: (1) revealing or suggestive clothes, (2) sexually suggestivemoves (including dancing), and (3) sexually suggestive poses and facial expr essions.