12 Tutorial VI: Reasoning models and images

12.1 Virtual RStudio

https://simplevm-proxy.bihealth.org/causalmink_100/

12.2 Materials for this session

Ollama API reference; Chat endpoint

Reasoning models

Reasoning model: OpenAI gpt-oss

Image analysis

Vision model: ministral-3
Paper with an interesting application of vision models: Meltzer et al. (2025)

We used state-of-the-art zero-shot classification with multimodal large language models (MLLM) in order to classify individual video frames. Based on the aforementioned literature of manual content analyses of music videos, we selected three major dimensions: (1) revealing or suggestive clothes, (2) sexually suggestivemoves (including dancing), and (3) sexually suggestive poses and facial expr essions.

Example videos