ITALIC

ITALIC is a benchmark developed by the University of Milano-Bicocca to evaluate how well language models can understand and follow instructions in Italian. It includes tasks like question answering, summarization, translation, and reasoning, with a focus on zero-shot and few-shot settings. The goal is to test models in realistic, instruction-based scenarios that reflect practical language use.

ITALIC

Italian Language Instructional Comprehension

Model Name ORTOGRAPHY SYNTAX LITERATURE CIVIC EDUCATION ART HISTORY LEXYCON GEOGRAPHY TOURISM MORPHOLOGY CURRENT EVENTS SINONIMYS HISTORY

Vitruvian_Scientist-14B

65,30% 64,20% 73,50% 73,70% 67,70% 82,70% 76,90% 71,90% 52,10% 79,30% 85,20% 77,00%

Vitruvian_Explainer-14B

66,8% 68,1% 71,40% 71,70% 68,40% 82,70% 78,70% 68,70% 56,40% 82,60% 85,30% 77,90%

Vitruvian_Smart-12B

63,70% 65,20% 74,40% 77,00% 72,20% 84,40% 81,60% 72,90% 52,90% 80,40% 85,50% 77,70%

LLamAntino-3-8B

52,83% 54,47% 67,68% 67,32% 68,67% 76,61% 76,92% 70,82% 40,00% 79,35% 76,42% 74,85%

Llama-3.1-8b-Ita

53,04% 53,65% 67,17% 71,22% 70,10% 81,51% 79,26% 71,73% 52,14% 82,61% 81,15% 77,40%

maestrale-chat-v0.4

54,17% 55,40% 70,93% 70,20% 67,35% 78,65% 76,61% 69,08% 44,29% 80,43% 70,34% 66,22%

Almavave-Velvet-14B

44,08% 45,63% 69,21% 72,56% 67,86% 77,32% 76,81% 71,43% 42,86% 83,70% 70,03% 66,40%

iGenius-Italia-9b

32,13% 31,35% 54,47% 52,00% 55,20% 59,14% 62,82% 59,18% 27,14% 65,22% 43,46% 50,00%

Fastweb-MIIA-7B

40,31% 44,60% 62,70% 59,92% 57,96% 63,13% 65,58% 57,55% 29,29% 71,74% 52,68% 55,34%

Minerva-7B

25,75% 27,13% 40,35% 44,50% 46,33% 45,45% 49,13% 51,33% 32,86% 52,17% 38,31% 42,66%

Question Example

1/3
Since the benchmark tasks are in Italian, both questions and answers are shown in their original language to preserve fidelity and meaning