ITALIC

ITALIC is a benchmark developed by the University of Milano-Bicocca to evaluate how well language models can understand and follow instructions in Italian. It includes tasks like question answering, summarization, translation, and reasoning, with a focus on zero-shot and few-shot settings. The goal is to test models in realistic, instruction-based scenarios that reflect practical language use.

ITALIC

Italian Language Instructional Comprehension

Model Name	ORTOGRAPHY	SYNTAX	LITERATURE	CIVIC EDUCATION	ART HISTORY	LEXYCON	GEOGRAPHY	TOURISM	MORPHOLOGY	CURRENT EVENTS	SINONIMYS	HISTORY
Vitruvian_Scientist-14B	65,30%	64,20%	73,50%	73,70%	67,70%	82,70%	76,90%	71,90%	52,10%	79,30%	85,20%	77,00%
Vitruvian_Explainer-14B	66,8%	68,1%	71,40%	71,70%	68,40%	82,70%	78,70%	68,70%	56,40%	82,60%	85,30%	77,90%
Vitruvian_Smart-12B	63,70%	65,20%	74,40%	77,00%	72,20%	84,40%	81,60%	72,90%	52,90%	80,40%	85,50%	77,70%
LLamAntino-3-8B	52,83%	54,47%	67,68%	67,32%	68,67%	76,61%	76,92%	70,82%	40,00%	79,35%	76,42%	74,85%
Llama-3.1-8b-Ita	53,04%	53,65%	67,17%	71,22%	70,10%	81,51%	79,26%	71,73%	52,14%	82,61%	81,15%	77,40%
maestrale-chat-v0.4	54,17%	55,40%	70,93%	70,20%	67,35%	78,65%	76,61%	69,08%	44,29%	80,43%	70,34%	66,22%
Almavave-Velvet-14B	44,08%	45,63%	69,21%	72,56%	67,86%	77,32%	76,81%	71,43%	42,86%	83,70%	70,03%	66,40%
iGenius-Italia-9b	32,13%	31,35%	54,47%	52,00%	55,20%	59,14%	62,82%	59,18%	27,14%	65,22%	43,46%	50,00%
Fastweb-MIIA-7B	40,31%	44,60%	62,70%	59,92%	57,96%	63,13%	65,58%	57,55%	29,29%	71,74%	52,68%	55,34%
Minerva-7B	25,75%	27,13%	40,35%	44,50%	46,33%	45,45%	49,13%	51,33%	32,86%	52,17%	38,31%	42,66%

Question Example

1/3

Since the benchmark tasks are in Italian, both questions and answers are shown in their original language to preserve fidelity and meaning