The PAISÀ Corpus of Italian Web Texts
Verena Lyding, Egon Stemle, Claudia Borghetti, Marco Brunello, Sara Castagnoli, Felice Dell Orletta, Henrik Dittmann, Alessandro Lenci, Vito Pirrelli
April 2014Abstract
PAISÀ is a Creative Commons licensed, large web corpus of contemporary Italian. We describe the design, harvesting, and processing steps involved in its creation.
Publication
Proceedings of the 9th Web as Corpus Workshop (WaC-9)