Information extraction with active learning : a case study in legal text
Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.
Main Authors: | , , , |
---|---|
Format: | acceptedVersion |
Language: | eng |
Published: |
2022
|
Subjects: | |
Online Access: | http://hdl.handle.net/11086/27448 |
_version_ | 1801215608749031424 |
---|---|
author | Cardellino, Cristian Adrián Villata, Serena Alonso i Alemany, Laura Cabrio, Elena |
author_facet | Cardellino, Cristian Adrián Villata, Serena Alonso i Alemany, Laura Cabrio, Elena |
author_sort | Cardellino, Cristian Adrián |
collection | Repositorio Digital Universitario |
description | Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. |
format | acceptedVersion |
id | rdu-unc.27448 |
institution | Universidad Nacional de Cordoba |
language | eng |
publishDate | 2022 |
record_format | dspace |
spelling | rdu-unc.274482022-10-13T11:08:41Z Information extraction with active learning : a case study in legal text Cardellino, Cristian Adrián Villata, Serena Alonso i Alemany, Laura Cabrio, Elena Active learning Natural language processing Ontology-based information extraction acceptedVersion Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Fil: Villata, Serena. Institut National de Recherche en Informatique et en Automatique; France. Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Fil: Cabrio, Elena. Institut National de Recherche en Informatique et en Automatique; France. Active learning has been successfully applied to a number of NLP tasks. In this paper, we present a study on Information Extraction for natural language licenses that need to be translated to RDF. The final purpose of our work is to automatically extract from a natural language document specifying a certain license a machine-readable description of the terms of use and reuse identified in such license. This task presents some peculiarities that make it specially interesting to study: highly repetitive text, few annotated or unannotated examples available, and very fine precision needed.In this paper we compare different active learning settings for this particular application. We show that the most straightforward approach to instance selection, uncertainty sampling, does not provide a good performance in this setting, performing even worse than passive learning. Density-based methods are the usual alternative to uncertainty sampling, in contexts with very few labelled instances. We show that we can obtain a similar effect to that of density-based methods using uncertainty sampling, by just reversing the ranking criterion, and choosing the most certain instead of the most uncertain instances. acceptedVersion Fil: Cardellino, Cristian Adrián. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Fil: Villata, Serena. Institut National de Recherche en Informatique et en Automatique; France. Fil: Alonso i Alemany, Laura. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina. Fil: Cabrio, Elena. Institut National de Recherche en Informatique et en Automatique; France. Otras Ciencias de la Computación e Información 2022-07-25T14:43:53Z 2022-07-25T14:43:53Z 2015 article http://hdl.handle.net/11086/27448 eng De la versión publicada: https://doi.org/10.1007/978-3-319-18117-2_36 Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/ Impreso ISSN: 0302-9743 |
spellingShingle | Active learning Natural language processing Ontology-based information extraction Cardellino, Cristian Adrián Villata, Serena Alonso i Alemany, Laura Cabrio, Elena Information extraction with active learning : a case study in legal text |
title | Information extraction with active learning : a case study in legal text |
title_full | Information extraction with active learning : a case study in legal text |
title_fullStr | Information extraction with active learning : a case study in legal text |
title_full_unstemmed | Information extraction with active learning : a case study in legal text |
title_short | Information extraction with active learning : a case study in legal text |
title_sort | information extraction with active learning a case study in legal text |
topic | Active learning Natural language processing Ontology-based information extraction |
url | http://hdl.handle.net/11086/27448 |
work_keys_str_mv | AT cardellinocristianadrian informationextractionwithactivelearningacasestudyinlegaltext AT villataserena informationextractionwithactivelearningacasestudyinlegaltext AT alonsoialemanylaura informationextractionwithactivelearningacasestudyinlegaltext AT cabrioelena informationextractionwithactivelearningacasestudyinlegaltext |