HESML V1R4 Java software library of ontology-based semantic similarity measures and information content models
datasetposted on 19.07.2019 by Juan J. Lastra-Díaz, Ana Garcia-Serrano
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
HESML V1R4 is the fourth release of the Half-Edge Semantic Measures Library (HESML) detailed in , which is a new, linerarly scalable and efficient Java software library of ontology-based semantic similarity measures and Information Content (IC) models based on WordNet. HESML V1R4 implements most ontology-based semantic similarity measures and Information Content (IC) models based on WordNet reported in the literature, as well as the evaluation of three pre-trained word embedding models. It also provides a XML-based input file format in order to specify the execution of reproducible experiments on WordNet-based similarity, even with no software coding. HESML V1R4 introduces the following novelties: (1) a software implementation for the evaluation of three pre-trained word embedding file formats which support most of state-of--the-art models reported in the literature; (2) a software implementation of an intrinsic IC model and two new IC-based semantic similarity measures introduced by Cai et al. (2017); (3) a software implementation of a fast approximation of the Wu&Palmer (1994) measure commonly used in the literature; (4) the integration of a very large set of word similarity benchmarks; and finally (5), the correction of an error in our software implementation of the Leacock&Chodorow (1998) measure in previous HESML versions. HESML library is freely distributed for any non-commercial purpose under a CC By-NC-SA-4.0 license, subject to the citing of the main HESML paper  as attribution requirement. On other hand, the commercial use of the similarity measures introduced in , as well as part of the intrinsic IC models introduced in  and , is protected by a patent application . In addition, any user of HESML must fulfill other licensing terms described in  related to other resources distributed with the library. References:  Lastra-Díaz, J. J., García-Serrano, A., Batet, M., Fernández, M., & Chirigati, F. (2017). HESML: a scalable ontology-based semantic similarity measures library with a set of reproducible experiments and a replication dataset. Information Systems, 66, 97–118.  Lastra-Díaz, J. J., & García-Serrano, A. (2015). A novel family of IC-based similarity measures with a detailed experimental survey on WordNet. Engineering Applications of Artificial Intelligence Journal, 46, 140–153.  Lastra-Díaz, J. J., & García-Serrano, A. (2015). A new family of information content models with an experimental survey on WordNet. Knowledge-Based Systems, 89, 509–526.  Lastra-Díaz, J. J., & García-Serrano, A. (2016). A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet. Universidad Nacional de Educación a Distancia (UNED).  Lastra Díaz, J. J., & García Serrano, A. (2016). System and method for the indexing and retrieval of semantically annotated data using an ontology-based information retrieval model. USPTO App, US2016/0179945 A1.