TabbyXL2: Experiment Data
datasetposted on 19.07.2019 by Alexey Shigarov, Vasiliy Khristyuk
Datasets usually provide raw data for analysis. This raw data often comes in spreadsheet form, but can be any collection of data, on which analysis can be performed.
This dataset is designed to evaluate TabbyXL2, v1.0.1., a tool for the rule-based transformation of spreadsheet data from arbitrary to relational tables, that is freely available at GitHub (https://github.com/cellsrg/tabbyxl2/releases/tag/v1.0.1). Our source data are based on the existing dataset of tables called Troy_200 (http://tc11.cvc.uab.es/datasets/Troy_200_1) that contains 200 arbitrary tables as CSV files collected from 10 different government statistical websites. We use its earlier version that stores the original tables with style features (fonts, alignment, and indentation) as Excel spreadsheets (available at http://tango.byu.edu/data). The dataset contains the following material: 1. All of Troy_200 tables with style features put into a single spreadsheet file; 2. The ground-truth data we prepared for the automatic performance evaluation of TabbyXL2 in the role and structural stages of the table analysis; 3. CRL and CLP rulesets designed for transforming Troy_200 arbitrary tables into the relational form; 4. The log files with the results of the program running and with the results of the performance evaluation of TabbyXL2. The dataset provides all required data to reproduce the automatic performance evaluation of TabbyXL2, using three following options: 1. TabbyXL2 automatically generates Java source code from CRL rules with our CRL interpreter and compile it to Java byte code, and then runs this generated program with JRE. 2. TabbyXL2 automatically maps CRL rules to DRL ones with the DSL specification and runs the executing them with Drools Expert (https://www.drools.org) rule engine. 3. TabbyXL2 runs the executing CLP ruleset corresponding to our CRL ruleset with JESS (http://www.jessrules.com) rule engine. The performance evaluation confirms the applicability of the implemented rulesets to process a bunch of different arbitrary tables of the same genre (government statistical websites). The experiment demonstrates that our tool, TabbyXL2, can be used for developing programs for the transformation of spreadsheet data into the relational form. README.md file included in this dataset provides a detail description of the data and steps to reproduce the experiment.