UML2PROV

UML2PROV is a framework aiming at providing any application with the ability to generate provenance information (i.e., to make provenance-aware applications). UML2PROV relies on the application's UML design for creating those artefacts needed to generate provenance. This way, UML2PROV bridges the gap between application design and provenance design, minimising software engineers intervention and without requiring them to have provenance skills.

Designers can follow their preferred software engineering methodology in order to create the UML diagrams representing an application's design, and then, UML2PROV comes into play to automatically generate:

A set of PROV-templates expressing the design of the provenance to be generated. PROV-templates describe the provenance graph specifying some variables acting as placeholder for values that have to be captured while the application is executing. A Java software module, called Bindings Generation Module (BGM), for collecting values of interest as application is running (encoded as variable-value associations called bindings). This BGM can be deployed in the application with a minimal developers intervention. The combination of the PROV-templates with the bindings providing values for the templates' variables generates provenance (PROV Documents) ready to be exploited.


Publications

Automated and non-intrusive provenance capture with UML2PROV

Carlos Sáenz-Adán, Francisco J. García-Izquierdo, Beatriz Pérez, Trung Dong Huynh, Luc Moreau

Data provenance is a form of knowledge graph providing an account of what a system performs, describing the data involved, and the processes carried out over them. It is crucial to ascertaining the origin of data, validating their quality, auditing applications behaviours, and, ultimately, making them accountable. However, instrumenting applications, especially legacy ones, to track the provenance of their operations remains a significant technical hurdle, hindering the adoption of provenance technology. UML2PROV is a software-engineering methodology that facilitates the instrumentation of provenance recording in applications designed with UML diagrams. It automates the generation of (1) templates for the provenance to be recorded and (2) the code to capture values required to instantiate those templates from an application at run time, both from the application's UML diagrams. By so doing, UML2PROV frees application developers from manual instrumentation of provenance capturing while ensuring the quality of recorded provenance.

Computing

Integrating Provenance Capture and UML with UML2PROV: Principles and Experience

Carlos Sáenz-Adán, Beatriz Pérez, Francisco J. García-Izquierdo, Luc Moreau

In response to the increasing calls for algorithmic accountability, UML2PROV is a novel approach to address the existing gap between application design, where models are described by UML diagrams, and provenance design, where generated provenance is meant to describe an application's flows of data, processes and responsibility, enabling greater accountability of this application. The originality of UML2PROV is that designers are allowed to follow their preferred software engineering methodology to create the UML Diagrams for their application, while UML2PROV takes the UML diagrams as a starting point to automatically generate: (1) the design of the provenance to be generated (expressed as PROV templates); and (2) the software library for collecting runtime values of interest (encoded as variable-value associations known as bindings), which can be deployed in the application without developer intervention. At runtime, the PROV templates combined with the bindings are used to generate high-quality provenance suitable for subsequent consumption. UML2PROV is rigorously defined by an extensive set of 17 patterns mapping UML diagrams to provenance templates, and is accompanied by a reference implementation based on Model Driven Development techniques. A systematic evaluation of UML2PROV uses quantitative data and qualitative arguments to show the benefits and trade-offs of applying UML2PROV for software engineers seeking to make applications provenance-aware. In particular, as the UML design drives both the design and capture of provenance, we discuss how the levels of detail in UML designs affect aspects such as provenance design generation, application instrumentation, provenance capability maintenance, storage and run-time overhead, and quality of the generated provenance. Some key lessons are learned such as: starting from a non-tailored UML design leads to the capture of more provenance than required to satisfy provenance requirements and therefore, increases the overhead unnecessarily; alternatively, if the UML design is tailored to focus on addressing provenance requirements, only relevant provenance gets to be collected, resulting in lower overheads.

IEEE TSE

Automating Provenance Capture in Software Engineering with UML2PROV

Carlos Sáenz-Adán, Luc Moreau, Beatriz Pérez, Simon Miles, Francisco José García Izquierdo

UML2PROV is an approach to address the gap between application design, through UML diagrams, and provenance design, using PROV-Template. Its original design (i) provides a mapping strategy from UML behavioural diagrams to templates, (ii) defines a code generation technique based on Proxy pattern to deploy suitable artefacts for provenance generation in an application, (iii) is implemented in Java, using XSLT as a first attempt to implement our mapping patterns. In this paper, we complement and improve this original design in three different ways, providing a more complete and accurate solution for provenance generation. First, UML2PROV now supports UML structural diagrams (Class Diagrams), defining a mapping strategy from such diagrams to templates. Second, the UML2PROV prototype is improved by using a Model Driven Development-based approach which not only implements the overall mapping patterns, but also provides a fully automatic way to generate the artefacts for provenance collection, based on Aspect Oriented Programming as a more expressive and compact technique for capturing provenance than the Proxy pattern. Finally, there is an analysis of the potential benefits of our overall approach.

IPAW 2018

UML2PROV: Automating Provenance Capture in Software Engineering

Carlos Sáenz-Adán, Beatriz Pérez, Trung Dong Huynh, Luc Moreau

In this paper we present UML2PROV, an approach addressing the gap between application design, through UML diagrams, and provenance design, using PROV-Template. PROV-Template is a declarative approach that enables software engineers to develop programs that generate provenance following the PROV standard. The main contributions of this paper are: (i) a mapping strategy from UML diagrams (UML State Machine and Sequence diagrams) to templates, (ii) a code generation technique that creates libraries, which can be deployed in an application by creating suitable artefacts for provenance generation, and (iii) a demonstration of the feasibility of UML2PROV implemented with Java, and a preliminary quantitative evaluation that shows benefits regarding aspects such as design, development and provenance capture.

SOFSEM 2018


Contact

Carlos Sáenz-Adán (Universidad de La Rioja / University of La Rioja)

  • Web page
  • carlos.saenz [at] unirioja [dot] es