On the dynamics of active documents for distributed data management

Pierre Bourhis 1
1 DAHU - Verification in databases
CNRS - Centre National de la Recherche Scientifique : UMR8643, Inria Saclay - Ile de France, ENS Cachan - École normale supérieure - Cachan, LSV - Laboratoire Spécification et Vérification [Cachan]
Abstract : One of the major issues faced by Web applications is the management of evolving of data. In this thesis, we consider this problem and in particular the evolution of active documents. Active documents is a formalism describing the evolution of XML documents by activating Web services calls included in the document. It has already been used in the context of the management of distributed data \cite{axml}. The main contributions of this thesis are theoretical studies motivated by two systems for managing respectively stream applications and workflow applications. In a first contribution, we study the problem of view maintenance over active documents. The results served as the basis for an implementation of stream processors based on active documents called Axlog widgets. In a second one, we see active documents as the core of data centric workflows and consider various ways of expressing constraints on the evolution of documents. The implementation, called Axart, validated the approach of a data centric workflow system based on active documents. The hidden Web (also known as deep or invisible Web), that is, the partof the Web not directly accessible through hyperlinks, but through HTMLforms or Web services, is of great value, but difficult to exploit. Wediscuss a process for the fully automatic discovery, syntacticand semantic analysis, and querying of hidden-Web services. We proposefirst a general architecture that relies on a semi-structured warehouseof imprecise (probabilistic) content. We provide a detailed complexityanalysis of the underlying probabilistic tree model. We describe how wecan use a combination of heuristics and probing to understand thestructure of an HTML form. We present an original use of a supervisedmachine-learning method, namely conditional random fields,in an unsupervised manner, on an automatic, imperfect, andimprecise, annotation based on domain knowledge, in order to extractrelevant information from HTML result pages. So as to obtainsemantic relations between inputs and outputs of a hidden-Web service, weinvestigate the complexity of deriving a schema mapping between databaseinstances, solely relying on the presence of constants in the twoinstances. We finally describe a model for the semantic representationand intensional indexing of hidden-Web sources, and discuss how toprocess a user's high-level query using such descriptions.
Document type :
Complete list of metadatas

Cited literature [16 references]  Display  Hide  Download

Contributor : Abes Star <>
Submitted on : Monday, June 6, 2011 - 10:23:13 AM
Last modification on : Tuesday, February 5, 2019 - 1:46:02 PM
Long-term archiving on : Friday, November 9, 2012 - 2:30:29 PM


Version validated by the jury (STAR)


  • HAL Id : tel-00598299, version 1



Pierre Bourhis. On the dynamics of active documents for distributed data management. Other [cs.OH]. Université Paris Sud - Paris XI, 2011. English. ⟨NNT : 2011PA112003⟩. ⟨tel-00598299⟩



Record views


Files downloads