An Efficient Framework for Processing and Analyzing Unstructured Text to Discover Delivery Delay and Optimization of Route Planning in Realtime

Mohammad Alshaer 1, 2
2 BD - Base de Données
LIRIS - Laboratoire d'InfoRmatique en Image et Systèmes d'information
Abstract : Internet of Things (IoT) is leading to a paradigm shift within the logistics industry. The advent of IoT has been changing the logistics service management ecosystem. Logistics services providers today use sensor technologies such as GPS or telemetry to collect data in realtime while the delivery is in progress. The realtime collection of data enables the service providers to track and manage their shipment process efficiently. The key advantage of realtime data collection is that it enables logistics service providers to act proactively to prevent outcomes such as delivery delay caused by unexpected/unknown events. Furthermore, the providers today tend to use data stemming from external sources such as Twitter, Facebook, and Waze. Because, these sources provide critical information about events such as traffic, accidents, and natural disasters. Data from such external sources enrich the dataset and add value in analysis. Besides, collecting them in real-time provides an opportunity to use the data for on-the-fly analysis and prevent unexpected outcomes (e.g., such as delivery delay) at run-time. However, data are collected raw which needs to be processed for effective analysis. Collecting and processing data in real-time is an enormous challenge. The main reason is that data are stemming from heterogeneous sources with a huge speed. The high-speed and data variety fosters challenges to perform complex processing operations such as cleansing, filtering, handling incorrect data, etc. The variety of data – structured, semi-structured, and unstructured – promotes challenges in processing data both in batch-style and real-time. Different types of data may require performing operations in different techniques. A technical framework that enables the processing of heterogeneous data is heavily challenging and not currently available. In addition, performing data processing operations in real-time is heavily challenging; efficient techniques are required to carry out the operations with high-speed data, which cannot be done using conventional logistics information systems. Therefore, in order to exploit Big Data in logistics service processes, an efficient solution for collecting and processing data in both realtime and batch style is critically important. In this thesis, we developed and experimented with two data processing solutions: SANA and IBRIDIA. SANA is built on Multinomial Naïve Bayes classifier whereas IBRIDIA relies on Johnson's hierarchical clustering (HCL) algorithm which is hybrid technology that enables data collection and processing in batch style and realtime. SANA is a service-based solution which deals with unstructured data. It serves as a multi-purpose system to extract the relevant events including the context of the event (such as place, location, time, etc.). In addition, it can be used to perform text analysis over the targeted events. IBRIDIA was designed to process unknown data stemming from external sources and cluster them on-the-fly in order to gain knowledge/understanding of data which assists in extracting events that may lead to delivery delay. According to our experiments, both of these approaches show a unique ability to process logistics data. However, SANA is found more promising since the underlying technology (Naïve Bayes classifier) out-performed IBRIDIA from performance measuring perspectives. It is clearly said that SANA was meant to generate a graph knowledge from the events collected immediately in realtime without any need to wait, thus reaching maximum benefit from these events. Whereas, IBRIDIA has an important influence within the logistics domain for identifying the most influential category of events that are affecting the delivery. Unfortunately, in IBRIRDIA, we should wait for a minimum number of events to arrive and always we have a cold start. Due to the fact that we are interested in re-optimizing the route on the fly, we adopted SANA as our data processing framework
Complete list of metadatas

https://tel.archives-ouvertes.fr/tel-02310852
Contributor : Abes Star <>
Submitted on : Thursday, October 10, 2019 - 2:48:07 PM
Last modification on : Monday, December 9, 2019 - 2:13:06 PM

File

TH2019ALSHAERMOHAMMAD.pdf
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02310852, version 1

Citation

Mohammad Alshaer. An Efficient Framework for Processing and Analyzing Unstructured Text to Discover Delivery Delay and Optimization of Route Planning in Realtime. Data Structures and Algorithms [cs.DS]. Université de Lyon; Université Libanaise, école doctorale des sciences et technologies, 2019. English. ⟨NNT : 2019LYSE1105⟩. ⟨tel-02310852⟩

Share

Metrics

Record views

273

Files downloads

103