Context-Based Method Using Bayesian Network in Multimodal Fission System

The current technological advancement has created the need to produce machines more powerful, easy to use and to meet the needs of users. To achieve this goal, multimodal systems have been developed to combine multiple modalities depending on the task, preferences and communicative intentions of the users. In this paper, we present a new approach to overcome uncertain or ambiguous knowledge during the fission process in multimodal system. This approach is context-based method using Bayesian network. We also present a modular and distributed architecture, which is very useful in multimodal systems


Introduction
The Human-machine Interaction is the discipline devoted to the design, implementation and evaluation of interactive systems for a human users as well as the study of major phenomena surrounding them.An interactive system is a system whose operation depends on information provided by an external environment that does not control this system.Interactive systems are also known as open systems -or autonomous -the functioning of which can be completely illustrated by algorithms.Refs 1 and 2 Moreover, Interactive systems are the outcome of scientific needs.Their popularity is due to their use by a great number of people.This fact led researchers to develop systems / machines according to the user's need and spread their use on a high scale.The current technological advancement has created the need to produce more powerful machines, easy to use and to meet the needs of users.To achieve these objectives, these machines should interfere in a harmonious way with the user.This could be made possible only if the systems are able to understand human communication.Human communications is conducted via many naturel modalities such as speech, gesture, eyes gaze and facial expressions.Inspired by human communications, multimodal systems were developed to combine many modalities according to the task, preferences and communication intentions.End users could take advantage of natural modalities (e.g.eye gaze, speech, International Journal of Computational Intelligence Systems, Vol. 8, No. 6 (2015) 1076-1090

Co-published by Atlantis Press and Taylor & Francis
Copyright: the authors 1076 gesture, etc.) to communicate or exchange information with applications/machines.These machines, integrated with natural modalities, is an effective solution for users who cannot use a keyboard or a mouse, on users who have visual handicap, on mobile users equipped with wireless telephone/mobile devices, on weakened users, etc.
Therefor the machines must be able to recognize the user's commands and communicate similarly to human with the use of different ways of communication such as words, gestures, voice, etc. Multimodal systems are an effective solution to this problematic.They allow the use of different modalities (gesture, vocal, etc.) to interact with machine, system, applications, etc. Nowadays, we are seeing a movement towards convergence of human-machine / machine-machine interactive systems to multimodal systems that enhance the interaction.These systems use the adequate modalities according to the user's preferences, his/her skill level and the nature of the task.Generally, these systems have a process of understanding.They must also have the ability to interpret data from multiple input modalities generated from sensors and devices (camera, microphone, etc.).This is known as fusion process Refs 3 and 5.The resulting command is then executed in the output device (screen, speakers, projector, etc.).This is known as fission process, the role of the fission module is to subdivide the requests/commands made by the user to elementary subtasks, then associate them to the appropriate and available output modalities Refs 5-7.A well-known example of these systems is the Bolt system "Put That There" Ref.8, where the author used gesture and speech to move objects.Another simple example of multimodal system is the use of the ringer and brightness to indicate a call or GPS which provides visual and audible indications.These systems improve the recognition and the understanding of the environment command (user, robot, machine, etc.) by the machine.This paper focuses on the fission process.We present a new methodological solution by modeling an architecture that facilitates the work of a fission module, by defining an ontology that contains different applicable scenarios and describes the environment, where a multimodal system exists.We detail in this article the use of a probability method, namely the Bayesian network (BN), to perform multimodal fission.
The BN is helpful in the case of ambiguity or uncertainty.This paper discusses these solutions by highlighting the architectural design of the proposed solution.The rest of this paper is organized as follows.Section 2 presents the most important research works related to ours.Section 3 presents the proposed architecture.Section 4 describes the fission algorithm.Section 5 discusses the BN.Section 6 presents a concrete simulation example using CPN-Tools.The paper is concluded in section 7.

Related Work
The Multimodal systems represent a remarkable deviation from the use of conventional systems, such as windows and icons, to a man / machine interaction, providing to the user more naturalness, flexibility and portability.The first multimodal system was created in 1980 by Richard Bolt "Put-That-There" Ref.8.The system is equipped with a microphone and a screen.It allows moving or changing the display of objects on the screen, using the voice accompanied by pointing on the screen.Since Bolt's work, the academic world has provided models and designs offering multimodal interaction techniques Refs 9-12.
The multimodal system is mainly composed of two indispensable modules: fission module Ref. 13 and fusion module Ref.14.
When two or more modalities are invoked at the same time (e.g.speech and clicking a mouse button), the user invokes complementarities of these modalities.This provides a rationale for the invocation of multimodal fusion and the fusion engine which is responsible for determining the meaning of such complementarities.
The fission is the way to segment the data that will be presented to the user according to the available modalities and context Ref.15.The fission module is a crucial component of multimodal systems.However, most research in multimodal systems focus more on the fusion than the fission.This point is supported by "There isn't much research done on fission of output modalities because most applications use few different output modalities therefore simple and direct output mechanisms are often used" 5 and by "Multimodal fission is a research topic that is not often addressed in the scientific community." 14.

Co-published by Atlantis Press and Taylor & Francis
Copyright: the authors 1077 This module is the main subject in our work.We focus specially on 1) the services connected to the output: multimodal fission and, 2) the creation of a multimodal interaction system.
The few examples presented in the literature about the fission process are very simple or question-response based process, that provide limited choices to the user.The user is therefore restricted to some particular command (predefined fission).For example, in the system presented in Ref.17, the fission process is very simple.It detects the state of the driver.It captures information from sensors installed in the car and then it tests if the values entered are in a specific range, and through this test the system will generate alerts like sonar alerts or vibration of the wheel.Most of the architectures presented for multimodal systems are dedicated to very specific modalities Ref 18-20.Also, most of the multimodal systems presented in the literature don't mention solutions to overcome the problem of uncertainty or ambiguity during the fission or fusion process.In Ref.17, they use the BN in the fusion process to model human fatigue and to predict fatigue based on the visual cues obtained when a driver is driving a car.The BN presented by the authors don't deal directly with fusion process but just to determine a parameter (level of fatigue) that affects the decision in the fusion process In this work we introduce a probabilistic reasoning to integrate uncertain and ambiguous knowledge.We use BN that we adapted using the contextual information such as temperature, history, user status, etc. to resolve the problem of ambiguity and uncertainty in the fission system.

Architectural design
In this section, we present our proposed architectural design as a solution to the described problems of previous architectures.
A general overview of the architectural design is shown in Fig. 1.The proposed system architecture is modular and distributed.All the components can be installed in different places in the network (robot, computer, server, etc.).The architecture is composed mainly of 7 modules/components.These components are as follows: Command Extraction: this module has in input different XML files from applications.The output corresponds to the command that our system will process.The system interacts with many applications.It exchanges data (information about environment, command, user preference, etc.) using XML files.The purpose of this module is to extract the command from the XML files and send it to the fission module; Context Acquisition Module: the goal of this module is to select the adequate modality (ies) according to the user's preference, status of media and the status of the environment.

Co-published by Atlantis Press and Taylor & Francis
Copyright: the authors

This module has in input:
The state of the environment obtained from the captors.It selects the appropriate modalities depending on information collected from the captors such as the noise level.It is understood that the use of the audio modality is affected by this information.For example, if the noise level is high, the use of the audio modality is disabled.This is called environment context Ref. 21 ; The user location and the status of the user.It defines the user's ability to use certain modalities.For example if the user is blind the display modality is disabled and if the user is in a library the audio modality is disabled.This is called user context Ref. 21 ; The state of the media.This allows to determine the capacity and the type of system that we use.For instance if the battery of the cell phone is low then the system disables the audio modality and decreases the level of brightness.This is called system context Ref. 21. Modality(ies)-subtasks Association: the goal of this module is to select the adequate modality (ies) for a given subtask.Their inputs are the selected modality (ies) and the list of subtasks obtained from Context Acquisition Module and the Fission Module, respectively.The output contains the list of subtasks with the adequate modality (ies).This module sends queries to the ontology to find the matching pattern Ref. 22 of subtask selection.In our case, pattern is defined by two parts: problem that presents the model of the command and solution that contains the suitable subtasks; Ontology: is the knowledge base that describes every detail in the environment (objects and events, etc.).It also contains all possible patterns Ref. 6 of the subtask selection and the modalities selection; Fission module: is the crucial module in our architecture.It decomposes a command to elementary subtasks.This module has in input the command obtained from Command Extraction.The output contains: the list of subtasks which are then sent to the subtasks executions module, or a feedback message if the command is incorrect, or the command to the BN module in the uncertain case.This module is detailed in section 4.
For more details about the previous modules, the reader can refer to Ref. 6 and Ref. 3. Bayesian Network Module: in case of ambiguity or uncertainty, this module allows interaction with the ontology to make a decision using a probabilistic method.The input of this module is the ambiguous command.The output provides the decision to the feedback module.

Fission algorithm
The fission process is the crucial part of the multimodal system.The fission is known as the way to subdivide commands to elementary subtasks and present them to the user according to the available modalities and context.Consequently, we can say that the goal of multimodal fission is to pass from a presentation independent of modalities to a coordinated and coherent multimodal presentation.In general, the fission rule is simple (Fig. 2): if a complex command (CC) is presented, then a list of subtasks with suitable modalities (and its parameters) is deduced.

Fission
With: = sub-task .= output modality .= complex command.l and k are different from m and n because it depends on the sub-tasks.For example, for some sub-tasks we will use only two terms even if we have three available modalities.

Co-published by Atlantis Press and Taylor & Francis
Copyright: the authors 1079 In equation (1), symbol indicates that we can use either one or several modalities to present a sub-task.For example, if we present a text to the user, we use audio or display.The symbol indicates that we use the available modalities together to present a sub-task.Stages of fission process are described in Fig. 3.The fission process starts when the system receives an XML file from a given application.Then it extracts the command from the XML file and the system interacts with the ontology to get the meaning for every word (this part is detailed in Ref. 3).
In the case of ambiguity or uncertainty decision the system uses BN to resolve the problem (this part is detailed in section 5) or else it gets the model of the command from the ontology.Then the system sends a query (containing the problem parameters) to the ontology to find the matching pattern.Pattern is defined with two parts: problem and solution.In our case the pattern problem presents the model of command and the pattern solution the list of subtasks.If the pattern is found the system gets the list of subtasks and selects for every subtask the adequate modality (ies) (detailed in Ref.6) or else the system will try to find a similar command in our ontology.If not found, the system creates elementary subtasks and asks a feedback from the user, or else the system finds the missing subtask using a BN (detailed in section 3) and asks a feedback from the user.

Bayesian network module
BN are particularly suitable for "collecting and representing knowledge on uncertain domains but also enable to perform probabilistic calculus and statistical analyses in an efficient manner.The difference of BN, in comparison with other classical methods, is their polyvalence.They allow dealing with issues such a prediction or diagnosis, optimization, data analysis of feedback experience, deviation detection and model updating "Refs 23 and 24 Bayesian model has several important advantages that make it suitable for our purpose.We employ the Bayesian networks to determine, in case of uncertainly, the suitable subtasks to select during the process of the multimodal fission.Therefore the fission process will be autonomous to take decision in case of uncertainty (this case will be illustrated with reel example in section 6.1) and maintain the knowledge base updated in case of missing subtasks (self-learning).(This case will be illustrated with reel example in section 6.2) Ref. 25 As we mentioned in the previous section, during the fission process, the system should make a decision between several equal choices.In uncertain or ambiguous cases, the BN is an effective solution to solve this problem.In this section, we present the definition of the BN, then we present an example of how it is used in data mining and then how we adapt this method to use it within the context to overcome the problem of uncertainty in multimodal fission.

Co-published by Atlantis Press and Taylor & Francis
Copyright: the authors 1080

Definition
A BN provides a mechanism for graphical representation of uncertain knowledge and for inferring high-level activities from the observed data.
A BN is a graph in which nodes represent random variables and the links are influences between variables.The graph is acyclic: it contains no loop.Ref.26.

Fig. 4. Relationship between two nodes in a BN
The arc produces a probability distribution where the parent has the a priori probability P (p), and the child conditional probability P (c |p) (Fig. 4.).BN are used in many fields for diagnosis (medical and industrial), risk analysis, spam detection and data mining.

BN in Data Mining
For an effective research, we use a semantic Bayesian graph that combines a semantic inference and probability.
As we can see in Fig. 5, a semantic Bayesian graph consists of several layers of nodes: queries, keywords, concepts and target resources.
At the first level of abstraction, keywords are used in user queries.Each keyword is connected to one or more concepts with a certain probability.Finally, each concept refers to a resource.The relationship between concepts and resources are semantic links.

Fig. 5. Semantic Bayesian network
The search of resources based on a semantic Bayesian graph is performed as follows: First, the query is parsed and keywords are selected.
Then the probability layer between keywords and concepts layer is calculated: The idea in data mining is as follows: for each concept, its probability is calculated based on the conditional probabilities of the keywords of the user query P(M|c).Thereafter, it chooses the concept that has the highest probability.Since each concept's node points to a target resource, then the resource related to the chosen concept is returned as a response.
The conditional probability P (M|c) is assigned manually by consulting an expert in each field and P (c) generally equals 1 to give the same weight to each concept.
From this perspective, we present in the next paragraph the adaptation of this method to our case with the use of context.

BN applied in fission process
We can use the same concept to select data from the ontology in the case of uncertainly or ambiguity.We can represent the adaptation of the BN with context by the equation ( 4.3) With n presents the number of context.m presents the number of ambiguous concepts.Con = context, C= concept and P = probability.Each context is connected to one or more concepts with a certain probability.

Co-published by Atlantis Press and Taylor & Francis
Copyright: the authors 1081 As mentioned previously, we set ( 1) = ( 2) = ( 3) = 1.We do so to give the same weight to every concept.As we can see, P(C3|Co) has the highest value.This means that the concept C3 = {Street} is the most probable request the user meant.The system will seek feedback from the user "what is the number of the building in the Montreal Street?"

Robot control example 1
Suppose that the user gives an order to the robot "give me water".As we can see in the ontology (Fig. 8), we have three possible cases: add water in glass or add water in casserole or add water in pail (in real life we can have many other choices, but we limit it to what we have in our ontology).We are in the case of uncertainly.We use the BN to resolve this problem.As shown in Figure 4.9, different contexts can influence the decision making.Each context is associated to all ambiguous concepts with a corresponding probability.

Fig. 8. Example water ontology
Using equation ( 1), we calculate the a posteriori probabilities for each concept {glass, casserole, pail}.Table 4.1 shows all likelihood probabilities.Suppose that captors, installed in the robots detect that the user is in the backyard, he is not thirsty, the temperature is higher than 25 degrees and he is gardening.
We calculate the a posteriori probabilities for every concept using equation (4.1) as in the previous example.) has the highest probability.The system will choose the pail to bring the water.
In this stage, we presented two different ways to use the BN for two different applications.In the next example, we present the case of similar command and we show how the system finds the missing subtask as presented in section 4.

Robot control example 2
Assume that the user asks a robot "prepare a coffee with milk".Suppose that we already have in our ontology a pattern for "prepare a coffee" that contains all necessary subtasks.Here we have similar commands (in red) and we have to find the missing subtasks ("prepare a coffee with milk").In this case we select the adequate pattern for the command "prepare a coffee" and we find the missing subtask.Milk presents a liquid in our ontology and we have already defined some possible elementary subtasks for liquid and objects as shown in Figure 10.We have three possible decisions: either drink liquid or add liquid in content or pour liquid in content.To make a decision, the system uses BN (Fig. 11).We have three concepts: C1 {drink}, C2 {add} and C3 {pour}.We have two keywords related to these concepts with and liquid.In this case we use the keywords as used in data mining.As shown in Fig. 11, the two keywords are related to the concepts with probabilities of likelihood: the subtasks of the command "prepare a coffee" already existing in the ontology.The ontology is then updated with the new pattern.Simulation To understand the mechanism of fission using BN we demonstrate a simulation of a scenario.
In this scenario, we assume that in our multimodal system, a GPS indicates the directions to go to a specific location.For instance "I want to go to Washington".We first model and simulate the architecture defined in Fig. 1.The steps of each strategy are modeled using the colored Petri Net Ref. 27 formalism and simulated using the CPN Tools Ref. 28.
Fig. 12 shows the general view of the architecture.It is composed mainly by 8 modules: Generator: this module generates events as random numbers to select a command in the place "command".T_parser: this module decomposes the command into words.
T_Fission: its role is to divide the command to elementary subtasks.
T_Grammar: permits to verify the meaning and the grammar of the command.Modality(ies) Selection: this module allows to select the available modalities depending on the state of the environment.
T_Subtask-Modality_Association: it associates for each subtask the appropriate modality (ies).Ontology: it is a container that stores the patterns as ontology concepts, the models and the vocabulary.
For more details about these modules, the reader can refer to Ref.3.
Bayesian Network: this is the main module presented in this paper.Its role is to handle uncertainty.
As we can see in Fig. 13, the system receives the list of words of the command and the transition list of the command.The meaning for every word are retrieved from the ontology.This is performed in the transitions Decom List and meaning.The role of Decom List is to send every word separately to the meaning transition.The meaning transition interacts with the ontology to get the meaning.
As shown in Fig. 13, according to our example cited above, the place meaning word, Washington may have 7 possible solutions: {city, restaurant, basketball team, president, street, map}.In this case, the system won't be able to choose the right meaning.Then in the transition test of incertitude, the system calculates the size of the meaning list.In our case, the size of meaning is greater than one.This is where the BN starts.

Co-published by Atlantis Press and Taylor & Francis
Copyright: the authors 1085 The role of Random context is to simulate randomly information from captors.As shown in Fig. 14, the transition "collect information from captors" collects information from places: location captors, user state, Application and time.In our example, the captured information is:  The transition likelihood probabilities contains different likelihood probabilities.As shown in Fig. 15, this transition has 2 inputs: from place get probabilities the word Washington and its meaning {city, restaurant, basketball-team, president, street, map} and from captors data place the context {GPS, 17, Washington, not hungry} and for the output the probabilities of likelihood, in our case the probabilities are shown in Table 3.

Conclusion
This paper presents an architecture for multimodal fission that makes use a Bayesian Networks to determine the list of subtasks to assign to different modalities.The fission module allows segmenting the command generated by the machine to elementary subtasks and presents them to the available output modalities.However, most of the multimodal systems studied in the literature use predefined modalities.The available architectures are targeted to specific applications.None of them seems to use any of the approaches belonging to probabilistic models when working with uncertainty, fuzzy or ambiguous real world data in multimodal systems.
In this paper we presented a new context-based method using Bayesian network to resolve the uncertainty problem during the fission process in a multimodal system.This method is useful, to overcome the problem of ambiguity or uncertainty.A concrete example was illustrated in order to show the effectiveness of the contribution, using CPN tool.As a future work, we aim to develop a real application using methods presented in this paper.

Fig. 10 .
Fig. 10.Some possible elementary subtasks for liquid and objects

Fig. 11 .
Fig. 11.Bayesian network presentingThe concept add is therefore chosen as it has the lowest probability.The system adds the subtask "add milk" to

Table 2 .
The captured information Fig. 14.Collect of information from captors