. @bullet-r-une-fonction-de-récompense, que nous supposons définie sur les états ; R(s) est la récompense obtenue par l'agent pour se trouver dans l'état s, ? ? est un facteur d'atténuation dans [0, 1], qui contrôle l'importance des récompenses espérées dans le futur

R. Dans-le-cadre-du, agent (apprenant) ne connaît initialement que les espaces d'états S et d'actions A, ainsi que le facteur ?. À tout instant t, il connaît l'état courant s t de l'environnement, et choisit une action a t

. De-nombreux-algorithmes-ont-Été-proposés-dans-la-littérature-pour-les-problèmes-de-rlgilbert, avec des récompenses numériques, puis, pour le cadre des récompenses qualitatives, que les approches standards ne traitent pas, nous utilisons l'approche SSB Q-learning Notons que notre contribution consiste d'abord à modéliser le problème de l'amélioration continue d'une chaîne de traitement comme un problème de RL, et que d'autres algorithmes pourraient être utilisés. Pour que ce document soit autosuffisant, nous présentons dans la suite les deux algorithmes de façon générale, mais nous encourageons le lecteur différentes chaînes pour différents types de documents. Même avec cet espace d'états / actions plus large, le temps de calcul ne sera pas un obstacle avec un algorithme de type Q-Learning, où les calculs sont instantanés à chaque pas de temps. La plateforme BIMBO sous-tend notre application, mais peut également être utilisée pour l'étude de différentes méthodes d'amélioration continue, p.234, 1989.

R. Akrour, M. Schoenauer, and M. Sebag, Preference-Based Policy Learning, Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, p.16, 2011.
DOI : 10.1007/978-3-642-23780-5_11

URL : https://hal.archives-ouvertes.fr/inria-00625001

R. Akrour, M. Schoenauer, and M. Sebag, APRIL: Active Preference Learning-Based Reinforcement Learning, Machine Learning and Knowledge Discovery in Databases, pp.116-131, 2012.
DOI : 10.1007/978-3-642-33486-3_8

URL : https://hal.archives-ouvertes.fr/hal-00722744

R. Akrour, M. Schoenauer, and M. Sebag, Interactive robot education, ECML/PKDD Workshop on Reinforcement Learning with Generalized Feedback: Beyond Numeric Rewards, p.16, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00931347

B. Amann, C. Constantin, C. Caron, and P. Giroux, WebLab PROV, Proceedings of the Joint EDBT/ICDT 2013 Workshops on, EDBT '13, pp.298-306, 2013.
DOI : 10.1145/2457317.2457367

URL : https://hal.archives-ouvertes.fr/hal-01219732

A. Azaria, Z. Rabinovich, S. Kraus, C. V. Goldman, and Y. Gal, Strategic advice provision in repeated human-agent interactions, Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence, p.16, 2012.
DOI : 10.1007/s10458-015-9284-6

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.261.414

L. Baird, Residual Algorithms: Reinforcement Learning with Function Approximation, Proceedings of the Twelfth International Conference on Machine Learning, pp.30-37, 1995.
DOI : 10.1016/B978-1-55860-377-6.50013-X

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.114.5034

C. Bejan and S. Harabagiu, Unsupervised Event Coreference Resolution, Computational Linguistics, vol.6, issue.2, pp.311-347, 2014.
DOI : 10.3115/1072399.1072405

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.671.7552

A. Bellenger, S. Gatepaille, H. Abdulrab, and J. Kotowicz, An Evidential Approach for Modeling and Reasoning on Uncertainty in Semantic Applications, URSW, pp.27-38, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00639325

B. Bonet and J. Pearl, Qualitative MDPs and POMDPs: An order-of-magnitude approximation, UAI '02, Proceedings of the 18th 235, 2002.

C. Boutilier, R. Dearden, and M. Goldszmidt, Stochastic dynamic programming with factored representations, Artificial Intelligence, vol.121, issue.1-2, pp.49-107, 2000.
DOI : 10.1016/S0004-3702(00)00033-3

URL : http://doi.org/10.1016/s0004-3702(00)00033-3

R. I. Brafman and M. Tennenholtz, R-max-a general polynomial time algorithm for near-optimal reinforcement learning, The Journal of Machine Learning Research, vol.3, issue.210, pp.213-231, 2003.

I. Bratko, ?. Suc, and D. , Learning qualitative models, Artificial Intelligence, vol.24, issue.201, pp.107-121, 2003.

A. Buades, B. Coll, and J. Morel, Non-Local Means Denoising, Image Processing On Line, vol.1, 2011.
DOI : 10.5201/ipol.2011.bcm_nlm

URL : http://doi.org/10.5201/ipol.2011.bcm_nlm

R. Busa-fekete, B. Szörényi, P. Weng, W. Cheng, and E. Hüllermeier, Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm, Machine Learning, pp.327-351, 2014.
DOI : 10.1007/s10994-014-5458-8

URL : https://hal.archives-ouvertes.fr/hal-01079370

K. Byrne, Populating the semantic web: combining text and relational databases as RDF graphs, 2009.

X. Chai, B. Vuong, A. Doan, and J. F. And-naughton, Efficiently incorporating user feedback into information extraction and integration programs, Proceedings of the 35th SIGMOD international conference on Management of data, SIGMOD '09, pp.87-100, 2009.
DOI : 10.1145/1559845.1559857

W. Cheng and E. Hüllermeier, Learning Similarity Functions from Qualitative Feedback, Advances in Case-Based Reasoning, pp.120-134, 2008.
DOI : 10.1007/978-3-540-85502-6_8

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.381.3719

L. Chiticariu, Y. Li, and F. R. Reiss, Rule-based information extraction is dead! long live rule-based information extraction systems, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, EMNLP 2013 meeting of SIGDAT, a Special Interest Group of the ACL, pp.827-832, 2013.

P. Cichosz, Truncating temporal differences: On the efficient implementation of TD (lambda) for reinforcement learning. CoRR, cs.AI/9501103, 1995.

M. Coggan, Exploration and exploitation in reinforcement learning, Canada, 2004.

W. Cohen, P. Ravikumar, and S. Fienberg, A comparison of string metrics for matching names and records, Kdd workshop on data cleaning and object consolidation, pp.73-78, 2003.

W. W. Cohen, B. B. Dalvi, and B. D. Cohen, Very Fast Similarity Queries on Semi-Structured Data from the Web, SDM, pp.512-520, 2013.

J. Cowie and W. Lehnert, Information extraction, Communications of the ACM, vol.39, issue.1, pp.80-91, 1996.
DOI : 10.1145/234173.234209

A. Culotta, T. Kristjansson, A. Mccallum, and P. Viola, Corrective feedback and persistent learning for information extraction, Artificial Intelligence, vol.170, issue.14-15, pp.14-151101, 2006.
DOI : 10.1016/j.artint.2006.08.001

URL : http://doi.org/10.1016/j.artint.2006.08.001

H. Cunningham, D. Maynard, K. Bontcheva, V. Tablan, N. Aswani et al., Developing Language Processing Components with GATE Version 8 (a User Guide). https, pp.2014-2026, 2014.

F. A. Dahl, O. M. Halck, and R. Givan, Learning While Exploring: Bridging the Gaps in the Eligibility Traces Model minimization in Markov decision processes, AAAI/IAAI, pp.73-84, 1997.

T. Degris, O. Sigaud, and P. Wuillemin, Learning the structure of Factored Markov Decision Processes in reinforcement learning problems, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.257-264, 2006.
DOI : 10.1145/1143844.1143877

URL : https://hal.archives-ouvertes.fr/hal-01336925

S. Dini and M. Serrano, Combining q-learning with artificial neural networks in an adaptive light seeking robot, 2012.

G. Doddington, A. Mitchell, M. Przybocki, L. Ramshaw, S. Strassel et al., The automatic content extraction (ACE) programtasks , data, and evaluation, 2004.

F. Doshi-velez and Z. Ghahramani, A comparison of human and agent reinforcement learning in partially observable domains, CogSci, 2011.

J. Doucy, H. Abdulrab, P. Giroux, and J. Kotowicz, Méthodologie pour l'orchestration sémantique de services dans le domaine de la fouille de documents multimédia, Proceedings of MajecSTIC, p.12, 2008.

J. Dutkiewicz, C. J¸edrzejekj¸edrzejek, J. Cybulka, and M. Falkowski, Knowledge-based highly-specialized terrorist event extraction. RuleML2013 Challenge, Human Language Technology and Doctoral Consortium, pp.1-11, 2013.

A. Epshteyn and G. Dejong, Qualitative reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.305-312, 2006.
DOI : 10.1145/1143844.1143883

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.73.7503

P. C. Fishburn, SSB Utility theory: an economic perspective, Mathematical Social Sciences, vol.8, issue.1, pp.63-94, 1984.
DOI : 10.1016/0165-4896(84)90061-1

L. Formiga, A. Barrón-cedeño, L. Màrquez, C. A. Henríquez, and J. B. Mariño, Leveraging online user feedback to improve statistical machine translation, Journal of Artificial Intelligence Research, vol.54, issue.204, pp.159-192, 2015.

K. Fort, Les ressources annotées, un enjeu pour l'analyse de contenu: vers une méthodologie de l'annotation manuelle de corpus, 2012.

M. P. Fromherz, D. G. Bobrow, D. Kleer, and J. , Model-based computing for design and control of reconfigurable systems, AI magazine, vol.24, issue.203, pp.120-132, 2003.

J. Fürnkranz and E. Hüllermeier, Preference Learning, 2011.
DOI : 10.1007/978-1-4899-7502-7_667-1

J. Fürnkranz, E. Hüllermeier, C. Rudin, R. Slowinski, and S. Sanner, Preference learning (dagstuhl seminar 14101) Dagstuhl Reports, pp.1-27, 2014.

M. Gardner, P. Talukdar, J. Krishnamurthy, M. , and T. , Incorporating Vector Space Similarity in Random Walk Inference over Knowledge Bases, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
DOI : 10.3115/v1/D14-1044

H. Gilbert, O. Spanjaard, P. Viappiani, and P. Weng, Reducing the Number of Queries in Interactive Value Iteration, pp.139-152, 2015.
DOI : 10.1007/978-3-319-23114-3_9

URL : https://hal.archives-ouvertes.fr/hal-01213280

H. Gilbert, O. Spanjaard, P. Viappiani, and P. Weng, Solving MDPs with Skew Symmetric Bilinear Utility Functions, 24th International Joint Conference on Artificial Intelligence (IJCAI-15), pp.1989-1995, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01212802

G. The and . Blog, Fuzzy sub- string matching with Levenshtein distance in Py- thon. http://ginstrom.com/scribbles BIBLIOGRAPHY fuzzy-substring-matching-with-levenshtein-distance-in-python, 2007.

J. Guan and G. Qiu, Modeling User Feedback Using a Hierarchical Graphical Model for Interactive Image Retrieval, Advances in Multimedia Information Processing-PCM 2007, pp.18-29, 2007.
DOI : 10.1007/978-3-540-77255-2_3

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.154.4159

J. R. Hobbs and E. Riloff, Information extraction, Handbook of Natural Language Processing, 2010.

J. Hoey, R. St-aubin, A. Hu, and C. Boutilier, SPUDD: Stochastic planning using decision diagrams, Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence, pp.279-288, 1999.

F. Hogenboom, F. Frasincar, U. Kaymak, D. Jong, and F. , An Overview of Event Extraction from Text. Workshop on Detection, Representation , and Exploitation of Events in the Semantic Web, Tenth International Semantic Web Conference, 2011.

H. Ji, Challenges from information extraction to information fusion, Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp.507-515, 2010.

H. Ji and R. Grishman, Refining Event Extraction through Cross- Document Inference, ACL, pp.254-262, 2008.

P. H. Kanani, Resource-bounded Information Acquisition and Learning, p.13, 2012.

A. B. Karami, K. Sehaba, and B. Encelle, Apprentissage de connaissances d'adaptation à partir des feedbacks des utilisateurs, 25es Journées francophones d'Ingénierie des Connaissances, pp.125-136, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01015966

W. B. Knox and P. Stone, Framing reinforcement learning from human reward: Reward positivity, temporal discounting, episodicity, and performance, Artificial Intelligence, vol.225, issue.203, pp.24-50, 2015.
DOI : 10.1016/j.artint.2015.03.009

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.692.9231

G. Kumaran and J. Allan, Text classification and named entities for new event detection, Proceedings of the 27th annual international conference on Research and development in information retrieval , SIGIR '04, pp.297-304, 2004.
DOI : 10.1145/1008992.1009044

URL : http://ciir.cs.umass.edu/pubfiles/ir-340.pdf

G. Kumaran and J. Allan, Using names and topics for new event detection, Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing , HLT '05, pp.121-128, 2005.
DOI : 10.3115/1220575.1220591

URL : http://acl.ldc.upenn.edu/H/H05/H05-1016.pdf

A. Kurakin, I. J. Goodfellow, and S. Bengio, Adversarial examples in the physical world, 2016.

G. Lafree, The Global Terrorism Database: Accomplishments and Challenges | LaFree | Perspectives on Terrorism, Perspectives on Terror, vol.4, issue.216, p.59, 2010.

N. Lao and W. W. Cohen, Relational retrieval using a combination of??path-constrained random walks, Machine Learning, vol.81, issue.1, pp.53-67, 2010.
DOI : 10.1007/s10994-010-5205-8

N. Lao, A. Subramanya, F. Pereira, and W. W. Cohen, Reading the web with learned syntactic-semantic inference rules, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp.1017-1026, 2012.

A. Lee, M. Passantino, H. Ji, G. Qi, and T. Huang, Enhancing multi-lingual information extraction via cross-media inference and fusion, Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp.630-638, 2010.

G. Lejeune, Veille épidémiologique multilingue: une approche parcimonieuse au grain caractere fondée sur le genre textuel, 2013.

V. I. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, Soviet Physics Doklady Doklady Akademii Nauk SSSR, vol.10, issue.8, pp.707-710, 1965.

L. Lin, Programming robots using reinforcement learning and teaching, AAAI, pp.781-786, 1991.

R. Loftin, B. Peng, J. Macglashan, M. L. Littman, M. E. Taylor et al., Learning behaviors via human-delivered discrete feedback: modeling implicit feedback strategies to speed up learning, Autonomous Agents and Multi-Agent Systems, vol.4, issue.4, pp.30-59, 2015.
DOI : 10.1007/s10458-015-9283-7

J. Ludovic, Approches supervisées et faiblement supervisées pour l'extraction d'événements complexes et le peuplement de bases de connaissances, 2011.

J. Ma and W. B. Powell, A convergent recursive least squares approximate policy iteration algorithm for multi-dimensional Markov decision process with continuous state and action spaces, 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp.66-73, 2009.
DOI : 10.1109/ADPRL.2009.4927527

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.4938

Z. Ma and M. Kwiatkowska, Modelling with PRISM of intelligent system. phdthesis. [Cited on page 13, 2008.

G. Matthew, Using technology recycling to develop a named entity recogniser for afrikaans. Southern African Linguistics and Applied Language Studies, pp.199-216, 2015.
DOI : 10.2989/16073614.2015.1061893

V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou et al., Playing atari with deep reinforcement learning, p.40, 2013.

V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness et al., Human-level control through deep reinforcement learning, Nature, vol.101, issue.7540, pp.518529-533, 2015.
DOI : 10.1038/nature14236

A. W. Moore and C. G. Atkeson, Prioritized sweeping: Reinforcement learning with less data and less time, Machine Learning, pp.103-130, 1993.
DOI : 10.1007/BF00993104

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.134.8196

E. Moreau, F. Yvon, C. , and O. , Robust similarity measures for named entities matching, Proceedings of the 22nd International Conference on Computational Linguistics, COLING '08, pp.593-600, 2008.
DOI : 10.3115/1599081.1599156

URL : https://hal.archives-ouvertes.fr/hal-00487084

A. Y. Ng and S. J. Russell, Algorithms for inverse reinforcement learning, Proceedings of the Seventeenth International Conference on Machine Learning, ICML '00, pp.663-670, 2000.

A. M. Nguyen, J. Yosinski, C. , and J. , Deep neural networks are easily fooled: High confidence predictions for unrecognizable images, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
DOI : 10.1109/CVPR.2015.7298640

URL : http://arxiv.org/abs/1412.1897

E. Nicart, B. Zanuttini, B. Grilhères, and F. Praca, Dora Qlearning -making better use of explorations, Proc. 11es Journées Francophones sur la Planification, la Décision et l'Apprentissage pour la conduite de systèmes, p.203, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01356078

H. M. Nydal, Deep q-learning for stock trading, pp.2016-2026, 2016.

M. Ogrodniczuk and A. Przepiórkowski, Linguistic Processing Chains as Web Services: Initial Linguistic Considerations, Proceedings of the Workshop on Web Services and Processing Pipelines in HLT: Tool Evaluation, LR Production and Validation (WSPP 2010) at the Language Resources and Evaluation Conference, pp.1-7, 0200.

S. C. Ong, S. W. Png, D. Hsu, L. , and W. S. , Planning under Uncertainty for Robotic Tasks with Mixed Observability, The International Journal of Robotics Research, vol.21, issue.3, pp.1053-1068, 2010.
DOI : 10.1177/0278364910369861

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.186.6365

G. Orwell, Tandem Library, centennial. edition, 1950.

S. Pandit and S. Gupta, A Comparative Study on Distance Measuring Approaches for Clustering, International Journal of Research in Computer Science, vol.2, issue.1, pp.29-31, 2011.
DOI : 10.7815/ijorcs.21.2011.011

URL : http://doi.org/10.7815/ijorcs.21.2011.011

W. Pang and G. M. Coghill, Learning Qualitative Differential Equation models: a survey of algorithms and applications, The Knowledge Engineering Review, vol.32, issue.01, pp.69-107, 2010.
DOI : 10.1023/A:1007317323969

J. Peng and R. J. Williams, Efficient learning and planning within the Dyna framework, IEEE International Conference on Neural Networks, pp.437-454, 1993.
DOI : 10.1109/ICNN.1993.298551

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.55.6194

J. Peng and R. J. Williams, Incremental multi-step q-learning, Machine Learning, pp.283-290, 1996.
DOI : 10.1016/b978-1-55860-335-6.50035-0

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.7356

M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, p.34, 1994.
DOI : 10.1002/9780470316887

K. Rao and S. Whiteson, V-MAX: A General Polynomial Time Algorithm for Probably Approximately Correct Reinforcement Learning, p.96, 2011.

A. Reyes, P. H. Ibargüengoytia, L. E. Sucar, and E. F. Morales, Abstraction and refinement for solving continuous markov decision processes, Probabilistic Graphical Models, pp.263-270, 2006.

E. Riloff, Automatically generating extraction patterns from untagged text, Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp.1044-1049, 1996.

F. Rodrigues, N. Oliveira, and L. Barbosa, Towards an engine for coordination-based architectural reconfigurations, Computer Science and Information Systems, vol.12, issue.2, pp.607-634, 2015.
DOI : 10.2298/CSIS140912019R

T. L. Saaty, Relative measurement and its generalization in decision making why pairwise comparisons are central in mathematics for the measurement of intangible factors the analytic hierarchy/network process, RACSAM-Revista de la Real Academia de Ciencias Exactas, Fisicas y Naturales. Serie A. Matematicas, pp.251-318, 2008.
DOI : 10.1007/BF03191825

R. Sabbadin, A possibilistic model for qualitative sequential decision problems under uncertainty in partially observable environments, Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence, UAI'99, pp.567-574, 1999.

A. Sander, L. Bu?oniu, and R. Babu?ka, Experience replay for real-time reinforcement learning control. Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions on, vol.42, issue.2, pp.201-212, 2012.

A. Saval, Temporal pattern, spatial and semantic for the discovery of relationships between events. Theses, 2011.
URL : https://hal.archives-ouvertes.fr/tel-01140316

T. Schaul, J. Quan, I. Antonoglou, and D. Silver, Prioritized experience replay. CoRR, abs/1511.05952, p.135, 2015.

L. Serrano, Vers une capitalisation des connaissances orientée utilisateur: extraction et structuration automatiques de l'information issue de sources ouvertes, pp.58-205, 2014.

L. Serrano, M. Bouzid, T. Charnois, S. Brunessaux, and B. Grilheres, Events Extraction and Aggregation for Open Source Intelligence: From Text to Knowledge, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, pp.518-523, 2013.
DOI : 10.1109/ICTAI.2013.83

L. Serrano, T. Charnois, S. Brunessaux, B. Grilheres, and M. Bouzid, Combinaison d'approches pour l'extraction automatique d'événements, 19e conférence sur le Traitement Automatique des Langues Naturelles, pp.423-430, 2012.

C. R. Shelton, Balancing multiple sources of reward in reinforcement learning, Neural Information Processing Systems-2000, pp.1082-1088, 2000.

H. Shteingart and Y. Loewenstein, Reinforcement learning and human behavior, Current Opinion in Neurobiology, vol.25, pp.93-98, 2014.
DOI : 10.1016/j.conb.2013.12.004

O. Sigaud and O. Buffet, Markov Decision Processes in Artificial Intelligence, 2010.
DOI : 10.1002/9781118557426

URL : https://hal.archives-ouvertes.fr/inria-00432735

S. P. Singh, Transfer of learning by composing solutions of elemental sequential tasks, Machine Learning, pp.323-339, 1992.

M. Snover, B. Dorr, R. Schwartz, L. Micciulla, and J. Makhoul, A study of translation edit rate with targeted human annotation, Proceedings of Association for Machine Translation in the Americas, pp.223-231, 2006.

M. M. So and L. C. Thomas, Modelling the profitability of credit cards by Markov decision processes, European Journal of Operational Research, vol.212, issue.1, pp.123-130, 2011.
DOI : 10.1016/j.ejor.2011.01.023

R. Southey, The story of the three bears. The Doctor, 1837.

R. D. Steele, The importance of open source intelligence to the military, International Journal of Intelligence and CounterIntelligence, vol.8, issue.4, pp.457-470, 1995.
DOI : 10.1080/08850609508435298

R. Steinberger, B. Pouliquen, and E. Van-der-goot, An introduction to the europe media monitor family of applications. CoRR, abs/1309, 2013.

I. Stewart, Concepts of Modern Mathematics, 1995.

A. L. Strehl, C. Diuk, and M. L. Littman, Efficient structure learning in factored-state MDPs, AAAI, pp.645-650, 2007.

A. L. Strehl, L. Li, E. Wiewiora, J. Langford, and M. L. Littman, PAC model-free reinforcement learning, Proceedings of the 23rd international conference on Machine learning , ICML '06, pp.881-888, 2006.
DOI : 10.1145/1143844.1143955

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.120.326

R. J. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, pp.136-156, 1998.
DOI : 10.1109/TNN.1998.712192

C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan et al., Intriguing properties of neural networks, 2013.

C. Szepesvári, Algorithms for Reinforcement Learning, Synthesis Lectures on Artificial Intelligence and Machine Learning, vol.4, issue.1, pp.1-103, 2010.
DOI : 10.2200/S00268ED1V01Y201005AIM009

M. Tambet, Guest post: Demystifying deep reinforcement learning -nervana. https://www.nervanasys.com/ demystifying-deep-reinforcement-learning, pp.2016-2022, 2016.

M. E. Taylor and P. Stone, Transfer learning for reinforcement learning domains: A survey, The Journal of Machine Learning Research, vol.10, pp.1633-1685, 2009.

F. Tramèr, F. Zhang, A. Juels, M. K. Reiter, R. et al., Stealing machine learning models via prediction apis, 2016.

M. Trampus and D. Mladenic, Constructing Domain Templates with Concept Hierarchy as Background Knowledge, Information Technology And Control, vol.43, issue.4, pp.414-432, 2014.
DOI : 10.5755/j01.itc.43.4.6899

L. Travé-massuyès, L. Ironi, and P. Dague, Mathematical foundations of qualitative reasoning. AI magazine, p.91, 2003.

A. Tversky and I. Gati, Studies of similarity, Cognition and categorization, vol.1, issue.217, pp.79-98, 1978.

W. R. Van-hage, V. Malaisé, R. Segers, L. Hollink, and G. Schreiber, Design and use of the Simple Event Model (SEM) Web Semantics: Science, Services and Agents on the World Wide Web, pp.128-136, 2011.

H. Van-hasselt, J. D. Lafferty, C. K. Williams, J. Shawe-taylor, R. S. Zemel et al., Double q-learning, NIPS, pp.2613-2621, 2010.

H. Van-hasselt, A. Guez, and D. Silver, Deep reinforcement learning with double q-learning. CoRR, abs/1509, 2015.

K. Veeramachaneni, I. Arnaldo, A. Cuesta-infante, V. Korrapati, C. Bassia et al., AI2: Training a big data machine to defend, 2016 IEEE 2nd International Conference on Big Data Security on Cloud IEEE International Conference on High Performance and Smart Computing (HPSC), and IEEE International Conference on Intelligent Data and Security (IDS), pp.49-54, 2016.
DOI : 10.1109/bigdatasecurity-hpsc-ids.2016.79

J. Wang, G. Li, J. X. Yu, and J. Feng, Entity matching, Proceedings of the VLDB Endowment, pp.622-633, 2011.
DOI : 10.14778/2021017.2021020

W. Wang, C. Xiao, X. Lin, and C. Zhang, Efficient approximate entity extraction with edit distance constraints, Proceedings of the 35th SIGMOD international conference on Management of data, SIGMOD '09, pp.759-770, 2009.
DOI : 10.1145/1559845.1559925

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.865

Y. Wang, T. S. Li, L. , and C. , Backward Q-learning: The combination of Sarsa algorithm and Q-learning, Engineering Applications of Artificial Intelligence, vol.26, issue.9, pp.2184-2193, 2013.
DOI : 10.1016/j.engappai.2013.06.016

C. J. Watkins, Learning From Delayed Rewards, p.135, 1989.

P. Weng, Markov Decision Processes with Ordinal Rewards: Reference Point-Based Preferences, ICAPS. [Cited on page 89, 2011.
URL : https://hal.archives-ouvertes.fr/hal-01285812

P. Weng, R. Busa-fekete, and E. Hüllermeier, Interactive Q- Learning with Ordinal Rewards and Unreliable Tutor, ECML/PKDD Workshop Reinforcement Learning with Generalized Feedback. [Cited on page 89, 2013.

P. Weng and B. Zanuttini, Interactive value iteration for markov decision processes with unknown rewards, Proceedings of the Twenty- Third international joint conference on Artificial Intelligence, pp.2415-2421, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00942290

M. Wiering and J. Schmidhuber, Speeding up Q(??)-learning, Machine Learning: ECML-98, pp.352-363, 1998.
DOI : 10.1007/BFb0026706

A. Wilson, A. Fern, and P. Tadepalli, A bayesian approach for policy learning from trajectory preference queries, Advances in neural information processing systems, pp.1133-1141, 2012.

C. Wirth and J. Fürnkranz, EPMC: Every visit preference monte carlo for reinforcement learning, Asian Conference on Machine Learning , ACML 2013, pp.483-497, 2013.

C. Wirth and J. Fürnkranz, Preference-based reinforcement learning: A preliminary survey, Proceedings of the ECML/PKDD-13 Workshop on Reinforcement Learning from Generalized Feedback: Beyond Numeric Rewards. [Cited on page 90, 2013.

C. Wirth, J. Fürnkranz, and G. Neumann, Model-free preferencebased reinforcement learning, Proceedings of the 30-th AAAI Conference on Artificial Intelligence (AAAI-16), pp.2222-2228, 2016.