“artificial intelligence, state of the art”
Written by Claudio G. Giancaterino
The 16th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2017) was taken place in Bari, Italy, from 14 Novemebr 2017 to 17 November 2017. It was been hosted by Department of Computer Science, at the University of Bari. AI*IA 2017 was organized by the Italian Association for Artificial Intelligence (AIIA – Associazione Italiana per l’Intelligenza Artificiale), and hosted both Carla P. Gomes and Peter W.J. Staar as keynotes.
Many topics were met, from Natural Language Processing to Facial Expression Recognition, from Chatbots to Semantics Data Analytics.
Healthcare is a topic in which data science has radically changed it giving rise to digital healthcare solutions. At this conference there was an interesting workshop:”Chatbots meet eHealth: Automatizing healthcare”. It was presented Holmes (Health On-line Medical Suggestions), a novel eHealth recommendation system that leverages a chatbot acting as human physician.
It’s made by different modules:
-HOLMeS Application developed in Python, it interacts with the user through the chatbot, interpreting patient request by means of the Watson Conversation API.
-HOLMeS Chat-Bot based on deep learning, it is designed in order interact with the patient.
-IBM Watson is the service used to establish a written conversation example-by-example, simulating human interactions.
-Computational Cluster implements the decision making logic. It uses the Apache Spark cluster executed over the Databricks infrastructure.
HOLMeS is able to work thanks a dataset of clinical records collected on 16733 patients and 13 different illness undergoing to disease prevention pathways with the related evaluation.
HOLMeS is able to handle four possible use-case scenarios:
1) Providing general information about itself or the aﬃliated medical centre.
2) Collecting general patient information with the aim of provide general prevention pathways.
3) Collecting detailed patient information in order to evaluate the opportunity of some prevention pathways.
4) Book a medical check up with the aﬃliated medical centre.
Understand natural language is not enough to manage a chatbot, is necessary a machine learning algorithm with the aim of HOLMeS to give right suggestions extrapolated inside dataset records stored. Spark is used as cluster-computing framework deployed over the Databricks infrastructure and Spark.ML library it provides all the data-preparation functionalities and the machine learning algorithms to train the Random Forest models using the collected training data. Results? 74,65% of Area Under ROC Curve (AUC) when ﬁrst-level features are used to assess the occurrence of diﬀerent prevention pathways. When are added other features, HOLMeS shows 86,78% of AUC.
Look at this paper.
Another appealing workshop was: “Applying Natural Language Processing to Speech Transcriptions for Automated Analysis of Educational Video Broadcasts”. Was showed results of a job, carried out by RAI within the framework of the project “La Città Educante”, aiming to implement new models of learning and teaching by exploring, developing and evaluating innovative technologies for knowledge extraction, management and sharing. NLP (natural language processing) plays a central role to develop a framework for automatic annotation of educational video broadcasts. There is a switch from video matter to text domain by analyzing the text automatically derived by the speech content of broadcast material. Apache OpenNLP library is used in order to solve two problems: document categorization and named entity recognition from spoken documents. About categorization was decided to adopt the Scientific Disciplinary Sectors (SSD) classification system, which is used in Italy for the organization of higher education as the reference for the subject classification. It’s an hierarchical classification and OpenNLP Document Categorizer tool is used to perform automatic categorization of input texts according to this taxonomy.
The implementation is made by the Simple Knowledge Organizational System (SKOS), an area of work developing specifications and standards to support representation of thesaurus, classification schemes, subject heading systems and taxonomies within the framework of the Semantic Web.
Main SKOS elements are:
-Concept, a basic unit information;
-Labels and Notations, a set of words or sentences;
-Relationships between concepts in different space of analysis.
About named entity recognition (NER) models are adopted a semi-supervised approach, used in machine learning technique when, given a large dataset, only a subset has annotations. NER models was realized in three steps. First of all, a set of web articles was selected among a larger and more set of news articles tagged with named entities information by an automatic system already held by RAI. Then the activity of cleansing by removing punctuation and capital letters in order to create the definitive training dataset resulted in a set of around 47,000 sentences. In the second step, was used the TokeNameFinderTrainer tool within Apache OpenNLP library to generate the new NER models. At the last step, were run the new models on a set of automatic speech transcriptions from RAI’s broadcasts and manually validated them by assessing the entities found within the input material.
Look at this paper.
Author: Claudio Giancaterino
Data science is the new gold
Actuary & Data science enthusiast