Knowledge database management system (KDBMS) has become more prevalent over time due to its ability to provide flexibility and scalability. Traditionally, RDBMS depended on the native support of the application languages used by programmers or data analysts. However, this was not fully adequate as some features were difficult to implement using programming languages directly (e.g., vectorization) and developers had no control over implementing advanced algorithms (e.g., multi-dimensional spatial indices).
- Extracting knowledge out of the unstructured text is complicated by the fact that current methods cannot understand the context well enough for general reasoning tasks like answering questions about what’s in a picture or summarizing a piece of news into short descriptions. There are several approaches for understanding at different levels of complexity. One way is to process text by understanding the words and how they are combined, reusing existing semantic data from knowledge bases such as Freebase. Another way is to learn about objects from text by finding visual patterns in large numbers of images with textual annotations. Such inference could be performed even if the algorithms cannot understand the phrases within a sentence because extracting relevant information from a sentence into a knowledge base remains challenging.
- Understanding human language is a difficult task that requires mapping natural language expressions into formal representations, identifying entities and their interrelationships, and reasoning over those entities using larger pieces of information available on the Web. Formalizing natural language has been an important topic since antiquity; but at present, there are many different theories of semantics due to its subtlety.
- Different perspectives of knowledge representation for semantics have been broadly categorized into three different types, namely, taxonomies, Ontologies, and event models. Taxonomy is the hierarchical organization of terms associated with a concept. Ontology can be viewed as an explicit specification of a conceptualization in terms of concepts and relations among these concepts; while an event model represents events occurring at specific instances in time. There are several ways to map natural language expressions to logical formalisms (e.g., entailment rules) using two different approaches: bottom-up vs top-down knowledge acquisition. The task requires processing text using machine learning techniques that can identify entities and their relationships that provide meaningful representations for semantics; however, non-functional requirements need to be met. For example, finding the appropriate balance between recall and precision is important for semantic analysis systems.
- The most common method to build Ontologies is to use hand-crafted rules that define hierarchies of concepts or relations with different levels of abstraction, but this process is time-consuming. Other ways include using frame-based approaches where objects are defined in terms of their properties (i.e., slots) and how they interact with each other (e.g., F-Logic). Frame-based Ontologies can be represented as networks with nodes representing classes, properties, roles, individuals, numbers, strings, etc.; while edges represent bifurcations relating entities at different levels of specificity (I.., sub concepts vs super concepts). These networks can also be used to represent event models and taxonomies.
- Ontologies and semantic annotations are often used to improve search; but they could also be beneficial for knowledge navigation since it enables users to explore large amounts of data by providing shortcuts or alternative paths like a semantic tree structure (e.g., Open Cyc). The automatic extraction of such information from text is generally performed using machine learning techniques such as graph-based methods, probabilistic methods, and syntactical methods that identify entities and their relationships in which higher-level abstractions could indicate relations between lower-level concepts (e.g., “An entity type”).
- Automatic language processing has many applications such as question answering systems where answers can be verified by retrieving candidate passages from text corpora, but this task requires differentiating between relevant and irrelevant information to answer user queries. It could be done by providing an overview of passage content to find patterns that indicate entities, their relationship, and the degree of association using knowledge graphs with semantic annotations.
- But another approach is to identify key phrases within sentences through unsupervised feature-learning techniques such as clustering algorithms for natural language processing (e.g., Latent Dirichlet Allocation); or to use word embedding models where a context vector represents a document based on the frequency of words it contains (similarity can be found using cosine distance projecting each word into a high dimensional space).
- Semantic reasoning systems extract knowledge from the text by performing inferences to reason about the semantic relations between entities using symbolic representations. For example, natural language inference systems could be trained on annotated data that helps identify patterns for matching documents and sentences; but they could also use statistical methods that induce (i.e., learns) rules from training corpora to map input semantic graphs with target graphs (i.e., knowledge base).
- These systems try to mimic human cognitive processes by building models of how humans understand texts, but they can also benefit from more implicit learning techniques like reinforcement learning or active learning where (semi-)supervised methods are used to provide feedback of the quality of the candidate annotations for improving model accuracy.
The study of semantic reasoning, knowledge representation, and Ontologies that include linguistic information for natural language processing (NLP) can provide very helpful approaches for improving the understanding of texts.