Rule-Embedded Neural Network for Strong-AI

Rule-Embedded Neural Network for Strong-AI

Rule-Embedded Neural Network for Strong-AI

Abstract: Based on the definition and previous studies of artificial intelligence (AI), this article analyzes the shortcomings of current technologies that are based on artificial neural networks (ANN) and deep learning. It points out that knowledge would become the key factor of the progress from ANN to Strong-AI. Next, this article analyzes the existing technologies of combining ANN with knowledge and puts forward the approach of Rule-Embedded Neural Network (ReNN). Finally, this article shows the effect of ReNN in the application of target detection in time series and provides the download address of the corresponding research paper.

I began to do the pattern recognition studies on QR code and digital verification code early in 2005. Since then, I have been focused on machine learning and artificial intelligence (AI) and have learned a lot about this field. Based on the reflections and experiments these years, I’ve gained more insights and made some breakthroughs recently and would like to share my thoughts with you in this article, aiming to introduce some new methods for AI development. Your comments, suggestions and criticism would all be much appreciated.

To better introduce my thoughts, we could turn to the classical “Three Questions in philosophy”:

  • What’s AI?
  • Where is AI from?
  • Where is AI going?

To answer the first question thoroughly, we need to introduce knowledge from many fields and disciplines. To make it simple, we can just define AI as a tool which is created by people and could be used to solve uncertainties in people’s daily life and production. Here, we are mainly focused on the latter two questions.

The second question: Where is AI from? The earliest clue can be traced back to the ancient Greek mythology. Hephaestus, the god of fire and the gorge in Olympus, was able to make robots as working assistants [1]. Of course, the true father of AI is Alan Mathison Turing (a great British scientist), who in 1936 designed the Turing machine symbolizing AI, and proposed the Turing Test [2], which is still used today. After Alan Turing, AI has experienced ups and downs.

At the Dartmouth meeting in 1956, the famous scientists including Minksy and Shannon thought, optimistically or even presumptuously, that AI will be fully realized by their generation and “machines will be capable, within 20 years, of doing any work a man can do”. However, despite some great success, it turned out that they have underestimated “the difficulty of replicating the feats of which the human brain is capable”. At this time, the first wave of AI development entered into the first “cold winter” (1974) [1].

With the development of computer technology, the expert systems developed in the 1980s were once again popular, causing the second wave of AI. However, the expert systems are based on symbolic computing, and the complexity of symbolic computing explodes exponentially with the increase of symbol quantity. For this, it is far beyond the capacity limit of the venture capitals to implement a practical expert system, so it still cannot be further iterated.

The third wave arose in the statistical machine learning around the 1990s, such as Vapnik's SVM and Li Kaifu's speech recognition based on statistics. Till then, AI began to create true value. At this stage, AI can only deal with relatively simple problems. The key steps like feature extraction still need a lot of manual work and the results were often not more accurate than a middle school student. In view of this, people regained a rational cognition for AI.

In 2006, Hinton, known as the “Godfather of AI”, published his paper on Science, bringing about the new wave of AI based on ANN and deep learning technologies. In the next decade, AI-tech came to a new height with deep learning techniques. In some areas, such as image recognition, speech recognition, it has arrived human’s level while in the field of chess and card games, it has even surpassed human. Despite that some researchers warned that the “foam” in so-called AI has reached as high as 90%, the industry and academia remain highly-enthusiastic about AI.

This article is relevant to the third question: Where is AI going? Hinton concluded that AI failed to make an expected breakthrough in 2017. Maybe, what Hinton had expected is some kind of Strong-AI or a clear way to Strong-AI. So, what is Strong-AI? In a sense, Strong-AI is deemed to be able to pass the Turing Test which was designed over 80 years ago. To put it simply, in the Turing Test, there is a person and a machine separately in two rooms. If humans can't tell which one is human and which is machine by asking certain several questions, the machine is considered to have passed the Turing Test [2].

If we make an analogy to Darwinian evolution theory, deep learning technology has probably evolved to the stage of animal intelligence currently. That is, given a specific input and output task (for example, identifying species from photographs), it can reach the best performance by using powerful fitting and computing ability of neural network, just like the eagle’s powerful vision sense or the dog’s strong sense of smell and hearing, which evolved from their nervous system. In addition, deep learning technology can also deal with transfer learning tasks to adapt to and solve new problems (e.g. identification from photographs), through replication of the structure and parameters of ANN. The reproduction here can be regarded as the inheritance of the animal nervous system. Obviously, animal intelligence can only deal with limited and specific problems and cannot pass the Turing Test. At present, academia’s focus on ANN still lies in achieving more powerful animal intelligence via researches of mathematical optimization of ANN models.

Then we’ll face the next question: how can we achieve the “evolution” from animal intelligence to human intelligence (Strong-AI)? It’s not easy to realize that through mathematical optimization in a short time, since it has taken millions of years in the biological evolution.

Therefore, to design Strong-AI, we decided to directly refer to and introduce the key factors of human intelligence, the most important of which should be “knowledge”: When our ancestors created symbols to record the laws of heaven and earth, sun and moon, they could be said to be truly evolved from animals to human beings. In short, knowledge is the law concluded from our abstract recognition and definitions of the objective world. Yet, the current deep learning technology does not include sharing of abstract knowledge except for the sharing of structure and parameters of ANN (corresponding to very specific and detailed cognition). And we consider that is the main hinder for the realization of Strong-AI. Thus, the introduction of knowledge into ANN may be a key step forward from deep learning to Strong-AI.

There are two ways to combine ANN and knowledge:

  • Pre-processing: to make ANN-inferences with the features extracted from domain knowledge
  • Post-processing: to supplement or modify the ANN-inferences with domain knowledge

In fact, the two ways just formally add knowledge to ANN while ANN itself does not reach the semantic level of cognition. The first way goes back to the traditional statistic machine learning, requiring lots of human work in feature engineering. The second way only uses the final outputs of ANN and related knowledge to do an inference. It neglects the cognition of local patterns from ANN, and thus inferences from the second way seem somewhat stiff. In essence, knowledge is not introduced into ANN practically, the inferences are still at the level of animal intelligence.

Different from the above two ways, we use an embedding method, i.e. Rule-Embedded Neural Network (ReNN). Rules are used to represent domain knowledge in a computable manner. ReNN disassembles the “black box” of ANN into two parts: local-based inference and global-based inference. The local-based inference only handles local patterns which can be easily learned from empirical datasets. It can be analogous to the abstract procedure about raw data in the process of human cognition, which aims at extracting some semantic elements from the original high-dimensional data. For example, we often extract mouths and ears firstly in the scenario of face recognition. The global-based inference first model the semantic elements with a rule-modulating block, and then make final inference by synthesizing the rule-modulated map and local patterns. The computational graph for ReNN is shown in Fig. 1.

Figure 1. Computational graph of ReNN

Hereinafter, we take the R-peak detection from Electrocardiograph (ECG) for example. ECG, which has been used in medicine for more than 100 years, is the first-hand information to analyze the structure and function of the heart. In recent years, ECG is becoming popular in smart wearable devices to monitor the status of users. The R-peak of ECG is a key time point in the procedure of rapid depolarization of the right and left ventricles. R-peaks are very useful when we need to estimate the exact time of each heartbeat, e.g. calculating heart rate variability and pulse wave transit time. However, R-peak detection is not an easy task since there are much noise such as changes in skin-contact resistance, motion artifact, muscular activation interference and AC interference. Particularly, the noise even becomes more serious in the signals collected from smart wearable devices while we try to optimize the physical design of the device to reduce users’ discomfort.

Fig. 2 shows ECG samples acquired by smart wearable devices. The upper curve shows an ECG with less noise, while the lower curve contains much serious noise.

Figure 2. ECG samples recorded by smart wearable devices

Regarding the ECG sample shown in lower curve, its voltage range is about 0.2 mV, while the standard ECG is generally 5 times higher. Even for cardiologists and technicians, they cannot exactly distinguish R-peaks and noise in short time (They generally refuse to analyze such noisy signals). As far as we know, traditional computer algorithms are unable to detect all the R-peaks accurately. Nevertheless, human experts can distinguish the R-peaks and noise with knowledge about the owner of the ECG. The ECG is from a 25-year-old woman without heart disease, and it was recorded in a resting status. Therefore, we can judge that the R-peaks are quasi-periodic and there are no premature beats or other arrhythmia in the ECG, and we can label the real R-peaks accurately in the ECG.

Fig. 3 illustrates the results of ReNN detection on the above noisy ECG sample. The top line is the input time-series data of the ECG. The other three lines from top to bottom are the outputs of feature-mapping block (line-F), rule-modulating block (line-R) and global-mapping block (line-O), respectively. The four lines are aligned according to time axis. The solid circles anchored on the lines are the time points labeled as R-peaks. The downward triangle shows a false positive (FP) by local-based inference (high value on line-F), while global-based inference (lower value on line-O) reduces the probability of R-peak at this time point due to little support from heart rhythm (low value on line-R). Besides, the first R-peak in front of the triangle is distorted due to noise. It is detected with higher probability with the support from heart rhythm (high value on line-R). As a result, ReNN is capable to improve detection accuracy: a noisy peak (false positive by local-based model) has been suppressed successfully, and a distorted R-peak has been enhanced in the global-based inference.

Figure 3. ReNN detection on an ECG sample.

The details about ReNN can refer to the pre-print paper [6]

Except for improving the detection accuracy, our experiment results also draw us to the following findings:

  • ReNN can reduce the computational complexity of the model in that neural connections in ReNN only handle short-term dependencies (local patterns). The long-term dependencies are mainly modeled by the rule-modulating block, which can be implemented with low complexity.
  • ReNN has better interpretability as its inferences can be explained from three aspects: local-based inference, rule modulation and global-based inference. We can tell the main supporting evidence as well as the contradictory evidences (if there exist) for each inference.

It should be noted that there are several problems to be solved for ReNN, as following:

  • In the application of ECG analysis, we plan to construct a more abundant rule base according to medical knowledge about cardiology. The rule base could then be used in the rule-modulating block, and thus ReNN can learn the behavior of human experts to recognize and to model abnormal rhythm such as premature beats and bigeminal beats.
  • For the rule bases from different domains, some of the rules may be domain-specific (e.g. rules for premature beat) and some may be common for different domains (e.g. rules for quasi-periodicity can be used for R-peaks, pulse waves, respiratory signals and flu trends). It is necessary for us to summarize and classify the rules, so Strong-AI can get the applicable rules automatically and quickly from the rule bases.
  • Referring to the learning and growing-up process of human beings, we need to design a learning mechanism for Strong-AI, which can continuously accumulate, reconstruct, and optimize its rule base to achieve the best level of intelligence with the given limited resources.

We hope that the above work could introduce new methods for the development of machine learning and Strong-AI. By the way, interns who are interested in AI and willing to contribute to this field are always welcome.

In particularly, I would like to thank Gamp Zhu from LOHAS tech. for his generous support on this research, and thank Prof. Jue Wang from Institute of Automation, Chinese Academic of Sciences for introducing the philosophy of “structure + average” for artificial intelligent, which has inspired the idea of this research.

This blog was first published in LOHAS. Please move to ZHIHUfor technical discussions. Contact us for research collaboration., (Research) (Business)




[3] G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, pp. 504-507, 2006.

[4] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.

[5] D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, et al., "Mastering the game of go without human knowledge," Nature, vol. 550, p. 354, 2017.

[6] H. Wang, “ReNN: Rule-embedded Neural networks,” arXiv preprint arXiv: 1801.09856, 2018