The purpose of this chapter is to introduce a powerful class of mathematical models: the artificial neural networks. This is a very general term that includes many different systems and various types of approaches, both from statistics and computer science. Our aim is not to examine them all (it would be a very long discussion), but to understand the basic functionality and the possible implementations of this powerful tool. We initially introduce neural networks, by analogy with the human brain. The analogy is not very detailed, but it serves to introduce the concept of parallel and distributed computing. Then we analyze in detail a widely applied type of artificial neural network: the feed-forward network with error back-propagation algorithm. We illustrate the architecture of the models, the main learning methods and data representation, showing how to build a typical artificial neural network.

ResearchGate Logo

Discover the world's research

  • 20+ million members
  • 135+ million publications
  • 700k+ research projects

Join for free

















Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.



DOI: 10.4018/978-1-4666-5888-2.ch626





The purpose of this chapter is to introduce a powerful

class of mathematical models: the artificial neural net-

works. This is a very general term that includes many

different systems and various types of approaches, both

from statistics and computer science. Our aim is not to

examine them all (it would be a very long discussion),

but to understand the basic functionality and the pos-

sible implementations of this powerful tool. We initially

introduce neural networks, by analogy with the human

brain. The analogy is not very detailed, but it serves to

introduce the concept of parallel and distributed com-

puting. Then we analyze in detail a widely applied type

of artificial neural network: the feed-forward network

with error back-propagation algorithm. We illustrate the

architecture of the models, the main learning methods

and data representation, showing how to build a typical

artificial neural network.



Artificial neural networks (Bandy, 1997; Haykin, 1999)

are information processing structures providing the

(often unknown) connection between input and output

data (Honkela, Duch, & Girolami, 2011) by artificially

simulating the physiological structure and functioning

of human brain structures.

The natural neural network, instead, consists of a

very large number of nerve cells (about ten billion in

humans), said neurons, and linked together in a com-

plex network. The intelligent behavior is the result of

extensive interaction between interconnected units. The

input of a neuron is composed of the output signals

of the neurons connected to it. When the contribution

of these inputs exceeds a certain threshold, the neu-

ron - through a suitable transfer function - generates

a bioelectric signal, which propagates through the

synaptic weights to other neurons.

Significant features of this network, which artificial

neural models intend to simulate, are:

The parallel processing, due to the fact that neu-

rons process simultaneously the information;

The twofold function of the neuron, that acts si-

multaneously as memory and signal processor;

The distributed nature of the data representa-

tion, i.e. knowledge is distributed throughout the

network, not circumscribed or predetermined;

The network's ability to learn from experience.

This last but fundamental capacity enables neural

networks to self-organize, adapt to new incoming

information and extract the input-output connections

from known examples that are the basis of their or-

ganization. An artificial neural network captures this

attitude in an appropriate "learning" stage.

Despite the great success achieved by artificial

neural networks, it is however better to remain aware

of the limits of this technology due to the necessary

reduction of the real system to be examined.





Artificial neural networks are composed of elementary

computational units called neurons (McCulloch & Pitts,

1943) combined according to different architectures.

For example, they can be arranged in layers (multi-layer

network), or they may have a connection topology.

Layered networks consist of:

Input layer, made of n neurons (one for each

network input);

Hidden layer, composed of one or more hid-

den (or intermediate) layers consisting of m

neurons;

Crescenzio Gallo

Department of Clinical and Experimental Medicine, University of Foggia, Italy









Output layer, consisting of p neurons (one for

each network output).

The connection mode allows distinguishing

between two types of architectures:

The feedback architecture, with connections

between neurons of the same or previous layer;

The feedforward architecture (Hornik,

Stinchcombe, & White, 1989), without feed-

back connections (signals go only to the next

layer's neurons).

Each neuron (Figure 1) receives n input signals xi

(with connection weights wi) which sum to an "activa-

tion" value y. A suitable transfer (or activation) function

F transforms it into the output F(y).

The operational capability of a network (namely,

his knowledge) is contained in the connection weights,

which assume their values thanks to the training phase.





The different configuration possibilities are endless,

so the choice of the optimal configuration must be

primarily a function of the application target.



The simplest network (Rosenblatt, 1958; Minsky

& Papert, 1969) is constituted by a single neuron,

with n inputs and a single output. The basic learning

algorithm of perceptron analyzes the input configura-

tion (pattern) and — weighting variables through the

synapses — decides which output is associated with

the configuration.

This type of architecture presents the major limi-

tation of being able to solve only linearly separable

problems.

Figure 1. A basic artificial neuron











The neural network with an input layer, one or more

intermediate layers of neurons and an output layer is

called Multi Layer Perceptron or MLP (Hornik, Stinch-

combe, & White, 1989). This network (see Figure 2)

is of feed-forward type and uses, in most cases, the

backpropagation learning algorithm. It calculates the

weights between layers starting from random values

and making small gradual and progressive changes after

the network's output errors until the learning algorithm

converges to an acceptable error approximation.

There are many other types of more complex struc-

tures, but these lie beyond our purposes. Indeed, the

feedforward supervised backpropagation architecture

is the most popular and is widely used for the capacity

that this set of models has to generalize the results to

a large number of problems.



Neural network applications (Wong, Sumudu, Men-

dis, & Bouzerdoum, 2011) can be grouped into three

major areas:

Classification;

Time series forecasting;

Function approximation.





In the case of function approximation (or regression),

networks are applied to all situations lacking a precise

functional form describing input-output relations. The

particular (useful) area of time series prediction aims

to predict future value(s) through past available data

periods.

Consider, for example, forecasting an exchange

rate: it can only affect the exact value rather than the

tendency of the period. In the first case the neural

network's output is used for implementing the trading

system. In the second case, it will suffice to know that

a market is rising or falling to be able to compare it

with the expected dynamics in other markets.

In any case, in each of the possible applications,

from an operational viewpoint it is necessary to divide

the series into two parts: one, made from the so-called

in-sample observations, acts as a base for training

(training set); the other, made from the out-of-sample

observations, has the purpose to verify its validity

(validation set). The network can be trained to provide a

forecast for broader horizons, using its own short-term

forecasts as input to long-term forecasts.

Figure 2. Structure of a feedforward multi-layer neural network









For the efficiency of this type of application the

assessment of particular technical aspects is important,

such as:

Choice of the Input Variables: It must be

made considering that the network is unable to

provide any explanatory function, so it could

use non-significant variables; in fact, the rela-

tionships between variables change with time

and consequently significant input today may

not be true in the future;

Optimal Learning Level: It is necessary to

take into account that a too short training pro-

cess does not allow the network to capture the

relationships between variables, while a too

long training could make the network unable to

generalize (overfitting);

Choice of Time Horizon for the Forecast:

This is an important factor in that very short

forecast horizons increase the number of cor-

rect predictions; in contrast, indications of long

forecasting time horizons are on average less

correct, but the correct ones result in an higher

average profit.



Neural networks can also be used to classify data

(Haykin, 1999; Honkela, Duch, & Girolami, 2011). Un-

like regression problems, where the goal is to produce

the output corresponding to a given input, classification

problems require to label each data point as belong-

ing to one of n given classes. Neural networks can be

trained to provide a discriminant function separating

the classes. In a typical classification problem with two

classes a straight line can be used as discriminating; the

classes are so called linearly separable. These problems

can be learned without hidden units but, occasionally,

a nonlinear function is required to ensure the separa-

tion of the classes: this can only be solved by a neural

network with one or more hidden layers.

Typical classification applications are, for example,

credit ratings, trust decisions or biological risk as-

sessment. In these cases the network has the task of

assigning the input to a corresponding output among

a number of predefined categories.

To use a neural network for classification, you need

to build an equivalent function approximation problem

by assigning a target value to each class. For a binary

problem (two classes) you can use a neural network

with a single output y, and a binary target value: 0 for

one class, 1 for the other. You can therefore interpret the

output of the network as an estimate of the probability

that a given sample belongs to one of two classes. In

these cases an activation function is often used which

saturates the two target values, i.e. the logistic function:

f(x) = 1/(1+e-x) (1)





The neural network is not programmed directly but it

is explicitly trained through a learning algorithm for

solving a given task, a process that leads to "learning

through experience". The learning algorithm helps to

define the specific configuration of a neural network

and, therefore, conditions and determines the ability

of the network itself to provide correct answers to

specific problems.

We distinguish at least three types of learning:

unsupervised, supervised and by reinforcement. In

the first case, the network is trained only on the basis

of a set of inputs, without providing the correspond-

ing outputs. For supervised learning it is necessary

to identify, instead, a set of examples consisting of

appropriate inputs and corresponding outputs to be

presented to the network so that it "learns" from them.

Learning by reinforcement is used in cases where it

is not possible to specify input-output patterns. Rein-

forcement is provided to the system, which interprets

it as a positive/negative signal about its behavior and

adjusts settings accordingly.

The dataset used for network learning constitutes

the so-called learning or training set.



The backpropagation algorithm is used in supervised

learning. It allows modifying connection weights in

such a way that it minimizes a certain error function







E. This function depends on the i-th network's output

(vector) neti given the i-th input (vector) xi and the i-th

(vector of) desired output yi. The training set is thus

a set of N input/output pairs. The error function to be

minimized can be written as:

E(w) = (1/2) 𝛴 i𝛴 k ( netik-yik)2 ( 2)

where index k represents the value corresponding to

the k-th output neuron. E(w) is a function depending

on the weights (which, in general, vary over time). To

minimize it you can use the gradient-descent algorithm,

which starts from a generic data point and calculates the

gradient , which gives the direction in which to move

in order to reach the maximum increase (or decrease, if

you consider - ). Once defined the direction, you move

of a preset η distance and find a new point on which

to recalculate the gradient. The algorithm continues

iteratively until the gradient is zero. The backpropaga-

tion algorithm can be divided into two steps.

Forward Step: The input data to the network is

propagated to the next level and so on. Then the

network error E(w) is computed.

Backward Step: The error made by the network

is propagated backwards, and the weights are

updated appropriately.

The logical steps for training a neural network with

supervised learning are the following.

Create a set of input patterns and the associated set of

(desired) output patterns.

Initialize the weights of the neural network to random

values.

Learning cycle (this cycle ends only when the error

is generally less than a predefined threshold or

after a specified number of iterations).

1. Feedfoward phase (from the input to the

output layer):

a. extract an input pattern at random

from those available;

b. compute the value of all next neurons;

c. subtract from the result the neuron's

threshold activation value (if this

value has not already been simulated

with the addition of a fixed unit input

value, the bias);

d. filter the neuron's output applying a

logistic function to make this value

the input of the next neuron.

2. Compare the network result with the cor-

responding output pattern and derive the

current network error.

3. Backpropagation phase (from the output

to the input layer):

a. calculate the correction to the weights

according to the chosen minimum

locating rule;

b. apply the correction to the weights

of the layer.

In order to have a good training, the set of input

patterns must be completely examined (epoch). So, after

randomly choosing a pattern, in the extraction of the

next the old one should not be considered. In this way,

at each step only inputs not yet processed participate

to training, thus making a complete processing of all

the training set.





The construction of a neural network (Bandy, 1997;

Haykin, 1999) necessarily involves some steps requir-

ing you to set the appropriate parameters:



In order to get an optimal training and network imple-

mentation, it is necessary to divide the dataset into

subsets, which determine the learning environment:

training and validation set. The latter is in turn divided

into test and generalization set.

In essence, the network learns by trying to recognize

the dynamics of the training set, checks how it fits

on the test set and then applies itself to a set of data

(generalization) never observed before.

There are no universally valid rules for the subdi-

vision of the dataset: the solutions adopted are (60%

20% 20%) and (60% 30% 10%) respectively

for the training, test and generalization set.











There are numerous empirical studies that use a single

hidden layer, as it is sufficient to approximate non-linear

functions with a high degree of accuracy. However,

this approach requires a high number of neurons, so

limiting the learning process. Sometimes networks

with two hidden layers are more effective, especially

for the prediction of high-frequency data.

This choice, in addition to being suggested by a

specific theory, is supported by the experience that

shows how, on the other hand, more than two hidden

layers do not produce significant improvements in the

results obtained by the network.

It should also be noted that an excessive number of

neurons can generate overlearning, i.e. the neurons are

not able to generate a reliable prediction because they

reduce the contribution of the inputs; in these cases it

is as if the network had "retained" the correct answers,

without being able to generalize. On the contrary, a too

low number of neurons reduce the network's learning

potential.

The formulas proposed in literature are very dif-

ferent, and in some cases contradictory:

h = 2·n + 1 (3)

h = 2·n (4)

h = n (5)

h = (n+m)/2 + t1/2 (6)

where

h = number of hidden neurons;

n = number of input neurons;

m = number of output neurons;

t = number of observations contained in the training set.

The empirical results show that none of these rules

appears generalizable to every problem, although a

not meaningless number of positive results seem to

prefer (5).



There are several connection modes between the dif-

ferent layers.

Standard Connections: Provide direct con-

nections, without feedback, between input and

output that pass through one or more hidden

layers.

Jump Connections: The net can assign con-

nective weights even between neurons not in

adjacent layers.

Multiple Connections: Provide for the possi-

bility that neurons assigned to the hidden layers

can return to the input variables with iterative

processes, in order to precisely quantify the

connection's weight.



We can distinguish several types of different functional

forms (see Table 1) that affect the binding of the neurons

(linear, sinusoidal, Gaussian, etc.). You can define a

different activation function for each layer.

There is no rule theoretically acceptable to define

the activation function of the various layers. Although

many studies present different functions for the indi-

vidual layers, some do have the same function for the

input, hidden and output layers.

The linear function is typically used for the output

layer. Its use is less effective, however, in the hidden

layers, especially if these are characterized by a high

number of neurons.

The logistic and symmetric logistic functions have

the characteristic to vary, respectively, in the intervals

[0, 1] and [1, 1]. Some problems have dynamic char-

acteristics that are grasped in a more precise measure

by the symmetric function, especially in the input and

hidden layers. Most of the empirical literature shows

the use of this function in the hidden layer, although

without a strong theoretical motivation.

The hyperbolic tangent function allows adapting

the network in a reliable manner in the hidden layers

(particularly in networks with three layers), especially

in the case in which the analyst has chosen a logistic

or linear function for output.







The sinusoidal function is generally used in research

and it is suggested to normalize input and output in

the range [1, 1].

The Gaussian function lends itself to the identifi-

cation of specific dynamic processes caught with two

hidden parallel layers architectures, with a tangent

function in the second layer.



After defining the initial characteristics of the neural

network, you must define the criteria for stopping

learning. If you want to build a neural network with

forecasting purposes, it is preferable to assess its learn-

ing on the test set; if, instead, you want to adequately

describe the studied phenomenon it is preferable to

assess learning on the training set. Further, learning

parameters are related to network's error indicators

(average error, maximum error, number of times with

no error improvement).



A further problem is the choice of the learning rule: in

particular, it is necessary to decide the rate at which the

network changes weights compared to error. Consider

Figure 3. On the left you can see progress both in

learning and oscillation, with a frequency that achieves

the desired result more quickly; on the right hand the

network learns by changing its "path" with a lower rate

and reaches the target at a higher time.

The learning rate η is normally initially set to the

value:

η = [max(x) - min( x )] / N (7)

where x represents the training set with N input records.

Through the compensation for the input values x and

relative number N this initial value of η should be fine

for many classification problems, regardless of the

number of training samples and their values' range.

However, although highest values of η may accelerate

the learning process, this could induce oscillations that

can slow convergence.

Alternatively, it is possible to make use of momen-

tum, i.e. the proportion with which the variation of the

last weight reached by the neural network is added to

the new weight. In this way, the network can learn

even at a higher rate but it is not likely to swing since

it fetches a high proportion of the last weight loss.

The analyst must then decide the level of the initial

connection weights between neurons and assess whether

high noise levels characterize the observations. In this

case, you should keep up the value of momentum. The

network's learning process passes naturally from the

progressive identification of the most suitable values

for these parameters. It therefore requires a long series

of attempts that can generate significantly different

results. The suggestion is to avoid radical changes in

the parameters.



Learning parameters are generally related to the net-

work's error indicators:

Mean error;

Maximum error;

Table 1. Activation functions

Function Formula

Linear f(x ) = x

Logistic (sigmoid) f(x) = 1/(1+e-x )

Logistic symmetric f(x) = (1+e-x )/2

Hyperbolic tangent f(x) = (ex -e-x )/(ex+e-x )

Corrected tangent f(x) = tanh(c x)

Sinusoidal f(x ) = sin(x)

Gaussian f(x ) = e-x^2

Inverse Gaussian f(x) = 1-e-x^2









Number of epochs with no error improvement.

By setting a value for these parameters, the net-

work will stop when it reaches the desired value. It is

usually easier to fix a large number of epochs (which

are influenced by the number of observations of the

training set) that the network analyzes without improv-

ing the error. It can be assumed that a value between

10,000 and 50,000 epochs is safe enough to block a

network that with great difficulty can learn more than

it has done up to that point. Of course, the choice also

depends on the speed with which the network reaches

these values.

An item for the acceptance of a network is con-

vergence. If errors are modest but the oscillation has

led to a high divergence, it is advisable to check the

adequacy of the parameters (in particular, learning rate

and momentum).

Once the correct temporal dynamics of the error

has been verified, at least in graphical terms, you must

measure it quantitatively. Among the various error

indicators developed in statistics, we point out:

1. Determination Index (R2)

2. Mean Absolute Error (MAE)

3. Mean Absolute Percentage Error (MAPE)

4. Mean Square Error (MSE)

5. Root Mean Square Error (RMSE)

These indicators measure, in various ways, the

spread between the original and the estimated output

from the network. Only if the input neurons, activa-

tion functions and parameters described above were

perfectly able to identify the original phenomenon,

the difference between actual and estimated output

would be zero, thus optimizing the above mentioned

error indicators.



Once the neural network has been built properly, you

need to verify its forecasting "goodness". It is pos-

sible that a model is able to describe well the training

and test set, but then it appears entirely inadequate as

regards its generalization, i.e. — in the forecasting

case — prediction.

You have, therefore, to test the neural network on

generalization set with the same techniques already de-

scribed. First, you should measure the above-described

error indicators on the series never observed by the

network. If these prove to be significantly worse and,

however, not acceptable on the basis of the original

objectives, the network will be further tested until it

reaches an acceptable forecasting ability.

Figure 3. Network processing with different learning rates









A neural network is a sophisticated statistical system

(operating in parallel) with good noise immunity (Liu,

Zhang, & Polycarpou, 2011): if some unit of the system

were to malfunction, the network as a whole would have

some reductions but hardly would experience a crash.

The models produced by neural networks, although

very efficient, cannot be explained in human symbolic

language: the result must be accepted "as is" (black

box). In other words — as opposed to an algorithmic

system — a neural network is able to generate a valid

result (or at least with a high likelihood of being ac-

ceptable) but it is not possible to explain how and why

this result has been generated.

As with any modeling algorithm, neural networks

are efficient only if the predictor variables are chosen

with care and are not able to deal well with categori-

cal variables (for example, the name of a city) with

many different values. They require a training phase

that fixes the weights of individual connections, and

this may take some time if the number of samples

and variables analyzed is very large. There are no

theorems or models to define the optimal network, so

the success of a network depends very much on the

experience of its maker.



Neural networks are typically used in contexts where

the data may be partially incorrect, or where there are

no analytical models able to deal with the problem

(Haykin, 1999). A typical use is in OCR software,

in face recognition systems and, more generally, in

systems that deal with the processing of data subject

to errors or noise. They are also one of the most used

tools in the analysis of Data Mining. Neural networks

are a promising powerful mean for financial analysis

(Gately, 1995) or weather forecasting (Zhang & Patuwo,

1998). In recent years their importance has greatly

increased also in the field of bioinformatics (Herrero

et al., 2001), in which they can be used to search for

functional patterns and/or structural proteins and

nucleic acids and analysis of data from EEG. If suit-

ably given a long series of inputs (learning or training

phase), the network is able to provide the more likely

output. Recent studies (Panakkat, & Adeli, 2007) have

shown a good potential of neural networks in seismol-

ogy for the location of epicenters of earthquakes and

prediction of their intensity.



Bandy, H. (1997). Developing a Neural Network Sys-

tem. AAII Working Paper.

Gately, E. (1995). Neural Networks for Financial

Forecasting. New York: John Wiley & Sons.

Haykin, S. (1999). Neural Networks: a comprehensive

foundation (2nd ed.). Upper Saddle River, NJ, USA:

Prentice Hall PTR.

Herrero, J. (2001). A hierarchical unsupervised growing

neural network for clustering gene expression patterns.

[Oxford University Press.]. Bioinformatics (Oxford,

England), 17(2), 126–136. doi:10.1093/bioinformat-

ics/17.2.126 PMID:11238068

Honkela, T., Odzis Aw Duch, W., & Girolami, M.

(Eds.). (2011). Artificial Neural Networks and Machine

Learning: ICANN 2011, part 1-2. Springer.

Hornik, K., Stinchcombe, M., & White, H. (1989). Mul-

tilayer feedforward networks are universal approxima-

tors. Neural Networks, 2, 359–366. doi:10.1016/0893-

6080(89)90020-8

Liu, D., Zhang, H., & Polycarpou, M. (2011). Advances

in Neural Networks. ISNN 2011: 8th International

Symposium on Neural Networks, Guilin, China, May

29–June 1, 2011, Proceedings, Springer.

McCulloch, W. S., & Pitts, W. (1943). A logical calculus

of the ideas immanent in nervous activity. The Bulletin

of Mathematical Biophysics, 5, 115–133. doi:10.1007/

BF02478259

Minsky, M., & Papert, S. (1969). Perceptron: An In-

troduction to Computational Geometry. Cambridge,

MA, USA: MIT Press.

Panakkat, A., & Adeli, H. (2007). Neural network

models for earthquake magnitude prediction using

multiple seismicity indicators. International Jour-

nal of Neural Systems, 17 (1), 13–33. doi:10.1142/

S0129065707000890 PMID:17393560









Rosenblatt, F. (1958). The perceptron: A probabilistic

model for information storage and organization in the

brain. Psychological Review, 65 , 386–408. doi:10.1037/

h0042519 PMID:13602029

Wai Wong, K., Sumudu, B., Mendis, U., & Bouzer-

doum, A. (Eds.). (2011). Neural Information Process-

ing: Models and Applications. Springer.

Zhang, G. P., & Patuwo, M. Y. H. (1998). Forecasting

with artificial neural networks: The state of the art.

International Archives of Photogrammetry and Remote

Sensing, 1(14), 35–62.



Abraham, A. (2005). Artificial Neural Networks. John

Wiley & Sons, Ltd.

Bishop, C. M. (2006). Pattern Recognition and Machine

Learning. New York, Heidelberg: Springer.

Box, G. E. P., & Jenkins, G. M. (1976). Time Series

Analysis, Forecasting and Control. Oakland, CA,

USA: Holden-Day.

Fletcher, R. (1987). Practical Methods of Optimiza-

tion. Chippenham, Great Britain: John Wiley & Sons.

Freeman, J. A., & Skapura, D. M. (1991). Neural

networks: algorithms, applications, and programming

techniques. USA: Addison-Wesley.

Hassoun, M. H. (1995). Funda me ntals of Artificial Neu -

ral Networks. MA, USA: The MIT Press Cambridge.

Heaton, J. (2008). Introduction to Neural Networks

with Java (2nd ed.). Chesterfield, MO, USA: Heaton

Research Inc.

Hecht-Nielsen, R. (1989). Neurocomputing. Reading,

MA, USA: Addison-Wesley.

Herz, J., Krough, A., & Palmer, R. G. (1991). Introduc-

tion to the Theory of Neural Computation. Reading,

MA, USA: Addison-Wesley.

Hopfield, J. J. (1982). Neural Networks and Physical

Systems with Emergent Collective Computational

Abilities. Proceedings of the National Academy of Sci-

ences of the United States of America, 79 , 2554–2558.

doi:10.1073/pnas.79.8.2554 PMID:6953413

Hopfield, J. J. (1984). Neurons with Graded Response

Have Collective Computational Properties Like

Those of Two-State Neurons. Proceedings of the

National Academy of Sciences of the United States of

America, 81, 3088–3092. doi:10.1073/pnas.81.10.3088

PMID:6587342

Johansson, R. (1993). System Modeling and Identifica-

tion. Englewood Cliffs, NJ, USA: Prentice Hall.

Kohonen, T. (1995). Self-Organizing Maps. Berlin,

Germany: Springer-Verlag. doi:10.1007/978-3-642-

97610-0

Lek, S., Delacoste, M., Baran, P., Dimopoulos, I.,

Lauga, J., & Aulagnier, S. (1996). Application of

neural networks to modelling non-linear relation-

ships in ecology. Ecological Modelling, 90, 39–52.

doi:10.1016/0304-3800(95)00142-5

Murata, N., Yoshizawa, S., & Amari, S.-I. (1994).

Network information criterion-determining the number

of hidden units for an artificial neural network model.

IEEE Transactions on Neural Networks, 5(6), 865–872.

doi:10.1109/72.329683 PMID:18267861

Paruelo, M., & Tomasel, F. (1997). Prediction of

functional characteristics of ecosystems: a comparison

of artificial neural network and regression models.

Ecological Modelling, 98 , 173–186. doi:10.1016/

S0304-3800(96)01913-8

Sjöberg, J. etal. (1995). Non-Linear Black-Box Mod-

eling in System Identification: A Unified Overview.

Automatica, 31(12), 1691–1724. doi:10.1016/0005-

1098(95)00120-8

Sjöberg, J., & Ljung, L. (1995). Overtraining, Regular-

ization, and Searching for Minimum with Application

to Neural Nets. International Journal of Control, 62(6),

1391–1407. doi:10.1080/00207179508921605

Weigend, A. S., & Gershenfeld, N. A. (1994). Time

Series Prediction: Forecasting the Future and Un-

derstanding the Past. In Proceedings of the NATO

Advanced Research Workshop on Comparative Time

Series Analysis held in Santa Fe, New Mexico, May

14–17, 1992. Addison-Wesley, Reading, MA, USA.

Yao, X. (1993). Evolutionary Artificial Neural Net-

works. International Journal of Neural Systems ,

4(3), 203–222. doi:10.1142/S0129065793000171

PMID:8293227







Zhou, Z.-H., Wu, J., & Tang, W. (2002). Ensembling

neural networks: Many could be better than all. Artifi-

cial Intelligence 137(1–2):239–263, ISSN 0004-3702,

http://dx.doi.org/10.1016/S0004-3702(02)00190-X.



Artificial Intelligence: The term "artificial intel-

ligence" generally refers to the ability of a computer

to perform functions and reasoning typical of the

human mind. It covers the theory and techniques for

the development of algorithms that allow computers

to show an ability and/or intelligent activity, at least

in specific domains.

Artificial Neural Network (ANN): An artificial

neural network defines a mathematical model for the

simulation of a network of biological neurons (e.g.

human nervous system). It simulates different aspects

related to the behavior and capacity of the human brain,

such as: intelligent information processing; distributed

processing; high level of parallelism; faculty of learn-

ing, generalization and adaptation; high tolerance to

inaccurate (or wrong) information.

Learning: The acquisition or modification of

knowledge (new or existing), behaviors, skills, values,

or preferences and may involve the synthesis of differ-

ent types of information.

Mathematical Model: A mathematical model is a

model built using the language and tools of mathemat-

ics. A mathematical model is often constructed with

the aim to provide predictions on the future 'state' of

a phenomenon or a system.

Neuron: The neuron is the unit cell, which con-

stitutes the nervous tissue, contributing to the forma-

tion (together with the fabric of the neuroglia and the

vascular tissue) of the nervous system.

Perceptron: A type of binary classifier that maps

its inputs (a vector of real type) to an output value (a

scalar real type). The perceptron may be considered

as the simplest model of feed-forward neural network,

as the inputs directly feeding the output units through

weighted connections.

Synapse: The synapse (or synaptic junction) is a

highly specialized structure that enables communica-

tion of nerve cells together (neurons) or with other cells

(muscle cells, sensory or endocrine glands).

... Artificial neural networks (ANNs) [10] [11] are, among the tools capable of learning from examples, those with the greatest capacity for generalization, because they can easily manage situations not foreseen during the learning phase. These are computational models that are directly inspired by brain function and are at an advanced stage of research. ...

... The networks of the second type allow, instead, under certain conditions, to identify also less defined objects. The condition to obtain this result is the use, in the algorithms of Figure 7 and Figure 10, of an activation function of sigmoidal type (hyperbolic tangent) [11]: this type of net gives a greater relief to the pixels of the weak objects detaching them from the background. ...

  • Crescenzio Gallo Crescenzio Gallo
  • Vito Capozzi

Machine learning consists in the creation and development of algorithms that allow a machine to learn itself, gradually improving its behavior over time. This learning is more effective, the more representative is the features of the dataset used to describe the problem. An important objective is therefore the correct selection (and, possibly, reduction of the number) of the most relevant features, which is typically carried out through dimensional reduction tools such as Principal Component Analysis (PCA), which is not linear in the more general case. In this work, an approach to the calculation of the reduced space of the PCA is proposed through the definition and implementation of appropriate models of artificial neural network, which allows to obtain an accurate and at the same time flexible reduction of the dimensionality of the problem.

... Increased size of the networks and complicated connection of these networks drives the need to create an artificial neural network [6], which is used for analyzing the system feedback and processing in relatively short time in comparison with the other methods to avoid the voltage collapse [7,8] in the electrical system. ...

... Table 3. The output state can be understood as follows: if the immediately measured power is 150 MW which is greater than the greatest generated power, this situation will lead to the collapse of electrical voltage [6,7] and the sign "NO" appears in the red rectangle of conversation window "Black out" as shown in figure 4. Fig. 4. Figure 5 shows the situation when the load is equal to or less than the peak power supply which was 100 MW in this case. For example, if the loading of that network is 85 MW, therefore the program will refer to turning on the situation by the sign "YES" in green rectangle representing the system stability. . ...

In the present work, the electric voltage stability at Muharda station in Syria was studied during the normal and up to normal loading states. The results were obtained using artificial neural network, which consists of three layers (input-hidden-output). This network is characterized by the speed and accuracy in processing before failure and supply turn-off, which may lead to economical problems. This study was carried out using two different generating schemes in this station (single double). The performance of this network consists of two stages: training stage (off-line) and testing stage (on-line), and a comparison between these stages is carried out, which leads to optimization the load in testing cases depending on the training data.

... Artificial neural networks (ANNs) [10] [11] are, among the tools capable of learning from examples, those with the greatest capacity for generalization, because they can easily manage situations not foreseen during the learning phase. These are computational models that are directly inspired by brain function and are at an advanced stage of research. ...

... The networks of the second type allow, instead, under certain conditions, to identify also less defined objects. The condition to obtain this result is the use, in the algorithms of Figure 7 and Figure 10, of an activation function of sigmoidal type (hyperbolic tangent) [11]: this type of net gives a greater relief to the pixels of the weak objects detaching them from the background. ...

  • Crescenzio Gallo Crescenzio Gallo
  • Vito Capozzi

Machine learning consists in the creation and development of algorithms that allow a machine to learn itself, gradually improving its behavior over time. This learning is more effective, the more representative is the features of the dataset used to describe the problem. An important objective is therefore the correct selection (and, possibly, reduction of the number) of the most relevant features, which is typically carried out through dimensional reduction tools such as Principal Component Analysis (PCA), which is not linear in the more general case. In this work, an approach to the calculation of the reduced space of the PCA is proposed through the definition and implementation of appropriate models of artificial neural network, which allows to obtain an accurate and at the same time flexible reduction of the dimensionality of the problem.

... Numerous tutorials exist on how to build and utilize NNs across various domains. The reader is directed to the references for further clarification: Gallo (2015), Jain et al. (1996), Walczak and Cerpa (1999), and Wythoff (1993). ...

  • Steven Walczak Steven Walczak

Neural networks are a machine learning method that excel in solving classification and forecasting problems. They have also been shown to be a useful tool for working with big data oriented environments such as law enforcement. This article reviews and examines existing research on the utilization of neural networks for forecasting crime and other police decision making problem solving. Neural network models to predict specific types of crime using location and time information and to predict a crime's location when given the crime and time of day are developed to demonstrate the application of neural networks to police decision making. The neural network crime prediction models utilize geo-spatiality to provide immediate information on crimes to enhance law enforcement decision making. The neural network models are able to predict the type of crime being committed 16.4% of the time for 27 different types of crime or 27.1% of the time when similar crimes are grouped into seven categories of crime. The location prediction neural networks are able to predict the zip code location or adjacent location 31.2% of the time.

... The subsection of data can be random, defining a priori the percentage of units to insert in each group [36]. The most used solution is 70% for training and 30% for testing, especially for large datasets [37]. ...

Artificial neural networks (ANNs) are a valid alternative predictive method to the traditional statistical techniques currently used in many research fields where a massive amount of data is challenging to manage. In environmental analysis, ANNs can analyze pollution sources in large areas, estimating difficult and expensive to detect contaminants from other easily measurable pollutants, especially for screening procedures. In this study, organic micropollutants have been predicted from heavy metals concentration using ANNs. Sampling was performed in an agricultural field where organic and inorganic contaminants concentrations are beyond the legal limits. A critical problem of a neural network design is to select its parametric topology, which can prejudice the reliability of the model. Therefore, it is very important to assess the performance of ANNs when applying different types of parameters of the net. In this work, based on Taguchi L12 orthogonal array, turning experiments were conducted to identify the best parametric set of an ANNs design, considering different combinations of sample number, scaling, training rate, activation functions, number of hidden layers, and epochs. The composite desirability value for the multi-response variables has been obtained through the desirability function analysis (DFA). The parameters' optimum levels have been identified using this methodology.

... where z is the output from the neuron, x i is the input value, w i is the connection weight, d is the bias value, and f is the activation function (Alshihri et al., 2009). ANN is called Multi-Layer Perceptron (MLP) because of having an input layer, one or more intermediate layers, and an output layer (Gallo, 2015). Feedforward Neural Networks (FNN) is a part of a multi-layer perceptron (MLP) that is trained by using the back-propagation (BP) algorithm (Alemu et al., 2018). ...

Amidst a world of never-ending waste production and waste disposal crises, scientists have been working their ways to come up with solutions to serve the earth better. Two such commonly found trashes deteriorating the environment are glass and tin can waste. This study aims to investigate the comparative suitability of response surface methodology (RSM) and artificial neural network (ANN) in predicting the mechanical strength of concrete prepared with fine glass aggregate (GFA) and condensed milk can fibers (CMCF). An experimental scheme has been designed in this study with two input variables as GFA and CMCF, and two output variables as compressive and splitting tensile strength. The results show that both variables influenced the compressive and splitting tensile strength of concrete at 7, 28, and 56 days (p< 0.01). The maximum compressive and splitting tensile strength was found at 20% GFA with 1% CMCF and 10% GFA with 0.5% CMCF, respectively. The model predicted values in both techniques were in close agreement with corresponding experimental values in all cases. The results of different statistical parameters in terms of correlation, coefficient of determination, chi-square, mean square error, root mean square error, mean absolute error, standard error prediction indicate the functionality of both modeling approaches for concrete strength prediction. However, RSM models yield better accuracy in simulating the compressive and splitting tensile strength of concrete than ANN models.

... where z is the output from the neuron, x i is the input value, w i is the connection weight, d is the bias value, and f is the activation function (Alshihri et al., 2009). ANN is called Multi-Layer Perceptron (MLP) because of having an input layer, one or more intermediate layers, and an output layer (Gallo, 2015). Feedforward Neural Networks (FNN) is a part of a multi-layer perceptron (MLP) that is trained by using the back-propagation (BP) algorithm (Alemu et al., 2018). ...

An enormous volume of data, known as Big Data, of varied properties, is continuously being generating from several sources. For efficient and consequential use of this huge amount of data, automated and correct categorization is very important. The precise categorization can find the correlations, hidden patterns, and other valuable insights. The process of categorization of mixed heterogeneous data is known as data classification and is done based on some predefined features. Various algorithms and techniques are proposed for Big Data classification. This chapter attempts to discuss various technicalities of Big Data classification, comprehensively. To start with, the basics of Big Data classifications such as need, types, patterns, phases, approaches, etc. are explained aptly. Different classification techniques, including traditional, evolutionary, and advanced machine learning technique, are discussed with suitable examples, along with citing their advantages and disadvantages. Finally, a survey of various open-source and commercial libraries, platforms, and tools for Big Data classification is presented.

  • Uma Maheswari Sadasivam
  • Nitin Ganesan

Fake news is the word making more talk these days be it election, COVID 19 pandemic, or any social unrest. Many social websites have started to fact check the news or articles posted on their websites. The reason being these fake news creates confusion, chaos, misleading the community and society. In this cyber era, citizen journalism is happening more where citizens do the collection, reporting, dissemination, and analyse news or information. This means anyone can publish news on the social websites and lead to unreliable information from the readers' points of view as well. In order to make every nation or country safe place to live by holding a fair and square election, to stop spreading hatred on race, religion, caste, creed, also to have reliable information about COVID 19, and finally from any social unrest, we need to keep a tab on fake news. This chapter presents a way to detect fake news using deep learning technique and natural language processing.

  • Fahed Mostafa Fahed Mostafa
  • Tharam S. Dillon
  • Elizabeth Chang

In the 1960s, neural networks were one of the most promising and active areas of research. Neural networks were applied to a variety of problems in the hope of achieving major breakthroughs by discovering relationships within the data.

Introduction: Artificial Neural Networks (ANN) are inspired by the way biological neural system works, such as the brain process information. The information processing system is composed of a large number of highly interconnected processing elements (neurons) working together to solve specific problems. ANNs, just like people, learn by example. Similar to learning in biological systems, ANN learning involves adjustments to the synaptic connections that exist between the neurons.