Neural Network Tutorial Pdf Free Download
The purpose of this chapter is to introduce a powerful class of mathematical models: the artificial neural networks. This is a very general term that includes many different systems and various types of approaches, both from statistics and computer science. Our aim is not to examine them all (it would be a very long discussion), but to understand the basic functionality and the possible implementations of this powerful tool. We initially introduce neural networks, by analogy with the human brain. The analogy is not very detailed, but it serves to introduce the concept of parallel and distributed computing. Then we analyze in detail a widely applied type of artificial neural network: the feed-forward network with error back-propagation algorithm. We illustrate the architecture of the models, the main learning methods and data representation, showing how to build a typical artificial neural network.
Discover the world's research
- 20+ million members
- 135+ million publications
- 700k+ research projects
Join for free
Copyright © 2015, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
DOI: 10.4018/978-1-4666-5888-2.ch626
The purpose of this chapter is to introduce a powerful
class of mathematical models: the artificial neural net-
works. This is a very general term that includes many
different systems and various types of approaches, both
from statistics and computer science. Our aim is not to
examine them all (it would be a very long discussion),
but to understand the basic functionality and the pos-
sible implementations of this powerful tool. We initially
introduce neural networks, by analogy with the human
brain. The analogy is not very detailed, but it serves to
introduce the concept of parallel and distributed com-
puting. Then we analyze in detail a widely applied type
of artificial neural network: the feed-forward network
with error back-propagation algorithm. We illustrate the
architecture of the models, the main learning methods
and data representation, showing how to build a typical
artificial neural network.
Artificial neural networks (Bandy, 1997; Haykin, 1999)
are information processing structures providing the
(often unknown) connection between input and output
data (Honkela, Duch, & Girolami, 2011) by artificially
simulating the physiological structure and functioning
of human brain structures.
The natural neural network, instead, consists of a
very large number of nerve cells (about ten billion in
humans), said neurons, and linked together in a com-
plex network. The intelligent behavior is the result of
extensive interaction between interconnected units. The
input of a neuron is composed of the output signals
of the neurons connected to it. When the contribution
of these inputs exceeds a certain threshold, the neu-
ron - through a suitable transfer function - generates
a bioelectric signal, which propagates through the
synaptic weights to other neurons.
Significant features of this network, which artificial
neural models intend to simulate, are:
• The parallel processing, due to the fact that neu-
rons process simultaneously the information;
• The twofold function of the neuron, that acts si-
multaneously as memory and signal processor;
• The distributed nature of the data representa-
tion, i.e. knowledge is distributed throughout the
network, not circumscribed or predetermined;
• The network's ability to learn from experience.
This last but fundamental capacity enables neural
networks to self-organize, adapt to new incoming
information and extract the input-output connections
from known examples that are the basis of their or-
ganization. An artificial neural network captures this
attitude in an appropriate "learning" stage.
Despite the great success achieved by artificial
neural networks, it is however better to remain aware
of the limits of this technology due to the necessary
reduction of the real system to be examined.
Artificial neural networks are composed of elementary
computational units called neurons (McCulloch & Pitts,
1943) combined according to different architectures.
For example, they can be arranged in layers (multi-layer
network), or they may have a connection topology.
Layered networks consist of:
• Input layer, made of n neurons (one for each
network input);
• Hidden layer, composed of one or more hid-
den (or intermediate) layers consisting of m
neurons;
Crescenzio Gallo
Department of Clinical and Experimental Medicine, University of Foggia, Italy
• Output layer, consisting of p neurons (one for
each network output).
The connection mode allows distinguishing
between two types of architectures:
• The feedback architecture, with connections
between neurons of the same or previous layer;
• The feedforward architecture (Hornik,
Stinchcombe, & White, 1989), without feed-
back connections (signals go only to the next
layer's neurons).
Each neuron (Figure 1) receives n input signals xi
(with connection weights wi) which sum to an "activa-
tion" value y. A suitable transfer (or activation) function
F transforms it into the output F(y).
The operational capability of a network (namely,
his knowledge) is contained in the connection weights,
which assume their values thanks to the training phase.
The different configuration possibilities are endless,
so the choice of the optimal configuration must be
primarily a function of the application target.
The simplest network (Rosenblatt, 1958; Minsky
& Papert, 1969) is constituted by a single neuron,
with n inputs and a single output. The basic learning
algorithm of perceptron analyzes the input configura-
tion (pattern) and — weighting variables through the
synapses — decides which output is associated with
the configuration.
This type of architecture presents the major limi-
tation of being able to solve only linearly separable
problems.
Figure 1. A basic artificial neuron
The neural network with an input layer, one or more
intermediate layers of neurons and an output layer is
called Multi Layer Perceptron or MLP (Hornik, Stinch-
combe, & White, 1989). This network (see Figure 2)
is of feed-forward type and uses, in most cases, the
backpropagation learning algorithm. It calculates the
weights between layers starting from random values
and making small gradual and progressive changes after
the network's output errors until the learning algorithm
converges to an acceptable error approximation.
There are many other types of more complex struc-
tures, but these lie beyond our purposes. Indeed, the
feedforward supervised backpropagation architecture
is the most popular and is widely used for the capacity
that this set of models has to generalize the results to
a large number of problems.
Neural network applications (Wong, Sumudu, Men-
dis, & Bouzerdoum, 2011) can be grouped into three
major areas:
• Classification;
• Time series forecasting;
• Function approximation.
In the case of function approximation (or regression),
networks are applied to all situations lacking a precise
functional form describing input-output relations. The
particular (useful) area of time series prediction aims
to predict future value(s) through past available data
periods.
Consider, for example, forecasting an exchange
rate: it can only affect the exact value rather than the
tendency of the period. In the first case the neural
network's output is used for implementing the trading
system. In the second case, it will suffice to know that
a market is rising or falling to be able to compare it
with the expected dynamics in other markets.
In any case, in each of the possible applications,
from an operational viewpoint it is necessary to divide
the series into two parts: one, made from the so-called
in-sample observations, acts as a base for training
(training set); the other, made from the out-of-sample
observations, has the purpose to verify its validity
(validation set). The network can be trained to provide a
forecast for broader horizons, using its own short-term
forecasts as input to long-term forecasts.
Figure 2. Structure of a feedforward multi-layer neural network
For the efficiency of this type of application the
assessment of particular technical aspects is important,
such as:
• Choice of the Input Variables: It must be
made considering that the network is unable to
provide any explanatory function, so it could
use non-significant variables; in fact, the rela-
tionships between variables change with time
and consequently significant input today may
not be true in the future;
• Optimal Learning Level: It is necessary to
take into account that a too short training pro-
cess does not allow the network to capture the
relationships between variables, while a too
long training could make the network unable to
generalize (overfitting);
• Choice of Time Horizon for the Forecast:
This is an important factor in that very short
forecast horizons increase the number of cor-
rect predictions; in contrast, indications of long
forecasting time horizons are on average less
correct, but the correct ones result in an higher
average profit.
Neural networks can also be used to classify data
(Haykin, 1999; Honkela, Duch, & Girolami, 2011). Un-
like regression problems, where the goal is to produce
the output corresponding to a given input, classification
problems require to label each data point as belong-
ing to one of n given classes. Neural networks can be
trained to provide a discriminant function separating
the classes. In a typical classification problem with two
classes a straight line can be used as discriminating; the
classes are so called linearly separable. These problems
can be learned without hidden units but, occasionally,
a nonlinear function is required to ensure the separa-
tion of the classes: this can only be solved by a neural
network with one or more hidden layers.
Typical classification applications are, for example,
credit ratings, trust decisions or biological risk as-
sessment. In these cases the network has the task of
assigning the input to a corresponding output among
a number of predefined categories.
To use a neural network for classification, you need
to build an equivalent function approximation problem
by assigning a target value to each class. For a binary
problem (two classes) you can use a neural network
with a single output y, and a binary target value: 0 for
one class, 1 for the other. You can therefore interpret the
output of the network as an estimate of the probability
that a given sample belongs to one of two classes. In
these cases an activation function is often used which
saturates the two target values, i.e. the logistic function:
f(x) = 1/(1+e-x) (1)
The neural network is not programmed directly but it
is explicitly trained through a learning algorithm for
solving a given task, a process that leads to "learning
through experience". The learning algorithm helps to
define the specific configuration of a neural network
and, therefore, conditions and determines the ability
of the network itself to provide correct answers to
specific problems.
We distinguish at least three types of learning:
unsupervised, supervised and by reinforcement. In
the first case, the network is trained only on the basis
of a set of inputs, without providing the correspond-
ing outputs. For supervised learning it is necessary
to identify, instead, a set of examples consisting of
appropriate inputs and corresponding outputs to be
presented to the network so that it "learns" from them.
Learning by reinforcement is used in cases where it
is not possible to specify input-output patterns. Rein-
forcement is provided to the system, which interprets
it as a positive/negative signal about its behavior and
adjusts settings accordingly.
The dataset used for network learning constitutes
the so-called learning or training set.
The backpropagation algorithm is used in supervised
learning. It allows modifying connection weights in
such a way that it minimizes a certain error function
E. This function depends on the i-th network's output
(vector) neti given the i-th input (vector) xi and the i-th
(vector of) desired output yi. The training set is thus
a set of N input/output pairs. The error function to be
minimized can be written as:
E(w) = (1/2) 𝛴 i𝛴 k ( netik-yik)2 ( 2)
where index k represents the value corresponding to
the k-th output neuron. E(w) is a function depending
on the weights (which, in general, vary over time). To
minimize it you can use the gradient-descent algorithm,
which starts from a generic data point and calculates the
gradient ∇, which gives the direction in which to move
in order to reach the maximum increase (or decrease, if
you consider -∇ ). Once defined the direction, you move
of a preset η distance and find a new point on which
to recalculate the gradient. The algorithm continues
iteratively until the gradient is zero. The backpropaga-
tion algorithm can be divided into two steps.
• Forward Step: The input data to the network is
propagated to the next level and so on. Then the
network error E(w) is computed.
• Backward Step: The error made by the network
is propagated backwards, and the weights are
updated appropriately.
The logical steps for training a neural network with
supervised learning are the following.
Create a set of input patterns and the associated set of
(desired) output patterns.
Initialize the weights of the neural network to random
values.
Learning cycle (this cycle ends only when the error
is generally less than a predefined threshold or
after a specified number of iterations).
1. Feedfoward phase (from the input to the
output layer):
a. extract an input pattern at random
from those available;
b. compute the value of all next neurons;
c. subtract from the result the neuron's
threshold activation value (if this
value has not already been simulated
with the addition of a fixed unit input
value, the bias);
d. filter the neuron's output applying a
logistic function to make this value
the input of the next neuron.
2. Compare the network result with the cor-
responding output pattern and derive the
current network error.
3. Backpropagation phase (from the output
to the input layer):
a. calculate the correction to the weights
according to the chosen minimum
locating rule;
b. apply the correction to the weights
of the layer.
In order to have a good training, the set of input
patterns must be completely examined (epoch). So, after
randomly choosing a pattern, in the extraction of the
next the old one should not be considered. In this way,
at each step only inputs not yet processed participate
to training, thus making a complete processing of all
the training set.
The construction of a neural network (Bandy, 1997;
Haykin, 1999) necessarily involves some steps requir-
ing you to set the appropriate parameters:
In order to get an optimal training and network imple-
mentation, it is necessary to divide the dataset into
subsets, which determine the learning environment:
training and validation set. The latter is in turn divided
into test and generalization set.
In essence, the network learns by trying to recognize
the dynamics of the training set, checks how it fits
on the test set and then applies itself to a set of data
(generalization) never observed before.
There are no universally valid rules for the subdi-
vision of the dataset: the solutions adopted are (60%
− 20% − 20%) and (60% − 30% − 10%) respectively
for the training, test and generalization set.
There are numerous empirical studies that use a single
hidden layer, as it is sufficient to approximate non-linear
functions with a high degree of accuracy. However,
this approach requires a high number of neurons, so
limiting the learning process. Sometimes networks
with two hidden layers are more effective, especially
for the prediction of high-frequency data.
This choice, in addition to being suggested by a
specific theory, is supported by the experience that
shows how, on the other hand, more than two hidden
layers do not produce significant improvements in the
results obtained by the network.
It should also be noted that an excessive number of
neurons can generate overlearning, i.e. the neurons are
not able to generate a reliable prediction because they
reduce the contribution of the inputs; in these cases it
is as if the network had "retained" the correct answers,
without being able to generalize. On the contrary, a too
low number of neurons reduce the network's learning
potential.
The formulas proposed in literature are very dif-
ferent, and in some cases contradictory:
h = 2·n + 1 (3)
h = 2·n (4)
h = n (5)
h = (n+m)/2 + t1/2 (6)
where
h = number of hidden neurons;
n = number of input neurons;
m = number of output neurons;
t = number of observations contained in the training set.
The empirical results show that none of these rules
appears generalizable to every problem, although a
not meaningless number of positive results seem to
prefer (5).
There are several connection modes between the dif-
ferent layers.
• Standard Connections: Provide direct con-
nections, without feedback, between input and
output that pass through one or more hidden
layers.
• Jump Connections: The net can assign con-
nective weights even between neurons not in
adjacent layers.
• Multiple Connections: Provide for the possi-
bility that neurons assigned to the hidden layers
can return to the input variables with iterative
processes, in order to precisely quantify the
connection's weight.
We can distinguish several types of different functional
forms (see Table 1) that affect the binding of the neurons
(linear, sinusoidal, Gaussian, etc.). You can define a
different activation function for each layer.
There is no rule theoretically acceptable to define
the activation function of the various layers. Although
many studies present different functions for the indi-
vidual layers, some do have the same function for the
input, hidden and output layers.
The linear function is typically used for the output
layer. Its use is less effective, however, in the hidden
layers, especially if these are characterized by a high
number of neurons.
The logistic and symmetric logistic functions have
the characteristic to vary, respectively, in the intervals
[0, 1] and [−1, 1]. Some problems have dynamic char-
acteristics that are grasped in a more precise measure
by the symmetric function, especially in the input and
hidden layers. Most of the empirical literature shows
the use of this function in the hidden layer, although
without a strong theoretical motivation.
The hyperbolic tangent function allows adapting
the network in a reliable manner in the hidden layers
(particularly in networks with three layers), especially
in the case in which the analyst has chosen a logistic
or linear function for output.
The sinusoidal function is generally used in research
and it is suggested to normalize input and output in
the range [−1, 1].
The Gaussian function lends itself to the identifi-
cation of specific dynamic processes caught with two
hidden parallel layers architectures, with a tangent
function in the second layer.
After defining the initial characteristics of the neural
network, you must define the criteria for stopping
learning. If you want to build a neural network with
forecasting purposes, it is preferable to assess its learn-
ing on the test set; if, instead, you want to adequately
describe the studied phenomenon it is preferable to
assess learning on the training set. Further, learning
parameters are related to network's error indicators
(average error, maximum error, number of times with
no error improvement).
A further problem is the choice of the learning rule: in
particular, it is necessary to decide the rate at which the
network changes weights compared to error. Consider
Figure 3. On the left you can see progress both in
learning and oscillation, with a frequency that achieves
the desired result more quickly; on the right hand the
network learns by changing its "path" with a lower rate
and reaches the target at a higher time.
The learning rate η is normally initially set to the
value:
η = [max(x) - min( x )] / N (7)
where x represents the training set with N input records.
Through the compensation for the input values x and
relative number N this initial value of η should be fine
for many classification problems, regardless of the
number of training samples and their values' range.
However, although highest values of η may accelerate
the learning process, this could induce oscillations that
can slow convergence.
Alternatively, it is possible to make use of momen-
tum, i.e. the proportion with which the variation of the
last weight reached by the neural network is added to
the new weight. In this way, the network can learn
even at a higher rate but it is not likely to swing since
it fetches a high proportion of the last weight loss.
The analyst must then decide the level of the initial
connection weights between neurons and assess whether
high noise levels characterize the observations. In this
case, you should keep up the value of momentum. The
network's learning process passes naturally from the
progressive identification of the most suitable values
for these parameters. It therefore requires a long series
of attempts that can generate significantly different
results. The suggestion is to avoid radical changes in
the parameters.
Learning parameters are generally related to the net-
work's error indicators:
• Mean error;
• Maximum error;
Table 1. Activation functions
Function Formula
Linear f(x ) = x
Logistic (sigmoid) f(x) = 1/(1+e-x )
Logistic symmetric f(x) = (1+e-x )/2
Hyperbolic tangent f(x) = (ex -e-x )/(ex+e-x )
Corrected tangent f(x) = tanh(c ∙ x)
Sinusoidal f(x ) = sin(x)
Gaussian f(x ) = e-x^2
Inverse Gaussian f(x) = 1-e-x^2
• Number of epochs with no error improvement.
By setting a value for these parameters, the net-
work will stop when it reaches the desired value. It is
usually easier to fix a large number of epochs (which
are influenced by the number of observations of the
training set) that the network analyzes without improv-
ing the error. It can be assumed that a value between
10,000 and 50,000 epochs is safe enough to block a
network that with great difficulty can learn more than
it has done up to that point. Of course, the choice also
depends on the speed with which the network reaches
these values.
An item for the acceptance of a network is con-
vergence. If errors are modest but the oscillation has
led to a high divergence, it is advisable to check the
adequacy of the parameters (in particular, learning rate
and momentum).
Once the correct temporal dynamics of the error
has been verified, at least in graphical terms, you must
measure it quantitatively. Among the various error
indicators developed in statistics, we point out:
1. Determination Index (R2)
2. Mean Absolute Error (MAE)
3. Mean Absolute Percentage Error (MAPE)
4. Mean Square Error (MSE)
5. Root Mean Square Error (RMSE)
These indicators measure, in various ways, the
spread between the original and the estimated output
from the network. Only if the input neurons, activa-
tion functions and parameters described above were
perfectly able to identify the original phenomenon,
the difference between actual and estimated output
would be zero, thus optimizing the above mentioned
error indicators.
Once the neural network has been built properly, you
need to verify its forecasting "goodness". It is pos-
sible that a model is able to describe well the training
and test set, but then it appears entirely inadequate as
regards its generalization, i.e. — in the forecasting
case — prediction.
You have, therefore, to test the neural network on
generalization set with the same techniques already de-
scribed. First, you should measure the above-described
error indicators on the series never observed by the
network. If these prove to be significantly worse and,
however, not acceptable on the basis of the original
objectives, the network will be further tested until it
reaches an acceptable forecasting ability.
Figure 3. Network processing with different learning rates
A neural network is a sophisticated statistical system
(operating in parallel) with good noise immunity (Liu,
Zhang, & Polycarpou, 2011): if some unit of the system
were to malfunction, the network as a whole would have
some reductions but hardly would experience a crash.
The models produced by neural networks, although
very efficient, cannot be explained in human symbolic
language: the result must be accepted "as is" (black
box). In other words — as opposed to an algorithmic
system — a neural network is able to generate a valid
result (or at least with a high likelihood of being ac-
ceptable) but it is not possible to explain how and why
this result has been generated.
As with any modeling algorithm, neural networks
are efficient only if the predictor variables are chosen
with care and are not able to deal well with categori-
cal variables (for example, the name of a city) with
many different values. They require a training phase
that fixes the weights of individual connections, and
this may take some time if the number of samples
and variables analyzed is very large. There are no
theorems or models to define the optimal network, so
the success of a network depends very much on the
experience of its maker.
Neural networks are typically used in contexts where
the data may be partially incorrect, or where there are
no analytical models able to deal with the problem
(Haykin, 1999). A typical use is in OCR software,
in face recognition systems and, more generally, in
systems that deal with the processing of data subject
to errors or noise. They are also one of the most used
tools in the analysis of Data Mining. Neural networks
are a promising powerful mean for financial analysis
(Gately, 1995) or weather forecasting (Zhang & Patuwo,
1998). In recent years their importance has greatly
increased also in the field of bioinformatics (Herrero
et al., 2001), in which they can be used to search for
functional patterns and/or structural proteins and
nucleic acids and analysis of data from EEG. If suit-
ably given a long series of inputs (learning or training
phase), the network is able to provide the more likely
output. Recent studies (Panakkat, & Adeli, 2007) have
shown a good potential of neural networks in seismol-
ogy for the location of epicenters of earthquakes and
prediction of their intensity.
Bandy, H. (1997). Developing a Neural Network Sys-
tem. AAII Working Paper.
Gately, E. (1995). Neural Networks for Financial
Forecasting. New York: John Wiley & Sons.
Haykin, S. (1999). Neural Networks: a comprehensive
foundation (2nd ed.). Upper Saddle River, NJ, USA:
Prentice Hall PTR.
Herrero, J. (2001). A hierarchical unsupervised growing
neural network for clustering gene expression patterns.
[Oxford University Press.]. Bioinformatics (Oxford,
England), 17(2), 126–136. doi:10.1093/bioinformat-
ics/17.2.126 PMID:11238068
Honkela, T., Odzis Aw Duch, W., & Girolami, M.
(Eds.). (2011). Artificial Neural Networks and Machine
Learning: ICANN 2011, part 1-2. Springer.
Hornik, K., Stinchcombe, M., & White, H. (1989). Mul-
tilayer feedforward networks are universal approxima-
tors. Neural Networks, 2, 359–366. doi:10.1016/0893-
6080(89)90020-8
Liu, D., Zhang, H., & Polycarpou, M. (2011). Advances
in Neural Networks. ISNN 2011: 8th International
Symposium on Neural Networks, Guilin, China, May
29–June 1, 2011, Proceedings, Springer.
McCulloch, W. S., & Pitts, W. (1943). A logical calculus
of the ideas immanent in nervous activity. The Bulletin
of Mathematical Biophysics, 5, 115–133. doi:10.1007/
BF02478259
Minsky, M., & Papert, S. (1969). Perceptron: An In-
troduction to Computational Geometry. Cambridge,
MA, USA: MIT Press.
Panakkat, A., & Adeli, H. (2007). Neural network
models for earthquake magnitude prediction using
multiple seismicity indicators. International Jour-
nal of Neural Systems, 17 (1), 13–33. doi:10.1142/
S0129065707000890 PMID:17393560
Rosenblatt, F. (1958). The perceptron: A probabilistic
model for information storage and organization in the
brain. Psychological Review, 65 , 386–408. doi:10.1037/
h0042519 PMID:13602029
Wai Wong, K., Sumudu, B., Mendis, U., & Bouzer-
doum, A. (Eds.). (2011). Neural Information Process-
ing: Models and Applications. Springer.
Zhang, G. P., & Patuwo, M. Y. H. (1998). Forecasting
with artificial neural networks: The state of the art.
International Archives of Photogrammetry and Remote
Sensing, 1(14), 35–62.
Abraham, A. (2005). Artificial Neural Networks. John
Wiley & Sons, Ltd.
Bishop, C. M. (2006). Pattern Recognition and Machine
Learning. New York, Heidelberg: Springer.
Box, G. E. P., & Jenkins, G. M. (1976). Time Series
Analysis, Forecasting and Control. Oakland, CA,
USA: Holden-Day.
Fletcher, R. (1987). Practical Methods of Optimiza-
tion. Chippenham, Great Britain: John Wiley & Sons.
Freeman, J. A., & Skapura, D. M. (1991). Neural
networks: algorithms, applications, and programming
techniques. USA: Addison-Wesley.
Hassoun, M. H. (1995). Funda me ntals of Artificial Neu -
ral Networks. MA, USA: The MIT Press Cambridge.
Heaton, J. (2008). Introduction to Neural Networks
with Java (2nd ed.). Chesterfield, MO, USA: Heaton
Research Inc.
Hecht-Nielsen, R. (1989). Neurocomputing. Reading,
MA, USA: Addison-Wesley.
Herz, J., Krough, A., & Palmer, R. G. (1991). Introduc-
tion to the Theory of Neural Computation. Reading,
MA, USA: Addison-Wesley.
Hopfield, J. J. (1982). Neural Networks and Physical
Systems with Emergent Collective Computational
Abilities. Proceedings of the National Academy of Sci-
ences of the United States of America, 79 , 2554–2558.
doi:10.1073/pnas.79.8.2554 PMID:6953413
Hopfield, J. J. (1984). Neurons with Graded Response
Have Collective Computational Properties Like
Those of Two-State Neurons. Proceedings of the
National Academy of Sciences of the United States of
America, 81, 3088–3092. doi:10.1073/pnas.81.10.3088
PMID:6587342
Johansson, R. (1993). System Modeling and Identifica-
tion. Englewood Cliffs, NJ, USA: Prentice Hall.
Kohonen, T. (1995). Self-Organizing Maps. Berlin,
Germany: Springer-Verlag. doi:10.1007/978-3-642-
97610-0
Lek, S., Delacoste, M., Baran, P., Dimopoulos, I.,
Lauga, J., & Aulagnier, S. (1996). Application of
neural networks to modelling non-linear relation-
ships in ecology. Ecological Modelling, 90, 39–52.
doi:10.1016/0304-3800(95)00142-5
Murata, N., Yoshizawa, S., & Amari, S.-I. (1994).
Network information criterion-determining the number
of hidden units for an artificial neural network model.
IEEE Transactions on Neural Networks, 5(6), 865–872.
doi:10.1109/72.329683 PMID:18267861
Paruelo, M., & Tomasel, F. (1997). Prediction of
functional characteristics of ecosystems: a comparison
of artificial neural network and regression models.
Ecological Modelling, 98 , 173–186. doi:10.1016/
S0304-3800(96)01913-8
Sjöberg, J. etal. (1995). Non-Linear Black-Box Mod-
eling in System Identification: A Unified Overview.
Automatica, 31(12), 1691–1724. doi:10.1016/0005-
1098(95)00120-8
Sjöberg, J., & Ljung, L. (1995). Overtraining, Regular-
ization, and Searching for Minimum with Application
to Neural Nets. International Journal of Control, 62(6),
1391–1407. doi:10.1080/00207179508921605
Weigend, A. S., & Gershenfeld, N. A. (1994). Time
Series Prediction: Forecasting the Future and Un-
derstanding the Past. In Proceedings of the NATO
Advanced Research Workshop on Comparative Time
Series Analysis held in Santa Fe, New Mexico, May
14–17, 1992. Addison-Wesley, Reading, MA, USA.
Yao, X. (1993). Evolutionary Artificial Neural Net-
works. International Journal of Neural Systems ,
4(3), 203–222. doi:10.1142/S0129065793000171
PMID:8293227
Zhou, Z.-H., Wu, J., & Tang, W. (2002). Ensembling
neural networks: Many could be better than all. Artifi-
cial Intelligence 137(1–2):239–263, ISSN 0004-3702,
http://dx.doi.org/10.1016/S0004-3702(02)00190-X.
Artificial Intelligence: The term "artificial intel-
ligence" generally refers to the ability of a computer
to perform functions and reasoning typical of the
human mind. It covers the theory and techniques for
the development of algorithms that allow computers
to show an ability and/or intelligent activity, at least
in specific domains.
Artificial Neural Network (ANN): An artificial
neural network defines a mathematical model for the
simulation of a network of biological neurons (e.g.
human nervous system). It simulates different aspects
related to the behavior and capacity of the human brain,
such as: intelligent information processing; distributed
processing; high level of parallelism; faculty of learn-
ing, generalization and adaptation; high tolerance to
inaccurate (or wrong) information.
Learning: The acquisition or modification of
knowledge (new or existing), behaviors, skills, values,
or preferences and may involve the synthesis of differ-
ent types of information.
Mathematical Model: A mathematical model is a
model built using the language and tools of mathemat-
ics. A mathematical model is often constructed with
the aim to provide predictions on the future 'state' of
a phenomenon or a system.
Neuron: The neuron is the unit cell, which con-
stitutes the nervous tissue, contributing to the forma-
tion (together with the fabric of the neuroglia and the
vascular tissue) of the nervous system.
Perceptron: A type of binary classifier that maps
its inputs (a vector of real type) to an output value (a
scalar real type). The perceptron may be considered
as the simplest model of feed-forward neural network,
as the inputs directly feeding the output units through
weighted connections.
Synapse: The synapse (or synaptic junction) is a
highly specialized structure that enables communica-
tion of nerve cells together (neurons) or with other cells
(muscle cells, sensory or endocrine glands).
... Artificial neural networks (ANNs) [10] [11] are, among the tools capable of learning from examples, those with the greatest capacity for generalization, because they can easily manage situations not foreseen during the learning phase. These are computational models that are directly inspired by brain function and are at an advanced stage of research. ...
... The networks of the second type allow, instead, under certain conditions, to identify also less defined objects. The condition to obtain this result is the use, in the algorithms of Figure 7 and Figure 10, of an activation function of sigmoidal type (hyperbolic tangent) [11]: this type of net gives a greater relief to the pixels of the weak objects detaching them from the background. ...
- Crescenzio Gallo
- Vito Capozzi
Machine learning consists in the creation and development of algorithms that allow a machine to learn itself, gradually improving its behavior over time. This learning is more effective, the more representative is the features of the dataset used to describe the problem. An important objective is therefore the correct selection (and, possibly, reduction of the number) of the most relevant features, which is typically carried out through dimensional reduction tools such as Principal Component Analysis (PCA), which is not linear in the more general case. In this work, an approach to the calculation of the reduced space of the PCA is proposed through the definition and implementation of appropriate models of artificial neural network, which allows to obtain an accurate and at the same time flexible reduction of the dimensionality of the problem.
... Increased size of the networks and complicated connection of these networks drives the need to create an artificial neural network [6], which is used for analyzing the system feedback and processing in relatively short time in comparison with the other methods to avoid the voltage collapse [7,8] in the electrical system. ...
... Table 3. The output state can be understood as follows: if the immediately measured power is 150 MW which is greater than the greatest generated power, this situation will lead to the collapse of electrical voltage [6,7] and the sign "NO" appears in the red rectangle of conversation window "Black out" as shown in figure 4. Fig. 4. Figure 5 shows the situation when the load is equal to or less than the peak power supply which was 100 MW in this case. For example, if the loading of that network is 85 MW, therefore the program will refer to turning on the situation by the sign "YES" in green rectangle representing the system stability. . ...
In the present work, the electric voltage stability at Muharda station in Syria was studied during the normal and up to normal loading states. The results were obtained using artificial neural network, which consists of three layers (input-hidden-output). This network is characterized by the speed and accuracy in processing before failure and supply turn-off, which may lead to economical problems. This study was carried out using two different generating schemes in this station (single double). The performance of this network consists of two stages: training stage (off-line) and testing stage (on-line), and a comparison between these stages is carried out, which leads to optimization the load in testing cases depending on the training data.
... Artificial neural networks (ANNs) [10] [11] are, among the tools capable of learning from examples, those with the greatest capacity for generalization, because they can easily manage situations not foreseen during the learning phase. These are computational models that are directly inspired by brain function and are at an advanced stage of research. ...
... The networks of the second type allow, instead, under certain conditions, to identify also less defined objects. The condition to obtain this result is the use, in the algorithms of Figure 7 and Figure 10, of an activation function of sigmoidal type (hyperbolic tangent) [11]: this type of net gives a greater relief to the pixels of the weak objects detaching them from the background. ...
- Crescenzio Gallo
- Vito Capozzi
Machine learning consists in the creation and development of algorithms that allow a machine to learn itself, gradually improving its behavior over time. This learning is more effective, the more representative is the features of the dataset used to describe the problem. An important objective is therefore the correct selection (and, possibly, reduction of the number) of the most relevant features, which is typically carried out through dimensional reduction tools such as Principal Component Analysis (PCA), which is not linear in the more general case. In this work, an approach to the calculation of the reduced space of the PCA is proposed through the definition and implementation of appropriate models of artificial neural network, which allows to obtain an accurate and at the same time flexible reduction of the dimensionality of the problem.
... Numerous tutorials exist on how to build and utilize NNs across various domains. The reader is directed to the references for further clarification: Gallo (2015), Jain et al. (1996), Walczak and Cerpa (1999), and Wythoff (1993). ...
- Steven Walczak
Neural networks are a machine learning method that excel in solving classification and forecasting problems. They have also been shown to be a useful tool for working with big data oriented environments such as law enforcement. This article reviews and examines existing research on the utilization of neural networks for forecasting crime and other police decision making problem solving. Neural network models to predict specific types of crime using location and time information and to predict a crime's location when given the crime and time of day are developed to demonstrate the application of neural networks to police decision making. The neural network crime prediction models utilize geo-spatiality to provide immediate information on crimes to enhance law enforcement decision making. The neural network models are able to predict the type of crime being committed 16.4% of the time for 27 different types of crime or 27.1% of the time when similar crimes are grouped into seven categories of crime. The location prediction neural networks are able to predict the zip code location or adjacent location 31.2% of the time.
... The subsection of data can be random, defining a priori the percentage of units to insert in each group [36]. The most used solution is 70% for training and 30% for testing, especially for large datasets [37]. ...
Artificial neural networks (ANNs) are a valid alternative predictive method to the traditional statistical techniques currently used in many research fields where a massive amount of data is challenging to manage. In environmental analysis, ANNs can analyze pollution sources in large areas, estimating difficult and expensive to detect contaminants from other easily measurable pollutants, especially for screening procedures. In this study, organic micropollutants have been predicted from heavy metals concentration using ANNs. Sampling was performed in an agricultural field where organic and inorganic contaminants concentrations are beyond the legal limits. A critical problem of a neural network design is to select its parametric topology, which can prejudice the reliability of the model. Therefore, it is very important to assess the performance of ANNs when applying different types of parameters of the net. In this work, based on Taguchi L12 orthogonal array, turning experiments were conducted to identify the best parametric set of an ANNs design, considering different combinations of sample number, scaling, training rate, activation functions, number of hidden layers, and epochs. The composite desirability value for the multi-response variables has been obtained through the desirability function analysis (DFA). The parameters' optimum levels have been identified using this methodology.
... where z is the output from the neuron, x i is the input value, w i is the connection weight, d is the bias value, and f is the activation function (Alshihri et al., 2009). ANN is called Multi-Layer Perceptron (MLP) because of having an input layer, one or more intermediate layers, and an output layer (Gallo, 2015). Feedforward Neural Networks (FNN) is a part of a multi-layer perceptron (MLP) that is trained by using the back-propagation (BP) algorithm (Alemu et al., 2018). ...
Amidst a world of never-ending waste production and waste disposal crises, scientists have been working their ways to come up with solutions to serve the earth better. Two such commonly found trashes deteriorating the environment are glass and tin can waste. This study aims to investigate the comparative suitability of response surface methodology (RSM) and artificial neural network (ANN) in predicting the mechanical strength of concrete prepared with fine glass aggregate (GFA) and condensed milk can fibers (CMCF). An experimental scheme has been designed in this study with two input variables as GFA and CMCF, and two output variables as compressive and splitting tensile strength. The results show that both variables influenced the compressive and splitting tensile strength of concrete at 7, 28, and 56 days (p< 0.01). The maximum compressive and splitting tensile strength was found at 20% GFA with 1% CMCF and 10% GFA with 0.5% CMCF, respectively. The model predicted values in both techniques were in close agreement with corresponding experimental values in all cases. The results of different statistical parameters in terms of correlation, coefficient of determination, chi-square, mean square error, root mean square error, mean absolute error, standard error prediction indicate the functionality of both modeling approaches for concrete strength prediction. However, RSM models yield better accuracy in simulating the compressive and splitting tensile strength of concrete than ANN models.
... where z is the output from the neuron, x i is the input value, w i is the connection weight, d is the bias value, and f is the activation function (Alshihri et al., 2009). ANN is called Multi-Layer Perceptron (MLP) because of having an input layer, one or more intermediate layers, and an output layer (Gallo, 2015). Feedforward Neural Networks (FNN) is a part of a multi-layer perceptron (MLP) that is trained by using the back-propagation (BP) algorithm (Alemu et al., 2018). ...
An enormous volume of data, known as Big Data, of varied properties, is continuously being generating from several sources. For efficient and consequential use of this huge amount of data, automated and correct categorization is very important. The precise categorization can find the correlations, hidden patterns, and other valuable insights. The process of categorization of mixed heterogeneous data is known as data classification and is done based on some predefined features. Various algorithms and techniques are proposed for Big Data classification. This chapter attempts to discuss various technicalities of Big Data classification, comprehensively. To start with, the basics of Big Data classifications such as need, types, patterns, phases, approaches, etc. are explained aptly. Different classification techniques, including traditional, evolutionary, and advanced machine learning technique, are discussed with suitable examples, along with citing their advantages and disadvantages. Finally, a survey of various open-source and commercial libraries, platforms, and tools for Big Data classification is presented.
- Uma Maheswari Sadasivam
- Nitin Ganesan
Fake news is the word making more talk these days be it election, COVID 19 pandemic, or any social unrest. Many social websites have started to fact check the news or articles posted on their websites. The reason being these fake news creates confusion, chaos, misleading the community and society. In this cyber era, citizen journalism is happening more where citizens do the collection, reporting, dissemination, and analyse news or information. This means anyone can publish news on the social websites and lead to unreliable information from the readers' points of view as well. In order to make every nation or country safe place to live by holding a fair and square election, to stop spreading hatred on race, religion, caste, creed, also to have reliable information about COVID 19, and finally from any social unrest, we need to keep a tab on fake news. This chapter presents a way to detect fake news using deep learning technique and natural language processing.
- Fahed Mostafa
- Tharam S. Dillon
- Elizabeth Chang
In the 1960s, neural networks were one of the most promising and active areas of research. Neural networks were applied to a variety of problems in the hope of achieving major breakthroughs by discovering relationships within the data.
Introduction: Artificial Neural Networks (ANN) are inspired by the way biological neural system works, such as the brain process information. The information processing system is composed of a large number of highly interconnected processing elements (neurons) working together to solve specific problems. ANNs, just like people, learn by example. Similar to learning in biological systems, ANN learning involves adjustments to the synaptic connections that exist between the neurons.
Source: https://www.researchgate.net/publication/261392616_Artificial_Neural_Networks_tutorial
Posted by: aleidaaleidaruembachea0252123.blogspot.com
Post a Comment for "Neural Network Tutorial Pdf Free Download"