Neural network implementation and teaching

The task is to help understanding how the neural networkswork and how to train them The task is not designed to provide competitive software market neural network implementation is prepared and is not intended tasks solved by the use of any practical use.

Knowledge required to solve the problem: construction of an artificial neuron structure and calculating the output of the multi-layer perceptron (neural network), error back propagation algorithm, teaching and test sample set, error calculation and cost function minimization gradient method.

The task

1. Create a program NNSolutionOne that initializes a specific neural network architecture weights and bias values. The weights initialization should use values of zero mean, 0.1 standard deviation normal distribution random numbers. The bias values ​​are initialised with 0. (1 point)

2. Create a program NNSolutionTwo that receives as input neural network (MLP) architecture description, neural network calculates the output based on input values and weights! (2 points)

3. Implement the error back propagation algorithm, and complete the program NNSolutionThree (2 points)

4. Implement the neural network teaching. Complete the program NNSolutionFour called so that the neural network is able to input from the architecture description, teaching parameters, neural network weights, and based on a set of training samples train and validate the network. (2 points)

5. The spam email classification issued teacher sample set based on the job and using part 1 and 4 to create and teach a neural network that is able to classify defects as small as possible tanítómintákhoz similar emails. The learning parameters and neural network architecture and the initial weights are free to choose. They are necessary for the proper configuration task successfully resolve (see details). The task of monitoring the trained neural network nn_solution_five.txt should be sent under the name of which will be evaluated in the second sub-task solution. (8 points - when all previous job was also part of a solution)

Guide to solve the problem

Reviewing the above recommended guidelines, since the task is to understand and greatly facilitate the solution.

Input

The programs of the standard input read their input. The problem, according to several different types of input parts is possible.

architecture

The neural network architecture is described. The line in whole numbers (2-10 pieces), separated by commas, which indicate the size of the neural network layer. The first value is the number of input dimensions N (1-100) and then come to the hidden layers L i (1-100) neuronszámai, then the last output value is the number of dimensions M (1-100).

Example:

2,3,2,1 /

Neural network weights

The neural network weights are delivered in the form. Each neuron weights in a row for the values ​​listed, separated by commas. Each neuron in the previous layer size for the weight value + 1, where i. the value of the previous layer. neuronjának output of the weighting, the last value is the bias. The following sequence of layers of neurons according to the direction from the input to the output, including the output layer.

Example:

0.1,0.2,0
-0.3,0.4 -1
-1.1,2,0.5 /

Input values, output values, and teaching model

Description of the neural network inputs and the tanítómintáinak looks like. The first row is an integer s (1 to 10,000) at which the number of input sample values or teacher.

The next S line includes the input values or teaching pattern, depending on the task. In case of input values (without the desired answer) all lines (see above, n is an integer) is an N-element vector, that is exactly N real numbers, separated by commas.

In case of a teacher than N + M contains every real number line. The first of these N input values of the training pattern, the remaining value of M is reported to be part of the output.

If the task is part of a neural network output of waiting for results, the output after the S value should be the next S line, the input the correct order, the inputs to specific outputs that each line exactly one M-element vector, ie, M fair value, separated by commas.

teaching parameters

The parameters of the teaching process describes a series. In the following three values listed, separated by commas: The number of such teaching Epoch (1-10000), which means that the number of times the entire teaching kit used in the teaching, the μ courage factor, which is a measure of weight changes or R (0 -1) is a real number, which the teacher sample rate for all samples. The balance will be validation samples (see section 4 sub-tasks).

Output (standard output, stdout)

The programs are written to the standard output of their output. The task according to several parts of different types of potential output. The output formats are the same as the input formats.

Mean squared errors

The doctrine of the validation samples measured by stock-generalization of the network capacity. Teaching is precisely this epoch. mean square error in the validation set is calculated at the end of each epoch and kiírni a new line. That is precisely this sort to be announced and one of them is exactly one real number.

The task details

NNSolutionOne

The program given in the order of the inputs: Architecture

The program should print the output in the following order: architecture ,weights

The weights of the neural network initialization is a very important issue. Think of it into what would happen if all the weight would be the same value initialized: there would be no difference between individual neurons, since all the neurons in the same way, all outputs of the previous layer of the same values ​​weighted by counting up their own output, so it would not be among them differences . The gradient method is also not be able to distinguish between them, the same weight changes hajtanánk implement them. For this reason, it is important that all the neuron weighs all the different weight values ​​to be initialized.

the size of the initial value of the weights is strongly influenced by the teachings of the speed of convergence. It initialize the weights so that the weighted sum of the activation function nonlinear stage of near falls. This is a rule of thumb the expected value of 0, 0.1-inch deviation in normal distribution of random numbers initialization. Of course, this is only appropriate if the distribution of the input sample vector elements is also expected value of 0 and unit variance. The checks on the input samples and normalizálásától this problem aside. With this method, there are more sophisticated initialization procedures, which are beneficial (especially in case of networks with many hidden layers Xavier initialization ), but do not deal with them now.

Example Input and output are associated:

2,3,1 / 2,3,1
0.07694805, -0.04505534,0.0
0.07036076, -0.008341249,0.0
-0.064064674,0.1687909,0.0
0.049475513,0.05999728,0.059059735,0.0

NNSolutionTwo

The program receives the following sequence of inputs: architecture , weights , inputs

The program should print the output of the order of output values

The output of neural network calculation, and other tasks to be executed in case of the hidden layer neurons Relu (f (x) = max (0, x)) activation function is applied, the use of linear neurons in the output layer.

Example Input and output are associated:

2,3,1
1.0 -0.5
0.1 -0.5
1,1, -1
2.2 -2.0
4
0.0
0.1
1.0
1.1 / 4
0.0
1.0
1.0
0.0

NNSolutionThree

The program given in the order of the inputs: Architecture (M = 1), weights , input (S = 1)

The program should print the output in the following order: architecture ,weights - Weight Bias and the site of the partial derivatives

The task is to calculate partial derivatives of a neural network in the specified input the output of the weights and bias values ​​according to. The output of neural network calculation be thought as a function y = MLP (x, w, b). This function should be calculated in this part of the task of partial derivatives of w and b (weights and bias) recorded at the value of x. These will be used in the next part of the problem of weight adjustment for the partial derivatives.

The calculation of partial derivatives analytically definitely perform. Take advantage of the chain rule: for faster calculations store intermediate calculation results are in accordance with the backpropagation algorithm.

The finite difference method to verify that the proper analytical calculations e. This is optional, but recommended. Relu the derivative does not exist at x = 0 place. In practice, this is not an issue: for the sake of clarity it should be Relu '(0) = 0th

Example Input and output are associated:

2,3,1
1.0 -0.5
0.1 -0.5
1,1, -1
2.2 -2.0
1
0.75,0.75 / 2,3,1
1.5,1.5,2.0
1.5,1.5,2.0
-1.5, -1.5, -2.0
0.25,0.25,0.5,1.0

NNSolutionFour

The program receives the following sequence of inputs: teaching parameters , architecture , weights , input patterns

The program should print the output in the following order: mean squared errors , architecture ,weights

In the course of teaching using only a portion of the input samples for training. This part is the number of samples (S) and the teacher sample rate (R) multiplied throughout the lower part of S t = floor (S * R). As a teacher of input patterns, select the first piece of S t. The balance called. Validation samples are S v = S - S t.

The teacher pattern line contact in a row, epochonként exactly once. The weight change of the delta rule, do all tanítómintára one by one, that is, the partial derivative of the error weights brave resistance factor times the weight according to change. This is the easiest, so-called. stochastic gradient algorithm has certain drawbacks, however, to learn more sophisticated methods (eg. minibatch) is not possible due to the current limited time.

The epoch at the end, so when the S t teacher sample all of the completed an educational step, validate the neural network of the validation samples: Calculate the net output of each validation sample, then calculate the mean squared error, a validation based on the validation samples required response samples (S v) and averaging the outputs (M) above.

Example Input and output are associated:

1,0.1,0.5
2,3,1
1.0 -0.4
0.1 -0.5
0.9,1, -1
2,1.9, 2.0
8
0,0,0
0,1,1
1,0,1
1,1,0
0.2,0.2,0
0.2,0.8,1
0.8,0.2,1
0.8,0.8,0 / 0.17536075
2,3,1
0.8978304, -0.018169597, -0.5021696
-0.017527387,1.0014727, -0.4985274
0.9184014,1.0184015, -0.98159856
1.9708253,1.9000499, -2.0082808, -0.041200735

The above example is a detailed solution.

nn_solution_five.txt

The file must contain the trained neural network parameters in the following order: architecture , weights
(practically part of the job 4. output end)

The published data set (training set) is derived from base data set of the classic spam, email spam is a classification problem. The task is to determine the e-mails based on specific characteristics of that particular e-mail spam or not. The properties, which are based on the classification performed by us are not important, but who cares what these qualities in here to read. Spam base data set weboldalda found here.

The file containing the list of all the 4500 data sets in a single e-mail contains 57 properties, separated by commas (first 57 column). The last, 58th column, e-mail is entered in the class that tells you whether the email spam email whether or not. Teaching is a neural network of the task that is able to get better results classified according to these properties as well as e-mails, which are not included in the training set.

The solution is to check the use of some test samples are not included in the training kit to test the injected neural network. A maximum of 8 points for some part available. The resulting score depends on the performance of the neural network. 22% defective rate classification 1 point for every additional 2% improvement is worth +1 point. Up to 8% or less than the maximum error rate is available in eight points.

Tips for solution

The task is not difficult to count on the resolve of the network size and parameters of the choice of teaching but after several attempts may be required. Examine the neural networks of various sizes (weights number: 10-10000 between the hidden layer spread over 1-3) and teach them the training patterns.

The teaching mode, make sure the over learning / túlilleszkedés phenomenon. To do this, divide the teaching and validation samples, the samples were issued, for example, 80% -20% range. The resulting samples teachers teach, the validation samples should not be used to verify the performance of the network used, but in teaching them to teach. If you notice that the validation samples not decrease the error rate, the network is probably over learning phase, in which case you should stop teaching because the network does not get better általánosítóképessége and unfortunate event even decrease.

Typically 250-5000 epoch may be required to teach. This value depends on the courage of factors as well. Such a long teaching a few minutes depending on the speed of your computer and the implementation of efficiency - can take fifteen minutes time.

The value of the courage factor when choosing keep two things in mind: too little courage factor is unreasonably slow in teaching. Courage is too high by a factor of up to diverge teaching as well (increasing error values, and then NaN). A typical example of courage factor: 0.01.

Example file be sent to:

57,2,1
1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,0
2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 2,2,2,2,2,2,2,0
1, -0.5,0

Evaluation

The administration solutions automatically detect and within a short time feedback on the helyességükről or errors. Then it is possible to repair and to submit a new program. Perfect solutions are worth 15 points. Any number of solutions delivered with the task deadline. The score was last sent solution will be the final score that matters at the end of the semester ticket.

technical Details

The program shall be written in Java, and HF portal must be submitted by the specified deadline.

The submitted Java source code must retain the same classes with names program names (ie not: NNSolutionOne, etc ...), which can be found in the sub-task-solving main function. The program may consist of any number of source file. The program can not be used in addition to the standard inputs and outputs of any other resources, so you do not perform file operations and may not open network connections.

To administer the program source code, describing some of the subtasks fifth neural network nn_solution_five.txt compressed into a zip file. The programs are automatically tested several different input, ie several different neural network architectures and teacher kit. must be equal to the reference values ​​within a neural network implementation is presented in the output values ​​of the weight error.

The source code must be submitted both by yourself, you should not use any external source (including another student's work, class libraries web). An exception is the Apache Commons Math library, which is used for linear algebra operations. To use this library should be sent the necessary classes source code, rather than the .jar file.

Attention! The solutions submitted at the end of the semester plagiarism will be subject to search. Any doubt or in part, the same author of the same solution will be denied the object of the signature.

Engedy Stephen Thomas, 5 September 2016 10:54 | Last updated: October 2 2016, 20:18

Artificial Intelligence

  • Official Website Staff
  • Presentation materials and notes
  • Results
  • Homeworks
  • Requirements

© 2010-2016 BME MIT | Report an error | User Guid