International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 5 ANN for Predicting MPG Automobile Mohammed N. Jamala Department of Information Technology, Faculty of Engineering & Information Technology, Al-Azhar University Gaza, Palestine Abstract: During the course of this research, imposing the training of an artificial neural network to predicate the MPG rate for present thru forthcoming automobiles in the foremost relatively accurate evaluation for the approximated number which foresight the actual number to help through later design and manufacturing of later automobile, by training the ANN to accustom to the relationship between the skewing of each later stated attributes, the set of mathematical combination of the sequences that could be excavate the Miles Per Gallon(MPG) by the system and using both the Gradient Descent Algorithm and the Normalized Square Error Technique explicitly lure the Final Parameter Norm and Scaling layer and Bounded Layering rules implicitly. And so on the system should be able to produce immune approximations `and calculations to make better of results of What the Actual output estimation. Keywords: ANN, artificial neural network, automobile, fuel efficiency, predict fuel consumption 1. INTRODUCTION While the thermal efficiency (mechanical output to chemical energy in fuel) of petroleum engines has increased since the beginning of the automotive era to a current maximum of 36.4% this is not the only factor in fuel economy. The design of automobile as a whole and usage pattern affects the fuel economy. Published fuel economy is subject to variation between jurisdictions due to variations in testing protocols. One of the first studies to determine fuel economy in the United States was the Mobil Economy Run, which was an event that took place every year from 1936 (except during World War II) to 1968. It was designed to provide real fuel efficiency numbers during a coast to coast test on real roads and with regular traffic and weather conditions. The Mobil Oil Corporation sponsored it and the United States Auto Club (USAC) sanctioned and operated the run. In more recent studies, the average fuel economy for new passenger car in the United States rose from 17 mpg in 1978 to more than 22 mpg in 1982.2The average fuel economy in 2008 for new cars, light trucks and SUVs in the United States was 26.4 mpgUS (8.9 L/100 km).3 2008 model year cars classified as "midsize" by the US EPA ranged from 11 to 46 mpgUS(21 to 5 L/100 km)4 However, due to environmental concerns caused by CO2 emissions, new EU regulations are being introduced to reduce the average emissions of cars sold beginning in 2012, to 130 g/km of CO2, equivalent to 4.5 L/100 km (52 mpgUS, 63 mpgimp) for a diesel-fueled car, and 5.0 L/100 km (47 mpgUS, 56 mpgimp) for a gasoline (petrol)-fueled car[5]. The average consumption across the fleet is not immediately affected by the new vehicle fuel economy: for example, Australia's car fleet average in 2004 was 11.5 L/100 km (20.5 mpgUS) [6], compared with the average new car consumption in the same year of 9.3 L/100 km (25.3 mpgUS)[7]. Problem Statement In this project we will train an Artificial Neural Network (ANN) to give better estimates for MPG rates all over the automobile industry. The learning method involved is feed-forward learning. Such calculation would help to reduce the efforts needed to design and analyze automobile fuel consumption with such limited factors "attributes (cylinders, displacement factor, horsepower, weight, acceleration, model year and origins of manufacturing). 2. ARTIFICIAL NEURAL NETWORKS "ANN" The historical prospect Warren McCulloch and Walter Pitts [8] (1943) created a computational model for neural networks based on mathematics and algorithms called threshold logic. This model paved the way for neural network research to split into two approaches. One approach focused on biological processes in the brain while the other focused on the application of neural networks to artificial intelligence. This work led to work on nerve networks and their link to finite automata[9]. Artificial Neural Network; International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 6 Artificial Neural Networks are computing algorithms that can solve complex problems imitating animal brain processes in a simplified manner[10]. Perceptron-type neural networks consist of artificial neurons or nodes, which are information processing units arranged in layers and interconnected by synaptic weights (connections). Neurons can filter and transmit information in a supervised fashion in order to build a predictive model that classifies data stored in memory. Typical ANN model is a three-layered network of interconnected nodes: the input layer, the hidden layer, and the output layer. The nodes between input and output layers can form one or more hidden layers. Every neuron in one layer has a link to every other neuron in the next layer, but neurons belonging to the same layer have no connections between them (Figure 1). The input layer receives information from the outside world, the hidden layer performs the information processing and the output layer produces the class label or predicts continuous values. The values from the input layer entering a hidden node are multiplied by weights, a set of predetermined numbers, and the products are then added to produce a single number. This number is passed as an argument to a nonlinear mathematical function, the activation function, which returns a number between 0 and 1 [11]. Figure 1. Neural network architecture. Figure 2. Neural network active node. In Fig.2, the net sum of the weighted inputs entering a node j and the output activation function that converts n neuron's weighted input to its output activation (the most commonly used is the sigmoid function). arc given by the following equations respectively. The neuron. and therefore the ANN. has two modes of operation. the training mode and the using mode. During the training phase, a data set with actual inputs and outputs will he used as examples to teach the system how to predict outputs. This supervised learning begins with random weights and, by using gradient descent search algorithms like Backpropagntion, adjusts the weights to be applied to the task at hand. The difference between target output values and obtained values is used in the error function to drive learning [12]. The error function depends on the weights. which need to be modified in order to minimize the error. For a given International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 7 training set {(x1,t1), (x2,t2),***.(xk.tk)} consisting of k ordered pairs of n inputs and m dimensional vectors (n-inputs, m-outputs), which are called the input and output patterns, the error for the output of each neuron can he defined by the equation: E1 = 1/2(oj – tj)2. while the error function of the network that must be minimized is given by: while Oj is the output produced when the input pattern Xj from the training set enters the network. and t1 is the target value [l3]. During the training mode, each weight is changed adding to its previous value the quantity where ꝩ is a constant that gives the learning rule. The higher the learning rate, the faster the convergent will be. but the searching path may trapped around the optimal solution and convergence become impossible. Once a set of good weights have been found, the neural network model can take another dataset with unknown output values and predict automatic the corresponding outputs. 3. DEFINITIONS Cylinders A cylinder is the central working part of a reciprocating engine or pump, the space in which a piston travels[12]. Multiple cylinders are commonly arranged side by side in a bank, or engine block, which is typically cast from aluminum or cast iron before receiving precision machine work. Cylinders may be sleeved (lined with a harder metal) or sleeveless encouraged to gradually form congruent grooves by avoiding extreme operating conditions. Later in its life, after mechanical (with a wear-resistant coating such as Nikasil). A sleeveless engine may also be referred to as a "parent-bore engine" [13]. A cylinder's displacement, or swept volume, can be calculated by multiplying its cross-sectional area (the square of half the bore by pi) by the distance the piston travels within the cylinder (the stroke). The engine displacement can be calculated by multiplying the swept volume of one cylinder by the number of cylinders. Presented symbolically, A piston is seated inside each cylinder by several metal piston rings [12] fitted around its outside surface in machined grooves; typically, two for compressional sealing and one to seal the oil. The rings make near contact with the cylinder walls (sleeved or sleeveless), riding on a thin layer of lubricating oil; essential to keep the engine from seizing and necessitating a cylinder wall's durable surface. During the earliest stage of an engine's life, its initial breaking-in or running-in period, small irregularities in the metals are wear has increased the spacing between the piston and the cylinder (with a consequent decrease in power output) the cylinders may be machined to a slightly larger diameter to receive new sleeves (where applicable) and piston rings, a rocess sometimes known as reboring. 3.1 Displacement International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 8 Engine displacement is the swept volume of all the pistons inside the cylinders of a eciprocating engine in a single movement from top dead center (TDC) to bottom dead Centre (BDC). It is commonly specified in cubic centimeters (cc or cm3), liters (l), or cubic inches (CID). Engine displacement does not include the total volume of the combustion chamber. Engine displacement is determined from the bore and stroke of an engine's cylinders. The bore is the diameter of the circular chambers cut into the cylinder block. Horsepower Is a unit of measurement of power (the rate at which work is done). There are many different standards and types of horsepower. Two common definitions being used today are the mechanical horsepower (or imperial horsepower), which is about 745.7 watts, and the metric horsepower, which is approximately 735.5 watts. 3.2 Weight The weight of an object is related to the amount of force acting on the object, either due to gravity or to a reaction force that holds it in place. 1 lb= 0.4535923kg 3.3 Acceleration Is the rate of change of velocity of an object with respect to time. An object's acceleration is the net result of any and all forces acting on the object, as described by Newton's Second Law[14]. The SI unit for acceleration is meter per second squared (m s2). 3.4 Model Year The year the car were built in Gregorian. 3.5 Origin The Country which manufacturing the automobile. 3.6 MPG The fuel economy of an automobile is the relationship between the distance traveled and the amount of fuel consumed by the vehicle. Consumption can be expressed in terms of volume of fuel to travel a distance, or the distance travelled per unit volume of fuel consumed. Since fuel consumption of vehicles is a significant factor in air pollution, and since importation of motor fuel can be a large part of a nation's foreign trade, many countries impose requirements for fuel economy. Different methods are used to approximate the actual performance of the vehicle. The energy in fuel is required to overcome various losses (wind resistance, tire drag, and others) encountered while propelling the vehicle, and in providing power to vehicle systems such as ignition or air conditioning. Various strategies can be employed to reduce losses at each of the conversions between the chemical energy in the fuel and the kinetic energy of the vehicle. Driver behavior can affect fuel economy; maneuvers such as sudden acceleration and heavy braking waste energy. International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 9 Fuel economy is the relationship between the distance traveled and fuel consumed. Fuel economy can be expressed in two ways: Units of fuel per fixed distance Generally expressed as liters per 100 kilometers (L/100 km), used in most European countries, China, South Africa, Australia and New Zealand. British and Canadian law allow for the use of either liters per 100 kilometers or miles per imperial gallon.15 Recently, the window sticker on new US cars has started displaying the vehicle's fuel consumption in US gallons per 100 miles, in addition to the traditional MPG number[16]. The Corporate Average Fuel Economy (CAFE) regulations in the United States, first enacted by Congress in 1975,17 are federal regulations intended to improve the average fuel economy of cars and light trucks (trucks, vans and sport utility vehicles) sold in the US in the wake of the 1973 Arab Oil Embargo. Historically, it is the sales-weighted average fuel economy of a manufacturer's fleet of current model year passenger cars or light trucks, manufactured for sale in the United States. Under Truck CAFE standards 2008–2011 this changes to a "footprint" model where larger trucks are allowed to consume more fuel. The standards were limited to vehicles under a certain weight, but those weight classes were expanded in 20S'. 4. MATERIALS AND DESIGN METHOD The EasyNN-Plus program was used to develop the ANN model. The structure of the neural network was set as feedforward, in which the output layer connects only to the previous layer. ANN training used 80% of the 398 cases and 20% of cases were selected for the validation set. All cases were randomly selected by the EasyNN-Plus program. The parameters considered the input layer for the neural network training can be seen in the following adjacent figures, combined with other parameters describing the combinations of the tic-tac-toe game regression. A total of eight input parameters were used in the development of the ANN model (see Appendix). Several configurations of ANN were tested in other to find the best performing combination of number of hidden layers and nodes per layer. In all configurations, Eq. was used as activation function to smooth the output signal of each node: International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 10 where x is the sum of the weighted input of each previous node plus the bias of the node itself. ANN results were evaluated based on the coefficient of determination of all strategies possible to the game mind, the mean bias error - The role of uncertainty in the ANN predictions was investigated with the support of additional ANNs. In these additional ANN, the uncertain parameters were not included as input variable. The accuracy of these ANNs was then compared to the accuracy of the original ANN (which includes all eight input parameters). Once the role of uncertainty was clarified, the sensitivity of the ANN to different input parameters was investigated. For this purpose, ANN was used to calculate 1 additional variation for each one of the 937 original simulation cases. In these variations, only one parameter was change at a time using the 8 discreet input values. This procedure allowed the calculation of individual changes in the single output due to individual changes in each one of the inputs. Uncertainty was not taken into account in the sensitivity analysis. Technique and Description Gradient Descent Formula was used in terms of measuring and evaluating each suitable move would and could be used: Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point. If instead one takes step proportional to the positive of the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent. The Gradient Descent formula 5. DATA ANALYSIS (USING NEURAL DESIGNER) 5.1 Data Set 5.1.1 Task description The data set contains the information for creating the predictive model. It comprises a data matrix in which columns represent variables and rows represent instances. Variables in a data set can be of three types: The inputs will be the independent variables; the targets will be the dependent variables; the unused variables will neither be used as inputs nor as targets. Additionally, instances can be: Training instances, which are used to construct the model; selection instances, which are used for selecting the optimal order; testing instances, which are used to validate the functioning of the model; unused instances, which are not used at all. 5.1.2 Data preview table International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 11 The next table shows a preview of the data matrix contained in the file PAutoMpg.xlsx. Here, the number of variables is 8, and the number of instances is 477. 5.1.3 Variables table The following table depicts the names, units, descriptions and uses of all the variables in the data set. The numbers of inputs, targets and unused variables here are 7, 1, and 0, respectively. 5.1.4 Variables bars chart The next chart illustrates the variables use. It depicts the numbers of inputs (7), targets (1) and unused variables (0). 5.1.5 Instances pie chart International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 12 The following pie chart details the uses of all the instances in the data set. The total number of instances is 477. The number of training instances is 287 (60.2%), the number of selection instances is 95 (19.9%), the number of testing instances is 95 (19.9%), and the number of unused instances is 0 (0%). 5.1.6 Missing values results There are not missing values in the data set. 5.2 Data statistics 5.2.1 Task description Basic statistics are a very valuable information when designing a model, since they might alert to the presence of spurious data. It is a must to check for the correctness of the most important statistical measures of every single variable. 5.2.2 Data statistics results The table below shows the minimums, maximums, means and standard deviations of all the variables in the data set. 5.3 Data histograms 5.3.1 Task description Histograms show how the data is distributed over its entire range. In approximation problems, a uniform distribution for all the variables is, in general, desirable. If the data is very irregularly distributed, then the model will probably be of bad quality. 5.3.2 MPG histogram The following chart shows the histogram for the variable mpg. The abscissa represents the centers of the containers, and the ordinate their corresponding frequencies. The minimum frequency is 7, which corresponds to the bin with center 44.25. The maximum frequency is 113, which corresponds to the bin with center 16.05. International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 13 5.3.3 Cylinders' histogram The following chart shows the histogram for the variable cylinders. The abscissa represents the centers of the containers, and the ordinate their corresponding frequencies. The minimum frequency is 3, which corresponds to the bin with center 5. The maximum frequency is 240, which corresponds to the bin with center 4. 5.3.4 Displacement histogram The following chart shows the histogram for the variable displacement. The abscissa represents the centers of the containers, and the ordinate their corresponding frequencies. The minimum frequency is 11, which corresponds to the bin with center 430.8. The maximum frequency is 162, which corresponds to the bin with center 92.19. 5.3.5 Horsepower histogram The following chart shows the histogram for the variable horsepower. The abscissa represents the centers of the containers, and the ordinate their corresponding frequencies. The minimum frequency is 7, which corresponds to the bin with center 14.38. The maximum frequency is 168, which corresponds to the bin with center 100.6. International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 14 5.3.6 Weight histogram The following chart shows the histogram for the variable weight. The abscissa represents the centers of the containers, and the ordinate their corresponding frequencies. The minimum frequency is 10, which corresponds to the bin with center 4920. The maximum frequency is 112, which corresponds to the bin with center 2274. 5.3.7 Acceleration histogram The following chart shows the histogram for the variable acceleration. The abscissa represents the centers of the containers, and the ordinate their corresponding frequencies. The minimum frequency is 4, which corresponds to the bin with center 23.75. The maximum frequency is 160, which corresponds to the bin with center 15.35 5.3.8 Model year histogram The following chart shows the histogram for the variable model year. The abscissa represents the centers of the containers, and the ordinate their corresponding frequencies. The minimum frequency is 28, which corresponds to the bin with center 72.25. The maximum frequency is 89, which corresponds to the bin with center 70.75. 5.3.9 Origin histogram International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 15 The following chart shows the histogram for the variable origin. The abscissa represents the centers of the containers, and the ordinate their corresponding frequencies. The minimum frequency is 84, which corresponds to the bin with center 2. The maximum frequency is 298, which corresponds to the bin with center 1 5.4 NEURAL NETWORK 5.4.1 Task description The neural networks represent the predictive model. In Neural Designer neural networks allow deep architectures, which are a class of universal approximated. 5.4.2 Scaling layer The size of the scaling layer is 7, the number of inputs. The scaling method for this layer is the Automatic. The following table shows the values which are used for scaling the inputs, which include the minimum, maximum, mean and standard deviation. 5.4.3 Neural network The number of layers in the neural network is 4. The following table depicts the size of each layer and its corresponding activation function. The architecture of this neural network can be written as 7:3:3:5:1. 5.4.4 Neural network parameters The following table shows the statistics of the parameters of the neural network. The total number of parameters is 62. International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 16 5.4.5 Un-scaling layer The size of the un-scaling layer is 1, the number of outputs. The un-scaling method for this layer is the minimum and maximum. The following table shows the values which are used for scaling the inputs, which include the minimum, maximum, mean and standard deviation. 5.4.6 Bounding layer The size of the bounding layer is 1, the number of outputs. The following table shows the values of the bounds for each output. 5.4.7 Outputs table The number of outputs is 1. The next table depicts some basic information about them, including the name, the units and the description. 5.4.8 Neural network graph A graphical representation of the network architecture is depicted next. It contains a scaling layer, a neural network and an unscaling layer. The yellow circles represent scaling neurons, the green circles the principal components, the blue circles perceptron neurons, the red circles un-scaling neurons, and the purple circles bounding neurons. The number of inputs is 7, the number of principal components is 7, and the number of outputs and bounding neurons are 1. The complexity, represented by the numbers of hidden neurons, is 3:3:5 5.4.9 Loss index 5.4.9.1 Task Description The loss index plays an important role in the use of a neural network. It defines the task the neural network is required to do, and provides a measure of the quality of the representation that it is required to learn. The choice of a suitable loss index depends on the particular application. 5.4.9.2 Error method International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 17 The normalized squared error is used here as the error method. It divides the squared error between the outputs from the neural network and the targets in the data set by a normalization coefficient. If the normalized squared error has a value of unity then the neural network is predicting the data 'in the mean', while a value of zero means perfect prediction of the data. 5.4.9.3 Regularization method The neural parameters norm is used as the regularization method. Is applied to control the complexity of the neural network by reducing the value of the parameters. The following table shows the weight of this regularization term in the loss expression. 5.4.9.4 Training algorithm The training algorithm chosen for this application is gradient descent. With this method, the neural parameters are updated in the direction of the negative gradient of the loss function. 5.4.10 Training 5.4.10.1 Task description The procedure used to carry out the learning process is called training (or learning) strategy. The training strategy is applied to the neural network in order to obtain the best possible loss. The type of training is determined by the way in which the adjustment of the parameters in the neural network takes place. 5.4.10.2 Training algorithm Gradient descent is used here for training. With this method, the neural parameters are updated in the direction of the negative gradient of the loss function. 5.4.10.3 Gradient descent losses history The following plot shows the losses in each iteration. The initial value of the training loss is 0.433694, and the final value after 1000 iterations is 0.171441. The initial value of the selection loss is 0.481546, and the final value after 1000 iterations is 0.167606. International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 18 5.4.10.4 Gradient descent results In the next some important results from training are shown. They include some final states from the neural network, the loss functional and the training algorithm. 5.5 Target balancing 5.5.1 Task description This task balances the distribution of targets in a data set for function regression. It un uses a given percentage of the instances whose values belong to the most populated bins. After this process, the distribution of the data will be more uniform and, in consequence, the resulting model will probably be of better quality. 5.5.2 mpg histogram The percentage of unused instances has been 10%, which corresponds to 47 instances. The following chart shows the histogram for the target variable mpg. The abscissa represents the centers of the containers, and the ordinate their corresponding frequencies. The minimum frequency is 6, which corresponds to the bin with center 41. The maximum frequency is 71, which corresponds to the bin with center 22.2. 5.6 Correlation matrix 5.6.1 Task description This task calculates the absolute values of the linear correlations among all inputs. The correlation is a numerical value between 0 and 1 that expresses the strength of the relationship between two variables. When it is close to 1 it indicates a strong relationship, and a value close to 0 indicates that there is no relationship. 5.6.2 Correlation matrix The following table shows the absolute value of the correlations between all input variables. The minimal correlation is 0.137324 between the variables model year and origin. The maximal correlation is 0.951917 between the variables cylinders and displacement. International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 19 5.7 Linear correlations 5.7.1 Task description It might be interesting to look for linear dependencies between single input and single target variables. This task calculates the absolute values of the correlation coefficient between all inputs and all targets. Correlations close to 1 mean that a single target is linearly correlated with a single input. Correlations close to 0 mean that there is not a linear relationship between an input and a target variable. Note that, in general, the targets depend on many inputs simultaneously. 5.7.2 Linear correlations The following table shows the absolute value of the linear correlations between all input and target variables. The maximum correlation (0.83584) is yield between the input variable weight and the target variable mpg. 5.7.3 mpg bars chart The next chart illustrates the dependency of the target mpg with all the input variables. 5.8 Repeated instances 5.8.1 Task description Repeated instances are those rows in the data matrix having the same values than as rows. Repeated instances are those rows in the data matrix having the same values as other rows. They provide redundant information to the model and should not be used for training, selection or testing. 5.8.2 Repeated instances table The number of repeated instances in the data set is 75. They have been set to 'Unused'. The next table lists the indices of all the repeated instances, arranged in rows of ten. International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 20 5.9 Instances splitting 5.9.1 Task description When designing a predictive model, the general practice is to first divide the data into three subsets. The first subset is the training set, which is used for constructing different candidate models. The second subset is the selection set, which is used to select the model exhibiting the best properties. The third subset is the testing set, which it is used for validating the final model. 5.9.2 Instances pie chart The following pie chart details the uses of all the instances in the data set. There are 142 instances for training (29.8%), 46 instances for selection (9.64%), 46 instances for testing (9.64%) and 243 unused instances (50.9%). 5.10 Inputs importance 5.10.1 Task description This task calculates the selection loss when removing one input at a time. This shows which input have more influence in the outputs. 5.10.2 Inputs importance results The next table shows the importance of each input. If the importance takes a value greater than 1 for an input, it means that the selection error without that input is greater than with it. In the case that the importance is lower than 1, the selection error is lower without using that input. Finally, if the importance is 1, there is no difference between using the current input and not using it. The most important variable is horsepower, that gets a contribution of 120.5% to the outputs. 5.11 Order selection 5.11.1 Incremental order results The next table shows the order selection results by the incremental order algorithm. They include some final states from the neural network, the loss functional and the order selection algorithm. International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 21 5.11.2 Final architecture A graphical representation of the resulted deep architecture is depicted next. It contains a scaling layer, a neural network and an un-scaling layer. The yellow circles represent scaling neurons, the green circles the principal components, the blue circles perceptron neurons, the red circles un scaling neurons, and the purple circles bounding neurons. The number of inputs is 7, the number of principal components is 7, and the number of outputs and bounding neurons are 1. The complexity, represented by the numbers of hidden neurons, is 3:3:1. 5.12 Linear regression analysis 5.12.1 Task description A standard method to test the loss of a model is to perform a linear regression analysis between the scaled neural network outputs and the corresponding targets for an independent testing subset. This analysis leads to 3 parameters for each output variable. The first two parameters, a and b, correspond to the y-intercept and the slope of the best linear regression relating scaled outputs and targets. The third parameter, R2, is the correlation coefficient between the scaled outputs and the targets. If we had a perfect fit (outputs exactly equal to targets), the slope would be 1, and the y-intercept would be 0. If the correlation coefficient is equal to 1, then there is a perfect correlation between the outputs from the neural network and the targets in the testing subset. 5.12.2 mpg linear regression parameters The next plot lists the linear regression parameters for the scaled output mpg. 5.12.3 mpg linear regression chart The next chart illustrates the linear regression for the scaled output mpg. The predicted values are plotted versus the actual ones as squares. The colored line indicates the best linear fit. The grey line would indicate a perfect fit 6. DISCUSSIONS AND CONCLUSION Data pre-processing helps in producing cleaning data which can be used in building a more accurate architecture. Table provided in the testing section shows the comparison between the results obtained for auto mpg data set using various algorithms on same data. Algorithms used (Normalized Squared Error + Gradient Descent) International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 22 It is obvious from the figure that the results obtain after pre-processing auto mpg data applying gradient descent algorithm outperformed all. This justify the fact that neural network is better tool for building a model as compared to the statistical counterpart. The parameters for neural network training parameters are: Data allocation: 80:10:10; Weight Seed: 6 No of Hidden unit: 6; Learning Rate: 0.8; Momentum Rate: 0.90.5; Activation function: Sigmoid; No of Epoch/Iterations: 1000; Importance of Attributes: Applying the Mathematical Expression 7. CODE RESULTED "FORMULA": scaled_cylinders=2*(cylinders-3)/(8-3)-1; scaled_displacement=2*(displacement-68)/(455-68)-1; scaled_horsepower=2*(horsepower-0)/(225-0)-1; scaled_weight=2*(weight-1613)/(4951-1613)-1; scaled_acceleration=2*(acceleration-8)/(24.8-8)-1; scaled_model_year=2*(model_year-70)/(82-70)-1; scaled_origin=2*(origin-1)/(3-1)-1; principal_component_1=(0.427521*scaled_cylinders+0.452171*scaled_displacem ent+0.427935*scaled_horsepower+0.418394*scaled_weight0.34759*scaled_acceleration-0.225596*scaled_model_year-0.287875*scaled_origin); principal_component_2=(-0.0787931*scaled_cylinders- .109669*scaled_displacement+0.127761*scaled_horsepower0.222788*scaled_weight0.452102*scaled_acceleration0.599929*scaled_model_year+0.592871*scaled_origin); principal_component_3=(0.0175998*scaled_cylinders+0.0025906*scaled_displacement0 .190201*scaled_horsepower+0.0209734*scaled_weight+0.550977*scaled_acceler ation-0.756956*scaled_model_year-0.294125*scaled_origin); principal_component_4=(-0.248812*scaled_cylinders0.199804*scaled_displacement-0.122994*scaled_horsepower0.363747*scaled_weight0.523859*scaled_acceleration0.101714*scaled_model_year-0.682611*scaled_origin); principal_component_5=(0.649467*scaled_cylinders+0.164912*scaled_displacem ent-0.6846*scaled_horsepower-0.214513*scaled_weight0 .170949*scaled_acceleration+0.026248*scaled_model_year+0.0799405*scaled_o rigin); principal_component_6=(-0.325269*scaled_cylinders0.11025*scaled_displacement0.500366*scaled_horsepower+0.745723*scaled_weight0.259915*scaled_acceleration0.0716249*scaled_model_year+0.053751*scaled_origin); principal_component_7=(0.470311*scaled_cylinders0 .839187*scaled_displacement+0.173784*scaled_horsepower+0.201099*scaled_we ight-0.00122541*scaled_acceleration-0.00670506*scaled_model_year0 .0623285*scaled_origin); y_1_1=(-1.23486-0.537891*principal_component_1-0.382974*principal_component_2-0.107825*principal_component_3+0.200487*principal_component_40.00508268*principal_component_5 +0.320787*principal_component_6 -0.26649*principal_component_7); y_1_2=(-0.00588725 -0.0503506*principal_component_1 +0.0115376*principal_component_2 +0.137881*principal_component_3 +1.30914*principal_component_4 +0.288058*principal_component_5 -0.210124*principal_component_6 +0.0880405*principal_component_7); y_1_3=(0.236763 -0.276728*principal_component_1 -0.00122843*principal_component_2 +1.013*principal_component_3 -0.627519*principal_component_4 -0.678489*principal_component_5 -0.551077*principal_component_6 -0.523779*principal_component_7); y_2_1=tanh(-0.471813 +0.596172*y_1_1 0.13173*y_1_2 +0.405407*y_1_3); y_2_2=tanh(-0.0302446 -0.770611*y_1_1 -0.14882*y_1_2 +0.930934*y_1_3); y_2_3=tanh(-0.593283 -0.248479*y_1_1 -0.630853*y_1_2 -0.295339*y_1_3); y_3_1=tanh(-0.631952 -0.686797*y_2_1 +0.212151*y_2_2 -1.04976*y_2_3); scaled_mpg=(-0.0647791 -1.07014*y_3_1); (mpg) = (0.5*(scaled_mpg+1.0)*(44.6-11)+11); mpg = max(9, mpg) mpg = min(60, mpg) International Journal of Academic Information Systems Research (IJAISR) ISSN: 2000-002X Vol. 2 Issue 10, October – 2018, Pages: 5-23 www.ijeais.org/ijaisr 23 References 1. Abu-Naser, S. S. (2012). "Predicting learners performance using artificial neural networks in linear programming intelligent tutoring system." International Journal of Artificial Intelligence & Applications 3(2): 65. 2. Afana, M., et al. (2018). "Artificial Neural Network for Forecasting Car Mileage per Gallon in the City." International Journal of Advanced Science and Technology 124: 51-59. 3. Ahmed, A., et al. (2019). "Knowledge-Based Systems Survey." International Journal of Academic Engineering Research (IJAER) 3(7): 1-22. 4. Alajrami, E., et al. (2019). "Blood Donation Prediction using Artificial Neural Network." Blood 3(10): 1-7. 5. Alghoul, A., et al. (2018). "Email Classification Using Artificial Neural Network." International Journal of Academic Engineering Research (IJAER) 2(11): 8-14. 6. Alkronz, E. S., et al. (2019). "Prediction of Whether Mushroom is Edible or Poisonous Using Back-propagation Neural Network." International Journal of Academic and Applied Research (IJAAR) 3(2): 1-8. 7. Al-Massri, R., et al. (2018). "Classification Prediction of SBRCTs Cancers Using Artificial Neural Network." International Journal of Academic Engineering Research (IJAER) 2(11). 8. Al-Mubayyed, O. M., et al. (2019). "Predicting Overall Car Performance Using Artificial Neural Network." International Journal of Academic and Applied Research (IJAAR) 3(1): 1-5. 9. Al-Shawwa, M. and S. S. Abu-Naser (2019). "Predicting Birth Weight Using Artificial Neural Network." International Journal of Academic Health and Medical Research (IJAHMR) 3(1): 9-14. 10. Al-Shawwa, M. and S. S. Abu-Naser (2019). "Predicting Effect of Oxygen Consumption of Thylakoid Membranes (Chloroplasts) from Spinach after Inhibition Using Artificial Neural Network." International Journal of Academic Engineering Research (IJAER) 3(2): 15-20. 11. Al-Shawwa, M., et al. (2018). "Predicting Temperature and Humidity in the Surrounding Environment Using Artificial Neural Network." International Journal of Academic Pedagogical Research (IJAPR) 2(9): 1-6. 12. Anderson, J., et al. (2005). "Adaptation of Problem Presentation and Feedback in an Intelligent Mathematics Tutor." Information Technology Journal 5(5): 167-207. 13. Ashqar, B. A. M. and S. S. Abu-Naser (2019). "Identifying Images of Invasive Hydrangea Using Pre-Trained Deep Convolutional Neural Networks." International Journal of Academic Engineering Research (IJAER) 3(3): 28-36. 14. Ashqar, B. A. M. and S. S. Abu-Naser (2019). "Image-Based Tomato Leaves Diseases Detection Using Deep Learning." International Journal of Academic Engineering Research (IJAER) 2(12): 10-16. 15. Ashqar, B. A., et al. (2019). "Plant Seedlings Classification Using Deep Learning." International Journal of Academic Information Systems Research (IJAISR) 3(1): 7-14. 16. Elzamly, A., et al. (2015). "Predicting Software Analysis Process Risks Using Linear Stepwise Discriminant Analysis: Statistical Methods." Int. J. Adv. Inf. Sci. Technol 38(38): 108-115. 17. Elzamly, A., et al. (2016). "A New Conceptual Framework Modelling for Cloud Computing Risk Management in Banking Organizations." International Journal of Grid and Distributed Computing 9(9): 137-154. 18. Elzamly, A., et al. (2017). "Predicting Critical Cloud Computing Security Issues using Artificial Neural Network (ANNs) Algorithms in Banking Organizations." International Journal of Information Technology and Electrical Engineering 6(2): 40-45. 19. Elzamly, A., et al. (2019). "Critical Cloud Computing Risks for Banking Organizations: Issues and Challenges." Religación. Revista de Ciencias Sociales y Humanidades 4(18): 673-682. 20. Heriz, H. H., et al. (2018). "English Alphabet Prediction Using Artificial Neural Networks." International Journal of Academic Pedagogical Research (IJAPR) 2(11): 8-14. 21. Jamala, M. N. and S. S. Abu-Naser (2018). "Predicting MPG for Automobile Using Artificial Neural Network Analysis." International Journal of Academic Information Systems Research (IJAISR) 2(10): 5-21. 22. Kashf, D. W. A., et al. (2018). "Predicting DNA Lung Cancer using Artificial Neural Network." International Journal of Academic Pedagogical Research (IJAPR) 2(10): 6-13. 23. Khalil, A. J., et al. (2019). "Energy Efficiency Predicting using Artificial Neural Network." International Journal of Academic Pedagogical Research (IJAPR) 3(9): 1-8. 24. Li, L., et al. (2011). "Hybrid Quantum-inspired genetic algorithm for extracting association rule in data mining." Information Technology Journal 12(4): 1437-1441. 25. Abu Naser, S. S., et al. (2016). "Design and Development of Mobile University Student Guide." Journal of Multidisciplinary Engineering Science Studies (JMESS) 2(1): 193-197. 26. Abu Naser, S. S., et al. (2016). "Design and Development of Mobile Blood Donor Tracker." Journal of 27. Marouf, A. and S. S. Abu-Naser (2018). "Predicting Antibiotic Susceptibility Using Artificial Neural Network." International Journal of Academic Pedagogical Research (IJAPR) 2(10): 1-5. 28. Masri, N., et al. (2019). "Survey of Rule-Based Systems." International Journal of Academic Information Systems Research (IJAISR) 3(7): 1-23. 29. Nasser, I. M. and S. S. Abu-Naser (2019). "Artificial Neural Network for Predicting Animals Category." International Journal of Academic and Applied Research (IJAAR) 3(2): 18-24. 30. Nasser, I. M. and S. S. Abu-Naser (2019). "Lung Cancer Detection Using Artificial Neural Network." International Journal of Engineering and Information Systems (IJEAIS) 3(3): 17-23. 31. Nasser, I. M. and S. S. Abu-Naser (2019). "Predicting Books' Overall Rating Using Artificial Neural Network." International Journal of Academic Engineering Research (IJAER) 3(8): 11-17. 32. Nasser, I. M. and S. S. Abu-Naser (2019). "Predicting Tumor Category Using Artificial Neural Networks." International Journal of Academic Health and Medical Research (IJAHMR) 3(2): 1-7. 33. Nasser, I. M., et al. (2019). "A Proposed Artificial Neural Network for Predicting Movies Rates Category." International Journal of Academic Engineering Research (IJAER) 3(2): 21-25. 34. Nasser, I. M., et al. (2019). "Artificial Neural Network for Diagnose Autism Spectrum Disorder." International Journal of Academic Information Systems Research (IJAISR) 3(2): 27-32. 35. Nasser, I. M., et al. (2019). "Developing Artificial Neural Network for Predicting Mobile Phone Price Range." International Journal of Academic Information Systems Research (IJAISR) 3(2): 1-6. 36. Ng, S., et al. (2010). "Ad hoc networks based on rough set distance learning method." Information Technology Journal 10(9): 239-251. 37. Owaied, H. H., et al. (2009). "Using rules to support case-based reasoning for harmonizing melodies." Journal of Applied Sciences 11(14): pp: 31-41. 38. Sadek, R. M., et al. (2019). "Parkinson's Disease Prediction Using Artificial Neural Network." International Journal of Academic Health and Medical Research (IJAHMR) 3(1): 1-8. 39. Salah, M., et al. (2018). "Predicting Medical Expenses Using Artificial Neural Network." International Journal of Engineering and Information Systems (IJEAIS) 2(20): 11-17. 40. Sulisel, O., et al. (2005). "Growth and Maturity of Intelligent Tutoring Systems." Information Technology Journal 7(7): 9-37. 41. Zaqout, I., et al. (2015). "Predicting Student Performance Using Artificial Neural Network: in the Faculty of Engineering and Information Technology." International Journal of Hybrid Information Technology 8(2): 221-228. 42. Abu-Nasser, Bassem. "Medical Expert Systems Survey." International Journal of Engineering and Information Systems (IJEAIS) 1, no. 7 (2017): 218-224. 43. Abu-Nasser, Bassem S., and Samy S. Abu-Naser. "Cognitive System for Helping Farmers in Diagnosing Watermelon Diseases." International Journal of Academic Information Systems Research (IJAISR) 2, no. 7 (2018): 1-7. 44. Abu-Nasser, Bassem S., and Samy S. Abu Naser. "Rule-Based System for Watermelon Diseases and Treatment." International Journal of Academic Information Systems Research (IJAISR) 2, no. 7 (2018): 1-7. 45. Baker, J., et al. "& Heller, R.(1996)." Information Visualization. Information Technology Journal 7(2). 46. Baker, J., et al. (1996). "Information Visualization." Information Technology Journal 7(2): pp: 403-404. 47. Baraka, M. H., et al. (2008). "A Proposed Expert System For Guiding Freshman Students In Selecting A Major In Al-Azhar University, Gaza." Journal of Theoretical & Applied Information Technology 4(9). 48. Barhoom, A. M., et al. (2019). "Predicting Titanic Survivors using Artificial Neural Network." International Journal of Academic Engineering Research (IJAER) 3(9): 8-12. 49. Chand, P. S., et al. (2008). "MADAMS: Mining and Acquisition of Data by ANT-MINER Samples." Journal of Theoretical & Applied Information Technology 4(10). 50. Chen, R.-S., et al. (2008). "Evaluating structural equation models with unobservable variables and measurement error." Information Technology Journal 10(2): 1055-1060. 51. Dalffa, M. A., et al. (2019). "Tic-Tac-Toe Learning Using Artificial Neural Networks." International Journal of Engineering and Information Systems (IJEAIS) 3(2): 9-19. 52. El_Jerjawi, N. S. and S. S. Abu-Naser (2018). "Diabetes Prediction Using Artificial Neural Network." International Journal of Advanced Science and Technology 121: 55-64. 53. El-Khatib, M. J., et al. (2019). "Glass Classification Using Artificial Neural Network." International Journal of Academic Pedagogical Research (IJAPR) 3(2): 25-31. 54. Elzamly, A., et al. (2015). "Classification of Software Risks with Discriminant Analysis Techniques in Software planning Development Process." International Journal of Advanced Science and Technology 81: 35-48. 55. Naser, S. S. A. "TOP 10 NEURAL NETWORK PAPERS: RECOMMENDED READING–ARTIFICIAL INTELLIGENCE RESEARCH."