I am new to tensorflow and am learning the basics at the moment so please bear with me. My problem concerns strange non-convergent behaviour of neural networks when presented with the supposedly simple task of finding a regression function for a small training set consisting only of m = 100 data points {(x_1, y_1), (x_2, y_2),...,(x_100, y_100)}, where x_i and y_i are real numbers. I first constructed a function that automatically generates a computational graph corresponding to a classical fully connected feedforward neural network: import numpy as np import tensorflow as tf import matplotlib.pyplot as plt import math def neural_network_constructor(arch_list = [1,3,3,1], act_func = tf.nn.sigmoid, w_initializer = tf.contrib.layers.xavier_initializer(), b_initializer = tf.zeros_initializer(), loss_function = tf.losses.mean_squared_error, training_method = tf.train.GradientDescentOptimizer(0.5)): n_input = arch_list[0] n_output = arch_list[-1] X = tf.placeholder(dtype = tf.float32, shape = [None, n_input]) layer = tf.contrib.layers.fully_connected( inputs = X, num_outputs = arch_list[1], activation_fn = act_func, weights_initializer = w_initializer, biases_initializer = b_initializer) for N in arch_list[2:-1]: layer = tf.contrib.layers.fully_connected( inputs = layer, num_outputs = N, activation_fn = act_func, weights_initializer = w_initializer, biases_initializer = b_initializer) Phi = tf.contrib.layers.fully_connected( inputs = layer, num_outputs = n_output, activation_fn = tf.identity, weights_initializer = w_initializer, biases_initializer = b_initializer) Y = tf.placeholder(tf.float32, [None, n_output]) loss = loss_function(Y, Phi) train_step = training_method.minimize(loss) return [X, Phi, Y, train_step] With the above default values for the arguments, this function would construct a computational graph corresponding to a neural network with 1 input neuron, 2 hidden layers with 3 neurons each and 1 output neuron. The activation function is per default the sigmoid function. X corresponds to the input tensor, Y to the labels of the training data and Phi to the feedforward output of the neural network. The operation train_step performs one gradient-descent step when executed in the session environment. So far, so good. If I now test a particular neural network (constructed with this function and the exact default values for the arguments given above) by making it learn a simple regression function for artificial data extracted from a sinewave, strange things happen: Before training, the network seems to be a flat line. After 100.000 training iterations, it manages to partially learn the function, but only the part which is closer to 0. After this, it becomes flat again. Further training does not decrease the loss function anymore. This get even stranger, when I take the exact same data set, but shift all x-values by adding 500: Here, the network completely refuses to learn. I cannot understand why this is happening. I have tried changing the architecture of the network and its learning rate, but have observed similar effects: the closer the x-values of the data cloud are to the origin, the easier the network can learn. After a certain distance to the origin, learning stops completely. Changing the activation function from sigmoid to ReLu has only made things worse; here, the network tends to just converge to the average, no matter what position the data cloud is in. Is there something wrong with my implementation of the neural-network-constructor? Or does this have something do do with initialization values? I have tried to get a deeper understanding of this problem now for quite a while and would greatly appreciate some advice. What could be the cause of this? All thoughts on why this behaviour is occurring are very much welcome! Thanks, Joker Login To add answer/comment