Abstract: Neural network models for dynamic systems can be trained either in parallel or in series-parallel configurations. Influenced by early arguments, several papers justify the choice of series-parallel rather than parallel configuration claiming it has a lower computational cost, better stability properties during training and provides more accurate results. The purpose of this work is to review some of those arguments and to present both methods in an unifying framework, showing that parallel and series-parallel training actually results from optimal predictors that use different noise models. A numerical example illustrate that each method provides better results when the noise model they implicit consider are consistent with the error in the data. Furthermore, it is argued that for feedforward networks with bounded activation functions the possible lack of stability does not jeopardize the training; and, a novel complexity analysis indicates the computational cost in the two configurations is not significantly different. This is confirmed through numerical examples.