Tracking times in temporal patterns embodied in intra-cortical data for controlling neural prosthesis an animal simulation study

ABSTRACT


INTRODUCTION
In this work, we aim to design a new hand trajectory decoder from brain signals in order to control a motor neuroprosthesis or at least know a desired motor action. Brain computer interface described in this paper is known in literature as a system to restore the lost motor functions because of a disease or an injury by activating paralyzed muscles. A number of previous studies have proved that desired actions can be extracted from motor cortex and, in turn, can be used as a reliable enough command signal to control a computer interface to realize actions with a robotic member for instance [1,2]. Many technical ameliorations in microelectronics have been brought to help patients e.g. for epilepsy suppression [3], spinal cord injuries [4,5], cochlear implant [6]. Research groups; in several papers; have previously confirmed that monkeys and human are able to learn how to control a robot limb or move a wheelchair [7,8], or more simply, maintaining a communication with the external environment, easily by stimulating neuron groups that participate in normal arm movement [9,10]. Technical stability, limited power source, flexibility in  ISSN: 2088-8708 Int J Elec & Comp Eng, Vol. 10, No. 5, October 2020 : 4721 -4737 4722 critical system parameters, restricted space to host the system and real time control are essential for these prosthesis to become clinically viable. Therefore neuroprosthetic strategies are developing toward closed-loop command systems consisting 3 functional blocks: neural signal recording, processing and neuro-muscular stimulation as depicted in Figure 1.
To obtain and comprehend the signal processed in the biological network of neurons; spikes should be assigned to specific neural sources. This classification procedure is termed spike sorting. A proper sorting of spikes with respect to their origin ameliorate considerably the decoding operation. Then, before representing the position of the animal in terms of firing patterns, sorting spikes must be accomplished and the spike's rate of each neuron is calculated [11,12]. Nevertheless, signal instabilities could arise from physical factors like the implants movement, reaction of the tissue or material degradation. Whatever the causes, signal changes can be significant, reaching 60% of the waveforms as reported by Dickey et al. [13]. In prior studies, we had used a multi-scale seriation approach for clustering spike trains [14].
The decoding function that links neural activity to behavior may be relatively unstable as well as degrading decoding performance. In [15], authors present right and left hand movements decoding by a probabilistic neural network using EEG signal and wavelet transform based on for classification and feature extraction. Perge et al. evaluated in [16,17] the nature and extent of instability in spiking populations recorded in the context of an ongoing pilot clinical trial of people with tetraplegia: they found that systematic rate changes occur commonly and they can cause estimation errors in the decoded kinematic parameters leading to degraded performance that presents itself as a directional bias. Adaptive filters attempting to mitigate these instabilities exist but their efficacy has not been established yet [18].
Decoding brain activity has been extensively studied in the literature giving birth to various methods, see e.g. [9,19]. A state space representation is adequate for this problem. In [20], Wu et al. modeled for example with a kalman filter (KF) the hand coordinates (acceleration, velocity and position on the two axes) with a probabilistic association between these motions and the brain signal. Kostov et al. [21] improved the possibility of real time control with a presence of a partial spinal cord injury by using an adaptive logic network. In [22], Gage et al. realized a control of cortical tasks by training a rat using a co-adaptive KF. These methods can estimate hidden states even in the absence of an accurate model of the system. In [23], Brockwell applied particle filter in order to get an estimation of the hidden states from cortical signals using the algorithm of Monte Carlo algorithm, with no hypothesis on the observations distribution. In summary, state space method offer a coherent outline for modeling stochastic dynamical systems. Note that despite the man-made neuroprosthesis will control a complex-living system, it shall not necessarily obey biological control principals: estimation procedures can be used to translate the brain signal into desired motor actions that can be used as a suitable command of a robot arm or other prosthetic interface. The final goal is an undeniably control, but it undoubtedly provides natural control suitable for clinically viable systems. Estimation of times of action potentials choosing optimally the width of the analysis window TW and assigned to neural units. A decoder is used to translate the neural activity into control signals of the prosthetic system. In this article, we aim to design a rat's hand trajectory decoder from its brain signals by analyzing neuronal spiking activity of motor cortical recorded from two freely moving rats in the first two months after electrode implantation as exposed in section 2.1. The experiments are realized at the National TsinHua University (NTHU) in Taiwan. In section 3 is described several kinds of specific spatio-temporal models and the manner of their detection. We explore in addition the statistical approaches to evaluate the probability to optimize the pattern detection. Section 6 gives summarizing statistics on the motor cortex responses recorded simultaneously with movement to make proof of the mapping linking these models to special behavioral task using time delay neural network.

EXPERIMENTATION AND DATA ACQUISITION 2.1. Animal training and behavioral tasks
In The study, approved by the Use Committee at the National Chiao Tung University and Institutional Animal Care, was conducted according to the standards established in the Guide for the Care and Use of Laboratory Animals. Four male Wistar rats weighing 250-300 g (BioLASCO Taiwan Corp., Ltd.) were housed individually on a 12 h light/dark sequence, with access to food and water ad libitum. Data was recorded from the motor cortex of awake animal performing a simple reward task. In this task, male rats (BioLACO Taiwan Co., Ltd) were trained to press a lever to initiate a trial in return for a water reward as shown in Figure 2(a). The animals were water restricted 8-hours/day during training and recording session but food were always provided to the animal ad lib every day.

Chronic animal preparation and neural ensemble
Pentobarbital was used to anesthetise the animals (50 mg/kg i.p.). They were then positioned on a regular stereotaxic apparatus (Model 9000, David Kopf, USA). Electrode array was implanted after a careful dura retraction. The couples of 8 micro-wire electrode arrays (no.15140/13848, diameter of 50m; California Fine Wire Co., USA) are implanted in the primary motor cortex (M1) at the level of the layer V. The area related to forelimb movement is located anterior 2-4 mm and lateral 2-4 mm to Bregma. After implantation, the exposed brain required a week of recovery time in where it should be sealed with dental acrylic. The animal moved freely in the box to move in the box of the behavior task (30 cm_30 cm_ 60 cm) during the recording sessions, as depicted in Figure 2(a), to receive 1 ml of rewarded water, the rat should only press the lever with its right forelimb. Neural signals are logged using a Multi-Channel Acquisition Processor (MAP, Plexon Inc., USA). The recorded signals were exposed from the main stage to an amplifier, over a band-pass filter (spike pre-amp filter: 450-5 kHz; gain: 15,000-20,000), and sampled at 40 kHz per channel as given in Figure 2  In the same time, the animal's activities was recorded by the video tracking system (CinePlex, Plex Inc., USA) and inspected to ensure that it was consistent for all trials included in a given analysis [24]. During a daily research session, the neural activity was frequently recorded in many sequences of short periods of two to six minutes during which the rat accomplished one behavioral task, interlaced by rest periods when no recording of activity was realized. Finally, we obtained dataset composed of 48 channels (or neurons) described by successions of '1' separated by long silences of '0'. An additional illustration was used based on a smooth by a Gaussian window applied on the rate of spikes.

DECODER CALIBRATION AND CLOSED LOOP CONTROLLER
A statistical model is developed to estimate the decoding process. In Figure 3 is shown a generic block plan of the neural controller which operates as a sample data feedback system whose behavior depends of an ensemble of allowed inputs and an optimization criterion. It generates the controls. The trajectories are the responses to the control signals.
The captured neural signals using micro-electrodes are exhibited to a circuit of signal processing like as amplification, band-pass filtering, and then transmitted to the level of spike localization, in which we define the times of spikes arising for each channel. The previous stage results spikes trains which is exposed to the spikes sorting stage composed of two principal parts : (i) one detects spikes (waveforms) from background noise [25] and (ii) classifies each detected spike to generate multiple single-unit spike trains [26]. The related movement information (kinematic parameters) is reconstructed using the simultaneous rates of the obtained trains of spikes. Usually, this procedure is named neuronal spike decoding [9] and it is performed by the decoding filter stage in Figure 1. The performance of this decoding filter is of great importance since it permits the quantitative information that is encoded in the spike trains to be estimated. In [27], Bialek reported the possibility of the trajectory reconstruction using a linear finite impulse response (FIR) filter applied on spikes train.
In this paper we are concerned with dynamic nonlinear discrete-time model with time-invariant parameters [28]. The parameters are considered as random variables adjusted iteratively until the responses of the adaptive model linked better with the measured outputs in the sense specified by the minimization criterion. The signals are mostly described as parameters. The observables, or output variables, are the time-histories of the dynamic system. The statistics of the non-measured signals, e.g. the noise, are supposed to be known but one needs to estimate them based on the observation data.

RECURRENT NEURAL NETWORK 4.1. The NARX models
Despite that linear filters such as wiener filters [29] deliver a simple analytic solution with low complexity training and online feasibility, the reconstructed trajectories may be suboptimal since the output is limited to mappings in the input space and mapping of control features to motor behavior may include nonlinear translations. Therefore, herein, we study modeling hand movements using artificial neural networks. The NARX (Nonlinear AutoRegressive with eXogenous inputs) model is an essential type of nonlinear discrete time systems for time-series forecasting [30,31]. The recurrent NARX can be represented: where the regressor is represented by ( ) = ( ( − ∆), … ( − ∆)) ∈ ℝ × and ( ) represents the output of the system at time ; ∆, 2∆, … , ∆ are the time lags of the system variables and (. ) an unknown nonlinear function. The general prediction (1) computes the next value of time series ( ) (output) from the past observation and the past output . Note that is multidimensional and sized by the number of neural channels. Regarding this way, the prediction becomes a function estimation problem, and so the aim the technique is focused to the approximation of the function , like a multi-layer perceptron (MLP).
The obtained system is a compact embedded memory model called a NARX network as sketched in Figure 4: a first tapped delay line of inputs with a delayed connections from the output to the input. A Time Delay Neural Networks (TDNN) is a NARX with zero output-memory order [32] given by the form: This formulation is simplified but with a significant restriction of the NARX from its representational abilities like a dynamic network. Simulated results given by Lee et al. [33] show that NARX networks are often much better than conventional recurrent neural networks at discovering long time dependences. An explanation why output delays can help long-term dependences can be found by considering how gradients are calculated using gradient-descent learning algorithms: for gradient based training algorithms, the information about the gradient contribution steps in the past vanishes for large , i.e. mathematically lim

Learning algorithms
The number of inputs scales the number of parameters of the model and thus creates problems with model generalization. Moreover, the size of the training data required for good approximation increases with the number of parameters. To overcome the problem of model order, a dynamic back-propagation algorithm is required to compute the gradients, which defer the memory structure to the hidden, recurrent layer instead of time-embedding but is more computationally intensive than static back-propagation and takes more time. In addition, training is more likely to be trapped in local minima [30]. Simple back-propagation algorithm is inappropriate since the parameter convergence need long iterations and a learning rate to adjust. Hence we propose to use the conjugate gradient algorithm (CGA) to accelerate the weight update. The weight vector of a feed-forward neural network is a point in the real Euclidean space ℝ defined by where , ( ) is the weight from unit in the layer to unit is layer + 1, is the number of units in layer .
We assume a global error function ( ) depending of the weight and biases attached to the network and chosen here as the standard least square function.
( ) is evaluated with one pass forward and the gradient ′( ) with one pass backward: Back propagation update rule is given by At a time instance , where − ′ ( ( ) ) is the gradient vector and is the learning rate that should be chosen carefully to make the algorithm is based on the linear approximation ( + ℎ) ≈ (ℎ) + ′ ( ) ℎ, the algorithm show poor convergence. Hence, a Conjugate Gradient Algorithm (CGA) is generally a better choice and results in a faster convergence to reach the minimum of a ( ). The optimization strategy consists to choose a search direction and then to decide how far to go in the specified direction, i.e. to determine the step size .CGA chooses the search direction and the step size by using second-order approximation.
where ℎ is the line search direction [34]. From the current search direction, the next search direction is determined so that it is conjugate to previous search directions. The critical points of ( ) are the solutions of the linear system (8).
If ′ (ℎ) denote the quadratic approximation to in the neighborhood of a point so that ( + ℎ) = (ℎ). Let the vector basis 1 , … , be a conjugate system of ℝ , the step from a starting point ℎ 1 to a critical point ℎ * can be expressed as a linear combination of 1 , … , Multiplying (9) with ′′ ( ) and substituting ′ ( ) for − ′′ ( ) ℎ * from (8) gives: The intermediate points ℎ +1 = ℎ + given the iterative determination of ℎ * are in fact minima for (ℎ) restricted to the plane described by ℎ = ℎ 1 + 1 1 + ⋯ + . The conjugate weight vector 1 , … , are determined recursively:  (13) and ℎ +1 = + , = / , = ′ (ℎ ) and = ′′ ( ) . For each iteration the above described algorithm is applied to the quadratic approximation of the global error function. In most cases, preconditioning is necessary to ensure fast convergence of the conjugate gradient method that takes the following form in algorithm 1. The conjugate gradient algorithm works very well on well-conditioned matrices; however, in reality, most matrices are ill-conditioned, reducing the efficiency of the algorithm. Môller showed that algorithm 1 might restart when +1 ≈ or ≈ +1 . Hence he proposed in [35] different versions of the conjugate gradient methods in regulating indefiniteness of ′′ ( ) with Lagrange multiplier . This is done by setting and for each iteration adjusting looking at the sign of . This results in the following algorithm 2 MLPs trained by a back-propagation algorithm were sometimes employed for the BMI (see e.g. Wessberg et al. [36] or Wang [37]), however, some difficulties were appeared in the training process. Like the number of parameters, which refers to the question of the amount of connections or weights contained the network furthermore the optimally selected the inputs. In the linear case, several approaches have been proposed in the orders determination (see [38] for review): penalizing the parameter increase (see [39] and [40] for review), delay-damage algorithm [33], (subset) feature selection [38] or extraction [41], etc. These answers are part of a more general question: How time is embodied in temporal patterns?

HOW TIME EMBODIED IN TEMPORAL PATTERNS? 5.1. Time feature selection in neural channels
Model dimension reduction can improve the precision of its predictive inference because, in high dimensions, prediction or inference may be computationally costly. Moreover, the reduction of features can offer simple visions onto the function of the system. Temporal processing of neural data is a challenge because the information is embedded in (i) the temporal order which refer to the ordering among the components of a sequence and by (ii) the time duration, i.e. the time that separates two time stamps [29].
Generally, selections of time lag for the prediction setting and of the best variables selection are done manually by experts. Selection methods of input variable can be classified into two classes: methods of filtering and wrapper methods (see Guyon and Elisseeff [42] for a review). Filter methods use statistical measures to classify the variables, according to their influence and relevance on the target variable. On the other hand, wrapping methods use the learning model as the basis for selection. Notice that the latter may often achieve more accurate prediction results because variables selection will take into account the approximation model. Correlation method, partial correlation method, Mallows coefficient and PLS based method are often used tor time-lag selection. Shakil et al. propose a linear time-lag model optimized by a genetic algorithm for delay selection [43]. Performing the selection of the time-lags for each variable before variable selection improves the final prediction performance.
For a better knowledge about the variables that affect the target variable , the best choice is by means of filter methods because they use only the information content of data rather than methods based on correlation coefficient. But for systems involving nonlinear interactions among variables, linear filter methods fail. The linear correlation value which can measure nonlinear interaction among candidates of input variables and the target output is a better choice [41]. But the main difficulty lies in the estimation of the mutual information in multidimensional space which is computationally expensive in addition [44]. Building a predictor from a selection of the most relevant variables is usually suboptimal; conversely a subset of useful variables may exclude many redundant, but relevant, variables. This contrasts with the problem of ranking all potentially relevant variables.

Basic elements in the decoding algorithm
Neuronal assemblies contribute to neural decoding. Accordingly, synchronized firing spikes are considered a property of neuronal signals that can be detected and depends on stimulus and behavioral context. There is a need for analysis tools that allow us to reliably detect correlated spiking activity between multiple neurons that is not explained by the firing rates of the neurons alone. First, the algorithm performs channel per channel coarse preliminary spike detection to estimate the spike morphology. The raw signal ( ) is raised to the power: ( ) = | ( )| , with = 2 but higher values may improve the performance in increasing the influence of large-amplitude events. Then we apply a Blackman low-pass filter to ( ) with a cut-off frequency in the range of 100-250Hz to decrease the effect of noise at high frequencies and to reduce the number of signal maxima that will be considered as candidate spikes. We then take the th root of the filtered signal to get the final output signal for this stage , where ℎ( )is the FIR impulse response of the low-pass filter and (*) the convolution operator. After the transformation, ( ) presents a sharp increase for an occurrence of spike and a soft maxima increase when no spike appears (e.g. noise). We denote the sequence of spike arrival times i.e. ( ) maxima as 0 ≤ 1 ≤ 2 , … , ≤ +1 , … , ≤ ≤ and the amplitude of the maxima as { 1 , … , } where = ( ) and K is the number of maxima in ( ).
After time discretization, simultaneously recorded spiking activities are represented as simultaneous sequences of ones and zeros; "1" is used to indicate the presence of a spike and "0" for the nonappearance of spike as shown in Figure 5. There are at most 2 different coincidence patterns with simultaneously observed neurons (in fact much lower since zeros dominate). The function of the probability density of these maxima has two different prominent styles: a mode for ECoG background noise close to zero and a mode distant from zero due to the spike activity, which can be easily separated with a threshold calculated by a simple neyman-pearson test, see [45] for a reminder on signal detection. After selecting the threshold, the set = { 1 , … , ′ } represents the times of detected spikes ( ′ < ). The latter are used to estimate the templates , = 1, … , by a simple mean; we estimate these time range span templates between 0,45ms and 0,55ms before and after the peak of the template. The normalized cross-correlation, used for finding matches of the reference template is obtained by sliding the template window of size : Parallel binary processes, lasting for time steps. Each row consists of a sequence of 1s and 0s representing a realization of a single process . The 1s mark the occurrences of spike events and the vector ( )expresses the joint activity across the process. Correlation is captured sliding a window analysis through a binary data representation of the events.  The observed number of patterns is governed by a binomial distribution of neurons firing jointly.
It is useful to think of Γ as a (known) signal originating from a source: the measurement vector is = Γ + and the question is whether = 0 or > 0. is seen as a noise vector scaled by a constant and added to the signal. The detection problem is phrased as 0 : = 0 versus 1 : > 0 in the model ∼ ( Γ, 2 ). The distribution of the statistic ( ) is normal ∼ ( (Γ Γ) 1/2 , ). This means that we can set a threshold 0 for testing 0 vs 1 : The threshold 0 to be determined is computed so that the false alarm probability PFA is minimized and the detection probability PD is maximized. The detection probability is determined by the formula and the false alarm probability by = 1 − ( 0). Fixing in our experiment = 10%, we obtain 0 = 1.28.
Data are organized and aligned with the corresponding behavioral event or stimulus into "trials". Figure 6 gives the detection probability PD with times for a set of selected channels: the plots suggest that the patterns of variation are very similar and the series are positively correlated at lag 0, suggesting synchrony. Finally, we calculate the performance of these signals detection technique by computing sensitivity and specificity of the channels for a range of descriptors. Table 1 presents the ten "best" neural channels for which six descriptors have been computed: the template variance and mean resp. and , the variation coefficient = , the early firing time before the behavior event is detected ∆ , the maximum rate of missing events and the maximal amplitude of the template . The ideal channel is a channel with a detection time ∆ as large as possible, a small variability and a low missing detection rate . This electrode selection strategy, originally suggested in [41], was based on our motivating rationale that channels eliminated before the main processing would further simplify the decoding of motor cortical recordings. Our experiment retain, in the following, 4 channels at maximum.

Metaneuron and nonparametric optimal detector of joint-spike event
The instantaneous observations of spike actions from channels can be described by parallel binary sequences The input to the detector is formed by the vector ( ). The simplest decision rule takes the following form: The decision (19) counts the number of 1 in a sliding window. Suppose that the 's are independent but not necessarily iid [46]. The detection problem may be written as testing 0 versus 1 . When 0 is true, ( | 1 ) has a zero median whilst if 1 is true, ( | 1 ) has a positive median. So the test can be stated in terms of as The likelihood ratio can now be formed and compared to a threshold : Let + = ∑ =1 , the number of 1 , then Λ( ) becomes Taking the natural logarithms of both sides and after reduction we obtain The test in (23) compares the number of 1 with the threshold ′. The threshold is fixed by setting the false alarm probability ( 2 | 0 ) (as in section 5.2) to be less than or equal to = 10%. Since + is the sum of Bernoulli random variables, + obeys binomial distribution + ∼ ℬ ( , 1 2 ). [47]. If 0 is true where the binomial coefficient is . The false alarm probability is therefore We can find the smallest value of ′ such that ( 2 | 0 ) ≤ or Once ′ is fixed, one can easily derived the probability of detection (POD) from the binomial distribution ℬ( , ), and this yields Therefore the POD is Finally the non-parametric decision rule is easy to implement and requires that we sum the components of the vector a ( ). As an illustration, let us consider a case with = 10 channels and = 25%. The threshold ′ for this test is obtained from (26) by finding the value of ′ for which 10 10 The desired binomial coefficients are: . For ′ = 6, the actual value of the false alarm probability is = ∑ 10 2 −10 = 176/1024 = 0.17 10 =7 . Suppose that we obtain the following set of observations a = (0111110010) . For this set of observations, + = 6 and we decide 1 .

PERFORMANCE ASSESSMENT 6.1. Experiment assessment with one neuron
First, we start our simulation with a single input as in a similar work presented in [48], where the best channel is selected with the correlation method (channel 21) which was the unique input to the TDNN but with a try error time regression. In this time, the best selected channel is given using the variation coefficient and the time shifting is defined by the template matching. Table 2 gives the time training and its log, and the estimation error with the corresponding neurons in the hidden layer. A single input channel with different neurons of the hidden layer, respectively (2; 4; 7; 9; 12; 15; 20; 25), is used the training time and the estimation error are given in Table 2 below with respect to the number of neurons.

Experiment with multiple neurons
For multiple neurons (channels), we use the first two and three selected neurons with respect to their order.

Two inputs
For the double neurons, we started our simulation using the time delay neural network with (2; 4; 7; 9; 12; 15; 20; 25; 30) neurons. The best selected neurons are the two channels 21 and 25 keeping their order. In Table 3 is given the training time of the TDNN.

Three inputs
For this case, we have three selected inputs which are the past two selected for the case of double input with the channel 43 in the third rank. The TDNN was trained on half data and tested on the rest; in Table 4 are given results. We obtained 48 channels ECoG after spikes detection and sorting, the obtained signal is represented by suits of '1' pulses disjointed by long silences of '0'. Various methods of data illustration were suggested to perform the information hidden in distances between spikes. A brief description of these methods is summarized in [49]. In this work, we compute the spike rate using a sliding window of the same frequency of the used camera get out, simultaneously, the hand coordinates with the brain signal. We calculated and used correlation coefficients to choose channels which will be used as features of the brain machine interface decoder and also to remove the redundant and non-correlated input variables with the output. Best cross-correlation values give the delay of the explanatory variables that will be used as network inputs.
In Figure 4 is shown the TDNN model. The input layer comprises, in this experiment, 10 delayed cortical signals from the 3 most substantial neurons (among 46 neurons having been recorded), between 2-7 hidden neurons with hyperbolic tangent cell function. In section 4.2, is described how we trained the TDNN using the 2 algorithms. The output variables (hand position) are centered and reduced to follow a Gaussian law (0; 1). The layer weights are initialized randomly in [−1; +1]. We set the learning rate = 0.05 in both algorithms.
The training was terminated when the cross validation error continuously increased for more than 10 steps. The network specifications are listed in Table 5. While the choice for the right architecture is mainly intuitive and implies arbitrary decisions, an attempt to apply ANN directly fails due to the dimensionality of the inputs. Therefore the dimension of the inputs has been reduced drastically by feature selection. The change in the number of units in the TDNN affects the decoding performance as shown in Table 5, the mean AIC across the 50 data sets is used to quantify the performance of each model for the two decoding algorithms. For clarity here, we have 7 peaks which represent 10% of data. In Figure 7(a), by the red line is described the model output. It can be seen that the hand coordinates on a plan is tracked and the trajectory is well reconstructed along the experiences. When movements is sufficiently complex (presence of peaks), the algorithm show the ability to catch well these peaks as shown a zoom onto the fit in Figure 7(b-g).
The Akaike information criterion ( ) is used for the measurement of performances. As mean for model selection the AIC is used. The AIC is calculated = 2 − 2 ln( ), where represents the number of model's parameters, and denotes the maximized value of the likelihood model's function. AIC deals with the compromise between the quality of the model's fit and its complexity. The best model is characterized with a lowest AIC value.
When using in decoding a reasonable number of units (more than three), the preconditioned CGA shows its superiority to the classic CGA. Figure 7: illustrates the result obtained from the MLP 27-5-2, where the trial-error technique is used for the MLP training. The performance of the MLP 27-5-2 was much worse than that of the CGA. The performance differences between the CGA and preconditioned CGA were larger above 5 units.

CONCLUSION
Determining a dynamical neural network with the appropriate architecture is not easy yet critical task. Nonlinear architectures are not the only ones that can affect the performance, but also the memory architecture of dynamic models can also have an important impact on its dynamical behavior. Using the NARX network in Brain Computer Interfaces produces too many free parameters: even if the method realizes faster with the conjugate gradient algorithm. A NARX network with 20 neurons in the hidden layer and 10 times delayed of 46 inputs of ECoG brain signal yields more than 1,160 free parameters with only 10,000 samples for training and cross-validation. We proposed in this paper an algorithm to define the embedded memory order of NARX recurrent neural networks. We also show that this algorithm can demonstrate improved performance on inference tasks. Furthermore, optimizing the memory architecture and the nonlinear function through input selection often results in sparsely connected architectures but with long time windows which are able to model the global features of the underlying system quite efficiently. Simulation results from experimental brain signals showed improved performance in the hand trajectory decoding process. A perspective issue is the possibility of using in addition a Kalman pre-filter to detect pulses and group them, therefore simplify the activation measurement.