A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Concept of regularization

DOI: 10.1615/thermopedia.000238

CONCEPT OF REGULARIZATION

Sergey A. Rukolaine

The principal difficulty in solving ill-posed problems is instability of their solutions with respect to small variations of input data. Methods for solving ill-posed problems (i.e., obtaining stable solutions of unstable problems) are called regularization methods.

1 SHORT HISTORICAL EXCURSUS

The concept of regularization is conveniently explained by the example of the linear Fredholm integral equation of the first kind (Wing, 1991)

(1)

where K is a given kernel of the integral equation (the kernel can be considered as the apparatus function of a measuring device), f is a given function (input data), and u is a sought-for solution, -∞≤ a < b ≤∞ and -∞≤ c < d ≤∞. The functions u and f belong to linear functional spaces U and F with norms ||u || = (∫ab|u (s)|2 ds)1/2 and ||f || = (∫cd|f (x)|2 dx)1/2, respectively.

The integral equation (1) is conveniently expressed in the form of the linear operator equation of the first kind

(2)

Note, that the kernel K and, consequently, the operator (mapping) A are usually known with an error, because they depend on radiative properties of medium whose numerical values are obtained from experimental data. But here they are supposed, for simplicity, to be exact.

In reality, the input data f contain an error (are disturbed by noise), i.e.,

where f is an exact function, which is unknown, and e is an error (noise). Due to instability of the problem, arbitrarily small error (noise) e can generally lead to arbitrarily large difference u -u, where u and u are obtained and exact solutions, respectively.

2 TIKHONOV’S THEOREM AND QUASISOLUTIONS

The concept of the regularization of ill-posed (unstable) inverse problems goes back to Tikhonov’s theorem (Tikhonov, 1943; Tikhonov and Arsenin, 1977; Denisov, 1999; Ivanov et al., 2002):

Theorem (A. N. Tikhonov). If A [in Eq. (2)] is a continuous one-to-one mapping of a compact set M of the space U onto a set N of the space F, then the inverse mapping A-1 is continuous on the set N.

Compact sets are sufficiently “thin” sets: examples are the sets of nondecreasing or nonincreasing functions, the sets of convex or concave functions, the sets of functions of bounded variation, and bounded finite–parametric sets of functions (Tikhonov et al., 1995). The continuity of the inverse mapping A-1 means, that small errors in the input data f cause small errors in the solution u (i. e., stability of u with respect to errors in f). In other words, if one knows that the solution u of Eq. (2) belongs to a sufficiently thin (compact) set M, and the input data f belong to the image N = AM of the set M under the mapping A, then the inverse problem (2) is well posed.

Tikhonov’s theorem substantiates the necessity of the use of a priori information about a sought-for solution in solving ill-posed problems. A priori (or prior) information is additional information about the sought-for solution u, which expresses some qualitative and/or quantitative properties of the solution. A priori information cannot be derived from the input data f or from the properties of the operator (mapping) A. It is necessary to emphasize here, that ill-posed (unstable) problems cannot generally be solved without a priori information about the sought-for solution.

The problem is that the input data f may not belong (due to errors) to the image N = AM of the compact set M (in practice, this is usually the case). To overcome this difficulty, Ivanov introduced a notion of quasisolution (Ivanov, 1962, 1963; Ivanov et al., 2002; Tikhonov and Arsenin, 1977; Denisov, 1999), which generalized the notion of the solution of Eq. (2). Quasisolution is a solution of the problem

(3)

where M is a compact set, ||·|| is a norm in a functional (e.g., Hilbert) space (Balakrishnan, 1976). If the operator (mapping) A is linear, then quasisolution always exists, is unique, and depends continuously on the input data f [i.e., the problem (3) is well posed]. The notion of quasi-solution is a generalization of the least-squares solution (Björck, 1996; Vogel, 2002).

3 TIKHONOV REGULARIZATION

A priori information, that the solution belongs to a compact set, is by no means always available. Phillips (1962) proposed a variational method of “picking out” the smoothest approximate solution from a set of functions u, corresponding to input data f within a given noise level. Subsequently, Tikhonov (1963a,b) proposed a more general variational method, which was based on the idea of a “stabilizing” functional Ω (u) ≥ 0 Tikhonov and Arsenin (1977). One of the properties of Ω is that the sets Ω (u) ≤ C, C > 0, are compact ones.

The starting point in Phillips’ (1962) and Tikhonov’s (1963a; 1963b) reasonings is the assumption that the estimate of the noise level is known,

The exact solution u is unknown, whereas the input data f are known. One can consider the set of functions u such that

(4)

This set contains the exact solution. However, due to ill-posedness of the problem, the set (4) is excessively large and contains both “good” solutions (approximating the exact one) and bad solutions (e.g., unphysical solutions). A way for selection of good solutions is needed. Phillips (1962) assumed that the exact solution was “reasonably smooth function”. His idea was to select the smoothest function u from the set (4). In more general terms of Tikhonov’s stabilizing functional Ω (u), this means that the solution of the minimization problem

(5)

is considered as a solution of Eq. (2). Under sufficiently general conditions, the solution of Eq. (5) belongs to the boundary of the set (4) (Tikhonov and Arsenin, 1977). Therefore, the minimization problem (5) can be replaced by the minimization problem

(6)

Equation (6) can be solved by the method of Lagrange multipliers, i.e., one can solve the minimization problem

(7)

(the Lagrange multiplier is α-1), the parameter α > 0 being determined here from the condition

(8)

where u α is the solution of Eq. (7). The functional Mα is Tikhonov’s smoothing functional, and the parameter α is called the regularization parameter. Determination of the regularization parameter from the condition (8) is Morozov’s discrepancy principle (Morozov, 1966a,b).

Note, that the functional Mα can be considered independently of the minimization problems (5) and (6), and the minimization problem (7) can be considered independently of the discrepancy principle (8). In this case, a problem of determination of the regularization parameter α arises.

In Tikhonov (1963a), Tikhonov proposes the stabilizing functional of the form

(9)

with p (s) ≥ 0 and q (s) ≥ q 0 > 0 [see also Tikhonov and Arsenin (1977)]. If p (s) = 1 and q (s) = 1, then the functional (9) takes the form

(10)

which is better known in the literature. The functional (10) leads to the well-known first-order Tikhonov regularization. In Tikhonov (1963b), Tikhonov proposed the k th-order stabilizing functional

(11)

In particular, the zezoth-order stabilizing functional, leading to weak regularization, is

(12)

It is very important to emphasize that the role of the functional Mα is not always to select smooth solutions. Its role depends on the stabilizing (or penalizing) functional Ω. If Ω is given by Eqs. (10) or (11) with k > 0, then the functional Mα does select smooth solutions. However, the functional Ω can be chosen in quite a lot of ways, depending on available a priori information about the sought-for solution. For example, in Daubechies et al., the penalizing functional Ω (u) is not quadratic, but a weighted lp-norm of the coefficients of u with respect to a particular orthonormal basis in a Hilbert space, with 1 ≤ p ≤ 2. Such a penalizing functional Ω can be useful, for example, if the solution u is sparse in the original space domain (spectrumlike solution) or in the Fourier domain or with respect to the singular value decomposition (SVD).

Note also that regularization in the form of the minimization problem (7) allows incorporating additional a priori information about the sought-for solution. For example, a priori information that the solution is nonnegative, which is often the case, leads to the minimization problem

4 REGULARIZATION METHODS REVISITED

Equations (1) and (2) are continuous mathematical models of inverse problems. To solve an inverse problem numerically, one has to discretize the equation in one way or another (Wing, 1991; Hansen, 1998). Note that, in inverse problems of interpretation of experimental data, the input data f are always given in a discrete form because they are obtained experimentally. In these problems, the input data are available in the form of the vector f = (f1, ... ,fm), where m is the number of experimental measurements.

4.1 Quasisolutions on Finite-Parametric Sets of Functions (Least-Squares Solutions)

In some cases (particularly in inverse problems of interpretation of experimental data), it is natural to state inverse problems in a semi-discrete form, where the input data f are given in a discrete form, while the solution u is sought for in a continuous form (Bertero and Boccacci, 1998). Then, instead of the integral Eq. (1), one has the system of equations

(13)

(the points xi are different).

The semi-discrete form (13) is suitable, for example, if there is a priori information that the solution u is completely determined by a finite number of parameters p1, ... ,pl, where l is known and much less than m, |pi|≤ Mi < ∞, i = 1, ... ,l (i.e., the solution belongs to a bounded finite-parametric set of functions, which is a compact set).

In the simplest case, the solution depends on the parameters linearly, i.e.,

(14)

where the basis functions φ1, ... , φl are linearly independent (Balakrishnan, 1976). In this case, the system (13) is reduced, with the use of the representation (14), to the linear algebraic system

(15)

where B m×l [i.e., B is a real (m ×l)-matrix, or a matrix with m rows and l columns], p = (p1, ... ,pl), the elements of the matrix B are bij = ∫abK (xi,sj(s) ds, i = 1, ... ,m, j = 1, ... ,l.

Because the number l of unknowns (parameters) is less than the number m of equations, the solution of the system (15) does not generally exist and the system is reduced to the least-squares problem

(16)

where is the Euclidean norm. The solution of this least-squares problem is given by the system of normal equations (Forsythe and Moler, 1967; Golub and Van Loan, 1996; Björck, 1996)

(17)

where the superscript T indicates the transposition.

The function (14) with the parameters p, obtained from the system (17), is nothing but the well-known least-squares solution of the integral equation (1), based on the representation (14). This example shows that the notion of quasisolution generalizes least-squares solutions.

In general, case dependence of the solution u on the parameters may be nonlinear,

In this case, the system (13) is reduced to the system of nonlinear equations,

(18)

where B is a nonlinear operator (mapping) from l to m. The system (18) is reduced to the nonlinear least-squares problem

4.2 Discrete Tikhonov Regularization

Discretization of the integral equation (1) [or the operator equation (2)], both with respect to the input data f and solution u, results in a linear algebraic system (Wing, 1991; Hansen, 1998)

(19)

where A m×n [i.e., A is a real (m × n)-matrix, or a matrix with m rows and n columns], f ≡ (f1, ... ,fm) m is a given vector, u ≡ (u1, ... ,un) n is a sought-for solution (we suppose for simplicity that nm). If the discretization is fine enough, then the matrix A is ill-conditioned.

Because the solution of the system (19) does not generally exist, the system is reduced to the least-squares problem

(20)

The solution of this least-squares problem is given by the system of normal equations,

(21)

However, because the matrix A is ill-conditioned, the solution u of the system (21) is practically unstable with respect to variations of the input data f; therefore, it cannot be considered as a stable solution of the system (19).

It is necessary to note that the normal system (17) differs from the system (21) essentially, and the difference is in the condition of the matrices A and B: the matrix B is well-conditioned, while the matrix A is ill-conditioned. Therefore, the solution p of the system (17) is stable with respect to the input data f.

For the system (19), the Tikhonov regularization (7), in the case when the penalizing functional Ω is quadratic, takes the form

(22)

where L is a real square matrix. Note that

where

0 is the zero vector of the length n. Therefore, the minimization problem (22) is equivalent to the least-squares problem,

The solution of this least-squares problem is given by the system of normal equations,

Thus, the solution of the minimization problem (22) is given by the linear algebraic system,

(23)

Note again that regularization in the form of the minimization problem (22) allows incorporating additional a priori information about the sought-for solution. For example, a priori information that the solution is nonnegative leads to the minimization problem

(24)

The solution of Eq. (24) is not generally given by the system (23).

The discrete zeroth-order Tikhonov regularization [corresponding to the zeroth-order Tikhonov regularization, i.e., with the stabilizing functional (12)] is given by L = In, where In is the n th-order identity matrix. In this case the system (23) takes the form

(25)

Of interest is application of the SVD of the matrix A to the system (25). Recall that the SVD of the matrix A m×n is that it can be represented in the following form (Forsythe and Moler, 1967; Golub and Van Loan, 1996; Björck, 1996; Bertero and Boccacci, 1998; Hansen, 1998) (we suppose for simplicity that nm):

where the matrices V = [v1 ... vm] m×m and W = [w1 ... wn] n×n are orthogonal, i.e.,

Σ is a rectangular “diagonal” matrix,

where σ1 ≥ σ2... ≥ σn ≥ 0. Henceforth, it is supposed for simplicity that σn > 0.

Application of the SVD to the matrix A yields, that the solution of the system (25) is

(26)

where

(27)

Note that the coefficients ci are bounded and nonincreasing: 1 ≥ c1...cn ≥ 0. Besides,

(28)

Note also that ci = 1 for α = 0, i = 1, ... ,n. This gives the solution of the system (21).

4.3 SVD-Based Regularization, Truncated SVD (TSVD)

The representation (26) can be rewritten as

(29)

where the coefficients ci are not given by the formula (27), but just satisfy the condition 0 ≤ ci ≤ 1. The coefficients ci in the representation (29) are called filter factors.

The property (28) prompts the simplest way of filtering, when the coefficients ci are equal to unity for smaller i and to zero for larger i: ci = 1 for 1 ≤ ik and ci = 0 for k < in. In this case, the representation (29) takes the form

(30)

the integer parameter k, 1 ≤ kn, playing the role of the regularization parameter. This way of regularization is called the truncated SVD (TSVD) (Hansen, 1998; Bertero and Boccacci, 1998).

The TSVD (30) is not the unique way of SVD-based regularization. Other ways of filtering are possible (Bertero et al., 1988; Hansen, 1998), including nonlinear ones (Daubechies et al.).

4.4 Iterative Regularization

Some iterative methods, developed for solution of well-conditioned linear algebraic systems, can be used for stable solution of ill-conditioned systems. In this case, the number of iterations plays the role of the regularization parameter.

Note that we do not mean here the systems (17) and (23), which are well-conditioned and can be solved by any method, including iterative ones. However, iterative methods applied to the systems (17) and (23) are not used as a regularization tool.

Iterative regularization methods for the system (19) stem mainly from the gradient-based minimization of the discrepancy functional

The gradient of the functional J is (Bertero and Boccacci, 1998)

The iterative scheme of the gradient-based minimization of the functional J can be written in general form as

where τk > 0 is a relaxation parameter. Different ways of the choice of the relaxation parameter lead to different minimization methods.

In the simplest case, the relaxation parameter is fixed so that τk = τ and 0 < τ < 1/ σ12, where σ1 is the largest singular value of the matrix A (the latter condition ensures the convergence of the method). This leads to the iterative scheme

(31)

which is known as the Landweber method (Landweber, 1951; Engl et al., 1996; Bertero and Boccacci, 1998; Hansen, 1998; Vogel, 2002). The convergence of the Landweber method is slow, and it is rarely used in practice. However, it can serve as a good illustration of why and how iterative regularization methods work.

With the use of the SVD of the matrix A and the equality ΣTΣ = diag(σ12, ... , σn2), where diag(σ12, ... , σn2) is the diagonal matrix with the diagonal elements σ12, ... , σn2, the relation (31) takes the form

Suppose, for simplicity, that the initial approximation is u(0) = 0. Then, it is easy to deduce by induction, that

Therefore, in the Landweber method (31)

(32)

where the filter factors are

Because 0 < τσ12 < 1 (the convergence condition), the filter factors ci(k) → 1 as k →∞, i = 1, ... ,n. Therefore, the successive approximations u(k) (32) converge to the solution of the system (21), which is unacceptable. However, for some finite k, the filter factors ci(k) ≈ 1 for small i and ci(k) ≈ 0 for large i, and corresponding u(k) can be considered as regularized solutions of the system (19) (Bertero and Boccacci, 1998; Hansen, 1998). Thus, the number of iterations k plays the role of the regularization parameter.

Other ways of the choice of the relaxation parameter lead to other iterative regularization methods: the steepest descent method, the quite efficient conjugate gradient method, etc. (Alifanov, 1994; Alifanov et al., 1995; Hanke, 1995; Engl et al., 1996; Bertero and Boccacci, 1998; Hansen, 1998; Vogel, 2002).

5 CHOICE OF REGULARIZATION PARAMETER

The choice of the value of the regularization parameter is of fundamental importance in the Tikhonov, SVD-based, and iterative regularization. The classical method is the discrepancy principle (Morozov, 1966a,b; Tikhonov and Arsenin, 1977; Hanke and Hansen, 1993; Bakushinskii and Goncharskii, 1994; Engl et al., 1996; Bertero and Boccacci, 1998; Hansen, 1998; Vogel, 2002). The discrepancy principle is completely (mathematically) substantiated. Unfortunately, in practice, the discrepancy principle often gives oversmoothed solutions, partially due to the fact that the noise level (data error) is usually overestimated.

The discrepancy principle explicitly uses information about the noise level (data error). Note that there is a mathematical statement: “There is no regularizer independent of the noise level to a linear ill-posed problem. If such a regularizer exists, then the problem is well-posed.” (Ramm, 2005). This statement means that regularizers, or regularizing algorithms, for ill-posed problems must depend on the noise level (cannot be error-free). See also (Bakushinskii and Goncharskii, 1994; Engl et al., 1996; Yagola et al., 2002).

There are other methods of the choice of the regularization parameter: variants of the discrepancy principle, the quasi-optimality criterion, generalized cross-validation (GCV), the L-curve criterion, etc. (Tikhonov and Arsenin, 1977; Hanke and Hansen, 1993; Engl et al., 1996; Bertero and Boccacci, 1998; Hansen, 1998; Vogel, 2002).

The L-curve criterion has gained popularity because this is an error-free method. It is necessary to note that the L-curve criterion has limitations and, in some problems, fails (Engl and Grever, 1994; Hanke, 1996; Vogel, 1996; Engl et al., 1996; Hansen, 2001).

6 NONLINEAR PROBLEMS

In real engineering applications heat and mass transfer, including radiation, is usually a combined-mode one and corresponding problems are highly nonlinear. In the case of nonlinear problems, the concept of regularization does not undergo appreciable changes. Regularization is related to instability of inverse problems and not to their nonlinearity, but instability and nonlinearity are essentially different notions. Nonlinearity of unstable inverse problems may greatly complicate their solving due to the presence of multiple local minima of an objective functional, etc. However, these difficulties are related to nonlinearity rather than to instability; the same difficulties are present in well-posed (stable) nonlinear problems as well. Thus, in nonlinear problems, the concept of regularization remains basically the same.

REFERENCES

Alifanov, O. M., Inverse Heat Transfer Problems, Springer–Verlag, Berlin, 1994.

Alifanov, O. M., Artyukhin, E. A. and Rumyantsev, S. V., Extreme Methods for Solving Ill-Posed Problems with Applications to Inverse Heat Transfer Problems, Begell House, New York, and Redding, CT, 1995.

Bakushinskii, A. B. and Goncharskii, A. V., Ill-posed Problems: Theory and Applications, Kluwer, Dordrecht, 1994.

Balakrishnan, A. V., Applied Functional Analysis, Springer–Verlag, New York, 1976.

Bertero, M., De Mol, C., and Pike, E. R., Linear inverse problems with discrete data: II. Stability and regularization, Inverse Prob., vol. 4, pp. 573–594, 1988.

Bertero, M. and Boccacci, P., Introduction to Inverse Problems in Imaging, Institute of Physics, Bristol and Philadelphia, 1998.

Björck, Å., Numerical Methods for Least Squares Problems, SIAM, Philadelphia, 1996.

Daubechies, I., Defrise, M. and De Mol, C. (2004) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint, Commun. Pure Appl. Math., vol. 57, pp. 1413–1457, 2004.

Denisov, A. M., Elements of the Theory of Inverse Problems, VSP, Utrecht, 1999.

Engl, H. W. and Grever, W., Using the L–curve for determining optimal regularization parameters. Numer. Math., vol. 69, pp. 25–31, 1994.

Engl, H. W., Hanke, M., and Neubauer, A., Regularization of Inverse Problems, Kluwer, Dordrecht, 1996.

Forsythe, G. E. and Moler, C. B., Computer Solution of Linear Algebraic Systems, Prentice Hall, Englewood Cliffs, NJ, 1967.

Golub, G. H. and Van Loan, C. F., Matrix Computations, John Hopkins University Press, Baltimore, 1996.

Hanke, M. and Hansen, P. C., Regularization methods for large-scale problems. Surv. Math. Ind., vol. 3, pp. 253–315, 1993.

Hanke, M., Conjugate Gradient Type Methods for Ill-Posed Problems, Wiley, Hoboken, NJ, 1995.

Hanke, M., Limitations of the L–curve method in ill-posed problems, BIT, vol. 36, pp. 287–301, 1996.

Hansen, P. C., Rank-Deficient and Discrete Ill-Posed Problems: Numerical Aspects of Linear Inversion, SIAM, Philadelphia, 1998.

Hansen, P. C., The L-curve and its use in the numerical treatment of inverse problems. Computational Inverse Problems in Electrocardiology, pp. 119–142, WIT Press, Southampton, 2001.

Ivanov, V. K., On linear ill-posed problems, Dokl. Akad. Nauk SSSR, vol. 145, pp. 270–272 (In Russian); English transl.: Soviet Math. Dokl., vol. 3, pp. 981–983, 1962.

Ivanov, V. K., On ill-posed problems, Matem. sbornik, vol. 61, pp. 211–223, 1963 (In Russian).

Ivanov, V. K., Vasin, V. V., and Tanana, V. P., Theory of Linear Ill-Posed Problems and Its Applications, VSP, Utrecht, 2002.

Landweber, L., An iteration formula for Fredholm integral equations of the first kind, Am. J. Math., vol. 73, pp. 615–624, 1951.

Morozov, V. A., On the solution of functional equations by the method of regularization, Dokl. Akad. Nauk SSSR, vol. 167, pp. 510–512 (In Russian); English transl.: Soviet Math. Dokl., vol. 7, pp. 414–417, 1966a.

Morozov, V. A., On the regularization of ill-posed problems and the choice of the regularization parameter, Zh. Vychisl. Matem. Matem. Fiz., vol. 6, pp. 170–175 (In Russian); English transl.: USSR Comput. Math. and Math. Phys., vol. 6, pp. 242–251, 1966b.

Phillips, D. L., A technique for the numerical solution of certain integral equations of the first kind, J. Associat. Comput. Mach., vol. 9, pp. 84–97, 1962.

Ramm, A. G., Inverse Problems: Mathematical and Analytical Techniques with Applications to Engineering, Springer, New York, 2005.

Tikhonov, A. N., On the stability of inverse problems. Dokl. Akad. Nauk SSSR, vol. 39, no. 5, pp. 195–198, 1943 (in Russian).

Tikhonov, A. N., Solution of incorrectly formulated problems and the regularization method, Dokl. Akad. Nauk SSSR, vol. 151, no. 3, pp. 501–504 (in Russian); English transl.: Soviet Math. Dokl., vol. 4, no. 4, pp. 1035–1038, 1963a.

Tikhonov, A. N., Regularization of incorrectly posed problems, Dokl. Akad. Nauk SSSR, vol. 153, no. 1, pp. 49–52 (in Russian); English transl.: Soviet Math. Dokl., vol. 4, no. 6, pp. 1624–1627, 1963b.

Tikhonov, A. N. and Arsenin, V. Y., Solution of Ill-Posed Problems, Wiley, Hoboken, NJ, 1977.

Tikhonov, A. N., Goncharsky, A. V., Stepanov, V. V., and Yagola, A. G., Numerical Methods for Solving Ill-Posed Problems, Kluwer, Dordrecht, 1995.

Vogel, C. R., Non-convergence of the L–curve regularization parameter selection method, Inverse Prob., vol. 12, pp. 535–547, 1996.

Vogel, C. R., Computational Methods for Inverse Problems, SIAM, Philadelphia, 2002.

Wing, G. M., A Primer on Integral Equations of the First Kind: The Problem of Deconvolution and Unfolding, SIAM, Philadelphiaá 1991.

Yagola, A. G., Leonov, A. S., and Titarenko, V. N., Data errors and an error estimation for ill-posed problems, Inverse Problems Eng., vol. 10, pp. 117–129, 2002.

Number of views: 23633 Article added: 7 September 2010 Article last modified: 1 March 2011 © Copyright 2010-2017 Back to top