\documentclass[runningheads]{lncse}
\usepackage{psfig}

%
\begin{document}


\title{Challenges and Opportunities in Using Automatic Differentiation with 
Object-Oriented Toolkits for Scientific Computing\thanks{ 
\tiny
The submitted manuscript has been created by the University of Chicago as
Operator of Argonne National Laboratory (``Argonne'') under Contract
No. W-31-109-ENG-38 with the U.S. Department of Energy.  The U.S. Government
retains for itself, and others acting on its behalf, a paid-up, nonexclusive,
irrevocable worldwide license in said article to reproduce, prepare derivative
works, distribute copies to the public, and perform publicly and display
publicly, by or on behalf of the Government.}}


\titlerunning{Using Automatic Differentiation with Object-Oriented Toolkits}

\author{Paul Hovland\inst{1} \and Steven Lee\inst{2} \and Lois McInnes\inst{1}
\and Boyana Norris\inst{1} \and Barry Smith\inst{1}}
%
\authorrunning{P. Hovland, S. Lee, L. McInnes, B. Norris, B. Smith}   
%
\institute{
Mathematics and Computer Science Division, 
Argonne National Laboratory, 
9700 S. Cass Ave., 
Argonne, IL 60439-4844.
\and
Center for Applied Scientific Computing,
Lawrence Livermore National Laboratory,
Box 808, L-560,
Livermore, CA 94551.
}

\maketitle

\begin{abstract}
The increased use of object-oriented toolkits in large-scale scientific
simulation presents new opportunities and challenges for the use of automatic
(or algorithmic) differentiation (AD) techniques, especially in the context of
optimization.  Because object-oriented toolkits use well-defined interfaces
and data structures, there is potential for simplifying the AD process.
Furthermore, derivative computation can be improved by exploiting
high-level information about numerical and computational abstractions.
However, challenges to the successful use of AD with these toolkits also
exist.  Among the greatest challenges is balancing the desire to limit the
scope of the AD process with the desire to minimize the work required of a
user.  We discuss our experiences in integrating AD with the PETSc, PVODE, and
TAO toolkits and our plans for future research and development in this area.
\end{abstract}

\section{Introduction}

The ever-increasing complexity of advanced computational science applications
has led to an increase in the use of object-oriented software practices in the
development of scientific applications and toolkits.  A good object-oriented
design increases productivity by allowing developers to focus on a small
component of a complex system.  Furthermore, increased code reuse provides
justification for expending significant effort in the development of highly
optimized object-oriented toolkits.

Many high-performance numerical toolkits include components designed to be
combined with an application-specific nonlinear function.  Examples include
optimization components, nonlinear equation solvers, and differential
algebraic equation solvers.  Often the numerical methods implemented by these
components also require first and possibly second derivatives of the function.
Frequently, the toolkit is able to approximate these derivatives by using
finite differences; however, the convergence rate and robustness are
often improved if the derivatives are computed analytically.

Developing correct parallel code for computing the analytic derivatives of a
complicated nonlinear function can be an onerous task, especially
when second derivatives are required.  An automated alternative such as
automatic differentiation (AD)~\cite{Griewank89a,Griewank-book} is therefore
very attractive as a mechanism for providing these analytic derivatives.
Furthermore, because object-oriented toolkits provide well-defined interfaces
for nonlinear functions, the AD process can potentially be simplified.

We examine the use of AD in conjunction with object-oriented numerical
toolkits.  In particular, we describe the use of AD to provide first and,
where appropriate, second derivatives in conjunction with the Portable, Extensible
Toolkit for Scientific Computing (PETSc), the Toolkit for Advanced
Optimization (TAO), and a parallel ODE solver for computing sensitivities
(SensPVODE).  

This paper is organized as follows.  Section~\ref{sec:AD} provides a brief
intoduction to automatic differentiation.  Section~\ref{sec:toolkits}
introduces the toolkits considered in this paper.  Section~\ref{sec:issues}
discusses the opportunities and challenges that arise in using AD with these
toolkits.  Section~\ref{sec:experiments} provides a synopsis of experimental
results.  Section~\ref{sec:conclusions} summarizes our experiences and
describes our expectations for using AD in the context of PDE-constrained
optimization.

\section{Automatic Differentiation}
\label{sec:AD}

Automatic, or algorithmic, differentiation is a technique for augmenting
arbitrarily complex computer subprograms with instructions for the computation
of derivatives.  The technique combines rules for analytically differentiating
the finite number of elemental functions in a programming language with the
chain rule of differential calculus.  

The two principal approaches to implementing AD are operator overloading and
compiler-based source transformation.  Each method has its advantages and
disadvantages~\cite{OOvsST}.  There are also two basic modes of AD, the
forward mode and the reverse mode.  The forward mode is particularly
appropriate when the number of independent variables is small or when a small
number of directional derivatives are required.  The reverse mode is
attractive when the number of dependent variables is small or when a
Jacobian-transpose-vector product is required.  We focus on the source
transformation approach using the forward mode, but most of the issues
discussed in this paper are also applicable to the operator overloading
approach and the reverse mode.  More details of how AD works and various
methods for exploiting chain rule associativity can be found
in~\cite{Griewank-book}.

Figure~\ref{fig:adifor-example} shows a simple example of the code generated
by the ADIFOR~\cite{ADIFOR_V2} source transformation tool for Fortran~77.
While this example is not indicative of the power of automatic differtiation,
which is equally applicable to applications spanning thousands or millions of
lines of code and utilizing complex control structures, it does illustrate
some important concepts.  

\begin{figure}[tbhp]
% \vspace*{.2in}
% \hspace*{.5in}
\begin{minipage}[t]{2.0in}
{
\begin{verbatim}
y(1) = 10.0 * (x2-x1*x1)
y(2) = 1.0 - x1
\end{verbatim}
}
\end{minipage} \hfill \begin{minipage}[t]{4.25in}
{
\begin{verbatim}
 d2_b = dble(10.0)
 d5_b = (-d2_b) * x1 + (-d2_b) * x1
 do g_i_ = 1, g_p_
   g_y(g_i_, 1) = d5_b * g_x1(g_i_) + 
+                 d2_b * g_x2(g_i_)
 enddo
 y(1) = dble(10.0) * (x2 - x1 * x1)

 do g_i_ = 1, g_p_
   g_y(g_i_, 2) = -g_x1(g_i_)
 enddo
 y(2) = 1.0d0 - x1
\end{verbatim}
}
\end{minipage}
\caption{A simple example of automatic differentiation.  The code on the right
was generated by the ADIFOR tool from the code fragment on the left.}
\label{fig:adifor-example}
\end{figure}

Each scalar variable in the original program has associated with it a
derivative vector (sometimes generically referred to as a derivative object).
In the case of ADIFOR-generated code, this association is by name.  For
example, the derivative vector associated with \verb+x1+ is
\verb+g_x1+.  In the case of an AD tool for C, such as ADIC~\cite{ADIC},
association by name is not possible because of aliasing.  Instead, association
must be by address.  ADIC accomplishes this association by converting all
floating-point variables to a new datatype, \verb+DERIV_TYPE+.
\begin{verbatim}
typedef struct {
        double value;
        double  grad[ad_GRAD_MAX];
} DERIV_TYPE;
\end{verbatim}
The \verb+value+ field carries the original floating-point value, while the
\verb+grad+ field contains the associated derivative vector.

The derivative vectors associated with independent variables are collectively
referred to as the seed matrix.  The seed matrix can be initialized such that
upon completion of the computation the derivative vectors associated with the
dependent variables contain the Jacobian, a Jacobian-vector product, or an
arbitrary Jacobian-matrix product.  For example, if in our simple example we
initialize \verb+g_x1+ to $[1.0, 0.0]$ and \verb+g_x2+ to $[0.0, 1.0]$, then
\verb+g_y(i,j)+ will contain $\partial{y_j}/\partial{xi}$.
% Explain transpose?

\section{Toolkits}
\label{sec:toolkits}

We have investigated the use of AD in conjunction with three object-oriented
toolkits for scientific computing: PETSc, TAO, and SensPVODE.  All of these
toolkits employ an object-oriented design but are implemented in C.
%Explain why?
In the next sections, we briefly describe these toolkits and the role
derivatives play.

\subsection{Portable, Extensible Toolkit for Scientific Computing}

% Next 2 paragraphs should be rephrased

PETSc~\cite{bgms97,petsc-user-ref} is a suite of data structures and routines
for the scalable solution of scientific applications modeled by partial
differential equations.  The software integrates a hierarchy of components
that range from low-level distributed data structures for vectors and matrices
through high-level linear, nonlinear, and timestepping solvers.  The
algorithmic source code is written in high-level abstractions so that it can
be easily understood and modified.  This approach promotes code reuse and
flexibility and, in many cases, helps to decouple issues of parallelism from
algorithm choices.

PETSc provides a suite of nonlinear solvers that couple a Newton-based method,
offering the advantage of rapid convergence when an iterate is near to a
problem's solution, with a line search, trust region, and pseudo-transient
continuation strategy to extend the radius of convergence of the Newton
techniques.  The linearized systems are typically solved inexactly with
preconditioned Krylov methods.  The basic Newton method requires the Jacobian
matrix, $J = F'(u)$, of a nonlinear function $F(u)$.  Matrix-free
Newton-Krylov methods require Jacobian-vector products, $F'(u) v$, and may
require an approximate Jacobian for preconditioning.

PETSc also provides components for managing computations on structured grids,
including hierarchies of grids for use in multigrid methods.  Among the
functions provided by these components are generalized gather-scatter
operations for communicating ghost values, colorings for use in finite
difference (and AD) Jacobian computations, and simplified facilities for
mapping between local and global indices.

\subsection{Toolkit for Advanced Optimization}

% Next 2 paragraphs should be rephrased

TAO~\cite{bmm99,tao-user-ref} focuses on scalable optimization
software, including nonlinear least squares, unconstrained
minimization, bound-constrained optimization, and general nonlinear
optimization.  The TAO optimization algorithms use high-level
abstractions for matrices and vectors and emphasize the
reuse of external tools where appropriate, including support for using
the linear algebra components provided by PETSc and related tools.

Many of the algorithms employed by TAO require first and sometimes second
derivatives.  For example, unconstrained minimization solvers that require the
gradient, $f'(u)$, of an objective function, $f(u)$, include a limited-memory
variable metric method and a conjugate gradient method, while solvers that
require both the gradient, $f'(u)$, and Hessian, $f''(u)$, (or
Hes\-sian-vector products) include line search and trust region variants of
Newton methods.  In addition, algorithms for nonlinear least squares and
constrained optimization often require the Jacobian of the constraint
functions.
% or at least the computation of Jacobian-vector and Jacobian-transpose-vector products.

\subsection{SensPVODE}

PVODE~\cite{BH99} is a high-performance ordinary differential equation solver
for the types of initial value problems (IVPs) that arise in large-scale
computational simulations.
\begin{equation}
y^{\prime}(t) = f(t,y,p), \; \; y(t_0) = y_0(p), \; \; y \in {\bf R}^N, \; \; p \in
{\bf R}^m. \label{origODE}
\end{equation}
Often, one wants to compute sensitivities with respect to certain
parameters $p_i$ in the IVP, $s_i = \partial{y}/\partial{p_i}$.  SensPVODE~\cite{LHB00} 
is a variant of PVODE that simultaneously solves the original ODE and 
the sensitivity ODEs.
\begin{equation}
s_i^{\prime}(t) = 
\frac{\partial f}{\partial y} s_i(t) + \frac{\partial f}{\partial p_i}, \; 
s_i(t_0) = \frac{\partial y_{0}(p)}{\partial p_{i}}, 
\; i = 1,\cdots, m. \label{sensODE}
\end{equation}
%Improve this next sentence.
Thus, SensPVODE requires the derivatives $\frac{\partial f}{\partial
y}s_i(t)+\frac{\partial f}{\partial p_{i}}$.  

\section{Using Automatic Differentiation with Object-Oriented Toolkits}
\label{sec:issues}

Automatic differentiation can be used in conjunction with object-oriented
toolkits at many different levels.  At the highest level, AD can be applied to
a toolkit to facilitate sensitivity analysis and optimization of models
constructed with the toolkit.  Another option is to use AD to provide the
derivatives required by the toolkit.  To provide these derivatives, AD can be
applied directly to the parallel nonlinear function.  Alternatively, AD can be
applied to the building blocks of the nonlinear function, such as the
nonlinear function on a local subdomain or an element or vertex function.  In
the following sections we consider some of the opportunities and the
challenges in applying AD at these various levels.

\subsection{Toolkit Level}

Applying AD to an object-oriented toolkit for scientific computing offers the
opportunity to compute derivatives (also called sensitivities) of scientific
applications that use the toolkit.  These derivatives can be used for
sensitivity analysis, to understand the sensitivity of the simulation results
to uncertainties in model parameters, or for optimization.  Applying AD to a
toolkit also provides an opportunity to employ so-called computational
differentiation techniques exploiting high-level mathematical and algorithmic
features.  Using computational differentiation techniques can also
circumvent some of the challenges that arise in applying AD to a toolkit,
including the fact that functions and derivatives may not converge at the same
rate in an iterative method and that certain numerical methods, including ODE
solvers with adaptive stepsize control, introduce feedback into a computation
that may seriously influence the derivatives.  

One simple example of a computational differentiation technique arises in the
differentiation of a linear solver.  Rather than differentiate through a
preconditioner and iterative Krylov solver, one can solve directly a linear
system with multiple right-hand sides, perhaps using block Krylov
methods~\cite{Buecker-Europar,block-CG} or a projection
method~\cite{Fischer-restarts}.  We have conducted preliminary research
in this area in developing a differentiated version of
PETSc~\cite{Norris-OO98}.  A more sophisticated example of computational
differentiation, applicable to other types of iterative solvers, 
% including nonlinear solvers, 
is described elsewhere in this
volume~\cite{Griewank-Piggyback}.  We also note that the development of
SensPVODE may be interpreted as applying computational differentiation
techniques to the PVODE toolkit.

\subsection{Parallel Nonlinear Function Level}

Automatic differentiation is often not quite ``automatic.''  Often, the user
of an AD tool needs to specify the independent and dependent variables and
possibly initialize the seed matrix to indicate exactly which derivatives are
to be computed.  The use of AD in conjunction with a numerical toolkit
promises to make full automation possible because of the use
of well-defined interfaces.  For example, the nonlinear solver in PETSc
requires a nonlinear function adhering to the interface
\begin{verbatim}
int Function(SNES, Vec, Vec, void *);
\end{verbatim}
while SensPVODE requires a function adhering to the interface
\begin{verbatim}
void function(integer, Real, N_Vector, N_Vector, void *);
\end{verbatim}
In both cases, all of the required information regarding the derivatives to be
computed is known in advance, making automation of the AD process possible.
Other work~\cite{NEOS-AD,Gertz,DASPK} has also demonstrated the benefits of
well-defined interfaces for automating the AD process.

However, applying AD directly to the parallel nonlinear function required by
the toolkit is not without challenges.  The function may include many calls to
toolkit support functions.  Thus, differentiated versions of these functions
must be developed using automatic or computational differentiation techniques.
In many cases, these support functions are used for communicating ghost values
or performing similar problem setup or data movement functions.  Consequently,
the differentiated versions of the functions may end up performing unnecessary
work.  Parallelism is also an issue.  The nonlinear function may use OpenMP
or make calls to MPI functions, so the AD tool must support the parallel
programming paradigm being used.  Furthermore, the parameters to the function
are likely parallel objects, as is the case with the \verb+Vec+ and
\verb+N_Vector+ objects in the examples above.  Therefore, care must be taken
in the way seed matrices are initialized; additional communication may also be
required.

% There is also the issue of copying between an ad_vector and a matrix or
% array of vectors.
 
An important issue in the differentiation of application functions arises from
the benign-looking \verb+void *+ that appears as the final argument in both of
the examples above.  This argument is used to pass a pointer to an
application-specific data structure that contains various data objects of use
to the application.  Figure~\ref{fig:userdata} contains several examples of
these data structures.

% Have not defined active/passive variables, so avoid dicussion or define.

As discussed earlier, in order to provide association by address between
variables and their derivative vectors, ADIC changes the datatypes of all
floating-point variables.  The consequence for these application-specific data
structures is that if AD is naively applied, the type of several fields within
the data structure will be changed.  Therefore, all application functions that
access the data structure, not just the nonlinear function, must be modified
to use the new datatypes.

An alternative is to use two data structures, one with the original datatypes
and one with the derivative datatypes, and to copy data between them.  However, as
the final example in Figure~\ref{fig:userdata} illustrates, some care must be
taken with this approach.  If the data structure includes workspace, such as
the \verb+uext+ field in this example, it may be inefficient to copy such unneeded
data.  Furthermore, it is usually not possible to generate
code automatically for copying from one data structure to another.  Therefore, at this point
in our work coupling ADIC with SensPVODE, we require the user to provide the
routine for copying data between the application-specific data structures and
their differentiated counterparts.

% Could discuss having separate variables as a preferable solution,
% using either association by name or the hash table approach

\begin{figure}
\begin{center}
\begin{verbatim}
typedef struct {
   double      param;          /* test problem parameter */
   int         mx,my;          /* discretization in x, y directions */
   Vec         localX,localF;  /* ghosted local vectors */
   DA          da;             /* distributed array data structure */
   int         rank;           /* processor rank */
} AppCtx;

typedef struct {
   double     lidvelocity,prandtl,grashof;  
   PetscTruth draw_contours;                
} AppCtx;

typedef struct {
  real om, dx, dy, q4;
  real uext[NVARS*(MXSUB+2)*(MYSUB+2)];
  integer my_pe, isubx, isuby, nvmxsub, nvmxsub2;
  real *p;
  real *pbar;
  MPI_Comm comm;
} *UserData;
\end{verbatim}
\end{center}
\caption{Examples of application-specific data structures used in PETSc, TAO,
and SensPVODE applications.}
\label{fig:userdata}
\end{figure}

\subsection{Local Subdomain Level}

As alluded to in the preceding section, many parallel nonlinear functions
follow the pattern gather/scatter ghost values, compute function on local
subdomain, and assemble/map function from local to global indices.  Therefore, an
approach that avoids many of the complications associated with applying AD to
the full parallel nonlinear function is to apply AD only to the local
subdomain function, with manual modifications to the setup and assembly
phases.  We have examined this approach in conjunction with PETSc and TAO by
manually extracting the local subdomain function, as illustrated
in Figure~\ref{fig:manual-subdomain} and described
in~\cite{ad2000-toolkits,AD-PETSC-TM}.

\begin{figure}[hbt]
% \centerline{\psfig{figure=app-petsc-ad-details.ps,width=3.5in}}
\mbox{\centerline{\psfig{figure=app-petsc-ad-details.eps,width=.6\textwidth}}}
\caption{\small Schematic diagram of the use of automatic differentiation
tools to generate the Jacobian routine for a nonlinear PDE computation.}
\label{fig:manual-subdomain}
\end{figure}

Recent work has examined how we can use the structured grid components to
simplify the process.  The setup and assembly phases are essentially the same
for most nonlinear functions on a structured grid.  Therefore, instead of
extracting the subdomain computation from the parallel nonlinear function, we
ask the user to provide only the subdomain function.  Instead of providing a
nonlinear function with the interface
\begin{verbatim}
int Function(SNES, Vec, Vec, void *);
\end{verbatim}
the user provides, for example, a local subdomain function with the interface
\begin{verbatim}
int LocalFormFunction2d(Field  **, Field  **, Coords, void *);
\end{verbatim}
where \verb+Coords+ contains information about the corners of the subdomain
and \verb+Field+ is a structure with a number of scalar fields corresponding
to the number of degrees of freedom at each vertex.  The setup and assembly are
handled by the structured grid component.  Given a differentiated version of
\verb+LocalFormFunction2d+, the structured grid component can also perform the
necessary setup, including seed matrix intialization, and assembly of the
Jacobian.  The setup and assembly phases use
coloring~\cite{Goldfarb-Toint} to reduce the length of the derivative vectors
to the stencil size times the number of degrees of freedom.  Future research
will extend this work to Hessian computations and unstructured meshes.

\subsection{Element or Vertex Function Level}

A final option is to differentiate the element or vertex function and then
assemble the full Jacobian from the element (or vertex) Jacobians.  This is
common practice in finite element computations, especially when the element
Jacobian is simple to derive by hand.  For complex element or vertex
functions, or for higher-order derivatives, AD can be employed.  This approach
eliminates the need for a matrix coloring, reduces the memory requirements,
and is easily extended to second and higher-order derivatives.

The principal impediment to this approach is that not all functions are easily
decomposed to this level.  It may be computationally more efficient to
precompute fluxes for all of the edges/faces in the subdomain, then use these
fluxes in computing the element functions.  Furthermore, boundary conditions
may introduce many special cases.  It is computationally more efficient to
handle these special cases separately, then loop over the remaining elements,
than to test every element to determine whether a special case applies.  
% Clincher sentence?

\section{Experimental Results}
\label{sec:experiments}

We provide a brief synopsis of experimental results from the use of AD in
conjunction with the PETSc and SensPVODE toolkits.  More detailed descriptions
of these and related experiments can be found
in~\cite{ad2000-toolkits,Norris-OO98,hm99,ad2000-pvode}.

\subsection{Toolkit Level}

In~\cite{Norris-OO98} we describe research into the application of automatic
and computational differentiation to PETSc.  Here we briefly discuss some of
the relevant experimental results.

We have tested the differentiated version of PETSc, and in particular its
linear solver component, with an example involving the solution of a linear
system of equations $A\,x = b$ where $A$ is the $256 \times 256$ matrix whose
sparsity pattern corresponds to a five-point stencil discretization of a $16
\times 16$ computational domain.  In all of the experiments, we have used a
GMRES solver in combination with an incomplete LU factorization
preconditioner.

\begin{figure}[tbh]
\centerline{\psfig{file=acc_256_256_var.ps,width=.45\textwidth}%
	    \hspace{.08\textwidth} %
            \psfig{file=time_256_256_var.ps,width=.45\textwidth}}
\caption{Gradient error and execution time with varying convergence tolerances.}
\label{fig:ad_PETSc}
\end{figure}

Figure~\ref{fig:ad_PETSc} shows the accuracy and performance results for
various convergence tolerances.  DD (for ``divided differences'') designates
finite difference approximation, AD designates black-box automatic
differentiation, and CD stands for computational differentiation using
successive linear solves on the multiple right-hand sides.  The termination
condition of the Krylov subspace methods is based on the relative decrease of
the $l_2$-norm of the residual and the convergence tolerance value, which is
plotted along the $x$-axis.  The $y$-axis of the accuracy plot is the
$l_2$-norm of the matrix representing the difference between the derivatives
produced by the various approaches and the actual solution, $\nabla\!x =
A^{-1} b$, which we compute separately up to machine precision for
verification purposes.  For the finite difference and AD approaches the
convergence tolerance refers to the convergence of $x$, whereas in the
computational differentiation approach it refers to the convergence of $\nabla
\! x$. In this example, the computational differentiation approach exhibits
significant performance improvement over finite differences and AD.


\subsection{Parallel Nonlinear Function Level}

We applied SensPVODE to a simple test case, a two-species diurnal
kinetics advection-diffusion system in two space dimensions.
The PDEs can be written as
\[
\frac{\partial c_i}{\partial t}=K_h\frac{\partial^2 c_i}{\partial x^2}
 +V \frac{\partial c_i}{\partial x}
 + \frac{\partial}{\partial y} \left(K_v(y) \frac{\partial
 c_i}{\partial y}\right) 
 + R_i(c_1,c_2,t) ~~~ (i=1,2),  \label{PDE2}
\]
where the subscripts $i$ are used to distinguish the chemical species.
The reaction terms are given by 
\begin{eqnarray*}
R_1(c_1,c_2,t) &=&-q_1c_1c_3-q_2c_1c_2+2q_3(t)c_3+q_4(t)c_2~{\rm
and} 
\label{react} \\
R_2(c_1,c_2,t) &=&q_1c_1c_3-q_2c_1c_2-q_4(t)c_2,  \nonumber
\end{eqnarray*}
and $K_v(y) = K_{0}e^{(y/5)}$.
The scalar constants for this problem are $K_h=4.0\times 10^{-6}$, 
$V=10^{-3}$, $K_{0}=10^{-8}$, $q_1=1.63\times 10^{-16}$,
$q_2=4.66\times 10^{-16}$, and $c_3=3.7\times 10^{16}$.
The diurnal rate constants are 
\[
\begin{array}{rclr}
q_i(t) & = & e^{[-a_i/\sin \omega t]} & \mbox{for }\sin \omega t>0, \\
q_i(t) & = & 0                         & \mbox{for }\sin \omega t\leq 0,
\end{array}
\]
where $i=3~{\rm and}~4$, $\omega=\pi /43200$, $a_3=22.62$, and $a_4=7.601$.
The time interval of integration is $[0, 86400]$, representing $24$
hours measured in seconds.

The problem is posed on the square $0 \le x \le 20$, $30 \le y \le 50$
(all in km), with homogeneous Neumann boundary conditions.
The PDE system is treated by central differences on a uniform $100 \times 100$
grid,
with simple polynomial initial profiles.
See~\cite{LHB00} for more details.
For the purpose of sensitivity analysis, we identify the following $8$
parameters associated with this problem:
$p_1=q_1$, $p_2=q_2$, $p_3=c_3$, $p_4=a_3$, $p_5=a_4$, $p_6=K_h$,
$p_7=V$, and $p_8=K_{0}$.
% In solving for $k$ sensitivities, we are computing the ODE
% solution together with the scaled sensitivities with respect to
% the first $k$ parameters; that is, $y(t)$ and $w_1(t),\ldots,w_5(t)$.

We varied the number of sensitivities from $1$ to $8$ and compared the use of
AD with three finite difference methods to compute the derivatives
$\frac{\partial f}{\partial y}s_i(t)+\frac{\partial f}{\partial p_{i}}$
required by SensPVODE.  In the combined central difference method, $p_i$ and
$y$ are perturbed simultaneously to obtain both terms in the derivative
expression.  In the other finite difference methods, the terms are approximated separately.
See~\cite{LHB00} for a complete description of the finite difference
strategies.  

\begin{figure}[bth]
\mbox{\centerline{\psfig{figure=senspvode-time.eps,width=.75\textwidth}}}
\caption{\small Comparison of SensPVODE performance (total time to solution) 
for various derivative computation strategies.  Results are the average of
three runs on 16 processors of a Linux cluster.}
\label{fig:senspvode-time}
\end{figure}

\begin{figure}[htb]
\mbox{\centerline{\psfig{figure=senspvode-nst.eps,width=.45\textwidth}%
\hfill %
\psfig{figure=senspvode-timeperstep.eps,width=.45\textwidth}}}
\caption{\small Number of timesteps and time per timestep 
for various derivative computation strategies in SensPVODE.  Results are the average of
three runs on 16 processors of a Linux cluster.}
\label{fig:senspvode-nstepstimeperstep}
\end{figure}

Figures~\ref{fig:senspvode-time} and \ref{fig:senspvode-nstepstimeperstep}
summarize our results on 16 nodes of a Linux cluster.  Each node in the
cluster has two 550 MHz Pentium III processors (only one processor per node was
used) and Myrinet interconnect.  AD shows a significant performance advantage
over the finite difference methods.  However, as
Figure~\ref{fig:senspvode-nstepstimeperstep} illustrates, the performance
improvements are due to a reduction in the number of timesteps required; the
runtime per timestep actually increases.
% Explain this increase.

\subsection{Local Subdomain Level}

We used AD to provide the directional derivatives required by PETSc
to solve the steady-state, three-dimen\-sio\-nal compressible Euler equations
on mapped, structured meshes using a second-order, Roe-type, finite-volume
discretization.  We solved in parallel a nonlinear system, using matrix-free
Newton-Krylov-Schwarz algorithms with pseudo-transient continuation to model
transonic flow over an ONERA M6 airplane wing.  See \cite{NKS-petsc-98} for
details about the problem formulation and algorithmic approach.  The
linearized Newton correction equations were solved by using restarted GMRES
preconditioned with the restricted additive Schwarz method with one degree of
overlap.

\begin{figure}[htbp]
\mbox{\centerline{\psfig{figure=petsc-its.eps,width=.75\textwidth}}}
\caption{\small Algorithmic performance of automatic differentiation
derivatives versus finite difference approximations}
\label{fig:petsc-its}
\end{figure}

As discussed in depth in \cite{hm99} and summarized in
Figure~\ref{fig:petsc-its}, our results indicate that, for matrix-free
Newton-Krylov methods, AD offers significantly greater robustness and provides
better algorithmic performance than do finite difference approximations (FD).
However, because the directional derivatives required by a matrix-free method
can be computed less expensively by FD than by AD, AD does not always provide
a performance advantage in terms of runtime~\cite{hm99}.  We are therefore
investigating hybrid AD-FD strategies, which combine the robustness of AD with
the reduced cost of FD.

% Include Jong's work as example of element function?

\section{Conclusions and Expectations}
\label{sec:conclusions}

The combination of AD and object-oriented toolkits has proven to be an
effective instrument for scientific computing.  The analytic derivatives of AD
enhance robustness and accelerate the convergence of Newton methods.  The
well-defined interfaces and data encapsulation of object-oriented toolkits
simplify the AD process.  There are many options as to the level at which AD
can be applied.  Each level has its advantages and disadvantages.

AD offers great promise as a useful tool in PDE-constrained optimization.
Analytic derivatives are often necessary to ensure robust and efficient
convergence to the true minimum. For complex PDE-based simulations, however,
developing analytic derivatives, especially second derivatives, by hand is
often intractible.  Judicious use of AD can overcome these obstacles.
Furthermore, the reverse mode of AD enables gradients,
Jacobian-transpose-vector, and Hessian-vector products to be computed at
significantly lower cost than is possible with finite difference
approximations.  The ability to compute accurately and efficiently a full
suite of first and second derivative matrices and directional derivatives
should facilitate the algorithmic experimentation necessary for the
advancement of PDE-constrained optimization.  Thus, AD can play an important
role in advancing both the science and the practice of PDE-contrained
optimization.
 
% Could also mention that AD is useful when the code is evolving; hand-coded
% derivatives will have trouble tracking changes.

\section*{Acknowledgments}
The work of S. Lee was performed under the auspices of the U.S. Department of Energy by
University of California Lawrence Livermore National Laboratory under contract
No. W-7405-Eng-48.  The other authors were supported by the Mathematical, Information, and
Computational Sciences Division subprogram of the Office of Advanced
Scientific Computing Research, U.S. Department of Energy, under Contract
W-31-109-Eng-38.

We thank Peter Brown, Alan Hindmarsh, David Keyes, Matt Knepley, Jorge
Mor\'{e}, and Linda Petzold for informative discussions and Gail Pieper for
proofreading an early draft of this manuscript.

\bibliographystyle{abbrv}
\bibliography{argonne}

\end{document}




