[RASMB] RE: DC/Dt vs. sedfit

Thu Feb 16 15:01:51 PST 2006

Hi Walter,

I have several comments related to your message.  Sorry this is even longer 
than your original message, but you raise several issues at once.

1) I think one should not dismiss the assumption of a constant frictional 
ratio.  This is nothing else than the traditional power law s ~ M^(2/3), 
just applied in a different form.  If you look at the literature, in the 
absence of additional knowledge, the s ~ M^(2/3) rule has been applied 
abundantly and very successfully in many studies, long before the c(s) 
approach was available.  If you use that approximation in order to scale 
diffusion and deconvolute diffusion from the sedimentation coefficient 
distribution, this will give one a clear idea of the s-values and the 
homogeneity of the material.  This is very important, since without knowing 
about the homogeneity of the sample, building any more detailed model from 
Lamm equation solutions can go wrong.

The constant frictional ratio approximation does not enter these pieces of 
information.  There can be slight effects on the precise location of peaks 
from trace components.  The frictional ratio assumption does enter the 
estimated molar masses from c(M), and will affect the values if there are 
multiple major peaks.

A detailed example of this question can be found in the second chapter of 
the new AUC book that Dave announced recently.

2) If one actually looks at the residuals and the quality of fit with c(s), 
one will readily be able to identify situations where c(s) fails.  In this 
case, there are several models available to improve the fit, such as c(s) 
with different f/f0 in different s-regions.  With an intelligent design of 
the more advanced c(s) analysis, one can frequently overcome that 
problem.  Another alternative is the hybrid discrete/continuous model of 
SEDPHAT ( see below).

3)  Since you bring up this issue of constant frictional ratio so strongly, 
I want to announce some as yet unpublished approach:

For cases where c(s) with the constant frictional coefficient does not fit, 
and where neither the segmented c(s) nor the hybrid discrete/continuous 
models seem to make sense, we have developed a new, two-dimensional 
size-and-shape distribution c(s,f/f0) (or c(s,M) or other incarnations of 
the same thing).  Within the limits of information from the experimental 
data, it estimates a full molar mass distribution for each s-value.  LIke 
the c(s) approach, it is using regularization to avoid over-interpreting 
the data, and there is obviously a limit on the ability to get molar mass 
information for trace components.  However, from suitably informative data 
it will give very useful molar mass information without invoking 
scale-relationships.  The paper with the details is not yet in press, and 
the website information is not yet updated (we will post the supporting 
information asap), but you can already get a preview of this function by 
downloading the new SEDFIT version at
http://www.analyticalultracentrifugation.com/download.htm
For those who are familiar with c(s) and are interested in this, it should 
not be too complicated to figure out how to use it.  The computation takes 
somewhat longer than c(s), i.e. a few minutes on a fast PC, but then again 
we don't need to fit for f/f0 anymore.  Some slight practical improvements 
will still have to be implemented, for example, to make it work better with 
signals from buffer salts, which are inconvenient to account for in the 
c(s,M) distribution.

Based on c(s,f/f0) (or c(s,M)), we can subsequently give up the additional 
M (or f/f0, or diffusion) dimension, and determine again something very 
closely related to the c(s) distribution.  Both have very similar properties.

This entirely eliminates all f/f0 assumptions from the deconvolution of 
diffusion, and the determination of a high resolution sedimentation 
coefficient distribution.

I should mention that this has so far not prevented me from still using the 
c(s) distribution as a standard tool examining folded proteins, simply 
because for most protein samples that I study the assumption of constant 
frictional ratio is very good compared to the information content of the 
data, and gives an excellent fit.  In the absence of any other knowledge, 
if the c(s) with constant f/f0 constraint gives a great fit, not much 
additional information can be gained from relaxing the constraints in 
c(s,M), except as a way to assess how well-defined or ill-defined the molar 
mass (or diffusion) information is for a given set of experimental 
data.  Obviously, c(s,M) should be very useful for other cases, for 
example, systems with very heterogeneous material with regard to shape such 
as you mentioned, or chemically heterogeneous mixtures,  where there are 
obvious advantages for this new c(s,M) distribution (and this was the 
motivation for the development).

Sorry to bring this up as a point that is difficult for you to argue 
against, since unpublished, but you can give it a try and I suggest we can 
discuss this further when all details are in the public domain (hopefully 
very shortly; it will also be described as part of the upcoming workshops).

4)  Regarding the "effective time-point" problem in ls-g*(s), John Philo 
has raised this issue in summer 2004 while I was on vacation and unable to 
respond.  The question of the correct time-point is indeed a good question, 
and I can only suggest to use, as a first approximation, the 
midpoint.  With respect to dcdt, however, this problem exists in very much 
the same way, even though it has not been advertised.

This is simply due to the fact that, strictly, dcdt - as an infinitesimal 
quantity - cannot be determined from experimental data! You can only 
approximate this with delta-c/delta-t with a finite time interval between 
scans.  In fact, the relevant time-interval is not that between neighboring 
scans, but the time passed between the first and last scan loaded (or half 
that if you are not considering the averaging of the transformed 
delta-c/delta-t traces).  Among all points in time during this delta-t, you 
assign a particular choice of t - a single value to be "associated" with 
the pattern.  It is quite obvious that this is exactly the same problem as 
in ls-g*(s).  The only g*(s) method which would not have that problem is 
dc/dr, the radial derivative taken truly from a single time-point.

For this reason, I believe your suggestion that dcdt gives a snapshot of 
the boundary is false, since it is derived as delta-c/delta-t, with usually 
a quite long time interval.   While ls-g*(s) can in fact take longer 
time-intervals and steeper boundaries without getting into trouble with 
artificial broadening of g*(s), this does not address the limitations of 
regarding the diffusion as if it came from a single point in time, which 
one needs to do when fitting Gaussians to get molar masses, and which 
affects ls-g*(s) and dcdt equally.

As I've shown in Anal Biochem 320 (2003) 104-124
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=12895474&query_hl=1&itool=pubmed_DocSum
, you can (and I suggest dcdt users to do that) take the g(s*) distribution 
and re-transform it into the original data space, and compare it with the 
original data that were taken to generate g(s*).  In that way, you can see 
how the g(s*) distribution relates to the original scans at the different 
time points, and get a quantitative idea of the quality of fit of the g(s*) 
to the RAW data at different time-points.  Quality of fit to the Gaussians 
to the g(s*) curve does not give you that information, since by then the 
many delta-c/delta-t traces have already been averaged.

[Another statement made by John Philo at the same occasion that I didn't 
get to respond, either, was that there was no theory showing that ls-g*(s) 
should be Gaussian for a single species.  In fact, this theory can be found 
now on the website tutorial about ls-g*(s) 
http://www.analyticalultracentrifugation.com/lsgofs_distribution.htm  - it 
is based on the same Faxen solution of the Lamm equation that is used in 
the argument that g(s*) from dcdt is Gaussian in a first approximation.]

To be sure, I am not advocating fitting g*(s) at all - be it from ls-g*(s) 
or dcdt - since the Gaussian approximation is simply much less precise than 
what is available with Lamm equation modeling, and there is no reason I can 
see to not take advantage of that.

However, simply fitting one or a few discrete Lamm equation solutions can 
be wrong if there is unrecognized heterogeneity in the boundary.  c(s) can 
reveal this heterogeneity, and this is when c(M) gives a more reliable mass 
value than a simple Lamm equation solution.  (This example is described in 
detail in Methods Enzymol. 384 (2004)185-212)

If there are multiple species in significant amounts, and if the question 
at hand is "what is the molar mass of these species", the most general 
approach would be the new c(s,M) distribution.  However, this model is 
quite flexible in that it actually gives you a full molar mass distribution 
(at each s-value).  If the sample in question is a truly discrete mixture 
of species, this knowledge can be used to your advantage, and in this case, 
another good approach is the hybrid discrete/continuous model of 
SEDPHAT.  In this method, you use the direct Lamm equation modeling to 
describe the main species, but you still have the flexibility to account 
for the trace species using segments of the continuous distribution.

5) For the issue of applying c(s) to reacting mixtures, the theoretical 
foundation of this can be found in the two recent papers Biophys. J. 89 
(2005) p 619-634 (http://www.biophysj.org/cgi/content/full/89/1/619) and p 
651-666 (http://www.biophysj.org/cgi/content/full/89/1/651).  The first 
paper deals with your first question - why deconvolution of diffusion 
works, and the second one with the question how the c(s) peaks can be 
interpreted.

Why discrete non-interacting Lamm equation solutions can be fit to reacting 
systems is based on the constant bath theory, which was originally proposed 
by Riesner and re-discovered by Claus Urbanke.  If you are concerned with 
the frictional ratio assumption, try the two-dimensional c(s,f/f0).  In our 
hands, so far, the experience is that both gives pretty similar 
sedimentation coefficient distributions, although c(s) seems to be somewhat 
more stable in this regard.  Regarding the quality of fit to reacting 
systems, I would generally propose an rmsd of 0.01 fringes or 0.01 OD, or 
at high concentration about 1% of the loading signal.  In my experience 
that should get you in a good ballpark.  Note this is not quite as 
stringent as in the c(s) analysis of non-interacting systems, and the 
reason is that we're after the main peaks of the sedimentation coefficient 
distribution.  In fact, if you can find that the main peaks of c(s) are 
robust and well-represented for given data (for example, test different 
P-values, data subsets), that would be a rational criterion for what is 
good enough.  It would be useful to share experience in this 
regard.  Clearly, don't use c(s) if even the best possible fit is still 
very bad, since this usually indicates something else is going on that is 
not accounted for (like signals from small degradation products or buffer 
salts).  One should be careful, though, not to apply c(s) for situations 
where there is hydrodynamic non-ideality, c(s) cannot handle that.

Basically, what c(s) gives you is peaks that approximate the asymptotic 
reaction boundary, which can be quantitatively interpreted using 
Gilbert-Jenkins theory.  As described in the first paper, application to 
c(s) to the reaction boundary is a way to avoid having to interpret the 
diffusional spread of the reacting system, which can be a significant 
advantage over the direct Lamm equation modeling with reaction scheme, as 
available also in SEDPHAT, or BCPFIT, or SEDANAL.  Reasons for that are 
laid out in both papers - basically the diffusion spread can be 
ill-conditioned to analyze and very susceptible to experimental 
imperfections.  In any case, the underlying sedimentation coefficient 
distribution from c(s) is very useful for the quantitative interpretation 
of peak heights and peak locations.

By the way, for many reacting systems you will see even without 
deconvolution of diffusion that there are the different boundary 
components, as predicted by Gilbert-Jenkins theory.  After all, this effect 
of reaction boundaries was observed already in the 1950s.  One could easily 
extract the boundary amplitudes and s-values as determined from any other 
boundary model that gives faithful boundary heights and weight-average 
s-values of the reaction boundary (including DCDT), and plug it into the 
Gilbert-Jenkins isotherm model of SEDPHAT.  This is just to clarify that 
diffusional deconvolution and interpretation of the effective s-values you 
get from c(s) applied to a reacting boundary are two different things.

Again, the question what to make out of these s-values from reacting 
systems is at the core of the Gilbert-Jenkins theory, and has been solved, 
for example, for a 1:1 reaction a long time ago in Nature 177 (1956) 
853-854, and for more general systems in the 60's and 70's.  The only new 
aspect about this part is that SEDPHAT can now fit these isotherms to 
experimentally determined data (i.e. ASCII tables of amplitudes and 
s-values of the reaction boundaries).

Regards,
Peter