FW: [RASMB] Origin software and vbar
Peter Schuck
pschuck at helix.nih.gov
Wed May 11 14:49:00 PDT 2005
Hi all,
this is a delayed contribution to the issue of sedimentation equilibrium
analysis (which by accident was sent originally to
rasmb-admin at server1.bbri.org instead of RASMB at server1.bbri.org).
Writing your own code is very nice, of course, and also can be fun as it
has a way of taking you quickly beyond simple scripts or c-code for
sedimentation equilibrium function, including more features of particular
interest. I also completely agree with John that AUC users are smart
enough to figure out what they need in order to answer their questions.
If a simple program does the job - great, it means it was a good experiment
and you have a nice system! There's obviously no reason to change
something if it works.
For me, the most important question for sedimentation equilibrium in
practice is usually to determine the stoichiometry and the binding
constants of self-associating and hetero-associating systems. We switch
biological systems frequently and some of them don't lend themselves
particularly well to AUC analysis (even though in some cases AUC may still
be the best tool). For the software the key for me is the range of binding
constants one can determine, and if one can exploit all the information
from the experiment. Convenience and a slick software interface are nice,
but the bottom line is the result. Obviously I'm not a believer in NONLIN,
mostly because there are a number of features more recently developed which
it doesn't have, but which will substantially improve the capability of
sedimentation equilibrium analysis for associating systems. Unfortunately,
they are also beyond quickly writing a line of code.
* flexible mass conservation analysis
* global multi-signal and multi-wavelength analysis for heterogeneous
protein interactions
* elimination of baseline signals (including radial-dependent features from
IF optics)
* correlation of data from dilution or titration series in different cells
Taken together, these allow you to extend the analysis to weaker and
stronger interactions, and for heterogeneous interactions eliminate the
problem of too similar molar mass, or of low extinction coefficients.
Mass conservation analysis goes back quite a bit, many researchers have
used it in the past, for example Marc Lewis quite intensively. More
recently John Philo has shown how it can be used as a soft, scalable
constraint in the analysis (Methods Enzymol 321 (2000) 100-120), and he has
published a number of sophisticated applications with it. He also
originally suggested to use constraints in the fitted total amount of
material between different cells, dependent on the experimental
design. We've combined this idea with an approach of Roark of using data
from multiple rotor speeds from each cell (Biophys Chem 5 (1976) 185-196),
which we modified to allow fitting for an unknown bottom position
(constrained among some of the data sets, e.g. absorbance data of the same
cell at different wavelengths). This combination allows to solve the
problem that we may know neither the real loading concentration nor the
bottom position of the cell precisely, by restricting the analysis to the
material that is redistributing between equilibria at different rotor
speeds. Baseline signals, including radial dependent offsets may be
determined, too. In the end, we can design experiments in different cells
as dilution series or titration series, and use our knowledge of the
constancy of the molar ratio or the constancy of the 'effective' loading
concentration of one component. Technically, it relieves us a bit from the
typical nightmare of fitting noisy exponentials. This can substantially
stabilize the analysis, meaning one can get a better handle on the binding
constants, strong or weak.
This approach in described detail in Anal. Biochem. 326 (2004) 234-256, and
a couple of applications can be found already in the literature. How to go
from the design of an experiment to conducting the AUC run and to the data
analysis with SEDPHAT is described in our step-by-step protocol (in press
in CSHLP, preprints are available from the sedfit website), and we'll have
shortly a tutorial for this on the sedphat website.
However, this approach is not completely SEDPHAT specific. Most recently,
some of these ideas have been incorporated in other shared software, too, -
with or without reference - and I expect more will join, soon. In the end
it doesn't matter what the name of the exe-file is you use to fit, but it's
the conceptual approach of the analysis that makes the difference.
Therefore, if you are contemplating migrating to a different equilibrium
analysis platform with self-associating or hetero-associating systems in
mind, mass conservation and the ability to correlate multiple signals from
different cells (and for convenience perhaps systematic noise composition
if you are using interference optical data as well) is what I would
recommend looking for.
Peter
P.S. Regarding Borries' idea of open source collaborative development of
software, for me it does not make sense. 1) SEDPHAT and SEDFIT both have >
1,000,000 lines of code, and it will be a nightmare to migrate to a
different platform. 2) The overhead in getting familiar with common
structures, module interfaces, languages, or libraries, and in the end
documenting for others what I wrote will be much larger for me than simply
to add stuff to my existing code, even if it would mean writing some
aspects from scratch (there's nothing wrong with duplicating things, if it
takes less time). 3) Every once in a while there are ideas how to do
something different that requires a fundamental change how data and
functions are organized. There would be either a significantly higher
hurdle doing this, or the requirement of an enormous amount of overhead
(and impossilbe foresight) to provide general enough structures from the
outset. 4) Frankly, I've had bad experiences sharing code. 5) Despite
what I said above about going beyond quickly writing a fitting function,
after all, we're not writing a piece of code for sending somebody to the
moon. Seriously, I think the development of multiple independent software
will be helpful for the community, even if they duplicate some functions.
If a task can be easily modularized, one may be able to write a separate
standalone software. Or perhaps allow to output the result in the common
XLA format which everybody can read (e.g. after editing files). An example
for something that can be done as a module is the prediction of buffer
density and viscosity. SEDNTERP is a great utility which most of us use
quite extensively. Another example of a stand-alone utility that we use a
lot is WINMATCH to assess if equilibrium is reached. Considering that
setting up an experiment takes me a couple of hours (not including the time
to think about it), and running it a couple of days, I'll be happy taking
10 sec to copy the values from SEDNTERP over to another software or even
writing it by hand into a notebook.
More information about the RASMB
mailing list