Validity of power law fit (and analytical fit choice)

Question

Context

I am carrying out molecular dynamics simulations of trapped ions in a radio-frequency field of amplitude $U_{RF}$ + DC (see details https://arxiv.org/abs/2102.04098). I am studying radio-frequency heating of ions under no cooling. I initialise them at low temperature, crystal state, and run the simulation until temperature raises up to hundreds of K. Phase transition occured to fluid state. I am interested in the duration of the crystal state before temperature suddenly raise to high level. The duration from beginning of simulation to the temperature runaway is $t_0$ and I am studying the value of $t_0$ as a function of the radio-frequency trapping ampitude $U_{RF}$. An example of temperature as a function of time for one given condition $U_{RF}$. Critical timing is when temperature suddenly increases.

I vary $U_{RF}$ and measure $t_0$. I have no analytical expression to compare with, but I would like at least an empirical expression of $t_0$ as a function of $U_{RF}$, retrieved with a fit. Here I present you the data, my question and my attempt using power law. Could you tell me if this seems reasonable to you? Do you have any proposition or see any discrepancy in my work ? Finally, is it a good idea to try a fit without any analytical physical expression?

Question

I would like to know:

is there a principle in order to choose a realistic function for the fit? I mean, is it possible to discuss the choice in advance or is it just some kind of try and error process? (try an expression, see if it gives reasonably good fit ...).
so finally, should I just work the maths and physics and derive a valid analytical expression for my problem and then fit?
what do you think about power law? (see below)

Power law -- Method

General description

I tried to fit the above data ($t_0$ as a function of $U_{RF}$) with a power law. Power law is expressed as $f(x)=ax^{-k}$ or in my case $f(U_{RF})=aU_{RF}^{-k}$. Given the shape of the curve this seems natural to me. Plot in log-log the above mentionned data looks like this <span class= $t_0$ as a function of $U_{RF}$ in horizontal and vertical logarithm." />

I also tried a second method using logarithms to determine the coefficients. Both methods give different results but the first one seems reasonably close to the data.

'homemade' log program (Python example)

I detail here how I used logarithms:

Given $f(U_{RF})=aU_{RF}^{-k}$, I want the coefficient $k$ and $a$.

I fit $\log(t_0) = A\log(U_{RF})+B$ with a linear expression.
I compute $a=10^A$ and $k=A$.
I fit my data with the power law with those coefficients.

With python this is as follows :

def my_power_law(x,a,b):
    return a*(x**-b)
linear fit of log data
popt, pcov = curve_fit(my_lin,log(xx),log(yy))
k_fromlog = popt[0]
B_fromlog = popt[1]
ax.plot(xx_fit,my_power_law(array(xx_fit),10**B_fromlog,k_fromlog),label='From log')
Direct power law fit
popt, pcov = curve_fit(my_power_law,xx,yy,p0=[1e23,12.5],maxfev=2500)
ax.plot(xx_fit,my_power_law(array(xx_fit),popt[0],popt[1]),
color='r',ls='--',label='Direct pow. Law fit')

In the direct power law fit (with my_power_law()) the coefficients are $a=5.2e+44$ and $k=24$, this looks quite nice after all. In the logarithm method I have $a=5.6e+64$ and $k=64.7$ and the curve is 37 orders of magnitude above the data ! The data with direct fit is presented below.

The same information under the form of a table is available below.

$U_{RF}$ [V]	$t_0$ [ms]
60.000	16.378
61.000	11.316
62.000	7.087
62.200	6.259
62.400	4.089
62.600	5.668
62.800	4.922
63.000	3.815
63.200	4.895
63.400	4.156
63.600	4.063
63.800	4.046
64.000	3.650
65.000	2.931
66.000	2.474
67.000	2.065
68.000	1.487
70.000	1.471

Another problem is that error bars are not crossing the fit curve (error bars smaller than the crosses).

Conclusion

What do you think about that ? Is the direct power law fitting satisfying to you? Is the power law a good choice ? Do you think this is a valid choice to try to fit data with an analytical expression guessed with no physical argument? In that last question I mean : should I try to determine an analytical form for the formula with analytical physics? My empirical method is due to the fact that I have no expression a priori, but maybe this is not a proper way to do science? Also I just saw the answer from ~fixed point~ https://physics.stackexchange.com/a/115137/248092 about non-least square. Do you think this is the main problem?

We could give a much more helpful answer to your first question if you gave us some more context about the physical nature of your problem. I would however suggest plotting your data on a log-log scale; If it really is a power law it should appear as a straight line. If it is not a straight line that should give you an immediate indication that you need a new approach (proving that it really is straight given that it looks roughly straight is harder). I would double check you work/code for the fit with the logarithm method. 37 orders of magnitude sounds like a silly mistake somewhere — By Symmetry, Feb 08 '22 at 15:11
If you don't get an answer here in a few days, you may want to try [stats.se] on the data-fitting component of your question. — Kyle Kanos, Feb 08 '22 at 15:23
Could you upload the data on a log-log plot? You noted that with log-log you should find a linear fit. Does the data look linearly on a log-log plot? If not then it's probably not a power-law. — SmallPieceOfBread, Feb 08 '22 at 17:07
You need to indicate as to what the errors in the readings are, eg via error bars. — Farcher, Feb 08 '22 at 17:11
I updated the main text (@BySymmetry), putting more physics context and showing a log-log figure of my data (@user3725600). As for the error bars ( @Farcher ) they are smaller than the points (0.01 ms). — Aldehyde, Feb 08 '22 at 17:47
I would go with an exponential drop to begin with because of the leveling off effect, and the drop % seems similar per fixed interval of voltage. — JAlex, Feb 09 '22 at 15:35
Can you add a table with data for our own benefit to try out different ideas/scenarios. — JAlex, Feb 09 '22 at 15:39
Most likely here you need to flip it to get $V = f(t)$. This is because time is always the independent quantity in real life. — JAlex, Feb 09 '22 at 16:21

score 1 · Answer 1 · answered Feb 08 '22 at 16:23

Here are a couple of pointers on fitting data to functions, both to power laws and to any other function:

What information do you already have about your system? Given the quantities that you know you're measuring, how are they "usually" related in simple models of your system? (For example, if you're measuring voltage in a circuit as a function of time, you might expect the function to be exponential, if only because it is exponential in most simple circuits.) Importantly, this points you to the class of functions you're selecting a "best representative" from.
How do you judge if a function is a better fit than another? You have to transform your visual intuition into a mathematical criterion that determines how good a fit is, and then use that criterion to select the best possible fit for your data by optimizing the value of that criterion. The simplest and best-behaved example of selecting a fit in this way is least squares fitting, which most coding languages have specialized functions to do.

If you believe your system should behave in accordance to a power law relationship, then use least squares fitting and you'll have a specific power law function as a solution that is mathematically guaranteed to be optimal (with respect to a specific criterion). If you don't, then I recommend you analyze your system to investigate what kind of function you should expect to see, which can also involve testing your data—as By Symmetry recommended, you could make a log-log plot and see if the data falls on a straight line.

score 1 · Answer 2 · answered Feb 08 '22 at 18:35

The task of choosing which function to use to fit data is an example of model selection. There are many statistical tools to decide which out of a collection of possible models is best supported by the data. In most cases you still need to supply the functions to try. The model selection test will tell you which is better, but that doesn't mean it is the best. There could always be a better model out there that you hadn't thought to try.

In many cases the physics at play can help you decide. If you are fitting position v. time data, you might try a straight line and a parabola. The straight line corresponds to the object having constant velocity and the parabola to the object having constant, non-zero acceleration. The result that one of these functions fits better tells us about the physics.

If we have voltage data collected from an RC circuit discharging, we can do some theoretical analysis and determine $V(t)=V_0 e^{-t/\tau}$. So we should fit an exponential decay to voltage v. time data. This type of theoretical analysis is the place to start. If you don't have good a priori information about the physics, it is a guessing game. But which function fits, can help you figure out what physics is going on. In this situation most people will start with simple functions: lines, parabolas, exponentials, powerlaws.

One of the key properties of a powerlaw relation is scale invariance. A consequence of this is that when you plot your data on a log-log plot ($\log f$ v. $\log x$), it will form a straight line. The slope of that line is the exponent ($-k$ if $f = a x^{-k}$) of the powerlaw.

In addition to a log-log plot, you could try making semi-log plots ($f$ v. $\log x$ and $\log f$ v. $x$). Plotting your data in different ways can help give you some intuition about what fits to try. But "by-eye" fitting isn't a sound strategy.

JAlex · Answer 3 · 2022-02-10T13:43:35.997

It is best to have some kind of physical basis for the relationship between quantities. Even if it is just dimensional analysis in order to understand in principle how things should relate. The ultimate goal here is predictive power. To be able to predict future results. So you train your model on existing data, estimate what a different scenario would produce with the fit and then test again to see how good your prediction is.

There are several pitfalls in this process. One being the hidden variables (such as size or time) inside the coefficients of any fit. I prefer to only fit dimensionless quantities with functions such as exponents and logs etc.

So if there is such a thing as a reference voltage, and critical time which are independent of the test conditions use them to find dimensionless quantities for use in relationships. Again dimensional analysis is your friend here as you can combine other fixed quantities to get what you want. For example some reference distance and speed to find a time constant.

Next stare at the data and try different things, or use Eureqa to find the relationships for you with the best fit and least complexity.

To illustrate the importance of complexity, and how is is mostly a human judgment call consider the following two fits (red and blue) of some data

fig1

To a human it is obvious which is the better fit, although mathematically both have zero error. Unless, the underlaying process excludes the straight line (red fit) as not possible physically, then you are forced to choose something else. This does not mean that blue is a good fit, but that both of them are a bad fit for the situation.

Specifically for your data, out of experience of doing this kind of work for many years, time quantities vary at a constant rate (duh) and therefore any expression that contains log(t) would lead to an extrapolation problem, where the model will fail at a very reasonable $t=0$. This is a strike against the power law for the time quantity.

Most likely you have a exp(t) type of term and a linear relationship with voltage.

To summarize, power fit works if the domain of the variables makes sense, and the arguments are dimensionless. Because if you find for example that $$t = 8.37642 \, V^{0.34762534} $$ how do you make physical sense of an irrational exponent.

Reference article on the need for dimensionless quantities in fitting experimental data.

I look at the behavior at the limits, the slope and any intercepts to guess of what I am looking at. Also a lot of nice smooth curves may not have a simple functions to represent them. You need to be quite creative to come up with some good fits.

...therefore any expression that contains log(t) would be physically impossible. $y\propto t^2\implies\log y\sim2\log t$ and does not seem to be physically impossible. — Kyle Kanos, Feb 09 '22 at 16:52
@KyleKanos - So what does the model predict when $t=0$ in this case? This is typical when a model works well with expected values, but completely fails beyond that (the extrapolation problem). Here you have to impose the constraint that $y>0$ for the model, and that constrain may not fit the situation. This is why having a physical understanding of the situation is critical for making a good fit model. — JAlex, Feb 09 '22 at 18:54
Methinks your disregard for $\log(t)$ needs to be amended to include the extrapolation problem, otherwise blanket remarks like that would regard Newtonian kinematics as physically impossible. — Kyle Kanos, Feb 09 '22 at 19:25
@KyleKanos thank you for the suggestion. I brought it down a few notches in the posting now. — JAlex, Feb 09 '22 at 19:47
@KyleKanos I understand your argument on the irrational exponents, and about dimensionless quantities. I am going to work out something for my case. Nevertheless, the critical timing $t_0$ is not just the time during an experiment, it is a specific duration measuring the time required for a given condition to be reached during one experiment with given voltage $U_{RF}$. So $t_0$ explicitely varies as a function of $U_{RF}$. So I can use $t_0$ as any other parameter right ? (And 'make' it dimensionless). — Aldehyde, Feb 09 '22 at 21:29
@Aldehyde - I suppose the above comment was for me, so I am going to answer here. I understand that in the experiment time is measure, like the time to drop an object from a height. But in the underlying model, time should be an independent variable to goes on at a constant rate. What I am saying is the if you had to choose between two models for a drop test, a) $t \propto \sqrt{h}$ and b) $h = \propto t^2$ go with b) due to the domain issue. — JAlex, Feb 10 '22 at 13:40
@Aldehyde - Read Dimensional Analysis: From Atomic Bombs to Wind Tunnel Testing on some historical notes with dimensional analysis and experimental data. — JAlex, Feb 10 '22 at 13:42