Extending the Algebraic Manipulability of Differentials
aa r X i v : . [ m a t h . G M ] A ug Extending the Algebraic Manipulability of Differentials
Jonathan Bartlett and Asatur Zh. Khurshudyan The Blyth Institute, [email protected] Institute of Mechanics, NAS of ArmeniaAugust 21, 2018
Abstract
Treating differentials as independent algebraic unitshave a long history of use and abuse. It is generallyconsidered problematic to treat the derivative as a frac-tion of differentials rather than as a holistic unit actingas a limit, though for practical reasons it is often donefor the first derivative. However, using a revised nota-tion for the second and higher derivatives will allow forthe ability to treat differentials as independent unitsfor a much larger number of cases.
The calculus of variations has had a long, rich history,with many competing notations and interpretations.The fluxion was the original concept of the derivativeinvented by Isaac Newton, and even had a notationsimilar to the modern Lagrange notation. A compet-ing notation for the derivative is the Leibniz notation,where the derivative is expressed as a ratio of differen-tials, representing arbitrarily small (possibly infinitesi-mal) differences in each variable.The calculus was originally thought of as examininginfinitely small quantities. When these infinitely smallquantities were put into ratio with each other, the resultcould potentially be within the reals (a likely resultfor smooth, continuous functions). But, on their own,these infinitesimals were thought of as infinitely closeto zero.The concept of an infinitesimal caused a great deal ofdifficulty within mathematics, and therefore calculuswas revised for the derivative to represent the limitof a ratio. In such a conception, d x and d y are notreally independent units, but, when placed in ratio with each other, represent the limit of that ratio asthe changes get smaller and smaller. However, manywere not pleased with the limit notion, preferring toview d x and d y as distinct mathematical objects.This question over the ontological status of differen-tials was somewhat paralleled by preferences in nota-tion. Those favoring the validity of infinitesimals gen-erally preferred the Leibniz notation, where d x and d y are at least visually represented as individual units,while those favoring the limit conception of the deriva-tive generally prefer the Lagrange notation, where thederivative is a holistic unit.In an interesting turn of events, in the late 19th century,the Leibniz notation for the derivative largely won out,but the Langrangian conception of the derivative hasbeen the favored intellectual interpretation of it. Essen-tially, this means that equations are generally writtenas if there were distinct differentials available, but theyare manipulated as if they only represent limits of aratio which cannot be taken apart.This dichotomy has led to an unfortunate lack of devel-opment of the notation. Because it is generally assumedthat differentials are not independent algebraic units,the fact that issues arise when treating them as suchhas not caused great concern, and has simply reinforcedthe idea that they should not be treated algebraically.Therefore, there has been little effort to improve thenotation to allow for a more algebraic treatment of in-dividual differentials.However, as will be shown, the algebraic manipulabilityof differentials can be greatly expanded if the notationfor higher-order derivatives is revised. This leads to anoverall simplification in working with calculus for bothstudents and practitioners, as it allows items which arewritten as fractions to be treated as fractions. It pre-vents students from making mistakes, since their natu-1al inclination is to treat differentials as fractions. Ad-ditionally, there are several little-known but extremelyhelpful formulas which are straightforwardly deduciblefrom this new notation.Even absent these practical concerns, we find thatreconceptualizing differentials in terms of algebraically-manipulable terms is an interesting project in its ownright, and perhaps may help us see the derivative in anew way, and adapt it to new uses in the future. Theremay also be additional formulas which can in the futurebe more directly connected to the algebraic formulationof the derivative.
When dealing with the first derivative, there are gen-erally few practical problems in treating differentialsalgebraically. If y is a function of x , then d y d x is the firstderivative of y with respect to x . This can generally betreated as a fraction.For instance, since d x d y is the first derivative of x with re-spect to y , it is easy to see that these values are merelythe inverse of each other. The inverse function theo-rem of calculus states that d x d y = d y d x . The generaliza-tion of this theorem into the multivariable domain es-sentially provides for fraction-like behavior within thefirst derivative.Likewise, in preparation for integration, both sides ofthe equation can be multiplied by d x . Even in multi-variate equations, differentials can essentially be mul-tiplied and divided freely, as long as the manipulationsare dealing with the first derivative.Even the chain rule goes along with this. Let x dependon parameter u . If one has the derivative d y d u and mul-tiplies it by the derivative d u d x then the result will be d y d x . This is identical to the chain rule in Lagrangiannotation.It is well recognized that problems occur when if onetries to extend this technique to the second derivative[1]. Take for a simple example the function y = x .The first derivative is d y d x = x . The second derivativeis d y d x = x . Since many in the engineering disciplines are not formallytrained mathematicians, this also can prevent professionals inapplied fields from making similar mistakes.
Say that it is later discovered that x is a function of t so that x = t . The problem here is that the chain rulefor the second derivative is not the same as what wouldbe implied by the algebraic representation.Here we arrive at one of the major problematic pointsfor using the current notation of the second derivativealgebraically. To demonstrate the problem explicitly, ifone were to take the second derivative seriously as a setof algebraic units, one should be able to multiply d y d x by d x d t to get the second derivative of y with respect to t . However, this does not work. If the differentials arebeing treated as algebraic units, then d x d t is the same as (cid:0) d x d t (cid:1) , which is just the first derivative of x with respectto t squared. The first derivative of x with respect to t is d x d t = t . Therefore, treating the second derivativealgebraically would imply that all that is needed to doto convert the second derivative of y with respect to x into the second derivative of y with respect to t is tomultiply by ( t ) .However, this reasoning leads to the false conclusionthat d y d t = t . If, instead, the substitution is doneat the beginning, it can be easily seen that the resultshould be 30 t : y = x x = t y = ( t ) y = t y ′ = t y ′′ = t This is also shown by the true chain rule for the sec-ond derivative, based on Fa`a di Bruno’s formula [2].This formula says that the chain rule for the secondderivative should be:d y d t = d y d x (cid:18) d x d t (cid:19) + d y d x d x d t (1)This, however, is extremely unintuitive, and essentiallymakes a mockery out of the concept of using the differ-ential as an algebraic unit.It is generally assumed that this is a problem for theidea that second differentials should be treated as alge-braic units. However, it is possible that the real prob-lem is that the notation for second differentials has notbeen given as careful attention as it should.The habits of mind that have come from this have evenaffected nonstandard analysis, where, despite their ap-preciation for the algebraic properties of differentials,2ave left the algebraic nature of the second derivativeeither unexamined (as in [3]) or examined poorly (i.e.,leaving out the problematic nature of the second deriva-tive, as in [4, pg. 4]). Most calculus students glaze over the notation forhigher derivatives, and few if any books bother to giveany reasons behind what the notation means. It is im-portant to go back and consider why the notation iswhat it is, and what the pieces are supposed to repre-sent.In modern calculus, the derivative is always taken withrespect to some variable. However, this is not strictlyrequired, as the differential operation can be used in acontext-free manner. The processes of taking a differ-ential and solving for a derivative (i.e., some ratio ofdifferentials) can be separated out into logically sepa-rate operations. In such an operation, instead of doing dd x (taking thederivative with respect to the variable x ), one wouldseparate out performing the differential and dividing by The idea that finding a differential (i.e., similar to a deriva-tive, but not being with respect to any particular variable) can beseparated from the operation of finding a derivative (i.e., differ-entiating with respect to some particular variable ) is consideredan anathema to some, but this concept can be inferred directlyfrom the activity of treating derivatives as fractions of differen-tials. The rules for taking a differential are identical to those fortaking an implicit derivative, but simply leaving out dividing thefinal differential by the differential of the independent variable.For those uncomfortable with taking a differential withouta derivative (i.e., without specifying an independent variable),imagine the differential operator d () as combining the operationsof taking an implicit derivative with respect to a non-presentvariable (such as q ) followed by a multiplication by the differ-ential of that variable (i.e., d q in this example). So, taking thedifferential of e x is written as d ( e x ) and the result of this opera-tion is e x d x . This is the same as if we had taken the derivativewith respect to the non-present variable q and then multipliedby d q . So, for instance, taking the differential of the function e x , the operation would start out with a derivative with respectto q dd q ( e x ) = e x d x d q followed by a multiplication by d q , yieldingjust e x d x .Doing this yields the standard set of differential rules, but al-lows them to be applied separately from (and prior to) a fullderivative. Also note that because they have no dependency onany variable present in the equation, the rules work in the single-variable and multi-variable case. Solving for a derivative is thenmerely solving for a ratio of differentials that arise after perform-ing the differential. It unifies explicit and implicit differentiationinto a unified process that is easier to teach, use, and under-stand, and requires few if any special cases, save the standardrequirements of continuity and smoothness. d x as separate steps. Originally, in the Leibnizian con-ception of the differential, one did not even bother solv-ing for derivatives, as they made little sense from theoriginal geometric construction of them [5, pgs. 8, 59].For a simple example, the differential of x can be foundusing a basic differential operator such that d ( x ) = x d x . The derivative is simply the differential dividedby d x . This would yield d ( x ) d x = x .For implicit derivatives, separating out taking the dif-ferential and finding a particular derivative greatly sim-plifies the process. Given an function (say, z = sin ( q ) ),the differential can be applied to both sides just like anyother algebraic manipulation: z = sin ( q ) d ( z ) = d ( sin ( q )) z d z = cos ( q ) d q From there, the equation can be manipulated to solvefor d z d q or d q d z , or it can just be left as-is.The basic differential of a variable is normally writtensimply as d ( x ) = d x . In fact, d x can be viewed merelya shorthand for d ( x ) .The second differential is merely the differential oper-ator applied twice [5, pg. 17]:d ( d ( x )) = d ( d x ) = d x (2)Therefore, the second differential of a function is merelythe differential operator applied twice. However, onemust be careful when doing this, as the product ruleaffects products of differentials as well.For instance, d ( x d x ) will be found using the productrule, where u = x and v = d x . In other words:d ( x d x ) = x ( d ( d x )) + d ( x ) d x = x d x + x d x d x = x d x + x d x The point of all of this is to realize that the notation d y d x is not some arbitrary arrangement of symbols, but has adeep (if, as will be shown, slightly incorrect or mislead-ing) meaning. The notation means that the equationis showing the ratio of the second differential of y (i.e.,d ( d ( y )) ) to the square of d x (i.e., d x ). In other words, starting with y , then applying the dif-ferential operator twice, and then dividing by d x twice, In Leibniz notation, d x is equivalent to ( d x ) . If the differ-ential of x was wanted, it would be written as d ( x ) . The rulesare given in [5, pg. 24]. d y d x . Unfortunately, that is not thesame sequence of steps that happens when two deriva-tives are performed, and thus it leads to a faulty for-mulation of the second derivative. As a matter of fact, order of operations is very impor-tant when doing derivatives. When doing a derivative,one first takes the differential and then divides by d x .The second derivative is the derivative of the first, sothe next differential occurs after the first derivative iscomplete , and the process finishes by dividing by d x again.However, what does it look like to take the differentialof the first derivative? Basic calculus rules tell us thatthe quotient rule should be used:d (cid:18) d y d x (cid:19) = d x ( d ( d y )) − d y ( d ( d x ))( d x ) = d x d y − d y d x d x = d x d y d x − d y d x d x = d x d x d y d x − d y d x d x d x = d y d x − d y d x d x d x Then, for the second step, this can be divided by d x ,yielding: d (cid:16) d y d x (cid:17) d x = d y d x − d y d x d x d x (3)This, in fact, yields a notation for the second derivativewhich is equally algebraically manipulable as the firstderivative. It is not very pretty or compact, but itworks algebraically.The chain rule for the second derivative fits this al-gebraic notation correctly, provided we replace eachinstance of the second derivative with its full form(cf. (1)): d y d t − d y d t d x d x = (cid:16) d y d x − d y d x d x d x (cid:17) (cid:0) d x d t (cid:1) + d y d x (cid:16) d x d t − d x d t d t d t (cid:17) (4)This in fact works out perfectly algebraically.One objection that has been given to the present au-thors by early reviewers about the formula presented in (3) is that the ratio d x d x reduces to zero. However,this is not necessarily true. The concern is that, since d x d x is always 1 (i.e., a constant), then d x d x should bezero. The problem with this concern is that we are nolonger taking d x d x to be the derivative of d x d x . Using thenotation in (3), the derivative of d x d x would be:d (cid:0) d x d x (cid:1) d x = d x d x − d x d x d x d x (5)In this case, since d x d x reduces to 1, the expression is ob-viously zero. However, in (5), the term d x d x is not itselfnecessarily zero, since it is not the second derivative of x with respect to x . The notation for the third and higher derivatives canbe found using the same techniques as for the secondderivative. To find the third derivative of y with respectto x , one starts with the second derivative and takes thedifferential:d © « d (cid:16) d y d x (cid:17) d x ª®®¬ = d (cid:18) d y d x − d y d x d x d x (cid:19) = d (cid:18) d x d y − d y d x d x (cid:19) = ( d x )( d ( d x d y − d y d x )) − ( d x d y − d y d x )( d ( d x ))( d x ) = d y d x − d y d x d x d x − x d x d y d x + y d x ( d x ) d x Finally, this result is divided by d x : d d ( d y d x ) d x ! d x = d y d x − d y d x d x d x − d x d x d y d x + d y d x ( d x ) d x (6)This expression includes a lot of terms not normallyseen, so some explanation is worthwhile. In this ex-pression, d x represents the second differential of x , ord ( d ( x )) . Therefore, ( d x ) represents ( d ( d ( x ))) . Like-wise, d x represents ( d ( x )) .Because the expanded notation for the second andhigher derivatives is much more verbose than the first4erivative, it is often useful to adopt a slight modifi-cation of Arbogast’s D notation (see [6, pgs. 209,218–219]) for the total derivative instead of writing it asalgebraic differentials: D x y = d y d x − d y d x d x d x (7) D x y = d y d x − d y d x d x d x − x d x d y d x + y d x ( d x ) d x (8)This gets even more important as the number of deriva-tives increases. Each one is more unwieldy than theprevious one. However, each level can be interconvertedinto differential notation as follows: D nx y = d ( D n − x y ) d x (9)The advantages of Arbogast’s notation over Lagrangiannotation are that (1) this modification of Arbogast’snotation clearly specifies both the top and bottom dif-ferential, and (2) for very high order derivatives, La-grangian notation takes up n superscript spaces to writefor the n th derivative, while Arbogast’s notation onlytakes up log ( n ) spaces.Therefore, when a compact representation of higher or-der derivatives is needed, this paper will use Arbogast’snotation for its clarity and succinctness. In fact, just as the algebraic manipulation of the firstderivative can be used to convert the derivative of y with respect to x into the derivative of x with respect to y , combining it with Arbogast’s notation for the secondderivative can be used to generate the formula for doing The difference between this notation and that of Arbogast isthat we are subscripting the D with the variable with which thederivative is being taken with respect to. Additionally, we arealways supplying in the superscript the number of derivatives weare taking. Therefore, where Arbogast would write simply D ,this notation would be written as D x . It may be surprising to find a paper on the algebraic no-tation of differentials using a non-algebraic notation. The goal,however, is to only use ratios when they act as ratios . Whenwriting a ratio that works like a ratio is too cumbersome, weprefer simply avoiding the ratio notation altogether, to preventmaking unwarranted leaps based on notation that may misleadthe intuition. this on the second derivative: D x y = d y d x − d y d x d x d x D x y d x d y = d y d x d x d y − d y d x d x d x d x d y D x y (cid:18) d x d y (cid:19) = d y d y d x d y − d x d y − D x y (cid:18) d x d y (cid:19) = d x d y − d x d y d y d y − D x y d y d x ! = d x d y − d x d y d y d y − D x y (cid:18) D x y (cid:19) = d x d y − d x d y d y d y It can be seen that this final equation is the derivativeof x with respect to y . Therefore, it can generally bestated that the second derivative of y with respect to x can be transformed into the second derivative of x withrespect to y with the following formula: − D x y (cid:18) D x y (cid:19) = D y x (10)To see this formula in action on a simple equation, con-sider y = x . Performing two derivatives gives us: y = x (11) D x y = x (12) D x y = x (13)According to (10), D y x (or, x ′′ in Lagrangian notation)can be found by performing the following: D y x = −( x ) (cid:18) x (cid:19) = − x x = − x − (14)This can be checked by taking successive derivatives ofthe inverse function of (11): x = y D y x = y − D y x = − y − (15)(15) can be seen to be equivalent to (14) by substitutingfor y using (11): D y x = − ( x ) − = − x − (16)5his is the same result achieved by using the inversionformula (cf. (10)). While the inversion formula (cf. (10)) is not original,it is a tool that many mathematicians are unaware of,and is rarely considered for solving higher-order differ-entials. As an example of how to apply (10), consider secondorder ordinary nonlinear differential equations of theform F ( y ′′ , y ′ , y ) = . Equations of this form can be solved implicitly for F ( a , b , c ) = a − b f ( c ) for generic function f . Indeed, consider the equation D x y = f ( y ) (cid:18) d y d x (cid:19) . (17)Then, by virtue of (10) we derive D y x = − f ( y ) . Integration of this equation with respect to y twice willprovide with x ( y ) = − ∫ ∫ f ( y ) d y d y . (18)For simplicity, let f ( y ) = y , so that (17) is reduced to D x y = y (cid:18) d y dx (cid:19) , The authors of this paper, as well as several early reviewers,had originally thought that the inversion formula was a new find-ing. Again, that is the usefulness of the notation. Specific formu-las such as the inversion formula do not need to be taught, as theysimply flow naturally out of the notation. Even though the inver-sion formula is not new with this paper, showing how the presentauthors were able to use it to good benefit demonstrates the ben-efit of an improved notation—practitioners needs not memorizeendless formulas, but they can be developed straightforwardly asneeded based upon basic intuitions. the real exact solutions of which is y ( x ) = √ c r ( x + c ) + q c + [ ( x + c )] −− r ( x + c ) + q c + [ ( x + c )] √ c and c are integration constants that must bedetermined from given boundary or Cauchy conditions.On the other hand, (18) results in x ( y ) = y + c y + c , the real inverse of which exactly coincides with (19). The view of differentials presented by Leibniz and thosefollowing in his footsteps differed significantly from themodern-day view of calculus. The modern view of cal-culus focuses on functions, which have defined inde-pendent and dependent variables. The Leibniz view,however, according to [5], is a much more geometricview. There is no preferred independent or dependentvariable.The modern concept of the derivative generally impliesa dependent and in independent variable. The numer-ator is the dependent variable and the denominator isthe independent variable. In the geometric view, how-ever, there are only relationships, and these relation-ships do not necessarily have an implied dependencyrelationship.Therefore, Leibnizian differentiation doesn’t occur withrespect to any independent variable. There is no pre-ferred independent variable. Likewise, as we have seenin Sections 6 and 7, the version of the differential pre-sented here allows for the reversal of variable depen-dency relationships. Similarly, the procedure of differ-entiation given in Section 3 which allows us to formu-late the new notation for the second derivative given in(3) follows the Leibnizian methodology, where the dif-ferentiation is done mechanically without consideringvariable dependencies.Leibniz did, however, consider certain kinds of variableswhich map very directly to what we would consider as6independent” variables. In the Leibniz conception,what we would consider an “independent” variable is a variable whose first derivative is considered constant .This leads to numerous simplifications of differentialsbecause, if a differential is constant, by standard differ-ential rules its differential is zero. Therefore, if x is theindependent variable (using modern terminology) thenthat implies that d x is constant. If d x is a constant(even if it is an infinitely small, unknowable constant),then that means that its differential is zero. There-fore, d x and higher differentials of x reduce to zero,simplifying the equation. As an example, given the equation x y = x d y + y d x = x d y + x d y + y d x = . Then, you could simplify the equation by choosing anysingle differential to hold constant. This is referredto in Leibnizian thought as choosing a “progression ofvariables,” and it is identical to choosing an indepen-dent variable [5, pg. 71]. Therefore, if one chooses x as the independent variable, then d x is constant, andtherefore d x =
0. Thus, the equation reduces to x d y + x d y = . However, if y is the independent variable, then d y isheld constant and therefore its differential, d y = x d y + y d x = . This understanding explains the success of the modernnotation of the second derivative. The notation givenin (3) is D x = d y d x − d y d x d x d x . As a way of understanding this, imagine the common inde-pendent variable used in physics, especially prior to relativity—time. Especially consider the way that time flows in a pre-relativistic era. It flows in a continual, constant fashion. There-fore, if the flow of time (i.e., d t ) is constant, then by the rules ofdifferentiation the second differential of time must be zero. Thus,an independent variable is one which acts in a similar fashion totime. Another way to consider this is to consider the indepen-dence of the independent variable. It’s changes (i.e., differences)are, by definition, independent of anything else. Therefore, wemay not assign a rule about the differences between the values.Thus, because there is no valid rule, the second differential maynot be zero, but it is at most undefinable by definition. However, if we assume that x is truly the independentvariable, then this means that d x = d y d x d x d x reduces to 0 as well. Thisreduces (3) to the modern notation of d y d x . Addition-ally, if we take the assumption that x is the independentvariable, then the problems identified in Section 2 dis-appear, because x , as an independent variable, cannotthen be dependent on t . In addition to (3) being reducible to d y d x under the as-sumption that x is the independent variable, the Leib-nizian view also gives a set of tools that allows us toreinflate instances of d y d x into (3). Euler showed that,given an equation from a specific “progression of vari-ables” (i.e., a particular choice of an independent vari-able), we can modify that equation in order to see whatit would have been if no choice of independent vari-able had been made. According to [5, pg. 75], the sub-stitution for reinflating a differential from a particularprogression of variables (i.e., a particular independentvariable) into one that is independent of the progres-sion of variables (i.e., no independent variable chosen),an expansion practically identical to (3) can be used. The notation presented here provides for a vast im-provement in the ability for higher order differentialsto be manipulated algebraically.This improved notation yields several potential areasfor study. These include:1. developing a general formula for the algebraic ex-pansion of higher derivatives,2. identifying additional second order differentialequations that are solvable by swapping the de-pendent and independent variable,3. finding other ways that differential equations canbe rendered solvable using insights from the newnotation,4. finding further reductions in special formulas thatcan be rendered by using algebraically manipulablenotations, To be clear, there is nothing preventing someone from mak-ing an independent variable dependent on a parameter. However,doing so then brings them around to needing to use the form ofthe second derivative defined here (which does not presume aparticular choice of independent variable), or a compensatingmechanism such as Fa`a di Bruno’s formula.
7. investigating the conjecture that the second differ-ential of independent variables are always zero andits potential implications, and6. extending this project to allow partial differentialsto be algebraically manipulable.
10 Acknowledgements
The authors would like to thank Aleks Kleyn, ChrisBurba, Daniel Lichtblau, George Monta˜nez, and otherswho read early versions of this manuscript and providedimportant feedback and suggestions.
References [1] E. W. Swokowski,
Calculus with Analytic Geome-try . PWS Publishers, alternate ed., 1983.[2] W. P. Johnson, “The curious history of Fa‘adi Bruno’s formula,”
American MathematicalMonthly , pp. 217–234, 2002.[3] J. M. Henle and E. M. Kleinberg,
Infinitesimal Cal-culus . Dover Publications, 1979.[4] H. J. Keisler,
Elementary Calculus: An Infinitesi-mal Approach . PWS Publishers, second ed., 1985.[5] H. J. M. Bos, “Differentials, higher-order differen-tials, and the derivative in the Leibnizian calculus,”
Archive for History of Exact Sciences , vol. 14, no. 1,pp. 1–90, 1974.[6] F. Cajori,