Profiling

Sometimes computing the likelihood is not as fast as we would like. Theano provides handy profiling tools which are wrapped in PyMC3 by model.profile. This function returns a ProfileStats object conveying information about the underlying Theano operations. Here we’ll profile the likelihood and gradient for the stochastic volatility example.

First we build the model.

[1]:
import pandas as pd
from pymc3 import *
from pymc3.math import exp
from pymc3.distributions.timeseries import *

returns = pd.read_csv(get_data('SP500.csv'), index_col=0, parse_dates=True)

n = returns.shape[0]

with Model() as model:
    sigma = Exponential('sigma', 1. / .02, testval=.1)
    nu = Exponential('nu', 1. / 10)
    s = GaussianRandomWalk('s', sigma ** -2, shape=n)
    r = StudentT('r', nu, lam=exp(-2 * s), observed=returns)

Then we call the profile function and summarize its return values.

[2]:
# Profiling of the logp call
model.profile(model.logpt).summary()
Function profiling
==================
  Message: /home/junpenglao/Documents/pymc3/pymc3/model.py:921
  Time in 1000 calls to Function.__call__: 2.225540e+00s
  Time in Function.fn.__call__: 2.201248e+00s (98.908%)
  Time in thunks: 2.190614e+00s (98.431%)
  Total compile time: 7.304576e+00s
    Number of Apply nodes: 26
    Theano Optimizer time: 5.187931e-01s
       Theano validate time: 1.646996e-03s
    Theano Linker time (includes C, CUDA code generation/compiling): 6.748426e+00s
       Import time 3.453946e-02s
       Node make_thunk time 6.747270e+00s
           Node Elemwise{add,no_inplace}(TensorConstant{(1, 1) of 1.0}, InplaceDimShuffle{x,x}.0) time 8.496084e-01s
           Node Elemwise{Composite{((i0 + (i1 * log(((i2 * i3) / i4)))) - i5)}}(Elemwise{Composite{scalar_gammaln((i0 * i1))}}.0, TensorConstant{(1, 1) of 0.5}, TensorConstant{(1, 1) of ..8861837907}, Elemwise{Composite{exp((i0 * i1))}}.0, InplaceDimShuffle{x,x}.0, Elemwise{Composite{scalar_gammaln((i0 * i1))}}.0) time 7.832110e-01s
           Node Elemwise{Composite{Switch(i0, (i1 - (i2 * i3 * log1p(((i4 * i5) / i6)))), i7)}}(Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, Elemwise{Composite{((i0 + (i1 * log(((i2 * i3) / i4)))) - i5)}}.0, TensorConstant{(1, 1) of 0.5}, Elemwise{add,no_inplace}.0, Elemwise{Composite{exp((i0 * i1))}}.0, TensorConstant{[[7.637491..7877e-07]]}, InplaceDimShuffle{x,x}.0, TensorConstant{(1, 1) of -inf}) time 7.819328e-01s
           Node Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}(Elemwise{Composite{exp((i0 * i1))}}.0, TensorConstant{(1, 1) of 0}, Elemwise{gt,no_inplace}.0) time 7.238307e-01s
           Node Elemwise{Composite{exp((i0 * i1))}}(TensorConstant{(1, 1) of -2.0}, InplaceDimShuffle{x,0}.0) time 6.877706e-01s

Time in all call to theano.grad() 0.000000e+00s
Time since theano import 22.829s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  91.6%    91.6%       2.008s       1.18e-04s     C    17000      17   theano.tensor.elemwise.Elemwise
   8.2%    99.9%       0.180s       5.99e-05s     C     3000       3   theano.tensor.elemwise.Sum
   0.1%    99.9%       0.002s       6.36e-07s     C     3000       3   theano.tensor.elemwise.DimShuffle
   0.0%   100.0%       0.001s       4.08e-07s     C     2000       2   theano.tensor.subtensor.Subtensor
   0.0%   100.0%       0.001s       5.57e-07s     C     1000       1   theano.tensor.opt.MakeVector
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  90.7%    90.7%       1.987s       1.99e-03s     C     1000        1   Elemwise{Composite{Switch(i0, (i1 - (i2 * i3 * log1p(((i4 * i5) / i6)))), i7)}}
   8.2%    98.9%       0.180s       5.99e-05s     C     3000        3   Sum{acc_dtype=float64}
   0.4%    99.3%       0.009s       9.14e-06s     C     1000        1   Elemwise{Composite{((i0 + (i1 * log(((i2 * i3) / i4)))) - i5)}}
   0.1%    99.5%       0.003s       3.24e-06s     C     1000        1   Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}
   0.1%    99.6%       0.002s       2.49e-06s     C     1000        1   Elemwise{Composite{exp((i0 * i1))}}
   0.1%    99.7%       0.001s       1.22e-06s     C     1000        1   Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}
   0.0%    99.7%       0.001s       4.27e-07s     C     2000        2   Elemwise{Composite{scalar_gammaln((i0 * i1))}}
   0.0%    99.7%       0.001s       8.06e-07s     C     1000        1   InplaceDimShuffle{x}
   0.0%    99.8%       0.001s       7.41e-07s     C     1000        1   InplaceDimShuffle{x,x}
   0.0%    99.8%       0.001s       3.24e-07s     C     2000        2   Elemwise{exp,no_inplace}
   0.0%    99.8%       0.001s       5.57e-07s     C     1000        1   MakeVector{dtype='float64'}
   0.0%    99.8%       0.001s       5.35e-07s     C     1000        1   Subtensor{int64::}
   0.0%    99.9%       0.001s       5.20e-07s     C     1000        1   Elemwise{add,no_inplace}
   0.0%    99.9%       0.000s       4.45e-07s     C     1000        1   Elemwise{Composite{log((i0 * i1))}}
   0.0%    99.9%       0.000s       3.96e-07s     C     1000        1   Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 - (i3 * i0)), i4) + i5)}}[(0, 0)]
   0.0%    99.9%       0.000s       3.63e-07s     C     1000        1   Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 - (i3 * i0)), i4) + i5)}}
   0.0%    99.9%       0.000s       3.61e-07s     C     1000        1   InplaceDimShuffle{x,0}
   0.0%   100.0%       0.000s       3.58e-07s     C     1000        1   Elemwise{gt,no_inplace}
   0.0%   100.0%       0.000s       2.81e-07s     C     1000        1   Subtensor{:int64:}
   0.0%   100.0%       0.000s       2.80e-07s     C     1000        1   Elemwise{Composite{Cast{int8}(GT(i0, i1))}}
   ... (remaining 2 Ops account for   0.02%(0.00s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  90.7%    90.7%       1.987s       1.99e-03s   1000    19   Elemwise{Composite{Switch(i0, (i1 - (i2 * i3 * log1p(((i4 * i5) / i6)))), i7)}}(Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, Elemwise{Composite{((i0 + (i1 * log(((i2 * i3) / i4)))) - i5)}}.0, TensorConstant{(1, 1) of 0.5}, Elemwise{add,no_inplace}.0, Elemwise{Composite{exp((i0 * i1))}}.0, TensorConstant{[[7.637491..7877e-07]]}, InplaceDimShuffle{x,x}.0, TensorConstant{(1, 1) of -inf})
   8.2%    98.9%       0.179s       1.79e-04s   1000    22   Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, (i1 - (i2 * i3 * log1p(((i4 * i5) / i6)))), i7)}}.0)
   0.4%    99.3%       0.009s       9.14e-06s   1000    17   Elemwise{Composite{((i0 + (i1 * log(((i2 * i3) / i4)))) - i5)}}(Elemwise{Composite{scalar_gammaln((i0 * i1))}}.0, TensorConstant{(1, 1) of 0.5}, TensorConstant{(1, 1) of ..8861837907}, Elemwise{Composite{exp((i0 * i1))}}.0, InplaceDimShuffle{x,x}.0, Elemwise{Composite{scalar_gammaln((i0 * i1))}}.0)
   0.1%    99.5%       0.003s       3.24e-06s   1000    14   Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}(Elemwise{Composite{exp((i0 * i1))}}.0, TensorConstant{(1, 1) of 0}, Elemwise{gt,no_inplace}.0)
   0.1%    99.6%       0.002s       2.49e-06s   1000     6   Elemwise{Composite{exp((i0 * i1))}}(TensorConstant{(1, 1) of -2.0}, InplaceDimShuffle{x,0}.0)
   0.1%    99.6%       0.001s       1.22e-06s   1000    21   Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}(Elemwise{Composite{Cast{int8}(GT(i0, i1))}}.0, TensorConstant{(1,) of 0.5}, Elemwise{Composite{inv(sqr(i0))}}[(0, 0)].0, Subtensor{int64::}.0, Subtensor{:int64:}.0, Elemwise{Composite{log((i0 * i1))}}.0, TensorConstant{(1,) of -inf})
   0.0%    99.7%       0.001s       8.06e-07s   1000     8   InplaceDimShuffle{x}(sigma)
   0.0%    99.7%       0.001s       7.41e-07s   1000     5   InplaceDimShuffle{x,x}(nu)
   0.0%    99.7%       0.001s       6.94e-07s   1000    23   Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, (i1 * ((-(i2 * sqr((i3 - i4)))) + i5)), i6)}}.0)
   0.0%    99.7%       0.001s       5.88e-07s   1000    10   Elemwise{Composite{scalar_gammaln((i0 * i1))}}(TensorConstant{(1, 1) of 0.5}, InplaceDimShuffle{x,x}.0)
   0.0%    99.8%       0.001s       5.57e-07s   1000    24   MakeVector{dtype='float64'}(__logp_sigma_log__, __logp_nu_log__, __logp_s, __logp_r)
   0.0%    99.8%       0.001s       5.35e-07s   1000     4   Subtensor{int64::}(s, Constant{1})
   0.0%    99.8%       0.001s       5.20e-07s   1000     9   Elemwise{add,no_inplace}(TensorConstant{(1, 1) of 1.0}, InplaceDimShuffle{x,x}.0)
   0.0%    99.8%       0.001s       5.03e-07s   1000     2   Elemwise{exp,no_inplace}(sigma_log__)
   0.0%    99.9%       0.000s       4.45e-07s   1000    18   Elemwise{Composite{log((i0 * i1))}}(TensorConstant{(1,) of 0...4309189535}, Elemwise{Composite{inv(sqr(i0))}}[(0, 0)].0)
   0.0%    99.9%       0.000s       3.96e-07s   1000    20   Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 - (i3 * i0)), i4) + i5)}}[(0, 0)](nu, TensorConstant{0}, TensorConstant{-2.3025850929940455}, TensorConstant{0.1}, TensorConstant{-inf}, nu_log__)
   0.0%    99.9%       0.000s       3.63e-07s   1000     7   Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 - (i3 * i0)), i4) + i5)}}(sigma, TensorConstant{0}, TensorConstant{3.912023005428146}, TensorConstant{50.0}, TensorConstant{-inf}, sigma_log__)
   0.0%    99.9%       0.000s       3.61e-07s   1000     1   InplaceDimShuffle{x,0}(s)
   0.0%    99.9%       0.000s       3.58e-07s   1000    11   Elemwise{gt,no_inplace}(InplaceDimShuffle{x,x}.0, TensorConstant{(1, 1) of 0})
   0.0%    99.9%       0.000s       2.81e-07s   1000     3   Subtensor{:int64:}(s, Constant{-1})
   ... (remaining 6 Apply instances account for 0.05%(0.00s) of the runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
  - Try the Theano flag floatX=float32
We don't know if amdlibm will accelerate this scalar op. scalar_gammaln
We don't know if amdlibm will accelerate this scalar op. scalar_gammaln
  - Try installing amdlibm and set the Theano flag lib.amdlibm=True. This speeds up only some Elemwise operation.
[3]:
# Profiling of the gradient call dlogp/dx
model.profile(gradient(model.logpt, model.vars)).summary()
Function profiling
==================
  Message: /home/junpenglao/Documents/pymc3/pymc3/model.py:921
  Time in 1000 calls to Function.__call__: 4.595304e+00s
  Time in Function.fn.__call__: 4.530428e+00s (98.588%)
  Time in thunks: 4.328702e+00s (94.198%)
  Total compile time: 1.276115e+01s
    Number of Apply nodes: 49
    Theano Optimizer time: 9.874594e-01s
       Theano validate time: 5.513906e-03s
    Theano Linker time (includes C, CUDA code generation/compiling): 1.172469e+01s
       Import time 5.937672e-02s
       Node make_thunk time 1.172257e+01s
           Node Elemwise{Composite{exp((i0 * i1))}}(TensorConstant{(1,) of -2.0}, s) time 1.043019e+00s
           Node Elemwise{Composite{Switch(i0, (-log1p((i1 / i2))), i3)}}(Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, Elemwise{mul,no_inplace}.0, InplaceDimShuffle{x,x}.0, TensorConstant{(1, 1) of 0}) time 9.094841e-01s
           Node Elemwise{mul,no_inplace}(TensorConstant{(1, 1) of -0.5}, Elemwise{add,no_inplace}.0, TensorConstant{[[7.637491..7877e-07]]}) time 8.909242e-01s
           Node Elemwise{Composite{Switch(i0, ((i1 * i2 * i3 * i4) / i5), i6)}}(Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1, 1) of 0.5}, Elemwise{add,no_inplace}.0, InplaceDimShuffle{x,0}.0, TensorConstant{[[7.637491..7877e-07]]}, Elemwise{Add}[(0, 1)].0, TensorConstant{(1, 1) of 0}) time 8.635607e-01s
           Node Elemwise{Composite{Switch(i0, (i1 / i2), i3)}}[(0, 2)](Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, Elemwise{mul,no_inplace}.0, Elemwise{Add}[(0, 1)].0, TensorConstant{(1, 1) of 0}) time 7.023010e-01s

Time in all call to theano.grad() 4.422141e+00s
Time since theano import 46.207s
Class
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Class name>
  67.3%    67.3%       2.914s       1.17e-04s     C    25000      25   theano.tensor.elemwise.Elemwise
  24.2%    91.5%       1.049s       1.50e-04s     C     7000       7   theano.tensor.elemwise.Sum
   7.6%    99.1%       0.328s       1.64e-04s     C     2000       2   theano.tensor.basic.Alloc
   0.4%    99.5%       0.019s       9.30e-06s     C     2000       2   theano.tensor.subtensor.IncSubtensor
   0.2%    99.7%       0.007s       7.24e-06s     C     1000       1   theano.tensor.basic.Join
   0.1%    99.9%       0.006s       1.30e-06s     C     5000       5   theano.tensor.elemwise.DimShuffle
   0.1%    99.9%       0.002s       1.18e-06s     C     2000       2   theano.tensor.basic.Reshape
   0.1%   100.0%       0.002s       1.11e-06s     C     2000       2   theano.tensor.subtensor.Subtensor
   0.0%   100.0%       0.001s       3.68e-07s     C     2000       2   theano.compile.ops.Rebroadcast
   0.0%   100.0%       0.001s       5.68e-07s     C     1000       1   theano.compile.ops.Shape_i
   ... (remaining 0 Classes account for   0.00%(0.00s) of the runtime)

Ops
---
<% time> <sum %> <apply time> <time per call> <type> <#call> <#apply> <Op name>
  40.8%    40.8%       1.767s       1.77e-03s     C     1000        1   Elemwise{Composite{Switch(i0, (-log1p((i1 / i2))), i3)}}
  15.7%    56.5%       0.678s       3.39e-04s     C     2000        2   Sum{axis=[0], acc_dtype=float64}
  10.0%    66.5%       0.431s       4.31e-04s     C     1000        1   Elemwise{Composite{Switch(i0, ((i1 * i2 * i3 * i4) / i5), i6)}}
   9.7%    76.1%       0.419s       2.10e-04s     C     2000        2   Elemwise{mul,no_inplace}
   8.6%    84.7%       0.371s       7.42e-05s     C     5000        5   Sum{acc_dtype=float64}
   7.6%    92.3%       0.328s       1.64e-04s     C     2000        2   Alloc
   4.5%    96.8%       0.197s       1.97e-04s     C     1000        1   Elemwise{Composite{Switch(i0, (i1 / i2), i3)}}[(0, 2)]
   1.5%    98.3%       0.064s       6.44e-05s     C     1000        1   Elemwise{Add}[(0, 1)]
   0.4%    98.7%       0.016s       1.56e-05s     C     1000        1   IncSubtensor{InplaceInc;int64::}
   0.2%    98.8%       0.007s       7.24e-06s     C     1000        1   Join
   0.1%    98.9%       0.004s       1.43e-06s     C     3000        3   InplaceDimShuffle{x}
   0.1%    99.0%       0.004s       3.94e-06s     C     1000        1   Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}
   0.1%    99.1%       0.004s       3.82e-06s     C     1000        1   Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 * i0), i1) + i3 + (i4 * i5 * psi((i4 * (i6 + i0))) * i0) + (i7 * i8 * i9) + (i10 * i4 * i11 * psi((i4 * i0)) * i0) + (i4 * i12 * i0) + i13)}}[(0, 0)]
   0.1%    99.2%       0.003s       3.45e-06s     C     1000        1   Elemwise{Composite{exp((i0 * i1))}}
   0.1%    99.3%       0.003s       3.02e-06s     C     1000        1   IncSubtensor{InplaceInc;:int64:}
   0.1%    99.3%       0.003s       2.94e-06s     C     1000        1   Elemwise{Composite{Switch(i0, (i1 * i2 * (i3 - i4)), i5)}}
   0.1%    99.4%       0.003s       2.85e-06s     C     1000        1   Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 * i0), i1) + i3 + (i4 * (((i5 * i6 * Composite{inv(Composite{(sqr(i0) * i0)}(i0))}(i7)) / i8) - (i9 * Composite{inv(Composite{(sqr(i0) * i0)}(i0))}(i7))) * (i10 ** i11) * inv(Composite{(sqr(i0) * i0)}(i0)) * i0))}}[(0, 0)]
   0.1%    99.5%       0.003s       2.63e-06s     C     1000        1   Elemwise{Composite{(i0 * ((i1 * i2) + (i3 * i4)))}}[(0, 2)]
   0.1%    99.5%       0.003s       1.28e-06s     C     2000        2   Elemwise{switch,no_inplace}
   0.1%    99.6%       0.002s       1.18e-06s     C     2000        2   Reshape{1}
   ... (remaining 15 Ops account for   0.42%(0.02s) of the runtime)

Apply
------
<% time> <sum %> <apply time> <time per call> <#call> <id> <Apply name>
  40.8%    40.8%       1.767s       1.77e-03s   1000    21   Elemwise{Composite{Switch(i0, (-log1p((i1 / i2))), i3)}}(Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, Elemwise{mul,no_inplace}.0, InplaceDimShuffle{x,x}.0, TensorConstant{(1, 1) of 0})
  10.2%    51.0%       0.442s       4.42e-04s   1000    32   Sum{axis=[0], acc_dtype=float64}(Alloc.0)
  10.0%    61.0%       0.431s       4.31e-04s   1000    30   Elemwise{Composite{Switch(i0, ((i1 * i2 * i3 * i4) / i5), i6)}}(Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, TensorConstant{(1, 1) of 0.5}, Elemwise{add,no_inplace}.0, InplaceDimShuffle{x,0}.0, TensorConstant{[[7.637491..7877e-07]]}, Elemwise{Add}[(0, 1)].0, TensorConstant{(1, 1) of 0})
   9.6%    70.6%       0.417s       4.17e-04s   1000    12   Elemwise{mul,no_inplace}(InplaceDimShuffle{x,0}.0, TensorConstant{[[7.637491..7877e-07]]})
   7.5%    78.2%       0.326s       3.26e-04s   1000    27   Alloc(Elemwise{switch,no_inplace}.0, TensorConstant{401}, Shape_i{0}.0)
   5.5%    83.6%       0.236s       2.36e-04s   1000    37   Sum{axis=[0], acc_dtype=float64}(Elemwise{Composite{Switch(i0, (i1 / i2), i3)}}[(0, 2)].0)
   4.5%    88.2%       0.197s       1.97e-04s   1000    33   Elemwise{Composite{Switch(i0, (i1 / i2), i3)}}[(0, 2)](Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}.0, Elemwise{mul,no_inplace}.0, Elemwise{Add}[(0, 1)].0, TensorConstant{(1, 1) of 0})
   4.3%    92.4%       0.184s       1.84e-04s   1000    28   Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, (-log1p((i1 / i2))), i3)}}.0)
   4.2%    96.7%       0.184s       1.84e-04s   1000    35   Sum{acc_dtype=float64}(Elemwise{Composite{Switch(i0, ((i1 * i2 * i3 * i4) / i5), i6)}}.0)
   1.5%    98.1%       0.064s       6.44e-05s   1000    22   Elemwise{Add}[(0, 1)](InplaceDimShuffle{x,x}.0, Elemwise{mul,no_inplace}.0)
   0.4%    98.5%       0.016s       1.56e-05s   1000    43   IncSubtensor{InplaceInc;int64::}(Elemwise{Composite{(i0 * ((i1 * i2) + (i3 * i4)))}}[(0, 2)].0, Elemwise{Composite{Switch(i0, (i1 * i2 * (i3 - i4)), i5)}}.0, Constant{1})
   0.2%    98.7%       0.007s       7.24e-06s   1000    48   Join(TensorConstant{0}, Rebroadcast{1}.0, Rebroadcast{1}.0, (d__logp/ds))
   0.1%    98.8%       0.004s       3.94e-06s   1000    15   Elemwise{Composite{Cast{int8}((GT(i0, i1) * i2 * GT(inv(sqrt(i0)), i1)))}}(InplaceDimShuffle{x,0}.0, TensorConstant{(1, 1) of 0}, Elemwise{gt,no_inplace}.0)
   0.1%    98.9%       0.004s       3.82e-06s   1000    39   Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 * i0), i1) + i3 + (i4 * i5 * psi((i4 * (i6 + i0))) * i0) + (i7 * i8 * i9) + (i10 * i4 * i11 * psi((i4 * i0)) * i0) + (i4 * i12 * i0) + i13)}}[(0, 0)](nu, TensorConstant{0}, TensorConstant{-0.1}, TensorConstant{1.0}, TensorConstant{0.5}, Sum{acc_dtype=float64}.0, TensorConstant{1.0}, TensorConstant{3.141592653589793}, TensorConstant{-0.15915494309189535}, Sum{acc_dtype=float64}.0, TensorConstant{
   0.1%    98.9%       0.003s       3.45e-06s   1000     5   Elemwise{Composite{exp((i0 * i1))}}(TensorConstant{(1,) of -2.0}, s)
   0.1%    99.0%       0.003s       3.02e-06s   1000    46   IncSubtensor{InplaceInc;:int64:}(IncSubtensor{InplaceInc;int64::}.0, Elemwise{Composite{Switch(i0, (i1 * (i2 - i3)), i4)}}.0, Constant{-1})
   0.1%    99.1%       0.003s       2.94e-06s   1000    24   Elemwise{Composite{Switch(i0, (i1 * i2 * (i3 - i4)), i5)}}(Elemwise{Composite{Cast{int8}(GT(i0, i1))}}.0, TensorConstant{(1,) of -1.0}, InplaceDimShuffle{x}.0, Subtensor{int64::}.0, Subtensor{:int64:}.0, TensorConstant{(1,) of 0})
   0.1%    99.1%       0.003s       2.85e-06s   1000    41   Elemwise{Composite{(Switch(Cast{int8}(GE(i0, i1)), (i2 * i0), i1) + i3 + (i4 * (((i5 * i6 * Composite{inv(Composite{(sqr(i0) * i0)}(i0))}(i7)) / i8) - (i9 * Composite{inv(Composite{(sqr(i0) * i0)}(i0))}(i7))) * (i10 ** i11) * inv(Composite{(sqr(i0) * i0)}(i0)) * i0))}}[(0, 0)](sigma, TensorConstant{0}, TensorConstant{-50.0}, TensorConstant{1.0}, TensorConstant{-2.0}, TensorConstant{0.5}, Sum{acc_dtype=float64}.0, Elemwise{Composite{inv(sqrt(i0))}}.0
   0.1%    99.2%       0.003s       2.63e-06s   1000    40   Elemwise{Composite{(i0 * ((i1 * i2) + (i3 * i4)))}}[(0, 2)](TensorConstant{(1,) of -2.0}, TensorConstant{(1,) of 0.5}, Sum{axis=[0], acc_dtype=float64}.0, Sum{axis=[0], acc_dtype=float64}.0, Elemwise{Composite{exp((i0 * i1))}}.0)
   0.1%    99.3%       0.002s       2.49e-06s   1000    18   InplaceDimShuffle{x}(Elemwise{Composite{inv(sqr(i0))}}.0)
   ... (remaining 29 Apply instances account for 0.75%(0.03s) of the runtime)

Here are tips to potentially make your code run faster
                 (if you think of new ones, suggest them on the mailing list).
                 Test them first, as they are not guaranteed to always provide a speedup.
  - Try the Theano flag floatX=float32
We don't know if amdlibm will accelerate this scalar op. psi
We don't know if amdlibm will accelerate this scalar op. psi
  - Try installing amdlibm and set the Theano flag lib.amdlibm=True. This speeds up only some Elemwise operation.