This notebook illustrates amassing a medium-sized dataset (1 second of 32- or 64-bit float mono audio, essentially – a simple sine wave) using the record_history decorator, and then using the stats.history_as_DataFrame attribute to obtain that dataset as a Pandas DataFrame.
After that basic (and naive) example, we compare the performance of this approach to one that uses numpy's ability to vectorize the same computation, and conclude that if you can vectorize, certainly you should do so. (The example is naive precisely because no one would call the function f below in a for loop when it's possible to use numpy universal functions (ufuncs). When that alternative is unavailable, however, record_history can come in handy.)
stats.history_as_DataFrame attribute%matplotlib inline
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from log_calls import record_history
@record_history()
def f(freq, t):
return np.sin(freq * 2 * np.pi * t)
ran_t = np.arange(0.0, 1.0, 1/44100, dtype=np.float32)
ran_t
Now, naively, call f 44,100 times in a for loop, and obtain its call history as a Pandas DataFrame:
#f.stats.clear_history()
for t in ran_t:
f(17, t)
df = f.stats.history_as_DataFrame
Examine and do stuff with it:
df.info()
from IPython.display import display
display(df.head())
display(df.tail())
len(f.stats.history)
df[['t', 'retval']].head()
plt.plot(df.t, df.retval);
record_history vs vectorization with numpy ufuncsdef g(freq, t):
return np.sin(freq * 2 * np.pi * t)
nodeco_vectorized_secs = %timeit -o Hz_17 = g(17, ran_t)
nodeco_vectorized_secs.best
Hz_17
plt.plot(Hz_17);
for loop:nodeco_loop_secs = %timeit -o for t in ran_t: g(7, t)
nodeco_loop_secs.best
def comparison(slower, faster):
'slower, faster: seconds'
ratio = slower/faster
order_of_magnitude = np.log10(ratio)
return ratio, order_of_magnitude
print("With record_history disabled:\n"
"Vectorized approach is %d times (about %.1f orders of magnitude) faster"
% comparison(slower=loop_secs.best, faster=vectorized_secs.best))
Now let's compare the performance of the record_history-decorated version (f) of the same function
record_history disabledf.stats.clear_history()
f.record_history_settings.enabled = False
vectorized_secs_rh_disabled = %timeit -o Hz_17 = f(17, ran_t)
vectorized_secs_rh_disabled.best
for loop:loop_secs_rh_disabled = %timeit -o for t in ran_t: f(7, t)
loop_secs_rh_disabled.best
print("With record_history disabled:\n"
"Vectorized approach is %d times (about %.1f orders of magnitude) faster"
% comparison(slower=loop_secs_rh_disabled.best, faster=vectorized_secs_rh_disabled.best))
print("Called in a for-loop, the no-decorator version is %d times (about %.1f orders of magnitude) faster"
% comparison(slower=loop_secs_rh_disabled.best, faster=nodeco_loop_secs.best))
record_history enabledf.record_history_settings.enabled = True
f.stats.clear_history()
vectorized_secs_rh_enabled = %timeit -o f.stats.clear_history(); Hz_17 = f(17, ran_t)
vectorized_secs_rh_enabled.best
len(f.stats.history)
def size_of_t_for_row(row):
return f.stats.history[row].argvals[1].size
size_of_t_for_row(0)
f.stats.history[0].retval.size
f.stats.history
f.stats.clear_history()
for loop:loop_secs_rh_enabled = %timeit -o for t in ran_t: f(7, t); f.stats.clear_history()
loop_secs_rh_enabled.best
print("With record_history enabled:\n"
"Vectorized approach is %d times (about %.1f orders of magnitude) faster"
% comparison(slower=loop_secs_rh_enabled.best, faster=vectorized_secs_rh_enabled.best))