fdtd-z API documentation#
- fdtdz_jax.fdtdz(epsilon, dt, source_field, source_waveform, source_position, absorption_mask, pml_kappa, pml_sigma, pml_alpha, pml_widths, output_steps, use_reduced_precision, launch_params)#
Execute a FDTD simulation.
fdtd-zis an implementation of the finite-difference time-domain (FDTD) method that is efficiently mapped to the GPU via a systolic update scheme [1]. This function exposes, as a JAX primitive, a relatively low-level API to the underlying CUDA kernel.fdtd-ztargets nanophotonic applications with a heavy prioritization on simulation throughput, as insufficient throughput is currently the bottleneck in many workflows (such as nanophotonic inverse design). As such, the flexibility and features of the engine have been kept to a bare minimum in the name of performance:Reduced-precision mode is available and recommended which utilizes 16-bit floating point internally (input and output arrays are always at single-precision) and allows for ~2 times larger z-extent. Requires (Nvidia) GPUs of compute capability >= 6.0.
Z-extent (
zz) of the simulation domain is fixed to a specific value to allow for a natural mapping to the 32 threads/warp architecture of Nvidia GPUs. The required value ofzzis given by either128 - por64 - pfor the cases of reduced precision or full precision respectively, wherepis the total number of PML layers for the simulation.PML boundary conditions [2] are limited to the z-direction, adiabatic aborbing boundaries [3] must be used in the x- and y-directions.
Absorption (apart from absorbing boundary conditions) and dispersion are not implemented. The simulated structure must consist of a real-valued permittivity that depends only on the E-field component and spatial location.
The permeability is fixed at a value of
1everywhere in the simulation domain.Only current sources as hyperplanes along the y- and z-axes are implemented.
fdtd-zuses a “dimensionless” system [4] where the permittivity and permeability of vacuum (and therefore the speed of light), as well as the size of the Yee cell are all set to a (dimensionless) value of1(although the size of the cell along the z-axis can be varied).- Parameters:
epsilon –
(3, xx, yy, zz)-shaped array of floats representing the permittivity values at theEx,Ey, andEznodes of the Yee cell respectively. We use the convention that these components are located at(0.5, 0, 0),(0, 0.5, 0), and(0, 0, 0.5)respectively, for a Yee cell of side-length 1.dt – Scalar float representing the amount of time elapsed in one update step.
source_field – An array of shape
(2, 1, yy, zz),(2, xx, 1, zz), or(2, 2, xx, yy, 1)-shaped array for a source atx = source_position,y = source_position, orz = source_positionrespectively. The(2, 1, yy, zz)source featuresEyandEzcomponents in that order, while the(2, xx, 1, zz)source featuresExandEzcomponents in that order. The(2, 2, xx, yy, 1)allows for the specification of two separate source fields at[0, :, :, :, :]and[1, :, :, :, :]which each containExandEycomponents in that order.source_waveform –
(tt, 2)-shaped array of floats denoting the temporal variation to apply to the each of the source fields, wherettis the total number of update steps needed (note that the first update occurs at step0). Specifically, the subarray at(tt, i)applies a temporal variation to the(i, 2, xx, yy. 1)subarray of a source atz = source_position, while for a source atx = source_positionory = source_positionthe temporal variation is applied to the source field atsource_position - i.source_position – integer representing the position of the source along either the y- or z-axes. For the case of a source at
y = source_positionthe source field is applied (with the corresponding waveform) at bothy = source_positionandy = source_position - 1, with the additional performance constraint thatsource_positionmust be even.absorption_mask –
(3, xx, yy)-shaped array of floats representing a z-invariant conductivity intended to allow for adiabatic absorbing boundary conditions along the x- and y-axes according toconductivity(x, y, z) = aborption_mask(x, y) * epsilon(x, y, z)for theEx,Ey, andEzcomponent respectively.pml_kappa –
(zz, 2)-shaped array of floats denoting the distance along the z-axis between adjacent Yee cells. This is primarily intended to be used as a stretching parameter for the PML, but equivalently also determines the unit cell length along the z-axis throughout the simulation domain. Specifically,(zz, 0)represents the distance between successive layers ofEx,Ey, andHznodes, while(zz, 1)represents the distance between layers ofHx,Hy, andEznodes.pml_sigma –
(zz, 2)-shaped array of floats for the conductivity of the PML region, where(zz, 0)and(zz, 1)are the values at the (Ex,Ey,Hz) and (Hx,Hy,Ez) layers respectively. Must be set to0outside of the PML regions.pml_alpha –
(zz, 2)-shaped array of floats similar topml_sigma. Must also be set to0outside of the PML regions.pml_widths –
(bot, top)integers specifying the number of cells which are to be designated as PML layers at the bottom and top of the simulation respectively. For performance reasons, the total number of PML layers used in the simulation (bot + top) is required to be a multiple of4.output_steps –
(start, stop, interval)tuple of integers denoting the update step at which to start recording output fields, the number of update steps separating successive output fields, and the step at which to stop recording (not included).use_reduced_precision – If
True, uses 16-bit (IEEE 754) precision for the simulation which allows for a maximum of 128 cells along the z-axis. Otherwise, uses 32-bit single-precision with a maximum of 64 cells along the z-axis. Both inputs and results are always expected as 32-bit arrays.launch_params – Integers as an object in the form of
((blocku, blockv), (gridu, gridv), spacing, (cc_major, cc_minor)), specifying the structure of the systolic update to use on the GPU where(blocku, blockv)determines the layout of warps in the u- and v-directions within a block and should be(2, 4)or(4, 2);(gridu, gridv)specify the layout of blocks on the GPU and must be equal to or less than the number of streaming multiprocessors on the GPU because of the need for grid-wide synchronization;spacingcontrols the number of buffers used between each block and its downstream neighbor and should be tuned to balance between reducing grid synchronization overhead and staying within the limits of the L2 cache; and(cc_major, cc_minor)major and minor compute capability of the device. Used to determine which precompiled kernel to use. Currently allowed values are(3, 7),(6, 0),(7, 0),(7, 5), and(8, 0). Recommended to use the latest compute capability kernel possible that does not exceed the compute capability of the device.
- Returns:
(n, 3, xx, yy, zz)array of floats representingnoutput fields, where each output field consists of the values of theEx,Ey, andEznode (in that order) over the simulation domain, at a specific update step.