kedro.runner.ParallelRunner¶
-
class
kedro.runner.ParallelRunner(max_workers=None, is_async=False)[source]¶ Bases:
kedro.runner.runner.AbstractRunnerParallelRunneris anAbstractRunnerimplementation. It can be used to run thePipelinein parallel groups formed by toposort.Methods
ParallelRunner.__init__([max_workers, is_async])Instantiates the runner by creating a Manager. ParallelRunner.create_default_data_set(ds_name)Factory method for creating the default data set for the runner. ParallelRunner.run(pipeline, catalog[, run_id])Run the Pipelineusing theDataSet``s provided by ``catalogand save results back to the same objects.ParallelRunner.run_only_missing(pipeline, …)Run only the missing outputs from the Pipelineusing theDataSet``s provided by ``catalogand save results back to the same objects.-
__init__(max_workers=None, is_async=False)[source]¶ Instantiates the runner by creating a Manager.
Parameters: - max_workers (
Optional[int]) – Number of worker processes to spawn. If not set, calculated automatically based on the pipeline configuration and CPU core count. On windows machines, the max_workers value cannot be larger than 61 and will be set to min(61, max_workers). - is_async (
bool) – If True, the node inputs and outputs are loaded and saved asynchronously with threads. Defaults to False.
Raises: ValueError– bad parameters passed- max_workers (
-
create_default_data_set(ds_name)[source]¶ Factory method for creating the default data set for the runner.
Parameters: ds_name ( str) – Name of the missing data setReturn type: _SharedMemoryDataSetReturns: An instance of an implementation of _SharedMemoryDataSet to be used for all unregistered data sets.
-
run(pipeline, catalog, run_id=None)¶ Run the
Pipelineusing theDataSet``s provided by ``catalogand save results back to the same objects.Parameters: - pipeline (
Pipeline) – ThePipelineto run. - catalog (
DataCatalog) – TheDataCatalogfrom which to fetch data. - run_id (
Optional[str]) – The id of the run.
Raises: ValueError– Raised whenPipelineinputs cannot be satisfied.Return type: Dict[str,Any]Returns: Any node outputs that cannot be processed by the
DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.- pipeline (
-
run_only_missing(pipeline, catalog)¶ Run only the missing outputs from the
Pipelineusing theDataSet``s provided by ``catalogand save results back to the same objects.Parameters: - pipeline (
Pipeline) – ThePipelineto run. - catalog (
DataCatalog) – TheDataCatalogfrom which to fetch data.
Raises: ValueError– Raised whenPipelineinputs cannot be satisfied.Return type: Dict[str,Any]Returns: Any node outputs that cannot be processed by the
DataCatalog. These are returned in a dictionary, where the keys are defined by the node outputs.- pipeline (
-