2.0.11

Databricks

Package: flyteplugins.databricks

Configuration for a Databricks task.

Tasks configured with this will execute natively on Databricks as a distributed PySpark job. Extends Spark with Databricks-specific cluster and authentication settings.

Parameters

class Databricks(
    spark_conf: typing.Optional[typing.Dict[str, str]],
    hadoop_conf: typing.Optional[typing.Dict[str, str]],
    executor_path: typing.Optional[str],
    applications_path: typing.Optional[str],
    driver_pod: typing.Optional[flyte._pod.PodTemplate],
    executor_pod: typing.Optional[flyte._pod.PodTemplate],
    databricks_conf: typing.Optional[typing.Dict[str, typing.Union[str, dict]]],
    databricks_instance: typing.Optional[str],
    databricks_token: typing.Optional[str],
)
Parameter Type Description
spark_conf typing.Optional[typing.Dict[str, str]] Spark configuration key-value pairs, e.g. {"spark.executor.memory": "4g"}.
hadoop_conf typing.Optional[typing.Dict[str, str]] Hadoop configuration key-value pairs.
executor_path typing.Optional[str] Path to the Python binary used for PySpark execution. Defaults to the interpreter path from the serialization context.
applications_path typing.Optional[str] Path to the main application file. Defaults to the task entrypoint path.
driver_pod typing.Optional[flyte._pod.PodTemplate] Pod template applied to the Spark driver pod.
executor_pod typing.Optional[flyte._pod.PodTemplate] Pod template applied to the Spark executor pods.
databricks_conf typing.Optional[typing.Dict[str, typing.Union[str, dict]]] Databricks job configuration dict compliant with the Databricks Jobs API v2.1 (also supports v2.0 use cases). Typically includes new_cluster or existing_cluster_id, run_name, and other job settings.
databricks_instance typing.Optional[str] Domain name of your Databricks deployment, e.g. "myorg.cloud.databricks.com".
databricks_token typing.Optional[str] Name of the Flyte secret containing the Databricks API token used for authentication.