Yupiik Chords intend to simplify and make Kubernetes native workflow execution. Compared to Apache Airflow or other alternatives, it is mainly self contained - i.e. doesn't need a database and reuses common Kubernetes infrastructure, including the monitoring stack instead of enforcing yet another one.

Design

Yupiik Chords is very flexible but default design relies on storing metadata in labels and annotations of Job descriptors - can potentially be extended to anything like SparkApplication for example.

The prefix can be configured (see CLI documentation) but by default it is yupiik.chords/.

The overall execution generally starts by an external trigger - i.e. Chords rarely takes the ownership of the initial nodes. Concretely it is 99% of the time a CronJob with the proper labels/annotations and the rest is a job trigger from an external tool - it can be another Job or an external orchestrator, with the right labels/annotations.

The execution graph (normally a directed acyclic graph - DAG) is provided using the API io.yupiik.kubernetes.chords.api.spi.DagContributor:

import io.yupiik.fusion.framework.api.scope.DefaultScoped;
import io.yupiik.kubernetes.chords.api.model.Dag;
import io.yupiik.kubernetes.chords.api.spi.DagContributor;

@DefaultScoped
public class MyDagsProvider implements DagContributor {
    @Override
    public List<Dag> get() {
        return List.of(new Dag(
                "my-dag",
                List.of(new Node(...), ...),
                List.of(new Edge(...), ...)
        ));
    }
}
NOTE
there is a pending evaluation to make the DAG auto-ported in annotations but for now it must be provided adding the definition explicitly.

The alternative to define a DAG is to do it using environment variables or system property using configuration mechanism.

Annotations and labels

As mentioned in previous section, the labels and annotations are key for Yupiik Chords to be able to follow the DAG and trigger the execution of the downstream Job.

Here is the list of the used ones - not we do omit the prefix since it is configurable (see cli documentation) but it means that foo name actually means yupiik.chords/foo:

Name Type Description
managed Label A boolean (values `true false`) to specify if the reconcilier must handle the related `Job`/descriptor.
reconcilier-id Label

Optional if you do run a single reconcilier (reconcile command) without any identifier, it defines a subset of DAG the command does manage.

dag-id Label

The identifier (name) of the DAG the Job relates to.

node-id Label Identifier of the node in the DAG.
execution-id Label

Identifier of the execution.

status Annotation Is the job already managed and should be ignored by future iterations.

Generally a root CronJob will look like:

apiVersion: batch/v1
kind: CronJob
metadata:
  name: my-root
  labels:
    yupiik.chords/managed: "true"
    yupiik.chords/dag-id: "1234"
    yupiik.chords/node-id: "56789"
    yupiik.chords/execution-id: "abcd-1234-tyui-szsz"
spec:
  ...
TIP

execution-id is often an UUID or alike generated when the root nodes are submitted. It is then propagated by Yupiik Chord runtime.

IMPORTANT
if one of these three label is missing the execution can fail - intentionally to ensure to avoid misconfigured descriptors/silently ignored errors.

Deployment

TBD: Cronjob, no parallelism...

Only jobs

As of today, Yupiik Chords only enables to chain Job. If you do want to chain something else you should ensure it does trigger a job creation (even just an echo bye) with the right labels following the underlying lifecycle (we do it for a SparkApplication for example using the driver lifecycle).

Code is OSS and contributing support for more resources is very open if you feel the workaround is not sufficient because your resource is too easily evicted with your constrained (often in memory) configuration.

Tips

  • Version your DAG so the identifier can be awesome-etl-1 instead of awesome-etl, this is important since the DAG is provided by code and not stored within descriptors,

  • Use ${now:yyyyMMddHHmmss} - if you do rely on the configuration definition, else do it in your code - in the Job name of your templates to make them unique, historized (in etcd/Kubernetes API) and sortable,

  • Handle referential configuration (ConfigMap, Secret) outside the DAG and reference them in the Job of the DAG,
    • High level the spirit behind is to keep the DAG for dynamic triggering while configuration is based on something more static,

    • It doesn't prevent a node (Job) to use Kubernetes API to update a ConfigMap for another Job so the execution is globally dynamic anyway, it is even good to have one state descriptor per DAG (if not concurrent else use multiple) - generally a Secret but a ConfigMap can be sufficient in several cases.

    • Also means that generally a node has a single descriptor which is the Job

  • Triggering root nodes when not using a CronJob can be done with kubectl or alike using the command: kubectl create job --from=cronjob/my-cron my-cron-manual-200260215225626, it is recommended to still register the root nodes as CronJob with the flag suspended=true set and to add the date as name.

Documentation

Yupiik Chords integrates with Yupiik Tools to generate the DAG diagrams using mermaid at build time.

Use the preTask io.yupiik.kubernetes.chords.documentation.task.DAGGenerator and configure the related specific settings:

chords-documentation.daggenerator.dagFormat (env: CHORDS_DOCUMENTATION_DAGGENERATOR_DAGFORMAT)

Output dagFormat to use. Default: io.yupiik.kubernetes.chords.documentation.task.DAGGenerator.DAGGeneratorConfiguration.DagFormat.MERMAID.

chords-documentation.daggenerator.dagInputFiles (env: CHORDS_DOCUMENTATION_DAGGENERATOR_DAGINPUTFILES)

Source of input files used to fill the generator properties, generally enables to share the DAG configuration and not repeat it. Default: java.util.List.of().

chords-documentation.daggenerator.diagramOutput (env: CHORDS_DOCUMENTATION_DAGGENERATOR_DIAGRAMOUTPUT)

Output base path where the DAG will be generated. The DAG name will be used as base for the file name.

chords-documentation.daggenerator.indexDiagramBasePath (env: CHORDS_DOCUMENTATION_DAGGENERATOR_INDEXDIAGRAMBASEPATH)

Base path to reference diagrams in the files Mainly used in .adoc if you configure the diagrams to be generated in /opt/rmannibucau/dev/Github/yupiik-chords/documentation/src/main/minisite/content/_partials or alike, it is used as the prefix of the file path `include

xxxx/$filewherexxx/is this exact value.. Default:/opt/rmannibucau/dev/Github/yupiik-chords/documentation/src/main/minisite/content/_partials/dags/`.

chords-documentation.daggenerator.indexFormat (env: CHORDS_DOCUMENTATION_DAGGENERATOR_INDEXFORMAT)

Output format for the index file. Default: io.yupiik.kubernetes.chords.documentation.task.DAGGenerator.DAGGeneratorConfiguration.IndexFormat.SIMPLE_ADOC.

chords-documentation.daggenerator.indexOutput (env: CHORDS_DOCUMENTATION_DAGGENERATOR_INDEXOUTPUT)

Output base path where the index will be generated if not disabled.

as well as your DAGS using the same syntax than for the main job configuration (core module).

TIP

io.yupiik.kubernetes.chords.documentation.task.DAGGeneratorTest.run is a sample of that configuration even if the properties can also use the long form or the environment variable form.

Another interesting Runnable in documentation-tasks module is io.yupiik.kubernetes.chords.documentation.ChordsCommandGenerator. It basically enables to run any command wrapped in a Runnable which enables to not use exec-maven-plugin and directly integrate to minisite for example. The configuration is translated to a command line, command entry is the command name and all other key/values are translated to options.