Getting Started
Yupiik Chords intend to simplify and make Kubernetes native workflow execution. Compared to Apache Airflow or other alternatives, it is mainly self contained - i.e. doesn't need a database and reuses common Kubernetes infrastructure, including the monitoring stack instead of enforcing yet another one.
Design
Yupiik Chords is very flexible but default design relies on storing metadata in labels and annotations of Job descriptors - can potentially be extended to anything like SparkApplication for example.
The prefix can be configured (see CLI
documentation) but by default it is yupiik.chords/.
The overall execution generally starts by an external trigger - i.e. Chords rarely takes the ownership of the initial nodes. Concretely it is 99% of the time a CronJob with the proper labels/annotations and the rest is a job trigger from an external tool - it can be another Job or an external orchestrator, with the right labels/annotations.
The execution graph (normally a directed acyclic graph - DAG) is provided using the API io.yupiik.kubernetes.chords.api.spi.DagContributor:
import io.yupiik.fusion.framework.api.scope.DefaultScoped;
import io.yupiik.kubernetes.chords.api.model.Dag;
import io.yupiik.kubernetes.chords.api.spi.DagContributor;
@DefaultScoped
public class MyDagsProvider implements DagContributor {
@Override
public List<Dag> get() {
return List.of(new Dag(
"my-dag",
List.of(new Node(...), ...),
List.of(new Edge(...), ...)
));
}
}
|
NOTE
|
there is a pending evaluation to make the DAG auto-ported in annotations but for now it must be provided adding the definition explicitly. |
The alternative to define a DAG is to do it using environment variables or system property using configuration mechanism.
Annotations and labels
As mentioned in previous section, the labels and annotations are key for Yupiik Chords to be able to follow the DAG and trigger the execution of the downstream Job.
Here is the list of the used ones - not we do omit the prefix since it is configurable (see cli
documentation) but it means that foo name actually means yupiik.chords/foo:
| Name | Type | Description | |
|---|---|---|---|
managed |
Label | A boolean (values `true | false`) to specify if the reconcilier must handle the related `Job`/descriptor. |
reconcilier-id |
Label |
Optional if you do run a single reconcilier ( |
|
dag-id |
Label |
The identifier (name) of the DAG the |
|
node-id |
Label | Identifier of the node in the DAG. | |
execution-id |
Label |
Identifier of the execution. |
|
status |
Annotation | Is the job already managed and should be ignored by future iterations. |
Generally a root CronJob will look like:
apiVersion: batch/v1
kind: CronJob
metadata:
name: my-root
labels:
yupiik.chords/managed: "true"
yupiik.chords/dag-id: "1234"
yupiik.chords/node-id: "56789"
yupiik.chords/execution-id: "abcd-1234-tyui-szsz"
spec:
...
|
TIP
|
|
|
IMPORTANT
|
if one of these three label is missing the execution can fail - intentionally to ensure to avoid misconfigured descriptors/silently ignored errors. |
Deployment
TBD: Cronjob, no parallelism...
Only jobs
As of today, Yupiik Chords only enables to chain Job. If you do want to chain something else you should ensure it does trigger a job creation (even just an echo bye) with the right labels following the underlying lifecycle (we do it for a SparkApplication for example using the driver lifecycle).
Code is OSS and contributing support for more resources is very open if you feel the workaround is not sufficient because your resource is too easily evicted with your constrained (often in memory) configuration.
Tips
-
Version your DAG so the identifier can be
awesome-etl-1instead ofawesome-etl, this is important since the DAG is provided by code and not stored within descriptors, -
Use
${now:yyyyMMddHHmmss}- if you do rely on the configuration definition, else do it in your code - in theJobname of your templates to make them unique, historized (in etcd/Kubernetes API) and sortable, -
Handle referential configuration (
ConfigMap,Secret) outside the DAG and reference them in theJobof the DAG,-
High level the spirit behind is to keep the DAG for dynamic triggering while configuration is based on something more static,
-
It doesn't prevent a node (
Job) to use Kubernetes API to update aConfigMapfor anotherJobso the execution is globally dynamic anyway, it is even good to have one state descriptor per DAG (if not concurrent else use multiple) - generally aSecretbut aConfigMapcan be sufficient in several cases. -
Also means that generally a node has a single descriptor which is the
Job
-
-
Triggering root nodes when not using a
CronJobcan be done withkubectlor alike using the command:kubectl create job --from=cronjob/my-cron my-cron-manual-200260215225626, it is recommended to still register the root nodes asCronJobwith the flagsuspended=trueset and to add the date as name.
Documentation
Yupiik Chords integrates with Yupiik Tools to generate the DAG diagrams using mermaid at build time.
Use the preTask io.yupiik.kubernetes.chords.documentation.task.DAGGenerator and configure the related specific settings:
-
chords-documentation.daggenerator.dagFormat(env:CHORDS_DOCUMENTATION_DAGGENERATOR_DAGFORMAT) -
Output dagFormat to use. Default:
io.yupiik.kubernetes.chords.documentation.task.DAGGenerator.DAGGeneratorConfiguration.DagFormat.MERMAID. -
chords-documentation.daggenerator.dagInputFiles(env:CHORDS_DOCUMENTATION_DAGGENERATOR_DAGINPUTFILES) -
Source of input files used to fill the generator properties, generally enables to share the DAG configuration and not repeat it. Default:
java.util.List.of(). -
chords-documentation.daggenerator.diagramOutput(env:CHORDS_DOCUMENTATION_DAGGENERATOR_DIAGRAMOUTPUT) -
Output base path where the DAG will be generated. The DAG
namewill be used as base for the file name. -
chords-documentation.daggenerator.indexDiagramBasePath(env:CHORDS_DOCUMENTATION_DAGGENERATOR_INDEXDIAGRAMBASEPATH) -
-
Base path to reference diagrams in the files Mainly used in
.adocif you configure the diagrams to be generated in/opt/rmannibucau/dev/Github/yupiik-chords/documentation/src/main/minisite/content/_partialsor alike, it is used as the prefix of the file path `include -
xxxx/$file
wherexxx/is this exact value.. Default:/opt/rmannibucau/dev/Github/yupiik-chords/documentation/src/main/minisite/content/_partials/dags/`. -
chords-documentation.daggenerator.indexFormat(env:CHORDS_DOCUMENTATION_DAGGENERATOR_INDEXFORMAT) -
Output format for the index file. Default:
io.yupiik.kubernetes.chords.documentation.task.DAGGenerator.DAGGeneratorConfiguration.IndexFormat.SIMPLE_ADOC. -
chords-documentation.daggenerator.indexOutput(env:CHORDS_DOCUMENTATION_DAGGENERATOR_INDEXOUTPUT) - Output base path where the index will be generated if not disabled.
as well as your DAGS using the same syntax than for the main job configuration (core module).
|
TIP
|
|
Another interesting Runnable in documentation-tasks module is io.yupiik.kubernetes.chords.documentation.ChordsCommandGenerator. It basically enables to run any command wrapped in a Runnable which enables to not use exec-maven-plugin and directly integrate to minisite for example. The configuration is translated to a command line, command entry is the command name and all other key/values are translated to options.