Actions100
- API Service Actions
- Bundles Automation-Side Actions
- Bundles Design-Side Actions
- Connection Actions
- Continuous Activity Actions
- Dashboard Actions
- Data Collection Actions
- Data Quality Actions
- Compute Rules on Specific Partition
- Create Data Quality Rules Configuration
- Delete Rule
- Get Data Quality Project Current Status
- Get Data Quality Project Timeline
- Get Data Quality Rules Configuration
- Get Dataset Current Status
- Get Dataset Current Status per Partition
- Get Last Outcome on Specific Partition
- Get Last Rule Results
- Get Rule History
- Update Rule Configuration
- Dataset Actions
- Compute Metrics
- Create Dataset
- Create Managed Dataset
- Delete Data
- Delete Dataset
- Execute Tables Import
- Get Column Lineage
- Get Data
- Get Data - Alternative Version
- Get Dataset Settings
- Get Full Info
- Get Last Metric Values
- Get Metadata
- Get Schema
- Get Single Metric History
- List Datasets
- List Partitions
- List Tables
- List Tables Schemas
- Prepare Tables Import
- Run Checks
- Set Metadata
- Set Schema
- Synchronize Hive Metastore
- Update Dataset Settings
- Update From Hive Metastore
- Dataset Statistic Actions
- Discussion Actions
- DSS Administration Actions
Overview
This node integrates with the Dataiku DSS API, enabling users to perform various operations on Dataiku DSS resources. Specifically for the Dataset resource and the Get Column Lineage operation, it retrieves the full lineage information of a specified column within a dataset in a project. This is useful for understanding data dependencies, tracing data flow, and impact analysis in data pipelines.
Common scenarios include:
- Auditing data transformations by tracking the origin and usage of specific columns.
- Debugging data quality issues by examining upstream data sources.
- Documenting data lineage for compliance and governance purposes.
Example:
You want to find out which datasets and columns contribute to a particular column in your dataset "sales_data" within project "my_project". Using this node operation, you can fetch the complete lineage details to visualize or analyze the data flow.
Properties
Name | Meaning |
---|---|
Project Key | The unique identifier of the Dataiku project containing the dataset. |
Dataset Name | The name of the dataset for which to retrieve column lineage information. |
Query Parameters | Optional additional parameters as key-value pairs to customize the API request (e.g., filters). |
Output
The output JSON contains the full lineage information of the specified column in the dataset. This includes both automatically computed and manually defined lineage details, such as upstream datasets, columns, transformations, and possibly metadata about the lineage relationships.
If the node is used for other operations that return files or binary content, the binary output will contain the corresponding file data (e.g., documentation downloads), but for the "Get Column Lineage" operation, the output is JSON structured data describing the lineage.
Dependencies
- Requires an active connection to a Dataiku DSS instance.
- Requires valid API credentials (an API key) for authentication with the Dataiku DSS API.
- The node expects the Dataiku DSS server URL and user API key to be configured in the credentials.
Troubleshooting
- Missing Credentials Error: If the API credentials are not set or invalid, the node will throw an error indicating missing credentials. Ensure you have configured the API key credential properly.
- Required Parameter Missing: The node validates required parameters like Project Key and Dataset Name. Omitting these will cause errors. Always provide these mandatory inputs.
- API Request Failures: Network issues, incorrect project or dataset names, or insufficient permissions may cause API call failures. Check connectivity, spelling, and user permissions.
- Parsing Errors: If the API returns unexpected data formats, the node might fail to parse the response. Verify the API version compatibility and response format.