Skip to content

Error reading job file for Dataflow Flex Template with Unable to Open Template File error #9153

@belwalshubham

Description

@belwalshubham

Hii @beccasaurus @nicain @lukesneeringer @hfwang I'm encountering an error while running a custom Dataflow job using a flex template in Google Cloud Platform (GCP).
A custom pipeline was created using a custom template in which a JSON file was provided. The pipeline was launched
successfully and scheduled to run. However, at some point during the execution, an error occurred, causing the pipeline to fail.
Environment:
Apache Beam version: apache-beam[gcp]==2.44.0

The error message is as follows:

Failed to read the job file: gs://dataflow-staging-us-central1-713358881388/staging/template_launches/2023-02-20_18_27_45-8498022740013370621/job_object with error message: (c20b1cad16245ca5): Unable to open template file: gs://dataflow-staging-us-central1-713358881388/staging/template_launches/2023-02-20_18_27_45-8498022740013370621/job_object..

I have also verified that the options for the job are set correctly. Here's an example of how I'm setting the options using the PipelineOptions class in Python:

pipeline_options = PipelineOptions.from_dictionary({
    'runner': 'DataflowRunner',
    'project': 'testcircle-350611',
    'region': 'us-central1',
    'staging_location': 'gs://dataflow-staging-us-central1-713358881388/staging/',
    'temp_location': 'gs://dataflow-staging-us-central1-713358881388/tmp/',
    'template_location': 'gs://dataflow-staging-us-central1-713358881388/staging/template_launches/',
    'service_account_email': 'xxxx-compute@developer.gserviceaccount.com'
})

here is my JSON file

"resources": {
    "sdkPipelineOptions": {
      "description": "Apache Beam SDK pipeline options",
      "properties": {
        "saveMainSession": "true",
        "runner": "DataflowRunner",
        "project": "testcircle-350611",
        "region": "us-central1",
        "staging_location": "gs://dataflow-staging-us-central1-713358881388/staging/",
        "temp_location": "gs://dataflow-staging-us-central1-713358881388/tmp/",
        "template_location": "gs://dataflow-staging-us-central1-713358881388/staging/template_launches/",
        "service_account_email": "-xxxxcompute@developer.gserviceaccount.com"
      }
    }
  },

and here is my docker file code


RUN pip install --upgrade pip

RUN apt-get update && apt-get install -y default-jdk postgresql-client

ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY etl.py .

ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/etl.py"

ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"] ```




Metadata

Metadata

Assignees

Labels

priority: p2Moderately-important priority. Fix may not be included in next release.samplesIssues that are directly related to samples.triage meI really want to be triaged.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions