-
Notifications
You must be signed in to change notification settings - Fork 6.6k
Closed as not planned
Closed as not planned
Copy link
Labels
priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.samplesIssues that are directly related to samples.Issues that are directly related to samples.triage meI really want to be triaged.I really want to be triaged.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.
Description
Hii @beccasaurus @nicain @lukesneeringer @hfwang I'm encountering an error while running a custom Dataflow job using a flex template in Google Cloud Platform (GCP).
A custom pipeline was created using a custom template in which a JSON file was provided. The pipeline was launched
successfully and scheduled to run. However, at some point during the execution, an error occurred, causing the pipeline to fail.
Environment:
Apache Beam version: apache-beam[gcp]==2.44.0
The error message is as follows:
Failed to read the job file: gs://dataflow-staging-us-central1-713358881388/staging/template_launches/2023-02-20_18_27_45-8498022740013370621/job_object with error message: (c20b1cad16245ca5): Unable to open template file: gs://dataflow-staging-us-central1-713358881388/staging/template_launches/2023-02-20_18_27_45-8498022740013370621/job_object..
I have also verified that the options for the job are set correctly. Here's an example of how I'm setting the options using the PipelineOptions class in Python:
pipeline_options = PipelineOptions.from_dictionary({
'runner': 'DataflowRunner',
'project': 'testcircle-350611',
'region': 'us-central1',
'staging_location': 'gs://dataflow-staging-us-central1-713358881388/staging/',
'temp_location': 'gs://dataflow-staging-us-central1-713358881388/tmp/',
'template_location': 'gs://dataflow-staging-us-central1-713358881388/staging/template_launches/',
'service_account_email': 'xxxx-compute@developer.gserviceaccount.com'
})
here is my JSON file
"resources": {
"sdkPipelineOptions": {
"description": "Apache Beam SDK pipeline options",
"properties": {
"saveMainSession": "true",
"runner": "DataflowRunner",
"project": "testcircle-350611",
"region": "us-central1",
"staging_location": "gs://dataflow-staging-us-central1-713358881388/staging/",
"temp_location": "gs://dataflow-staging-us-central1-713358881388/tmp/",
"template_location": "gs://dataflow-staging-us-central1-713358881388/staging/template_launches/",
"service_account_email": "-xxxxcompute@developer.gserviceaccount.com"
}
}
},
and here is my docker file code
RUN pip install --upgrade pip
RUN apt-get update && apt-get install -y default-jdk postgresql-client
ARG WORKDIR=/dataflow/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY etl.py .
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/etl.py"
ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"] ```
Metadata
Metadata
Assignees
Labels
priority: p2Moderately-important priority. Fix may not be included in next release.Moderately-important priority. Fix may not be included in next release.samplesIssues that are directly related to samples.Issues that are directly related to samples.triage meI really want to be triaged.I really want to be triaged.type: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.Error or flaw in code with unintended results or allowing sub-optimal usage patterns.