Skip to content

Commit f7e3320

Browse files
vdusekPijukatel
andauthored
refactor!: Adapt to the Crawlee v1.0 (#470)
### Description - Integration of the Crawlee v1 changes, mostly new storages & storage clients (introduced in apify/crawlee-python#1194). ### Issues - Closes: #469 - Closes: #540 ### Testing - The current test set covers the changes. --------- Co-authored-by: Josef Prochazka <josef.prochazka@apify.com>
1 parent d3f0f85 commit f7e3320

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

62 files changed

+2077
-1324
lines changed

.github/workflows/run_code_checks.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,6 @@ jobs:
3636

3737
integration_tests:
3838
name: Integration tests
39-
needs: [lint_check, type_check, unit_tests]
4039
uses: apify/workflows/.github/workflows/python_integration_tests.yaml@main
4140
secrets: inherit
4241
with:

Makefile

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,13 @@ type-check:
2626
uv run mypy
2727

2828
unit-tests:
29-
uv run pytest --numprocesses=auto --verbose --cov=src/apify tests/unit
29+
uv run pytest --numprocesses=auto -vv --cov=src/apify tests/unit
3030

3131
unit-tests-cov:
32-
uv run pytest --numprocesses=auto --verbose --cov=src/apify --cov-report=html tests/unit
32+
uv run pytest --numprocesses=auto -vv --cov=src/apify --cov-report=html tests/unit
3333

3434
integration-tests:
35-
uv run pytest --numprocesses=$(INTEGRATION_TESTS_CONCURRENCY) --verbose tests/integration
35+
uv run pytest --numprocesses=$(INTEGRATION_TESTS_CONCURRENCY) -vv tests/integration
3636

3737
format:
3838
uv run ruff check --fix

docs/03_concepts/code/03_dataset_exports.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,14 @@ async def main() -> None:
1111
await dataset.export_to(
1212
content_type='csv',
1313
key='data.csv',
14-
to_key_value_store_name='my-cool-key-value-store',
14+
to_kvs_name='my-cool-key-value-store',
1515
)
1616

1717
# Export the data as JSON
1818
await dataset.export_to(
1919
content_type='json',
2020
key='data.json',
21-
to_key_value_store_name='my-cool-key-value-store',
21+
to_kvs_name='my-cool-key-value-store',
2222
)
2323

2424
# Print the exported records

docs/03_concepts/code/conditional_actor_charge.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,8 +6,8 @@ async def main() -> None:
66
# Check the dataset because there might already be items
77
# if the run migrated or was restarted
88
default_dataset = await Actor.open_dataset()
9-
dataset_info = await default_dataset.get_info()
10-
charged_items = dataset_info.item_count if dataset_info else 0
9+
metadata = await default_dataset.get_metadata()
10+
charged_items = metadata.item_count
1111

1212
# highlight-start
1313
if Actor.get_charging_manager().get_pricing_info().is_pay_per_event:

docs/04_upgrading/upgrading_to_v2.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ id: upgrading-to-v2
33
title: Upgrading to v2
44
---
55

6-
This page summarizes most of the breaking changes between Apify Python SDK v1.x and v2.0.
6+
This page summarizes the breaking changes between Apify Python SDK v1.x and v2.0.
77

88
## Python version support
99

@@ -12,7 +12,7 @@ Support for Python 3.8 has been dropped. The Apify Python SDK v2.x now requires
1212
## Storages
1313

1414
- The SDK now uses [crawlee](https://github.com/apify/crawlee-python) for local storage emulation. This change should not affect intended usage (working with `Dataset`, `KeyValueStore` and `RequestQueue` classes from the `apify.storages` module or using the shortcuts exposed by the `Actor` class) in any way.
15-
- There is a difference in the `RequestQueue.add_request` method: it accepts an `apify.Request` object instead of a free-form dictionary.
15+
- There is a difference in the `RequestQueue.add_request` method: it accepts an `apify.Request` object instead of a free-form dictionary.
1616
- A quick way to migrate from dict-based arguments is to wrap it with a `Request.model_validate()` call.
1717
- The preferred way is using the `Request.from_url` helper which prefills the `unique_key` and `id` attributes, or instantiating it directly, e.g., `Request(url='https://example.tld', ...)`.
1818
- For simple use cases, `add_request` also accepts plain strings that contain an URL, e.g. `queue.add_request('https://example.tld')`.

docs/04_upgrading/upgrading_to_v3.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
---
2+
id: upgrading-to-v2
3+
title: Upgrading to v2
4+
---
5+
6+
This page summarizes the breaking changes between Apify Python SDK v2.x and v3.0.
7+
8+
## Python version support
9+
10+
Support for Python 3.9 has been dropped. The Apify Python SDK v3.x now requires Python 3.10 or later. Make sure your environment is running a compatible version before upgrading.
11+
12+
## Storages
13+
14+
<!-- TODO -->
15+
16+
## Storage clients
17+
18+
<!-- TODO -->

pyproject.toml

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,8 @@ keywords = [
3636
dependencies = [
3737
"apify-client<2.0.0",
3838
"apify-shared<2.0.0",
39-
"crawlee~=0.6.0",
39+
"crawlee@git+https://github.com/apify/crawlee-python.git@master",
40+
"cachetools>=5.5.0",
4041
"cryptography>=42.0.0",
4142
"httpx>=0.27.0",
4243
# TODO: ensure compatibility with the latest version of lazy-object-proxy
@@ -77,6 +78,7 @@ dev = [
7778
"pytest~=8.4.0",
7879
"ruff~=0.12.0",
7980
"setuptools", # setuptools are used by pytest but not explicitly required
81+
"types-cachetools>=6.0.0.20250525",
8082
"uvicorn[standard]",
8183
"werkzeug~=3.1.3", # Werkzeug is used by httpserver
8284
"yarl~=1.20.0", # yarl is used by crawlee
@@ -85,6 +87,9 @@ dev = [
8587
[tool.hatch.build.targets.wheel]
8688
packages = ["src/apify"]
8789

90+
[tool.hatch.metadata]
91+
allow-direct-references = true
92+
8893
[tool.ruff]
8994
line-length = 120
9095
include = ["src/**/*.py", "tests/**/*.py", "docs/**/*.py", "website/**/*.py"]

src/apify/_actor.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,11 @@
3030
from apify._consts import EVENT_LISTENERS_TIMEOUT
3131
from apify._crypto import decrypt_input_secrets, load_private_key
3232
from apify._models import ActorRun
33-
from apify._platform_event_manager import EventManager, LocalEventManager, PlatformEventManager
3433
from apify._proxy_configuration import ProxyConfiguration
3534
from apify._utils import docs_group, docs_name, get_system_info, is_running_in_ipython
36-
from apify.apify_storage_client import ApifyStorageClient
35+
from apify.events import ApifyEventManager, EventManager, LocalEventManager
3736
from apify.log import _configure_logging, logger
37+
from apify.storage_clients import ApifyStorageClient
3838
from apify.storages import Dataset, KeyValueStore, RequestQueue
3939

4040
if TYPE_CHECKING:
@@ -126,11 +126,11 @@ def __init__(
126126

127127
# Create an instance of the cloud storage client, the local storage client is obtained
128128
# from the service locator.
129-
self._cloud_storage_client = ApifyStorageClient.from_config(config=self._configuration)
129+
self._cloud_storage_client = ApifyStorageClient()
130130

131131
# Set the event manager based on whether the Actor is running on the platform or locally.
132132
self._event_manager = (
133-
PlatformEventManager(
133+
ApifyEventManager(
134134
config=self._configuration,
135135
persist_state_interval=self._configuration.persist_state_interval,
136136
)

src/apify/_configuration.py

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,39 @@ class Configuration(CrawleeConfiguration):
140140
),
141141
] = None
142142

143+
default_dataset_id: Annotated[
144+
str,
145+
Field(
146+
validation_alias=AliasChoices(
147+
'actor_default_dataset_id',
148+
'apify_default_dataset_id',
149+
),
150+
description='Default dataset ID used by the Apify storage client when no ID or name is provided.',
151+
),
152+
] = 'default'
153+
154+
default_key_value_store_id: Annotated[
155+
str,
156+
Field(
157+
validation_alias=AliasChoices(
158+
'actor_default_key_value_store_id',
159+
'apify_default_key_value_store_id',
160+
),
161+
description='Default key-value store ID for the Apify storage client when no ID or name is provided.',
162+
),
163+
] = 'default'
164+
165+
default_request_queue_id: Annotated[
166+
str,
167+
Field(
168+
validation_alias=AliasChoices(
169+
'actor_default_request_queue_id',
170+
'apify_default_request_queue_id',
171+
),
172+
description='Default request queue ID for the Apify storage client when no ID or name is provided.',
173+
),
174+
] = 'default'
175+
143176
disable_outdated_warning: Annotated[
144177
bool,
145178
Field(

src/apify/_proxy_configuration.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,8 @@
2020

2121
if TYPE_CHECKING:
2222
from apify_client import ApifyClientAsync
23-
from crawlee import Request
23+
24+
from apify import Request
2425

2526
APIFY_PROXY_VALUE_REGEX = re.compile(r'^[\w._~]+$')
2627
COUNTRY_CODE_REGEX = re.compile(r'^[A-Z]{2}$')

0 commit comments

Comments
 (0)