Skip to content

Commit 3bee4e8

Browse files
authored
Merge pull request SpiderClub#12 from ResolveWang/master
add default settings when using docker;add test results at english readme
2 parents 8027034 + de2875b commit 3bee4e8

File tree

2 files changed

+31
-10
lines changed

2 files changed

+31
-10
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,10 @@ print(fetcher.get_proxies()) # or print(fetcher.pool)
7777
> pip install -U docker-compose
7878
7979
- 修改[settings.py](config/settings.py)中的`SPLASH_URL``REDIS_HOST`参数
80-
80+
```python3
81+
SPLASH_URL = 'http://splash:8050'
82+
REDIS_HOST = 'redis'
83+
```
8184
- 使用*docker-compose*启动各个应用组件
8285
> docker-compose up
8386

README_EN.md

Lines changed: 27 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -77,18 +77,21 @@ print(fetcher.get_proxies()) # or print(fetcher.pool)
7777
> pip install -U docker-compose
7878
7979
- Change`SPLASH_URL`and`REDIS_HOST`in [settings.py](config/settings.py)
80-
80+
```python3
81+
SPLASH_URL = 'http://splash:8050'
82+
REDIS_HOST = 'redis'
83+
```
8184
- Start all the containers using docker-compose
8285
> docker-compose up
8386
8487
- Use [py_cli](client/py_cli.py) or Squid to get available proxy ips.
85-
```python3
86-
from client.py_cli import ProxyFetcher
87-
args = dict(host='127.0.0.1', port=6379, password='123456', db=0)
88-
fetcher = ProxyFetcher('https', strategy='greedy', length=5, redis_args=args)
89-
print(fetcher.get_proxy())
90-
print(fetcher.get_proxies()) # or print(fetcher.pool)
91-
```
88+
```python3
89+
from client.py_cli import ProxyFetcher
90+
args = dict(host='127.0.0.1', port=6379, password='123456', db=0)
91+
fetcher = ProxyFetcher('https', strategy='greedy', length=5, redis_args=args)
92+
print(fetcher.get_proxy())
93+
print(fetcher.get_proxies()) # or print(fetcher.pool)
94+
```
9295

9396
or
9497

@@ -107,10 +110,25 @@ print(resp.text)
107110
just do it at your own risk
108111
- If there is no Great Fire Wall at your country,set`proxy_mode=0` in both [gfw_spider.py](crawler/spiders/gfw_spider.py) and [ajax_gfw_spider.py](crawler/spiders/ajax_gfw_spider.py).
109112
If you don't want to crawl some websites, set `enable=0` in [rules.py](config/rules.py)
110-
- Becase of the Great Fire Wall in China, some proxy ip may can't be used to crawl some websites.You can extend the proxy pool by yourself in [spiders](crawler/spiders)
113+
- Becase of the Great Fire Wall in China, some proxy ip may can't be used to crawl some websites such as Google.You can extend the proxy pool by yourself in [spiders](crawler/spiders)
111114
- Issues and PRs are welcome
112115
- Just star it if it's useful to you
113116

117+
# Test Result
118+
Here are test results for crawling https://zhihu.com using `haipproxy`.Source Code can be seen [here](examples/zhihu)
119+
120+
|requests|time|cost|strategy|client|
121+
|-----|----|---|---------|-----|
122+
|0|2018/03/03 22:03|0|greedy|[py_cli](client/py_cli.py)|
123+
|10000|2018/03/03 11:03|1 hour|greedy|[py_cli](client/py_cli.py)|
124+
|20000|2018/03/04 00:08|2 hours|greedy|[py_cli](client/py_cli.py)|
125+
|30000|2018/03/04 01:02|3 hours|greedy|[py_cli](client/py_cli.py)|
126+
|40000|2018/03/04 02:15|4 hours|greedy|[py_cli](client/py_cli.py)|
127+
|50000|2018/03/04 03:03|5 hours|greedy|[py_cli](client/py_cli.py)|
128+
|60000|2018/03/04 05:18|7 hours|greedy|[py_cli](client/py_cli.py)|
129+
|70000|2018/03/04 07:11|9 hours|greedy|[py_cli](client/py_cli.py)|
130+
|80000|2018/03/04 08:43|11 hours|greedy|[py_cli](client/py_cli.py)|
131+
114132
# Reference
115133
Thanks to all the contributors of the following projects.
116134

0 commit comments

Comments
 (0)