Skip to content

Commit 8b5c5f9

Browse files
committed
test
1 parent 5df9c92 commit 8b5c5f9

File tree

28 files changed

+498
-20
lines changed

28 files changed

+498
-20
lines changed

Captcha1/ReadMe.md

Lines changed: 17 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,34 @@
1+
### 验证码识别项目第一版:Captcha1
2+
13
本项目采用Tesseract V3.01版本(V3.02版本在训练时有改动,多shapeclustering过程)
24

3-
Tesseract用法:
5+
**Tesseract用法:**
46
* 配置环境变量TESSDATA_PREFIX =“D:\Tesseract-ocr\”,即tessdata的目录,在源码中会到这个路径下查找相应的字库文件用来识别。
57
* 命令格式:
68
`tesseract imagename outputbase [-l lang] [-psm pagesegmode] [configfile...]`
79
* 只识别成数字
810
`tesseract imagename outputbase -l eng digits`
911
* 解决empty page!!
10-
**-psm N**
12+
**-psm N**
1113

12-
7 = Treat the image as a single text line
13-
tesseract imagename outputbase -l eng -psm 7
14+
7 = Treat the image as a single text line
15+
tesseract imagename outputbase -l eng -psm 7
1416
* configfile 参数值为tessdata\configs 和 tessdata\tessconfigs 目录下的文件名:
1517
`tesseract imagename outputbase -l eng nobatch`
1618

1719

18-
**验证码识别项目使用方法1:**
19-
将下载的图片放到./pic目录下,
20+
**验证码识别项目使用方法1:**
21+
22+
* 将下载的图片放到./pic目录下,
2023

2124
验证码图片名称:get_random.jpg
22-
价格图片名称:get_price_img.png
23-
命令格式:
25+
价格图片名称:get_price_img.png
26+
27+
* 命令格式:
2428

2529
验证码图片识别:python tess_test.py ./pic/get_random.jpg
26-
价格图片识别:python tess_test.py ./pic/get_price_img.png
27-
打印出识别的结果,若要将结果存在临时文本文件temp.txt中,则修改pytessr_pro.py中代码"cleanup_scratch_flag = True"改为"cleanup_scratch_flag = False"
30+
价格图片识别:python tess_test.py ./pic/get_price_img.png
31+
32+
打印出识别的结果
33+
34+
若要将结果存在临时文本文件**temp.txt**中,则修改pytessr_pro.py中代码"**cleanup_scratch_flag = True**"改为"**cleanup_scratch_flag = False**"

NewsSpider/ReadMe.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
### 网络爬虫之最基本的爬虫:爬取[网易新闻排行榜](http://news.163.com/rank/)
22

3-
一些说明:
3+
**一些说明:**
4+
45
* 使用urllib2或requests包来爬取页面。
6+
57
* 使用正则表达式分析一级页面,使用Xpath来分析二级页面。
8+
69
* 将得到的标题和链接,保存为本地文件。

QunarSpider/ReadMe.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
### 网络爬虫之Selenium使用代理登陆:爬取[去哪儿](http://flight.qunar.com/)网站
22

3-
一些说明:
3+
**一些说明:**
4+
45
* 使用selenium模拟浏览器登陆,获取翻页操作。
6+
57
* 代理可以存入一个文件,程序读取并使用。
8+
69
* 支持多进程抓取。

ReadMe.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -224,7 +224,7 @@ Selenium是一款自动化测试工具。它能实现操纵浏览器,包括字
224224

225225
可以利用开源的Tesseract-OCR系统进行验证码图片的下载及识别,将识别的字符传到爬虫系统进行模拟登陆。当然也可以将验证码图片上传到打码平台上进行识别。如果不成功,可以再次更新验证码识别,直到成功为止。
226226

227-
参考项目:[Captcha1](https://github.com/lining0806/PythonSpiderNotes/tree/master/Captcha1)
227+
参考项目:[验证码识别项目第一版:Captcha1](https://github.com/lining0806/PythonSpiderNotes/tree/master/Captcha1)
228228

229229
**爬取有两个需要注意的问题:**
230230

Spider_Java/README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1-
# Spider
1+
### Spider_Java
2+
23
抓取网址:华尔街见闻http://live.wallstreetcn.com/
3-
单线程抓取
4+
单线程抓取 Spider_Java1
5+
多线程抓取 Spider_Java2
File renamed without changes.
File renamed without changes.

Spider_Java/Spider_Java2/.classpath

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<classpath>
3+
<classpathentry kind="src" path="src"/>
4+
<classpathentry kind="con" path="org.eclipse.jdt.launching.JRE_CONTAINER"/>
5+
<classpathentry kind="lib" path="lib/mongo-java-driver-2.13.0-rc1.jar"/>
6+
<classpathentry kind="output" path="bin"/>
7+
</classpath>

Spider_Java/Spider_Java2/.project

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<projectDescription>
3+
<name>Spider</name>
4+
<comment></comment>
5+
<projects>
6+
</projects>
7+
<buildSpec>
8+
<buildCommand>
9+
<name>org.eclipse.jdt.core.javabuilder</name>
10+
<arguments>
11+
</arguments>
12+
</buildCommand>
13+
</buildSpec>
14+
<natures>
15+
<nature>org.eclipse.jdt.core.javanature</nature>
16+
</natures>
17+
</projectDescription>
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
/**
2+
*
3+
*/
4+
package synchronizetest;
5+
6+
/**
7+
* @author FIRELING
8+
*
9+
*/
10+
public class Test
11+
{
12+
public static void main(String[] args)
13+
{
14+
Reservoir r = new Reservoir(100);
15+
Booth b1 = new Booth(r);
16+
Booth b2 = new Booth(r);
17+
Booth b3 = new Booth(r);
18+
}
19+
}
20+
/**
21+
* contain shared resource
22+
*/
23+
class Reservoir {
24+
private int total;
25+
public Reservoir(int t)
26+
{
27+
this.total = t;
28+
}
29+
/**
30+
* Thread safe method
31+
* serialized access to Booth.total
32+
*/
33+
public synchronized boolean sellTicket() // 利用synchronized修饰符同步了整个方法
34+
{
35+
if(this.total > 0) {
36+
this.total = this.total-1;
37+
return true; // successfully sell one
38+
}
39+
else {
40+
return false; // no more tickets
41+
}
42+
}
43+
}
44+
/**
45+
* create new thread by inheriting Thread
46+
*/
47+
class Booth extends Thread {
48+
private static int threadID = 0; // owned by Class object
49+
50+
private Reservoir release; // sell this reservoir
51+
private int count = 0; // owned by this thread object
52+
/**
53+
* constructor
54+
*/
55+
public Booth(Reservoir r) {
56+
super("ID:"+(++threadID));
57+
this.release = r; // all threads share the same reservoir
58+
this.start();
59+
}
60+
/**
61+
* convert object to string
62+
*/
63+
public String toString() {
64+
return super.getName();
65+
}
66+
/**
67+
* what does the thread do?
68+
*/
69+
public void run() {
70+
while(true) { // 循环体!!!
71+
if(this.release.sellTicket()) {
72+
this.count = this.count+1;
73+
System.out.println(this.getName()+":sell 1");
74+
try {
75+
sleep((int) Math.random()*100); // random intervals
76+
// sleep(100); // 若sleep时间相同,则每个窗口买票相当
77+
}
78+
catch (InterruptedException e) {
79+
throw new RuntimeException(e);
80+
}
81+
}
82+
else {
83+
break;
84+
}
85+
}
86+
System.out.println(this.getName()+" I sold:"+count);
87+
}
88+
}
89+

0 commit comments

Comments
 (0)