Skip to content

Commit 77dcb1c

Browse files
author
codebasics
committed
nltk vs spacy
1 parent e33dd73 commit 77dcb1c

File tree

1 file changed

+238
-0
lines changed

1 file changed

+238
-0
lines changed

2_nltk_vs_spacy/Spacy vs NLTK.ipynb

Lines changed: 238 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,238 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"<h3>Installation instructions</h3>"
8+
]
9+
},
10+
{
11+
"cell_type": "markdown",
12+
"metadata": {},
13+
"source": [
14+
"pip install spacy\n",
15+
"\n",
16+
"python -m spacy download en"
17+
]
18+
},
19+
{
20+
"cell_type": "markdown",
21+
"metadata": {},
22+
"source": [
23+
"pip install nltk"
24+
]
25+
},
26+
{
27+
"cell_type": "markdown",
28+
"metadata": {},
29+
"source": [
30+
"<h3>Sentence & Word Tokenization In Spacy</h3>"
31+
]
32+
},
33+
{
34+
"cell_type": "code",
35+
"execution_count": 9,
36+
"metadata": {},
37+
"outputs": [],
38+
"source": [
39+
"import spacy"
40+
]
41+
},
42+
{
43+
"cell_type": "code",
44+
"execution_count": 12,
45+
"metadata": {},
46+
"outputs": [],
47+
"source": [
48+
"nlp = spacy.load(\"en_core_web_sm\")\n",
49+
"\n",
50+
"doc = nlp(\"Dr. Strange loves pav bhaji of mumbai. Hulk loves chat of delhi\")"
51+
]
52+
},
53+
{
54+
"cell_type": "code",
55+
"execution_count": 13,
56+
"metadata": {},
57+
"outputs": [
58+
{
59+
"name": "stdout",
60+
"output_type": "stream",
61+
"text": [
62+
"Dr. Strange loves pav bhaji of mumbai.\n",
63+
"Hulk loves chat of delhi\n"
64+
]
65+
}
66+
],
67+
"source": [
68+
"for sentence in doc.sents:\n",
69+
" print(sentence)"
70+
]
71+
},
72+
{
73+
"cell_type": "code",
74+
"execution_count": 14,
75+
"metadata": {
76+
"scrolled": true
77+
},
78+
"outputs": [
79+
{
80+
"name": "stdout",
81+
"output_type": "stream",
82+
"text": [
83+
"Dr.\n",
84+
"Strange\n",
85+
"loves\n",
86+
"pav\n",
87+
"bhaji\n",
88+
"of\n",
89+
"mumbai\n",
90+
".\n",
91+
"Hulk\n",
92+
"loves\n",
93+
"chat\n",
94+
"of\n",
95+
"delhi\n"
96+
]
97+
}
98+
],
99+
"source": [
100+
"for sentence in doc.sents:\n",
101+
" for word in sentence:\n",
102+
" print(word)"
103+
]
104+
},
105+
{
106+
"cell_type": "markdown",
107+
"metadata": {},
108+
"source": [
109+
"<h3>Sentence & Word Tokenization In NLTK</h3>"
110+
]
111+
},
112+
{
113+
"cell_type": "code",
114+
"execution_count": 4,
115+
"metadata": {
116+
"scrolled": true
117+
},
118+
"outputs": [
119+
{
120+
"name": "stderr",
121+
"output_type": "stream",
122+
"text": [
123+
"[nltk_data] Downloading package punkt to\n",
124+
"[nltk_data] C:\\Users\\dhava\\AppData\\Roaming\\nltk_data...\n",
125+
"[nltk_data] Unzipping tokenizers\\punkt.zip.\n"
126+
]
127+
},
128+
{
129+
"data": {
130+
"text/plain": [
131+
"True"
132+
]
133+
},
134+
"execution_count": 4,
135+
"metadata": {},
136+
"output_type": "execute_result"
137+
}
138+
],
139+
"source": [
140+
"from nltk.tokenize import sent_tokenize\n",
141+
"import nltk\n",
142+
"nltk.download('punkt')"
143+
]
144+
},
145+
{
146+
"cell_type": "code",
147+
"execution_count": 5,
148+
"metadata": {},
149+
"outputs": [
150+
{
151+
"data": {
152+
"text/plain": [
153+
"['Dr.', 'Strange loves pav bhaji of mumbai.', 'Hulk loves chat of delhi']"
154+
]
155+
},
156+
"execution_count": 5,
157+
"metadata": {},
158+
"output_type": "execute_result"
159+
}
160+
],
161+
"source": [
162+
"sent_tokenize(\"Dr. Strange loves pav bhaji of mumbai. Hulk loves chat of delhi\")"
163+
]
164+
},
165+
{
166+
"cell_type": "code",
167+
"execution_count": 6,
168+
"metadata": {},
169+
"outputs": [],
170+
"source": [
171+
"from nltk.tokenize import word_tokenize"
172+
]
173+
},
174+
{
175+
"cell_type": "code",
176+
"execution_count": 8,
177+
"metadata": {
178+
"scrolled": true
179+
},
180+
"outputs": [
181+
{
182+
"data": {
183+
"text/plain": [
184+
"['Dr',\n",
185+
" '.',\n",
186+
" 'Strange',\n",
187+
" 'loves',\n",
188+
" 'pav',\n",
189+
" 'bhaji',\n",
190+
" 'of',\n",
191+
" 'mumbai',\n",
192+
" '.',\n",
193+
" 'Hulk',\n",
194+
" 'loves',\n",
195+
" 'chat',\n",
196+
" 'of',\n",
197+
" 'delhi']"
198+
]
199+
},
200+
"execution_count": 8,
201+
"metadata": {},
202+
"output_type": "execute_result"
203+
}
204+
],
205+
"source": [
206+
"word_tokenize(\"Dr. Strange loves pav bhaji of mumbai. Hulk loves chat of delhi\")"
207+
]
208+
},
209+
{
210+
"cell_type": "markdown",
211+
"metadata": {},
212+
"source": [
213+
"**From above code you can see that Spacy is object oriented whereas NLTK is a string processing library**"
214+
]
215+
}
216+
],
217+
"metadata": {
218+
"kernelspec": {
219+
"display_name": "Python 3",
220+
"language": "python",
221+
"name": "python3"
222+
},
223+
"language_info": {
224+
"codemirror_mode": {
225+
"name": "ipython",
226+
"version": 3
227+
},
228+
"file_extension": ".py",
229+
"mimetype": "text/x-python",
230+
"name": "python",
231+
"nbconvert_exporter": "python",
232+
"pygments_lexer": "ipython3",
233+
"version": "3.8.5"
234+
}
235+
},
236+
"nbformat": 4,
237+
"nbformat_minor": 4
238+
}

0 commit comments

Comments
 (0)