Skip to content

Possible Memory leak #295

Open
Open
@LiKao

Description

@LiKao

It seems that couchdb-python or some of the libraries used leaks memory. This can lead to memory exhaustion on big tasks. A simple example using only reads from a server with about 20k documents:

import psutil
import os
import gc

from couchdb import Server

def show_memory():
    process = psutil.Process(os.getpid())
    meminfo = process.memory_info()
    print('Memory usage:')
    print("\tResident: %d (kb)" %(meminfo[0]/1024))
    print("\tVirtual:  %d (kb)" %(meminfo[1]/1024))


server = Server(url)

print("Before")
show_memory()

for x in db:
   pass

print("After")
show_memory()

print("After collect")
gc.collect()
show_memory()

print("DB deleted")
del db
show_memory()

print("Server deleted")
del server
show_memory()

Output:

Before
Memory usage:
    Resident: 16444 (kb)
    Virtual:  98680 (kb)
After
Memory usage:
    Resident: 18444 (kb)
    Virtual:  102928 (kb)
After collect
Memory usage:
    Resident: 17932 (kb)
    Virtual:  102416 (kb)
DB deleted
Memory usage:
    Resident: 17932 (kb)
    Virtual:  102416 (kb)
Server deleted
Memory usage:
    Resident: 17932 (kb)
    Virtual:  102416 (kb)

I.e. the memory is retained even after all resources are removed. During batch import of large datasets this can lead to memory exhaustion on some systems:

Testcase: Import of 50k randomly generated documents, 50 fields per document, 50bytes per field. Ids provided by using the uuid4() method in python. Import in batches of 20k Documents using the db.update(docs) method. Batches are generated one at a time and then deleted.

Output:

Generating batch nr. 0
Memory usage:
    Resident: 19096 (kb)
    Virtual:  225364 (kb)
Before Upload
Memory usage:
    Resident: 221952 (kb)
    Virtual:  428276 (kb)
Uploading Batch nr 0
Upload done, docs deleted, gc.collect()
Memory usage:
    Resident: 87528 (kb)
    Virtual:  294720 (kb)
Generating batch nr. 1
Memory usage:
    Resident: 87532 (kb)
    Virtual:  294724 (kb)
Before Upload
Memory usage:
    Resident: 226804 (kb)
    Virtual:  433988 (kb)
Uploading Batch nr 1
Traceback (most recent call last):
  File "./benchmark.py", line 198, in <module>
    do_benchmark()
  File "./benchmark.py", line 153, in do_benchmark
    db.update(docs)
  File "/usr/local/lib/python2.7/site-packages/couchdb/client.py", line 785, in update
    _, _, data = self.resource.post_json('_bulk_docs', body=content)
  File "/usr/local/lib/python2.7/site-packages/couchdb/http.py", line 545, in post_json
    **params)
  File "/usr/local/lib/python2.7/site-packages/couchdb/http.py", line 564, in _request_json
    headers=headers, **params)
  File "/usr/local/lib/python2.7/site-packages/couchdb/http.py", line 560, in _request
    credentials=self.credentials)
  File "/usr/local/lib/python2.7/site-packages/couchdb/http.py", line 261, in request
    body = json.encode(body).encode('utf-8')
  File "/usr/local/lib/python2.7/site-packages/couchdb/json.py", line 69, in encode
    return _encode(obj)
  File "/usr/local/lib/python2.7/site-packages/couchdb/json.py", line 117, in <lambda>
    dumps(obj, allow_nan=False, ensure_ascii=False)
  File "/usr/lib64/python2.7/dist-packages/simplejson/__init__.py", line 386, in dumps
    **kw).encode(obj)
  File "/usr/lib64/python2.7/dist-packages/simplejson/encoder.py", line 275, in encode
    return u''.join(chunks)
MemoryError

I.e. first update works, second update fails with memory error. This indicates that the problem is not caused by the batch size alone, as then either both uploads would have to fail or both had to work fine. Instead some resource does not seem to get freed correctly between uploads.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions