From b07441fe9bc9822110265fc2c367ae3236443cd7 Mon Sep 17 00:00:00 2001 From: Geoffrey Sneddon Date: Sat, 27 Apr 2013 22:05:02 +0100 Subject: [PATCH 1/4] Update README, and move to using reST. Fixes #5, #22. --- README | 39 ------------------ README.rst | 114 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 114 insertions(+), 39 deletions(-) delete mode 100644 README create mode 100644 README.rst diff --git a/README b/README deleted file mode 100644 index 12a48f30..00000000 --- a/README +++ /dev/null @@ -1,39 +0,0 @@ -html5lib is a pure-python library for parsing HTML. It is designed to -conform to the HTML 5 specification, which has formalized the error handling -algorithms of popular web browsers. - - = Installation = - -html5lib is packaged with distutils. To install it use: - $ python setup.py install - - = Tests = - -You may wish to check that your installation has been a success by -running the testsuite. All the tests can be run by invoking -runtests.py in the html5lib/tests/ directory - - = Usage = - -Simple usage follows this pattern: - -import html5lib -f = open("mydocument.html") -parser = html5lib.HTMLParser() -document = parser.parse(f) - - -More documentation is avaliable in the docstrings or from -http://code.google.com/p/html5lib/wiki/UserDocumentation - - = Bugs = - -Please report any bugs on the issue tracker: -http://code.google.com/p/html5lib/issues/list - - = Get Involved = - -Contributions to code or documenation are actively encouraged. Submit -patches to the issue tracker or discuss changes on irc in the #whatwg -channel on freenode.net - diff --git a/README.rst b/README.rst new file mode 100644 index 00000000..e923619c --- /dev/null +++ b/README.rst @@ -0,0 +1,114 @@ +html5lib +======== + +html5lib is a pure-python library for parsing HTML. It is designed to +conform to the HTML specification, which has formalized the error +handling algorithms of legacy web browsers, and is now implemented by +all major web browsers. + + +Requirements +------------ + +Python 2.6 and above (including 3) are supported. Implementations +known to work are CPython (as the reference implementation) and +PyPy. Jython is known *not* to work due to various bugs in its +implementation of the language. Others such as IronPython may or may +not work; if you wish to try, you are strongly recommended to run the +testsuite and report back! + +The only required library dependency is ``six``, this can be found +packaged in PyPi. + +Optionally: + + - ``datrie`` can be used to improve parsing performance (though in + almost all cases the improvement is trivial); + + - ``lxml`` is supported as a tree format (for both building and + walking) under CPython (but *not* PyPy where it is known to cause + segfaults); + + - ``genshi`` has a treewalker (but not builder); and + + - ``chardet`` (note currently this is only packaged on PyPi for + Python 2, though several package managers include unofficial ports + to Python 3) can be used as a fallback when character encoding + cannot be determined. + + +Installation +------------ + +html5lib is packaged with distutils. To install it use:: + + $ python setup.py install + + +Usage +----- + +Simple usage follows this pattern:: + + import html5lib + with open("mydocument.html", "r") as fp: + document = html5lib.parse(f) + +or:: + + import html5lib + document = html5lib.parse("

Hello World!") + +More documentation is available in the docstrings. + + +Bugs +---- + +Please report any bugs on the `issue tracker +`_. + + +Tests +----- + +These are nowadays contained in the html5lib-tests repository and +included as a submodule, thus for git checkouts they must be +initialized (for release tarballs this is unneeded):: + + $ git submodule init + $ git submodule update + +And then they can be run once ``nose`` has been installed with +``nosetests``. All should pass. + + +Contributing +------------ + +Pull requests are more than welcome — both to the library and to the +documentation. Some useful information: + + - We aim to follow PEP 8 in the library, but ignoring the + 79-character-per-line limit, instead following a soft limit of 99, + but allowing lines over this where it is the readable thing to do. + + - We keep pyflakes reporting no errors or warnings at all times. + + - We keep the master branch passing all tests at all times on all + supported versions. + +Travis CI is run against all pull requests and should enforce all of +the above. + +We also use an external code-review tool, which uses your GitHub login +to authenticate. You'll get emails for changes on the review. + + +Questions? +---------- + +There's a mailing list available for support on Google Groups, +`html5lib-discuss `_, +though you may have more success (and get a far quicker response) +asking on IRC in #whatwg on irc.freenode.net. From 299538efe62d2ef101cf9656042b615a62bb4d14 Mon Sep 17 00:00:00 2001 From: Geoffrey Sneddon Date: Sat, 27 Apr 2013 22:06:54 +0100 Subject: [PATCH 2/4] Ensure README and requirements files are included in sdist. --- MANIFEST.in | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MANIFEST.in b/MANIFEST.in index 0bad7a6c..33b31140 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -1,3 +1,5 @@ include LICENSE +include README.rst +include requirements*.txt graft html5lib/tests/testdata recursive-include html5lib/tests *.py From 576be46074bc8d2729fefa16eb7c2729ff66c458 Mon Sep 17 00:00:00 2001 From: Geoffrey Sneddon Date: Sat, 27 Apr 2013 22:10:46 +0100 Subject: [PATCH 3/4] fixup! Update README, and move to using reST. Fixes #5, #22. --- README.rst | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/README.rst b/README.rst index e923619c..b057749e 100644 --- a/README.rst +++ b/README.rst @@ -22,19 +22,19 @@ packaged in PyPi. Optionally: - - ``datrie`` can be used to improve parsing performance (though in - almost all cases the improvement is trivial); +- ``datrie`` can be used to improve parsing performance (though in + almost all cases the improvement is trivial); - - ``lxml`` is supported as a tree format (for both building and - walking) under CPython (but *not* PyPy where it is known to cause - segfaults); +- ``lxml`` is supported as a tree format (for both building and + walking) under CPython (but *not* PyPy where it is known to cause + segfaults); - - ``genshi`` has a treewalker (but not builder); and +- ``genshi`` has a treewalker (but not builder); and - - ``chardet`` (note currently this is only packaged on PyPi for - Python 2, though several package managers include unofficial ports - to Python 3) can be used as a fallback when character encoding - cannot be determined. +- ``chardet`` (note currently this is only packaged on PyPi for + Python 2, though several package managers include unofficial ports + to Python 3) can be used as a fallback when character encoding + cannot be determined. Installation @@ -89,14 +89,14 @@ Contributing Pull requests are more than welcome — both to the library and to the documentation. Some useful information: - - We aim to follow PEP 8 in the library, but ignoring the - 79-character-per-line limit, instead following a soft limit of 99, - but allowing lines over this where it is the readable thing to do. +- We aim to follow PEP 8 in the library, but ignoring the + 79-character-per-line limit, instead following a soft limit of 99, + but allowing lines over this where it is the readable thing to do. - - We keep pyflakes reporting no errors or warnings at all times. +- We keep pyflakes reporting no errors or warnings at all times. - - We keep the master branch passing all tests at all times on all - supported versions. +- We keep the master branch passing all tests at all times on all + supported versions. Travis CI is run against all pull requests and should enforce all of the above. From 58f577e213bc4c11bff0d094f02b8948d55c3360 Mon Sep 17 00:00:00 2001 From: Geoffrey Sneddon Date: Sun, 28 Apr 2013 09:59:45 +0100 Subject: [PATCH 4/4] fixup! Update README, and move to using reST. Fixes #5, #22. --- README.rst | 35 +++++++++++++++++------------------ 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/README.rst b/README.rst index b057749e..1c25df37 100644 --- a/README.rst +++ b/README.rst @@ -2,20 +2,19 @@ html5lib ======== html5lib is a pure-python library for parsing HTML. It is designed to -conform to the HTML specification, which has formalized the error -handling algorithms of legacy web browsers, and is now implemented by -all major web browsers. +conform to the HTML specification, as is implemented by all major web +browsers. Requirements ------------ -Python 2.6 and above (including 3) are supported. Implementations -known to work are CPython (as the reference implementation) and -PyPy. Jython is known *not* to work due to various bugs in its -implementation of the language. Others such as IronPython may or may -not work; if you wish to try, you are strongly recommended to run the -testsuite and report back! +Python 2.6 and above as well as Python 3.0 and above are +supported. Implementations known to work are CPython (as the reference +implementation) and PyPy. Jython is known *not* to work due to various +bugs in its implementation of the language. Others such as IronPython +may or may not work; if you wish to try, you are strongly encouraged +to run the testsuite and report back! The only required library dependency is ``six``, this can be found packaged in PyPi. @@ -23,7 +22,7 @@ packaged in PyPi. Optionally: - ``datrie`` can be used to improve parsing performance (though in - almost all cases the improvement is trivial); + almost all cases the improvement is marginal); - ``lxml`` is supported as a tree format (for both building and walking) under CPython (but *not* PyPy where it is known to cause @@ -31,10 +30,10 @@ Optionally: - ``genshi`` has a treewalker (but not builder); and -- ``chardet`` (note currently this is only packaged on PyPi for +- ``chardet`` can be used as a fallback when character encoding cannot + be determined (note currently this is only packaged on PyPi for Python 2, though several package managers include unofficial ports - to Python 3) can be used as a fallback when character encoding - cannot be determined. + to Python 3). Installation @@ -72,15 +71,15 @@ Please report any bugs on the `issue tracker Tests ----- -These are nowadays contained in the html5lib-tests repository and -included as a submodule, thus for git checkouts they must be -initialized (for release tarballs this is unneeded):: +These are contained in the html5lib-tests repository and included as a +submodule, thus for git checkouts they must be initialized (for +release tarballs this is unneeded):: $ git submodule init $ git submodule update -And then they can be run once ``nose`` has been installed with -``nosetests``. All should pass. +And then they can be run, with ``nose`` installed, using the +``nosetests`` command in the root directory. All should pass. Contributing