diff --git a/.github/workflows/test_and_build.yaml b/.github/workflows/test_and_build.yaml new file mode 100644 index 0000000..b29de88 --- /dev/null +++ b/.github/workflows/test_and_build.yaml @@ -0,0 +1,31 @@ +name: Tests + +on: [push, pull_request] + +jobs: + build: + + runs-on: ubuntu-latest + strategy: + matrix: + python-version: ["3.8", "3.9", "3.10", "3.11", "3.12"] + + steps: + - uses: actions/checkout@v3 + - name: Set up Python ${{ matrix.python-version }} + uses: actions/setup-python@v4 + with: + python-version: ${{ matrix.python-version }} + - name: Install dependencies + run: | + python -m pip install --upgrade pip + pip install -r test_requirements.txt + - name: Lint with flake8 + run: | + make lint + - name: Check typing with mypy + run: | + make typing + - name: Test with pytest + run: | + make test diff --git a/.travis.yml b/.travis.yml deleted file mode 100644 index 0ce2828..0000000 --- a/.travis.yml +++ /dev/null @@ -1,21 +0,0 @@ -os: linux -dist: xenial -language: python -cache: pip - -python: -#- '2.7' -- '3.6' -- '3.7' -- '3.8' -- '3.9' - -install: -- make install - -script: -- make lint -- make test - -after_success: -- bash <(curl -s https://codecov.io/bash) diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 0000000..b2d3c55 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,168 @@ +2.2.0 (June 20, 2024) +--------------------- + +* Email addresses with internationalized local parts could, with rare Unicode characters, be returned as valid but actually be invalid in their normalized form (returned in the `normalized` field). In particular, it is possible to get a normalized address with a ";" character, which is not valid and could change the interpretation of the address. Local parts now re-validated after Unicode NFC normalization to ensure that invalid characters cannot be injected into the normalized address and that characters with length-increasing NFC normalizations cannot cause a local part to exceed the maximum length after normalization. Thanks to khanh@calif.io from https://calif.io for reporting the issue. +* The length check for email addresses with internationalized local parts is now also applied to the original address string prior to Unicode NFC normalization, which may be longer and could exceed the maximum email address length, to protect callers who do not use the returned normalized address. +* Improved error message for IDNA domains that are too long or have invalid characters after Unicode normalization. +* A new option to parse `My Name ` strings, i.e. a display name plus an email address in angle brackets, is now available. It is off by default. +* Improvements to Python typing. +* Some additional tests added. + +2.1.2 (June 16, 2024) +--------------------- + +* The domain name length limit is corrected from 255 to 253 IDNA ASCII characters. I misread the RFCs. +* When a domain name has no MX record but does have an A or AAAA record, if none of the IP addresses in the response are globally reachable (i.e. not Private-Use, Loopback, etc.), the response is treated as if there was no A/AAAA response and the email address will fail the deliverability check. +* When a domain name has no MX record but does have an A or AAAA record, the mx field in the object returned by validate_email incorrectly held the IP addresses rather than the domain itself. +* Fixes in tests. + +2.1.1 (February 26, 2024) +------------------------- + +* Fixed typo 'marking' instead of 'marketing' in case-insensitive mailbox name list. +* When DNS-based deliverability checks fail, in some cases exceptions are now thrown with `raise ... from` for better nested exception tracking. +* Fixed tests to work when no local resolver can be configured. +* This project is now licensed under the Unlicense (instead of CC0). +* Minor improvements to tests. +* Minor improvements to code style. + +2.1.0 (October 22, 2023) +------------------------ + +* Python 3.8+ is now required (support for Python 3.7 was dropped). +* The old `email` field on the returned `ValidatedEmail` object, which in the previous version was superseded by `normalized`, will now raise a deprecation warning if used. See https://stackoverflow.com/q/879173 for strategies to suppress the DeprecationWarning. +* A `__version__` module attribute is added. +* The email address argument to validate_email is now marked as positional-only to better reflect the documented usage using the new Python 3.8 feature. + +2.0.0 (April 15, 2023) +---------------------- + +This is a major update to the library, but since email address specs haven't changed there should be no significant changes to which email addresses are considered valid or invalid with default options. There are new options for accepting unusual email addresses that were previously always rejected, some changes to how DNS errors are handled, many changes in error message text, and major internal improvements including the addition of type annotations. Python 3.7+ is now required. Details follow: + +* Python 2.x and 3.x versions through 3.6, and dnspython 1.x, are no longer supported. Python 3.7+ with dnspython 2.x are now required. +* The dnspython package is no longer required if DNS checks are not used, although it will install automatically. +* NoNameservers and NXDOMAIN DNS errors are now handled differently: NoNameservers no longer fails validation, and NXDOMAIN now skips checking for an A/AAAA fallback and goes straight to failing validation. +* Some syntax error messages have changed because they are now checked explicitly rather than as a part of other checks. +* The quoted-string local part syntax (e.g. multiple @-signs, spaces, etc. if surrounded by quotes) and domain-literal addresses (e.g. @[192.XXX...] or @[IPv6:...]) are now parsed but not considered valid by default. Better error messages are now given for these addresses since it can be confusing for a technically valid address to be rejected, and new allow_quoted_local and allow_domain_literal options are added to allow these addresses if you really need them. +* Some other error messages have changed to not repeat the email address in the error message. +* The `email` field on the returned `ValidatedEmail` object has been renamed to `normalized` to be clearer about its importance, but access via `.email` is also still supported. +* Some mailbox names like `postmaster` are now normalized to lowercase per RFC 2142. +* The library has been reorganized internally into smaller modules. +* The tests have been reorganized and expanded. Deliverability tests now mostly use captured DNS responses so they can be run off-line. +* The __main__ tool now reads options to validate_email from environment variables. +* Type annotations have been added to the exported methods and the ValidatedEmail class and some internal methods. +* The old dict-like pattern for the return value of validate_email is deprecated. + +Versions 2.0.0.post1 and 2.0.0.post2 corrected some packaging issues. 2.0.0.post2 also added a check for an invalid combination of arguments. + +Version 1.3.1 (January 21, 2023) +-------------------------------- + +* The new SPF 'v=spf1 -all' (reject-all) deliverability check is removed in most cases. It now is performed only for domains that do not have MX records but do have an A/AAAA fallback record. + +Version 1.3.0 (September 18, 2022) +---------------------------------- + +* Deliverability checks now check for 'v=spf1 -all' SPF records as a way to reject more bad domains. +* Special use domain names now raise EmailSyntaxError instead of EmailUndeliverableError since they are performed even if check_deliverability is off. +* New module-level attributes are added to override the default values of the keyword arguments and the special-use domains list. +* The keyword arguments of the public methods are now marked as keyword-only, ending support for Python 2.x. +* [pyIsEmail](https://github.com/michaelherold/pyIsEmail)'s test cases are added to the tests. +* Recommend that check_deliverability be set to False for validation on login pages. +* Added an undocumented globally_deliverable option. + +Version 1.2.1 (May 1, 2022) +--------------------------- + +* example.com/net/org are removed from the special-use reserved domain names list so that they do not raise exceptions if check_deliverability is off. +* Improved README. + +Version 1.2.0 (April 24, 2022) +------------------------------ + +* Reject domains with NULL MX records (when deliverability checks + are turned on). +* Reject unsafe unicode characters. (Some of these checks you should + be doing on all of your user inputs already!) +* Reject most special-use reserved domain names with EmailUndeliverableError. A new `test_environment` option is added for using `@*.test` domains. +* Improved safety of exception text by not repeating an unsafe input character in the message. +* Minor fixes in tests. +* Invoking the module as a standalone program now caches DNS queries. +* Improved README. + +Version 1.1.3 (June 12, 2021) +----------------------------- + +* Allow passing a custom dns_resolver so that a DNS cache and a custom timeout can be set. + +Version 1.1.2 (Nov 5, 2020) +--------------------------- + +* Fix invoking the module as a standalone program. +* Fix deprecation warning in Python 3.8. +* Code improvements. +* Improved README. + +Version 1.1.1 (May 19, 2020) +---------------------------- + +* Fix exception when DNS queries time-out. +* Improved README. + +Version 1.1.0 (Spril 30, 2020) +------------------------------ + +* The main function now returns an object with attributes rather than a dict with keys, but accessing the object in the old way is still supported. +* Added overall email address length checks. +* Minor tweak to regular expressions. +* Improved error messages. +* Added tests. +* Linted source code files; changed README to Markdown. + +Version 1.0.5 (Oct 18, 2019) +---------------------------- + +* Prevent resolving domain names as if they were not fully qualified using a local search domain settings. + +Version 1.0.4 (May 2, 2019) +--------------------------- + +* Added a timeout argument for DNS queries. +* The wheel distribution is now a universal wheel. +* Improved README. + +Version 1.0.3 (Sept 12, 2017) +----------------------------- + +* Added a wheel distribution for easier installation. + +Version 1.0.2 (Dec 30, 2016) +---------------------------- + +* Fix dnspython package name in Python 3. +* Improved README. + +Version 1.0.1 (March 6, 2016) +----------------------------- + +* Fixed minor errors. + +Version 1.0.0 (Sept 5, 2015) +---------------------------- + +* Fail domains with a leading period. +* Improved error messages. +* Added tests. + +Version 0.5.0 (June 15, 2015) +----------------------------- + +* Use IDNA 2008 instead of IDNA 2003 and use the idna package's UTS46 normalization instead of our own. +* Fixes for Python 2. +* Improved error messages. +* Improved README. + +Version 0.1.0 (April 21, 2015) +------------------------------ + +Initial release! diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index a0b40f9..76e88b9 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,7 +1,7 @@ -## Public domain +This project is in the public domain. Copyright and related +rights in the work are waived through the [LICENSE](LICENSE) +file in this directory. -This project is in the public domain. Copyright and related rights in the work worldwide are waived through the [CC0 1.0 Universal public domain dedication][CC0]. See the LICENSE file in this directory. - -All contributions to this project must be released under the same CC0 wavier. By submitting a pull request or patch, you are agreeing to comply with this waiver of copyright interest. - -[CC0]: http://creativecommons.org/publicdomain/zero/1.0/ +All contributions to this project must be released under the +same terms. By submitting a pull request or patch, you are +agreeing to comply with this. diff --git a/LICENSE b/LICENSE index 0e259d4..122e7a7 100644 --- a/LICENSE +++ b/LICENSE @@ -1,121 +1,27 @@ -Creative Commons Legal Code - -CC0 1.0 Universal - - CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE - LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN - ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS - INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES - REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS - PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM - THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED - HEREUNDER. - -Statement of Purpose - -The laws of most jurisdictions throughout the world automatically confer -exclusive Copyright and Related Rights (defined below) upon the creator -and subsequent owner(s) (each and all, an "owner") of an original work of -authorship and/or a database (each, a "Work"). - -Certain owners wish to permanently relinquish those rights to a Work for -the purpose of contributing to a commons of creative, cultural and -scientific works ("Commons") that the public can reliably and without fear -of later claims of infringement build upon, modify, incorporate in other -works, reuse and redistribute as freely as possible in any form whatsoever -and for any purposes, including without limitation commercial purposes. -These owners may contribute to the Commons to promote the ideal of a free -culture and the further production of creative, cultural and scientific -works, or to gain reputation or greater distribution for their Work in -part through the use and efforts of others. - -For these and/or other purposes and motivations, and without any -expectation of additional consideration or compensation, the person -associating CC0 with a Work (the "Affirmer"), to the extent that he or she -is an owner of Copyright and Related Rights in the Work, voluntarily -elects to apply CC0 to the Work and publicly distribute the Work under its -terms, with knowledge of his or her Copyright and Related Rights in the -Work and the meaning and intended legal effect of CC0 on those rights. - -1. Copyright and Related Rights. A Work made available under CC0 may be -protected by copyright and related or neighboring rights ("Copyright and -Related Rights"). Copyright and Related Rights include, but are not -limited to, the following: - - i. the right to reproduce, adapt, distribute, perform, display, - communicate, and translate a Work; - ii. moral rights retained by the original author(s) and/or performer(s); -iii. publicity and privacy rights pertaining to a person's image or - likeness depicted in a Work; - iv. rights protecting against unfair competition in regards to a Work, - subject to the limitations in paragraph 4(a), below; - v. rights protecting the extraction, dissemination, use and reuse of data - in a Work; - vi. database rights (such as those arising under Directive 96/9/EC of the - European Parliament and of the Council of 11 March 1996 on the legal - protection of databases, and under any national implementation - thereof, including any amended or successor version of such - directive); and -vii. other similar, equivalent or corresponding rights throughout the - world based on applicable law or treaty, and any national - implementations thereof. - -2. Waiver. To the greatest extent permitted by, but not in contravention -of, applicable law, Affirmer hereby overtly, fully, permanently, -irrevocably and unconditionally waives, abandons, and surrenders all of -Affirmer's Copyright and Related Rights and associated claims and causes -of action, whether now known or unknown (including existing as well as -future claims and causes of action), in the Work (i) in all territories -worldwide, (ii) for the maximum duration provided by applicable law or -treaty (including future time extensions), (iii) in any current or future -medium and for any number of copies, and (iv) for any purpose whatsoever, -including without limitation commercial, advertising or promotional -purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each -member of the public at large and to the detriment of Affirmer's heirs and -successors, fully intending that such Waiver shall not be subject to -revocation, rescission, cancellation, termination, or any other legal or -equitable action to disrupt the quiet enjoyment of the Work by the public -as contemplated by Affirmer's express Statement of Purpose. - -3. Public License Fallback. Should any part of the Waiver for any reason -be judged legally invalid or ineffective under applicable law, then the -Waiver shall be preserved to the maximum extent permitted taking into -account Affirmer's express Statement of Purpose. In addition, to the -extent the Waiver is so judged Affirmer hereby grants to each affected -person a royalty-free, non transferable, non sublicensable, non exclusive, -irrevocable and unconditional license to exercise Affirmer's Copyright and -Related Rights in the Work (i) in all territories worldwide, (ii) for the -maximum duration provided by applicable law or treaty (including future -time extensions), (iii) in any current or future medium and for any number -of copies, and (iv) for any purpose whatsoever, including without -limitation commercial, advertising or promotional purposes (the -"License"). The License shall be deemed effective as of the date CC0 was -applied by Affirmer to the Work. Should any part of the License for any -reason be judged legally invalid or ineffective under applicable law, such -partial invalidity or ineffectiveness shall not invalidate the remainder -of the License, and in such case Affirmer hereby affirms that he or she -will not (i) exercise any of his or her remaining Copyright and Related -Rights in the Work or (ii) assert any associated claims and causes of -action with respect to the Work, in either case contrary to Affirmer's -express Statement of Purpose. - -4. Limitations and Disclaimers. - - a. No trademark or patent rights held by Affirmer are waived, abandoned, - surrendered, licensed or otherwise affected by this document. - b. Affirmer offers the Work as-is and makes no representations or - warranties of any kind concerning the Work, express, implied, - statutory or otherwise, including without limitation warranties of - title, merchantability, fitness for a particular purpose, non - infringement, or the absence of latent or other defects, accuracy, or - the present or absence of errors, whether or not discoverable, all to - the greatest extent permissible under applicable law. - c. Affirmer disclaims responsibility for clearing rights of other persons - that may apply to the Work or any use thereof, including without - limitation any person's Copyright and Related Rights in the Work. - Further, Affirmer disclaims responsibility for obtaining any necessary - consents, permissions or other rights required for any use of the - Work. - d. Affirmer understands and acknowledges that Creative Commons is not a - party to this document and has no duty or obligation with respect to - this CC0 or use of the Work. +This is free and unencumbered software released into the public +domain. + +Anyone is free to copy, modify, publish, use, compile, sell, or +distribute this software, either in source code form or as a +compiled binary, for any purpose, commercial or non-commercial, +and by any means. + +In jurisdictions that recognize copyright laws, the author or +authors of this software dedicate any and all copyright +interest in the software to the public domain. We make this +dedication for the benefit of the public at large and to the +detriment of our heirs and successors. We intend this +dedication to be an overt act of relinquishment in perpetuity +of all present and future rights to this software under +copyright law. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES +OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR +ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF +CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN +CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN +THE SOFTWARE. + +For more information, please refer to diff --git a/MANIFEST.in b/MANIFEST.in index 2f9bf23..b22c457 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -1,2 +1,2 @@ -include email_validator.py include LICENSE README.md +recursive-include tests *.json *.py diff --git a/Makefile b/Makefile index 71f8600..57df9da 100644 --- a/Makefile +++ b/Makefile @@ -11,9 +11,13 @@ lint: #python setup.py check -rms flake8 --ignore=E501,E126,W503 email_validator tests +.PHONY: typing +typing: + mypy email_validator/*.py tests/*.py + .PHONY: test test: - pytest --cov=email_validator + PYTHONPATH=.:$$PYTHONPATH pytest --cov=email_validator -k "not network" .PHONY: testcov testcov: test @@ -21,7 +25,7 @@ testcov: test @coverage html .PHONY: all -all: testcov lint +all: typing testcov lint .PHONY: clean clean: diff --git a/README.md b/README.md index 67885a5..5d4405c 100644 --- a/README.md +++ b/README.md @@ -2,47 +2,40 @@ email-validator: Validate Email Addresses ========================================= A robust email address syntax and deliverability validation library for -Python by [Joshua Tauberer](https://joshdata.me). +Python 3.8+ by [Joshua Tauberer](https://joshdata.me). -This library validates that a string is of the form `name@example.com`. This is -the sort of validation you would want for an email-based login form on -a website. +This library validates that a string is of the form `name@example.com` +and optionally checks that the domain name is set up to receive email. +This is the sort of validation you would want when you are identifying +users by their email address like on a registration form. Key features: -* Checks that an email address has the correct syntax --- good for - login forms or other uses related to identifying users. -* Gives friendly error messages when validation fails (appropriate to show - to end users). -* (optionally) Checks deliverability: Does the domain name resolve? And you can override the default DNS resolver. -* Supports internationalized domain names and (optionally) - internationalized local parts, but blocks unsafe characters. -* Normalizes email addresses (super important for internationalized - addresses! see below). - -The library is NOT for validation of the To: line in an email message -(e.g. `My Name `), which -[flanker](https://github.com/mailgun/flanker) is more appropriate for. -And this library does NOT permit obsolete forms of email addresses, so -if you need strict validation against the email specs exactly, use -[pyIsEmail](https://github.com/michaelherold/pyIsEmail). - -This library is tested with Python 3.6+ but should work in earlier versions: - -[![Build Status](https://app.travis-ci.com/JoshData/python-email-validator.svg?branch=main)](https://app.travis-ci.com/JoshData/python-email-validator) - ---- - -This library was first published in 2015. The current version is 1.2.1 -(posted May 1, 2022). The main changes in version 1.2 are: - -* Rejecting domains with NULL MX records (when deliverability checks - are turned on). -* Rejecting unsafe unicode characters. (Some of these checks you should - be doing on all of your user inputs already!) -* Rejecting most special-use reserved domain names. A new `test_environment` - option is added for using `@*.test` domains. -* Some fixes in the tests. +* Checks that an email address has the correct syntax --- great for + email-based registration/login forms or validating data. +* Gives friendly English error messages when validation fails that you + can display to end-users. +* Checks deliverability (optional): Does the domain name resolve? + (You can override the default DNS resolver to add query caching.) +* Supports internationalized domain names (like `@ツ.life`), + internationalized local parts (like `ツ@example.com`), + and optionally parses display names (e.g. `"My Name" `). +* Rejects addresses with invalid or unsafe Unicode characters, + obsolete email address syntax that you'd find unexpected, + special use domain names like `@localhost`, + and domains without a dot by default. + This is an opinionated library! +* Normalizes email addresses (important for internationalized + and quoted-string addresses! see below). +* Python type annotations are used. + +This is an opinionated library. You should definitely also consider using +the less-opinionated [pyIsEmail](https://github.com/michaelherold/pyIsEmail) +if it works better for you. + +[![Build Status](https://github.com/JoshData/python-email-validator/actions/workflows/test_and_build.yaml/badge.svg)](https://github.com/JoshData/python-email-validator/actions/workflows/test_and_build.yaml) + +View the [CHANGELOG / Release Notes](CHANGELOG.md) for the version history of changes in the library. Occasionally this README is ahead of the latest published package --- see the CHANGELOG for details. --- @@ -55,7 +48,7 @@ This package [is on PyPI](https://pypi.org/project/email-validator/), so: pip install email-validator ``` -`pip3` also works. +(You might need to use `pip3` depending on your local environment.) Quick Start ----------- @@ -66,27 +59,30 @@ account in your application, you might do this: ```python from email_validator import validate_email, EmailNotValidError -email = "my+address@mydomain.tld" +email = "my+address@example.org" try: - # Validate & take the normalized form of the email - # address for all logic beyond this point (especially - # before going to a database query where equality - # does not take into account normalization). - email = validate_email(email).email + + # Check that the email address is valid. Turn on check_deliverability + # for first-time validations like on account creation pages (but not + # login pages). + emailinfo = validate_email(email, check_deliverability=False) + + # After this point, use only the normalized form of the email address, + # especially before going to a database query. + email = emailinfo.normalized + except EmailNotValidError as e: - # email is not valid, exception message is human-readable + + # The exception message is human-readable explanation of why it's + # not a valid (or deliverable) email address. print(str(e)) ``` This validates the address and gives you its normalized form. You should **put the normalized form in your database** and always normalize before -checking if an address is in your database. - -The validator will accept internationalized email addresses, but not all -mail systems can send email to an addresses with non-English characters in -the *local* part of the address (before the @-sign). See the `allow_smtputf8` -option below. +checking if an address is in your database. When using this in a login form, +set `check_deliverability` to `False` to avoid unnecessary DNS queries. Usage ----- @@ -94,8 +90,7 @@ Usage ### Overview The module provides a function `validate_email(email_address)` which -takes an email address (either a `str` or `bytes`, but only non-internationalized -addresses are allowed when passing a `bytes`) and: +takes an email address and: - Raises a `EmailNotValidError` with a helpful, human-readable error message explaining why the email address is not valid, or @@ -104,7 +99,7 @@ addresses are allowed when passing a `bytes`) and: When an email address is not valid, `validate_email` raises either an `EmailSyntaxError` if the form of the address is invalid or an -`EmailUndeliverableError` if the domain name fails the DNS check. Both +`EmailUndeliverableError` if the domain name fails DNS checks. Both exception classes are subclasses of `EmailNotValidError`, which in turn is a subclass of `ValueError`. @@ -112,38 +107,49 @@ But when an email address is valid, an object is returned containing a normalized form of the email address (which you should use!) and other information. -The validator doesn't permit obsoleted forms of email addresses that no -one uses anymore even though they are still valid and deliverable, since +The validator doesn't, by default, permit obsoleted forms of email addresses +that no one uses anymore even though they are still valid and deliverable, since they will probably give you grief if you're using email for login. (See -later in the document about that.) - -The validator checks that the domain name in the email address has a -(non-null) MX DNS record indicating that it is configured for email. -There is nothing to be gained by trying to actually contact an SMTP -server, so that's not done here. For privacy, security, and practicality -reasons servers are good at not giving away whether an address is +later in the document about how to allow some obsolete forms.) + +The validator optionally checks that the domain name in the email address has +a DNS MX record indicating that it can receive email. (Except a Null MX record. +If there is no MX record, a fallback A/AAAA-record is permitted, unless +a reject-all SPF record is present.) DNS is slow and sometimes unavailable or +unreliable, so consider whether these checks are useful for your use case and +turn them off if they aren't. +There is nothing to be gained by trying to actually contact an SMTP server, so +that's not done here. For privacy, security, and practicality reasons, servers +are good at not giving away whether an address is deliverable or not: email addresses that appear to accept mail at first can bounce mail after a delay, and bounced mail may indicate a temporary failure of a good email address (sometimes an intentional failure, like -greylisting). (A/AAAA-record fallback is also checked.) +greylisting). ### Options The `validate_email` function also accepts the following keyword arguments (defaults are as shown below): +`check_deliverability=True`: If true, DNS queries are made to check that the domain name in the email address (the part after the @-sign) can receive mail, as described above. Set to `False` to skip this DNS-based check. It is recommended to pass `False` when performing validation for login pages (but not account creation pages) since re-validation of a previously validated domain in your database by querying DNS at every login is probably undesirable. You can also set `email_validator.CHECK_DELIVERABILITY` to `False` to turn this off for all calls by default. + +`dns_resolver=None`: Pass an instance of [dns.resolver.Resolver](https://dnspython.readthedocs.io/en/latest/resolver-class.html) to control the DNS resolver including setting a timeout and [a cache](https://dnspython.readthedocs.io/en/latest/resolver-caching.html). The `caching_resolver` function shown below is a helper function to construct a dns.resolver.Resolver with a [LRUCache](https://dnspython.readthedocs.io/en/latest/resolver-caching.html#dns.resolver.LRUCache). Reuse the same resolver instance across calls to `validate_email` to make use of the cache. + +`test_environment=False`: If `True`, DNS-based deliverability checks are disabled and `test` and `**.test` domain names are permitted (see below). You can also set `email_validator.TEST_ENVIRONMENT` to `True` to turn it on for all calls by default. + `allow_smtputf8=True`: Set to `False` to prohibit internationalized addresses that would require the [SMTPUTF8](https://tools.ietf.org/html/rfc6531) extension. You can also set `email_validator.ALLOW_SMTPUTF8` to `False` to turn it off for all calls by default. -`check_deliverability=True`: Set to `False` to skip the domain name MX DNS record check. You can also set `email_validator.CHECK_DELIVERABILITY` to `False` to turn it off for all calls by default. +`allow_quoted_local=False`: Set to `True` to allow obscure and potentially problematic email addresses in which the part of the address before the @-sign contains spaces, @-signs, or other surprising characters when the local part is surrounded in quotes (so-called quoted-string local parts). In the object returned by `validate_email`, the normalized local part removes any unnecessary backslash-escaping and even removes the surrounding quotes if the address would be valid without them. You can also set `email_validator.ALLOW_QUOTED_LOCAL` to `True` to turn this on for all calls by default. + +`allow_domain_literal=False`: Set to `True` to allow bracketed IPv4 and "IPv6:"-prefixed IPv6 addresses in the domain part of the email address. No deliverability checks are performed for these addresses. In the object returned by `validate_email`, the normalized domain will use the condensed IPv6 format, if applicable. The object's `domain_address` attribute will hold the parsed `ipaddress.IPv4Address` or `ipaddress.IPv6Address` object if applicable. You can also set `email_validator.ALLOW_DOMAIN_LITERAL` to `True` to turn this on for all calls by default. + +`allow_display_name=False`: Set to `True` to allow a display name and bracketed address in the input string, like `My Name `. It's implemented in the spirit but not the letter of RFC 5322 3.4, so it may be stricter or more relaxed than what you want. The display name, if present, is provided in the returned object's `display_name` field after being unquoted and unescaped. You can also set `email_validator.ALLOW_DISPLAY_NAME` to `True` to turn this on for all calls by default. `allow_empty_local=False`: Set to `True` to allow an empty local part (i.e. `@example.com`), e.g. for validating Postfix aliases. -`dns_resolver=None`: Pass an instance of [dns.resolver.Resolver](https://dnspython.readthedocs.io/en/latest/resolver-class.html) to control the DNS resolver including setting a timeout and [a cache](https://dnspython.readthedocs.io/en/latest/resolver-caching.html). The `caching_resolver` function shown above is a helper function to construct a dns.resolver.Resolver with a [LRUCache](https://dnspython.readthedocs.io/en/latest/resolver-caching.html#dns.resolver.LRUCache). Reuse the same resolver instance across calls to `validate_email` to make use of the cache. - -`test_environment=False`: DNS-based deliverability checks are disabled and `test` and `subdomain.test` domain names are permitted (see below). You can also set `email_validator.TEST_ENVIRONMENT` to `True` to turn it on for all calls by default. ### DNS timeout and cache @@ -155,23 +161,23 @@ from email_validator import validate_email, caching_resolver resolver = caching_resolver(timeout=10) while True: - email = validate_email(email, dns_resolver=resolver).email + validate_email(email, dns_resolver=resolver) ``` ### Test addresses -This library rejects email addresess that use the [Special Use Domain Names](https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml) `invalid`, `localhost`, `test`, and some others by raising `EmailUndeliverableError`. This is to protect your system from abuse: You probably don't want a user to be able to cause an email to be sent to `localhost`. However, in your non-production test environments you may want to use `@test` or `@myname.test` email addresses. There are three ways you can allow this: +This library rejects email addresses that use the [Special Use Domain Names](https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml) `invalid`, `localhost`, `test`, and some others by raising `EmailSyntaxError`. This is to protect your system from abuse: You probably don't want a user to be able to cause an email to be sent to `localhost` (although they might be able to still do so via a malicious MX record). However, in your non-production test environments you may want to use `@test` or `@myname.test` email addresses. There are three ways you can allow this: 1. Add `test_environment=True` to the call to `validate_email` (see above). -2. Set `email_validator.TEST_ENVIRONMENT` to `True`. -3. Remove the special-use domain name that you want to use from `email_validator.SPECIAL_USE_DOMAIN_NAMES`: +2. Set `email_validator.TEST_ENVIRONMENT` to `True` globally. +3. Remove the special-use domain name that you want to use from `email_validator.SPECIAL_USE_DOMAIN_NAMES`, e.g.: ```python import email_validator email_validator.SPECIAL_USE_DOMAIN_NAMES.remove("test") ``` -It is tempting to use `@example.com/net/org` in tests. These domains are reserved to IANA for use in documentation so there is no risk of accidentally emailing someone at those domains. But beware that this library will reject these domain names if DNS-based deliverability checks are not disabled because these domains do not resolve to domains that accept email. In tests, consider using your own domain name or `@test` or `@myname.test` instead. +It is tempting to use `@example.com/net/org` in tests. They are *not* in this library's `SPECIAL_USE_DOMAIN_NAMES` list so you can, but shouldn't, use them. These domains are reserved to IANA for use in documentation so there is no risk of accidentally emailing someone at those domains. But beware that this library will nevertheless reject these domain names if DNS-based deliverability checks are not disabled because these domains do not resolve to domains that accept email. In tests, consider using your own domain name or `@test` or `@myname.test` instead. Internationalized email addresses --------------------------------- @@ -179,8 +185,12 @@ Internationalized email addresses The email protocol SMTP and the domain name system DNS have historically only allowed English (ASCII) characters in email addresses and domain names, respectively. Each has adapted to internationalization in a separate -way, creating two separate aspects to email address -internationalization. +way, creating two separate aspects to email address internationalization. + +(If your mail submission library doesn't support Unicode at all, then +immediately prior to mail submission you must replace the email address with +its ASCII-ized form. This library gives you back the ASCII-ized form in the +`ascii_email` field in the returned object.) ### Internationalized domain names (IDN) @@ -191,13 +201,8 @@ domain names are converted into a special IDNA ASCII "[Punycode](https://www.rfc form starting with `xn--`. When an email address has non-ASCII characters in its domain part, the domain part is replaced with its IDNA ASCII equivalent form in the process of mail transmission. Your mail -submission library probably does this for you transparently. Note that -most web browsers are currently in transition between IDNA 2003 (RFC -3490) and IDNA 2008 (RFC 5891) and [compliance around the web is not -very -good](http://archives.miloush.net/michkap/archive/2012/02/27/10273315.html) -in any case, so be aware that edge cases are handled differently by -different applications and libraries. This library conforms to IDNA 2008 +submission library probably does this for you transparently. ([Compliance +around the web is not very good though](http://archives.miloush.net/michkap/archive/2012/02/27/10273315.html).) This library conforms to IDNA 2008 using the [idna](https://github.com/kjd/idna) module by Kim Davies. ### Internationalized local parts @@ -208,90 +213,61 @@ email addresses, only English letters, numbers, and some punctuation (`._!#$%&'^``*+-=~/?{|}`) are allowed. In internationalized email address local parts, a wider range of Unicode characters are allowed. -A surprisingly large number of Unicode characters are not safe to display, -especially when the email address is concatenated with other text, so this -library tries to protect you by not permitting resvered, non-, private use, -formatting (which can be used to alter the display order of characters), -whitespace, and control characters, and combining characters -as the first character (so that they cannot combine with something outside -of the email address string). See https://qntm.org/safe and https://trojansource.codes/ -for relevant prior work. (Other than whitespace, these are checks that -you should be applying to nearly all user inputs in a security-sensitive -context.) - -These character checks are performed after Unicode normalization (see below), -so you are only fully protected if you replace all user-provided email addresses -with the normalized email address string returned by this library. This does not -guard against the well known problem that many Unicode characters look alike -(or are identical), which can be used to fool humans reading displayed text. - Email addresses with these non-ASCII characters require that your mail -submission library and the mail servers along the route to the destination, +submission library and all the mail servers along the route to the destination, including your own outbound mail server, all support the [SMTPUTF8 (RFC 6531)](https://tools.ietf.org/html/rfc6531) extension. -Support for SMTPUTF8 varies. See the `allow_smtputf8` parameter. - -### If you know ahead of time that SMTPUTF8 is not supported by your mail submission stack - -By default all internationalized forms are accepted by the validator. -But if you know ahead of time that SMTPUTF8 is not supported by your -mail submission stack, then you must filter out addresses that require -SMTPUTF8 using the `allow_smtputf8=False` keyword argument (see above). +Support for SMTPUTF8 varies. If you know ahead of time that SMTPUTF8 is not +supported by your mail submission stack, then you must filter out addresses that +require SMTPUTF8 using the `allow_smtputf8=False` keyword argument (see above). This will cause the validation function to raise a `EmailSyntaxError` if -delivery would require SMTPUTF8. That's just in those cases where -non-ASCII characters appear before the @-sign. If you do not set -`allow_smtputf8=False`, you can also check the value of the `smtputf8` -field in the returned object. +delivery would require SMTPUTF8. If you do not set `allow_smtputf8=False`, +you can also check the value of the `smtputf8` field in the returned object. -If your mail submission library doesn't support Unicode at all --- even -in the domain part of the address --- then immediately prior to mail -submission you must replace the email address with its ASCII-ized form. -This library gives you back the ASCII-ized form in the `ascii_email` -field in the returned object, which you can get like this: +### Unsafe Unicode characters are rejected -```python -valid = validate_email(email, allow_smtputf8=False) -email = valid.ascii_email -``` - -The local part is left alone (if it has internationalized characters -`allow_smtputf8=False` will force validation to fail) and the domain -part is converted to [IDNA ASCII](https://tools.ietf.org/html/rfc5891). -(You probably should not do this at account creation time so you don't -change the user's login information without telling them.) - -### UCS-4 support required for Python 2.7 +A surprisingly large number of Unicode characters are not safe to display, +especially when the email address is concatenated with other text, so this +library tries to protect you by not permitting reserved, non-, private use, +formatting (which can be used to alter the display order of characters), +whitespace, and control characters, and combining characters +as the first character of the local part and the domain name (so that they +cannot combine with something outside of the email address string or with +the @-sign). See https://qntm.org/safe and https://trojansource.codes/ +for relevant prior work. (Other than whitespace, these are checks that +you should be applying to nearly all user inputs in a security-sensitive +context.) This does not guard against the well known problem that many +Unicode characters look alike, which can be used to fool humans reading +displayed text. -This library hopefully still works with Python 2.7. -Note that when using Python 2.7, it is required that it was built with -UCS-4 support (see -[here](https://stackoverflow.com/questions/29109944/python-returns-length-of-2-for-single-unicode-character-string)); -otherwise emails with unicode characters outside of the BMP (Basic -Multilingual Plane) will not validate correctly. Normalization ------------- +### Unicode Normalization + The use of Unicode in email addresses introduced a normalization problem. Different Unicode strings can look identical and have the same -semantic meaning to the user. The `email` field returned on successful +semantic meaning to the user. The `normalized` field returned on successful validation provides the correctly normalized form of the given email -address: +address. + +For example, the CJK fullwidth Latin letters are considered semantically +equivalent in domain names to their ASCII counterparts. This library +normalizes them to their ASCII counterparts (as required by IDNA): ```python -valid = validate_email("me@Domain.com") -email = valid.ascii_email -print(email) -# prints: me@domain.com +emailinfo = validate_email("me@Domain.com") +print(emailinfo.normalized) +print(emailinfo.ascii_email) +# prints "me@domain.com" twice ``` Because an end-user might type their email address in different (but equivalent) un-normalized forms at different times, you ought to replace what they enter with the normalized form immediately prior to going into your database (during account creation), querying your database -(during login), or sending outbound mail. Normalization may also change -the length of an email address, and this may affect whether it is valid -and acceptable by your SMTP provider. +(during login), or sending outbound mail. The normalizations include lowercasing the domain part of the email address (domain names are case-insensitive), [Unicode "NFC" @@ -305,10 +281,26 @@ in the domain part, possibly other [UTS46](http://unicode.org/reports/tr46) mappings on the domain part, and conversion from Punycode to Unicode characters. +Normalization may change the characters in the email address and the +length of the email address, such that a string might be a valid address +before normalization but invalid after, or vice versa. This library only +permits addresses that are valid both before and after normalization. + (See [RFC 6532 (internationalized email) section 3.1](https://tools.ietf.org/html/rfc6532#section-3.1) and [RFC 5895 (IDNA 2008) section 2](http://www.ietf.org/rfc/rfc5895.txt).) +### Other Normalization + +Normalization is also applied to quoted-string local parts and domain +literal IPv6 addresses if you have allowed them by the `allow_quoted_local` +and `allow_domain_literal` options. In quoted-string local parts, unnecessary +backslash escaping is removed and even the surrounding quotes are removed if +they are unnecessary. For IPv6 domain literals, the IPv6 address is +normalized to condensed form. [RFC 2142](https://datatracker.ietf.org/doc/html/rfc2142) +also requires lowercase normalization for some specific mailbox names like `postmaster@`. + + Examples -------- @@ -316,23 +308,21 @@ For the email address `test@joshdata.me`, the returned object is: ```python ValidatedEmail( - email='test@joshdata.me', + normalized='test@joshdata.me', local_part='test', domain='joshdata.me', ascii_email='test@joshdata.me', ascii_local_part='test', ascii_domain='joshdata.me', - smtputf8=False, - mx=[(10, 'box.occams.info')], - mx_fallback_type=None) + smtputf8=False) ``` -For the fictitious address `example@ツ.life`, which has an +For the fictitious but valid address `example@ツ.ⓁⒾⒻⒺ`, which has an internationalized domain but ASCII local part, the returned object is: ```python ValidatedEmail( - email='example@ツ.life', + normalized='example@ツ.life', local_part='example', domain='ツ.life', ascii_email='example@xn--bdk.life', @@ -342,24 +332,20 @@ ValidatedEmail( ``` -Note that `smtputf8` is `False` even though the domain part is -internationalized because -[SMTPUTF8](https://tools.ietf.org/html/rfc6531) is only needed if the -local part of the address is internationalized (the domain part can be -converted to IDNA ASCII Punycode). Also note that the `email` and `domain` -fields provide a normalized form of the email address and domain name -(casefolding and Unicode normalization as required by IDNA 2008). +Note that `normalized` and other fields provide a normalized form of the +email address, domain name, and (in other cases) local part (see earlier +discussion of normalization), which you should use in your database. Calling `validate_email` with the ASCII form of the above email address, `example@xn--bdk.life`, returns the exact same information (i.e., the -`email` field always will contain Unicode characters, not Punycode). +`normalized` field always will contain Unicode characters, not Punycode). For the fictitious address `ツ-test@joshdata.me`, which has an internationalized local part, the returned object is: ```python ValidatedEmail( - email='ツ-test@joshdata.me', + normalized='ツ-test@joshdata.me', local_part='ツ-test', domain='joshdata.me', ascii_email=None, @@ -369,10 +355,8 @@ ValidatedEmail( ``` Now `smtputf8` is `True` and `ascii_email` is `None` because the local -part of the address is internationalized. The `local_part` and `email` fields -return the normalized form of the address: certain Unicode characters -(such as angstrom and ohm) may be replaced by other equivalent code -points (a-with-ring and omega). +part of the address is internationalized. The `local_part` and `normalized` fields +return the normalized form of the address. Return value ------------ @@ -382,15 +366,18 @@ are: | Field | Value | | -----:|-------| -| `email` | The normalized form of the email address that you should put in your database. This merely combines the `local_part` and `domain` fields (see below). | -| `ascii_email` | If set, an ASCII-only form of the email address by replacing the domain part with [IDNA](https://tools.ietf.org/html/rfc5891) [Punycode](https://www.rfc-editor.org/rfc/rfc3492.txt). This field will be present when an ASCII-only form of the email address exists (including if the email address is already ASCII). If the local part of the email address contains internationalized characters, `ascii_email` will be `None`. If set, it merely combines `ascii_local_part` and `ascii_domain`. | -| `local_part` | The local part of the given email address (before the @-sign) with Unicode NFC normalization applied. | +| `normalized` | The normalized form of the email address that you should put in your database. This combines the `local_part` and `domain` fields (see below). | +| `ascii_email` | If set, an ASCII-only form of the normalized email address by replacing the domain part with [IDNA](https://tools.ietf.org/html/rfc5891) [Punycode](https://www.rfc-editor.org/rfc/rfc3492.txt). This field will be present when an ASCII-only form of the email address exists (including if the email address is already ASCII). If the local part of the email address contains internationalized characters, `ascii_email` will be `None`. If set, it merely combines `ascii_local_part` and `ascii_domain`. | +| `local_part` | The normalized local part of the given email address (before the @-sign). Normalization includes Unicode NFC normalization and removing unnecessary quoted-string quotes and backslashes. If `allow_quoted_local` is True and the surrounding quotes are necessary, the quotes _will_ be present in this field. | | `ascii_local_part` | If set, the local part, which is composed of ASCII characters only. | | `domain` | The canonical internationalized Unicode form of the domain part of the email address. If the returned string contains non-ASCII characters, either the [SMTPUTF8](https://tools.ietf.org/html/rfc6531) feature of your mail relay will be required to transmit the message or else the email address's domain part must be converted to IDNA ASCII first: Use `ascii_domain` field instead. | | `ascii_domain` | The [IDNA](https://tools.ietf.org/html/rfc5891) [Punycode](https://www.rfc-editor.org/rfc/rfc3492.txt)-encoded form of the domain part of the given email address, as it would be transmitted on the wire. | +| `domain_address` | If domain literals are allowed and if the email address contains one, an `ipaddress.IPv4Address` or `ipaddress.IPv6Address` object. | +| `display_name` | If no display name was present and angle brackets do not surround the address, this will be `None`; otherwise, it will be set to the display name, or the empty string if there were angle brackets but no display name. If the display name was quoted, it will be unquoted and unescaped. | | `smtputf8` | A boolean indicating that the [SMTPUTF8](https://tools.ietf.org/html/rfc6531) feature of your mail relay will be required to transmit messages to this address because the local part of the address has non-ASCII characters (the local part cannot be IDNA-encoded). If `allow_smtputf8=False` is passed as an argument, this flag will always be false because an exception is raised if it would have been true. | | `mx` | A list of (priority, domain) tuples of MX records specified in the DNS for the domain (see [RFC 5321 section 5](https://tools.ietf.org/html/rfc5321#section-5)). May be `None` if the deliverability check could not be completed because of a temporary issue like a timeout. | | `mx_fallback_type` | `None` if an `MX` record is found. If no MX records are actually specified in DNS and instead are inferred, through an obsolete mechanism, from A or AAAA records, the value is the type of DNS record used instead (`A` or `AAAA`). May be `None` if the deliverability check could not be completed because of a temporary issue like a timeout. | +| `spf` | Any SPF record found while checking deliverability. Only set if the SPF record is queried. | Assumptions ----------- @@ -400,18 +387,20 @@ strictly conform to the standards. Many email address forms are obsolete or likely to cause trouble: * The validator assumes the email address is intended to be - deliverable on the public Internet. The domain part - of the email address must be a resolvable domain name. - [Special Use Domain Names](https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml) - and their subdomains are always considered invalid (except see - the `test_environment` parameter above). -* The "quoted string" form of the local part of the email address (RFC - 5321 4.1.2) is not permitted --- no one uses this anymore anyway. - Quoted forms allow multiple @-signs, space characters, and other - troublesome conditions. The unsual [(comment) syntax](https://github.com/JoshData/python-email-validator/issues/77) - in email addresses is also rejected. -* The "literal" form for the domain part of an email address (an - IP address) is not accepted --- no one uses this anymore anyway. + usable on the public Internet. The domain part + of the email address must be a resolvable domain name + (see the deliverability checks described above). + Most [Special Use Domain Names](https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml) + and their subdomains, as well as + domain names without a `.`, are rejected as a syntax error + (except see the `test_environment` parameter above). +* Obsolete email syntaxes are rejected: + The unusual ["(comment)" syntax](https://github.com/JoshData/python-email-validator/issues/77) + is rejected. Extremely old obsolete syntaxes are + rejected. Quoted-string local parts and domain-literal addresses + are rejected by default, but there are options to allow them (see above). + No one uses these forms anymore, and I can't think of any reason why anyone + using this library would need to accept them. Testing ------- @@ -423,6 +412,8 @@ pip install -r test_requirements.txt make test ``` +Tests run with mocked DNS responses. When adding or changing tests, temporarily turn on the `BUILD_MOCKED_DNS_RESPONSE_DATA` flag in `tests/mocked_dns_responses.py` to re-build the database of mocked responses from live queries. + For Project Maintainers ----------------------- @@ -430,19 +421,20 @@ The package is distributed as a universal wheel and as a source package. To release: -* Update the version number. -* Follow the steps below to publish source and a universal wheel to pypi. +* Update CHANGELOG.md. +* Update the version number in `email_validator/version.py`. +* Make & push a commit with the new version number and make sure tests pass. +* Make & push a tag (see command below). * Make a release at https://github.com/JoshData/python-email-validator/releases/new. +* Publish a source and wheel distribution to pypi (see command below). ```sh -pip3 install twine -rm -rf dist -python3 setup.py sdist -python3 setup.py bdist_wheel -twine upload dist/* -git tag v1.0.XXX # replace with version in setup.cfg +git tag v$(cat email_validator/version.py | sed "s/.* = //" | sed 's/"//g') git push --tags +./release_to_pypi.sh ``` -Notes: The wheel is specified as universal in the file `setup.cfg` by the `universal = 1` key in the -`[bdist_wheel]` section. +License +------- + +This project is free of any copyright restrictions per the [Unlicense](https://unlicense.org/). (Prior to Feb. 4, 2024, the project was made available under the terms of the [CC0 1.0 Universal public domain dedication](http://creativecommons.org/publicdomain/zero/1.0/).) See [LICENSE](LICENSE) and [CONTRIBUTING.md](CONTRIBUTING.md). diff --git a/email_validator/__init__.py b/email_validator/__init__.py index 3d295ec..d50a8d2 100644 --- a/email_validator/__init__.py +++ b/email_validator/__init__.py @@ -1,46 +1,40 @@ -# -*- coding: utf-8 -*- +from typing import TYPE_CHECKING -import sys -import re -import unicodedata -import dns.resolver -import dns.exception -import idna # implements IDNA 2008; Python's codec is only IDNA 2003 +# Export the main method, helper methods, and the public data types. +from .exceptions import EmailNotValidError, EmailSyntaxError, EmailUndeliverableError +from .types import ValidatedEmail +from .validate_email import validate_email +from .version import __version__ -# Default values for keyword arguments. +__all__ = ["validate_email", + "ValidatedEmail", "EmailNotValidError", + "EmailSyntaxError", "EmailUndeliverableError", + "caching_resolver", "__version__"] -ALLOW_SMTPUTF8 = True -CHECK_DELIVERABILITY = True -TEST_ENVIRONMENT = False -DEFAULT_TIMEOUT = 15 # secs +if TYPE_CHECKING: + from .deliverability import caching_resolver +else: + def caching_resolver(*args, **kwargs): + # Lazy load `deliverability` as it is slow to import (due to dns.resolver) + from .deliverability import caching_resolver -# Based on RFC 2822 section 3.2.4 / RFC 5322 section 3.2.3, these -# characters are permitted in email addresses (not taking into -# account internationalization): -ATEXT = r'a-zA-Z0-9_!#\$%&\'\*\+\-/=\?\^`\{\|\}~' + return caching_resolver(*args, **kwargs) -# A "dot atom text", per RFC 2822 3.2.4: -DOT_ATOM_TEXT = '[' + ATEXT + ']+(?:\\.[' + ATEXT + ']+)*' -# RFC 6531 section 3.3 extends the allowed characters in internationalized -# addresses to also include three specific ranges of UTF8 defined in -# RFC3629 section 4, which appear to be the Unicode code points from -# U+0080 to U+10FFFF. -ATEXT_INTL = ATEXT + u"\u0080-\U0010FFFF" -DOT_ATOM_TEXT_INTL = '[' + ATEXT_INTL + ']+(?:\\.[' + ATEXT_INTL + ']+)*' +# These global attributes are a part of the library's API and can be +# changed by library users. -# The domain part of the email address, after IDNA (ASCII) encoding, -# must also satisfy the requirements of RFC 952/RFC 1123 which restrict -# the allowed characters of hostnames further. The hyphen cannot be at -# the beginning or end of a *dot-atom component* of a hostname either. -ATEXT_HOSTNAME = r'(?:(?:[a-zA-Z0-9][a-zA-Z0-9\-]*)?[a-zA-Z0-9])' +# Default values for keyword arguments. -# Length constants -# RFC 3696 + errata 1003 + errata 1690 (https://www.rfc-editor.org/errata_search.php?rfc=3696&eid=1690) -# explains the maximum length of an email address is 254 octets. -EMAIL_MAX_LENGTH = 254 -LOCAL_PART_MAX_LENGTH = 64 -DOMAIN_MAX_LENGTH = 255 +ALLOW_SMTPUTF8 = True +ALLOW_EMPTY_LOCAL = False +ALLOW_QUOTED_LOCAL = False +ALLOW_DOMAIN_LITERAL = False +ALLOW_DISPLAY_NAME = False +GLOBALLY_DELIVERABLE = True +CHECK_DELIVERABILITY = True +TEST_ENVIRONMENT = False +DEFAULT_TIMEOUT = 15 # secs # IANA Special Use Domain Names # Last Updated 2021-09-21 @@ -106,597 +100,3 @@ # fail deliverability checks because "test" is not an actual TLD. "test", ] - -# ease compatibility in type checking -if sys.version_info >= (3,): - unicode_class = str -else: - unicode_class = unicode # noqa: F821 - - # turn regexes to unicode (because 'ur' literals are not allowed in Py3) - ATEXT = ATEXT.decode("ascii") - DOT_ATOM_TEXT = DOT_ATOM_TEXT.decode("ascii") - ATEXT_HOSTNAME = ATEXT_HOSTNAME.decode("ascii") - - -class EmailNotValidError(ValueError): - """Parent class of all exceptions raised by this module.""" - pass - - -class EmailSyntaxError(EmailNotValidError): - """Exception raised when an email address fails validation because of its form.""" - pass - - -class EmailUndeliverableError(EmailNotValidError): - """Exception raised when an email address fails validation because its domain name does not appear deliverable.""" - pass - - -class ValidatedEmail(object): - """The validate_email function returns objects of this type holding the normalized form of the email address - and other information.""" - - """The email address that was passed to validate_email. (If passed as bytes, this will be a string.)""" - original_email = None - - """The normalized email address, which should always be used in preferance to the original address. - The normalized address converts an IDNA ASCII domain name to Unicode, if possible, and performs - Unicode normalization on the local part and on the domain (if originally Unicode). It is the - concatenation of the local_part and domain attributes, separated by an @-sign.""" - email = None - - """The local part of the email address after Unicode normalization.""" - local_part = None - - """The domain part of the email address after Unicode normalization or conversion to - Unicode from IDNA ascii.""" - domain = None - - """If not None, a form of the email address that uses 7-bit ASCII characters only.""" - ascii_email = None - - """If not None, the local part of the email address using 7-bit ASCII characters only.""" - ascii_local_part = None - - """If not None, a form of the domain name that uses 7-bit ASCII characters only.""" - ascii_domain = None - - """If True, the SMTPUTF8 feature of your mail relay will be required to transmit messages - to this address. This flag is True just when ascii_local_part is missing. Otherwise it - is False.""" - smtputf8 = None - - """If a deliverability check is performed and if it succeeds, a list of (priority, domain) - tuples of MX records specified in the DNS for the domain.""" - mx = None - - """If no MX records are actually specified in DNS and instead are inferred, through an obsolete - mechanism, from A or AAAA records, the value is the type of DNS record used instead (`A` or `AAAA`).""" - mx_fallback_type = None - - """Tests use this constructor.""" - def __init__(self, **kwargs): - for k, v in kwargs.items(): - setattr(self, k, v) - - """As a convenience, str(...) on instances of this class return the normalized address.""" - def __self__(self): - return self.normalized_email - - def __repr__(self): - return "".format(self.email) - - """For backwards compatibility, some fields are also exposed through a dict-like interface. Note - that some of the names changed when they became attributes.""" - def __getitem__(self, key): - if key == "email": - return self.email - if key == "email_ascii": - return self.ascii_email - if key == "local": - return self.local_part - if key == "domain": - return self.ascii_domain - if key == "domain_i18n": - return self.domain - if key == "smtputf8": - return self.smtputf8 - if key == "mx": - return self.mx - if key == "mx-fallback": - return self.mx_fallback_type - raise KeyError() - - """Tests use this.""" - def __eq__(self, other): - if not isinstance(other, ValidatedEmail): - return False - return ( - self.email == other.email - and self.local_part == other.local_part - and self.domain == other.domain - and self.ascii_email == other.ascii_email - and self.ascii_local_part == other.ascii_local_part - and self.ascii_domain == other.ascii_domain - and self.smtputf8 == other.smtputf8 - and repr(sorted(self.mx) if self.mx else self.mx) - == repr(sorted(other.mx) if other.mx else other.mx) - and self.mx_fallback_type == other.mx_fallback_type - ) - - """This helps producing the README.""" - def as_constructor(self): - return "ValidatedEmail(" \ - + ",".join("\n {}={}".format( - key, - repr(getattr(self, key))) - for key in ('email', 'local_part', 'domain', - 'ascii_email', 'ascii_local_part', 'ascii_domain', - 'smtputf8', 'mx', 'mx_fallback_type') - ) \ - + ")" - - """Convenience method for accessing ValidatedEmail as a dict""" - def as_dict(self): - return self.__dict__ - - -def __get_length_reason(addr, utf8=False, limit=EMAIL_MAX_LENGTH): - diff = len(addr) - limit - reason = "({}{} character{} too many)" - prefix = "at least " if utf8 else "" - suffix = "s" if diff > 1 else "" - return reason.format(prefix, diff, suffix) - - -def caching_resolver(timeout=DEFAULT_TIMEOUT, cache=None): - resolver = dns.resolver.Resolver() - resolver.cache = cache or dns.resolver.LRUCache() - resolver.lifetime = timeout # timeout, in seconds - return resolver - - -def validate_email( - email, - allow_smtputf8=ALLOW_SMTPUTF8, - allow_empty_local=False, - check_deliverability=CHECK_DELIVERABILITY, - test_environment=TEST_ENVIRONMENT, - timeout=DEFAULT_TIMEOUT, - dns_resolver=None -): - """ - Validates an email address, raising an EmailNotValidError if the address is not valid or returning a dict of - information when the address is valid. The email argument can be a str or a bytes instance, - but if bytes it must be ASCII-only. - """ - - # Allow email to be a str or bytes instance. If bytes, - # it must be ASCII because that's how the bytes work - # on the wire with SMTP. - if not isinstance(email, (str, unicode_class)): - try: - email = email.decode("ascii") - except ValueError: - raise EmailSyntaxError("The email address is not valid ASCII.") - - # At-sign. - parts = email.split('@') - if len(parts) != 2: - raise EmailSyntaxError("The email address is not valid. It must have exactly one @-sign.") - - # Collect return values in this instance. - ret = ValidatedEmail() - ret.original_email = email - - # Validate the email address's local part syntax and get a normalized form. - local_part_info = validate_email_local_part(parts[0], - allow_smtputf8=allow_smtputf8, - allow_empty_local=allow_empty_local) - ret.local_part = local_part_info["local_part"] - ret.ascii_local_part = local_part_info["ascii_local_part"] - ret.smtputf8 = local_part_info["smtputf8"] - - # Validate the email address's domain part syntax and get a normalized form. - domain_part_info = validate_email_domain_part(parts[1], test_environment=test_environment) - ret.domain = domain_part_info["domain"] - ret.ascii_domain = domain_part_info["ascii_domain"] - - # Construct the complete normalized form. - ret.email = ret.local_part + "@" + ret.domain - - # If the email address has an ASCII form, add it. - if not ret.smtputf8: - ret.ascii_email = ret.ascii_local_part + "@" + ret.ascii_domain - - # If the email address has an ASCII representation, then we assume it may be - # transmitted in ASCII (we can't assume SMTPUTF8 will be used on all hops to - # the destination) and the length limit applies to ASCII characters (which is - # the same as octets). The number of characters in the internationalized form - # may be many fewer (because IDNA ASCII is verbose) and could be less than 254 - # Unicode characters, and of course the number of octets over the limit may - # not be the number of characters over the limit, so if the email address is - # internationalized, we can't give any simple information about why the address - # is too long. - # - # In addition, check that the UTF-8 encoding (i.e. not IDNA ASCII and not - # Unicode characters) is at most 254 octets. If the addres is transmitted using - # SMTPUTF8, then the length limit probably applies to the UTF-8 encoded octets. - # If the email address has an ASCII form that differs from its internationalized - # form, I don't think the internationalized form can be longer, and so the ASCII - # form length check would be sufficient. If there is no ASCII form, then we have - # to check the UTF-8 encoding. The UTF-8 encoding could be up to about four times - # longer than the number of characters. - # - # See the length checks on the local part and the domain. - if ret.ascii_email and len(ret.ascii_email) > EMAIL_MAX_LENGTH: - if ret.ascii_email == ret.email: - reason = __get_length_reason(ret.ascii_email) - elif len(ret.email) > EMAIL_MAX_LENGTH: - # If there are more than 254 characters, then the ASCII - # form is definitely going to be too long. - reason = __get_length_reason(ret.email, utf8=True) - else: - reason = "(when converted to IDNA ASCII)" - raise EmailSyntaxError("The email address is too long {}.".format(reason)) - if len(ret.email.encode("utf8")) > EMAIL_MAX_LENGTH: - if len(ret.email) > EMAIL_MAX_LENGTH: - # If there are more than 254 characters, then the UTF-8 - # encoding is definitely going to be too long. - reason = __get_length_reason(ret.email, utf8=True) - else: - reason = "(when encoded in bytes)" - raise EmailSyntaxError("The email address is too long {}.".format(reason)) - - if check_deliverability and not test_environment: - # Validate the email address's deliverability using DNS - # and update the return dict with metadata. - deliverability_info = validate_email_deliverability( - ret["domain"], ret["domain_i18n"], timeout, dns_resolver - ) - if "mx" in deliverability_info: - ret.mx = deliverability_info["mx"] - ret.mx_fallback_type = deliverability_info["mx-fallback"] - - return ret - - -def validate_email_local_part(local, allow_smtputf8=True, allow_empty_local=False): - # Validates the local part of an email address. - - if len(local) == 0: - if not allow_empty_local: - raise EmailSyntaxError("There must be something before the @-sign.") - else: - # The caller allows an empty local part. Useful for validating certain - # Postfix aliases. - return { - "local_part": local, - "ascii_local_part": local, - "smtputf8": False, - } - - # RFC 5321 4.5.3.1.1 - # We're checking the number of characters here. If the local part - # is ASCII-only, then that's the same as bytes (octets). If it's - # internationalized, then the UTF-8 encoding may be longer, but - # that may not be relevant. We will check the total address length - # instead. - if len(local) > LOCAL_PART_MAX_LENGTH: - reason = __get_length_reason(local, limit=LOCAL_PART_MAX_LENGTH) - raise EmailSyntaxError("The email address is too long before the @-sign {}.".format(reason)) - - # Check the local part against the regular expression for the older ASCII requirements. - m = re.match(DOT_ATOM_TEXT + "\\Z", local) - if m: - # Return the local part unchanged and flag that SMTPUTF8 is not needed. - return { - "local_part": local, - "ascii_local_part": local, - "smtputf8": False, - } - - else: - # The local part failed the ASCII check. Now try the extended internationalized requirements. - m = re.match(DOT_ATOM_TEXT_INTL + "\\Z", local) - if not m: - # It's not a valid internationalized address either. Report which characters were not valid. - bad_chars = ', '.join(sorted(set( - unicodedata.name(c, repr(c)) for c in local if not re.match(u"[" + (ATEXT if not allow_smtputf8 else ATEXT_INTL) + u"]", c) - ))) - raise EmailSyntaxError("The email address contains invalid characters before the @-sign: %s." % bad_chars) - - # It would be valid if internationalized characters were allowed by the caller. - if not allow_smtputf8: - raise EmailSyntaxError("Internationalized characters before the @-sign are not supported.") - - # It's valid. - - # RFC 6532 section 3.1 also says that Unicode NFC normalization should be applied, - # so we'll return the normalized local part in the return value. - local = unicodedata.normalize("NFC", local) - - # Check for unsafe characters. - # Some of this may be redundant with the range U+0080 to U+10FFFF that is checked - # by DOT_ATOM_TEXT_INTL. - for i, c in enumerate(local): - category = unicodedata.category(c) - if category[0] in ("L", "N", "P", "S"): - # letters, numbers, punctuation, and symbols are permitted - pass - elif category[0] == "M": - # combining character in first position would combine with something - # outside of the email address if concatenated to the right, but are - # otherwise permitted - if i == 0: - raise EmailSyntaxError("The email address contains an initial invalid character (%s)." - % unicodedata.name(c, repr(c))) - elif category[0] in ("Z", "C"): - # spaces and line/paragraph characters (Z) and - # control, format, surrogate, private use, and unassigned code points (C) - raise EmailSyntaxError("The email address contains an invalid character (%s)." - % unicodedata.name(c, repr(c))) - else: - # All categories should be handled above, but in case there is something new - # in the future. - raise EmailSyntaxError("The email address contains a character (%s; category %s) that may not be safe." - % (unicodedata.name(c, repr(c)), category)) - - # Try encoding to UTF-8. Failure is possible with some characters like - # surrogate code points, but those are checked above. Still, we don't - # want to have an unhandled exception later. - try: - local.encode("utf8") - except ValueError: - raise EmailSyntaxError("The email address contains an invalid character.") - - # Flag that SMTPUTF8 will be required for deliverability. - return { - "local_part": local, - "ascii_local_part": None, # no ASCII form is possible - "smtputf8": True, - } - - -def validate_email_domain_part(domain, test_environment=False): - # Empty? - if len(domain) == 0: - raise EmailSyntaxError("There must be something after the @-sign.") - - # Perform UTS-46 normalization, which includes casefolding, NFC normalization, - # and converting all label separators (the period/full stop, fullwidth full stop, - # ideographic full stop, and halfwidth ideographic full stop) to basic periods. - # It will also raise an exception if there is an invalid character in the input, - # such as "⒈" which is invalid because it would expand to include a period. - try: - domain = idna.uts46_remap(domain, std3_rules=False, transitional=False) - except idna.IDNAError as e: - raise EmailSyntaxError("The domain name %s contains invalid characters (%s)." % (domain, str(e))) - - # Now we can perform basic checks on the use of periods (since equivalent - # symbols have been mapped to periods). These checks are needed because the - # IDNA library doesn't handle well domains that have empty labels (i.e. initial - # dot, trailing dot, or two dots in a row). - if domain.endswith("."): - raise EmailSyntaxError("An email address cannot end with a period.") - if domain.startswith("."): - raise EmailSyntaxError("An email address cannot have a period immediately after the @-sign.") - if ".." in domain: - raise EmailSyntaxError("An email address cannot have two periods in a row.") - - # Regardless of whether international characters are actually used, - # first convert to IDNA ASCII. For ASCII-only domains, the transformation - # does nothing. If internationalized characters are present, the MTA - # must either support SMTPUTF8 or the mail client must convert the - # domain name to IDNA before submission. - # - # Unfortunately this step incorrectly 'fixes' domain names with leading - # periods by removing them, so we have to check for this above. It also gives - # a funky error message ("No input") when there are two periods in a - # row, also checked separately above. - try: - ascii_domain = idna.encode(domain, uts46=False).decode("ascii") - except idna.IDNAError as e: - if "Domain too long" in str(e): - # We can't really be more specific because UTS-46 normalization means - # the length check is applied to a string that is different from the - # one the user supplied. Also I'm not sure if the length check applies - # to the internationalized form, the IDNA ASCII form, or even both! - raise EmailSyntaxError("The email address is too long after the @-sign.") - raise EmailSyntaxError("The domain name %s contains invalid characters (%s)." % (domain, str(e))) - - # We may have been given an IDNA ASCII domain to begin with. Check - # that the domain actually conforms to IDNA. It could look like IDNA - # but not be actual IDNA. For ASCII-only domains, the conversion out - # of IDNA just gives the same thing back. - # - # This gives us the canonical internationalized form of the domain, - # which we should use in all error messages. - try: - domain_i18n = idna.decode(ascii_domain.encode('ascii')) - except idna.IDNAError as e: - raise EmailSyntaxError("The domain name %s is not valid IDNA (%s)." % (ascii_domain, str(e))) - - # RFC 5321 4.5.3.1.2 - # We're checking the number of bytes (octets) here, which can be much - # higher than the number of characters in internationalized domains, - # on the assumption that the domain may be transmitted without SMTPUTF8 - # as IDNA ASCII. This is also checked by idna.encode, so this exception - # is never reached. - if len(ascii_domain) > DOMAIN_MAX_LENGTH: - raise EmailSyntaxError("The email address is too long after the @-sign.") - - # A "dot atom text", per RFC 2822 3.2.4, but using the restricted - # characters allowed in a hostname (see ATEXT_HOSTNAME above). - DOT_ATOM_TEXT = ATEXT_HOSTNAME + r'(?:\.' + ATEXT_HOSTNAME + r')*' - - # Check the regular expression. This is probably entirely redundant - # with idna.decode, which also checks this format. - m = re.match(DOT_ATOM_TEXT + "\\Z", ascii_domain) - if not m: - raise EmailSyntaxError("The email address contains invalid characters after the @-sign.") - - # All publicly deliverable addresses have domain named with at least - # one period, and we'll consider the lack of a period a syntax error - # since that will match people's sense of what an email address looks - # like. We'll skip this in test environments to allow '@test' email - # addresses. - if "." not in ascii_domain and not (ascii_domain == "test" and test_environment): - raise EmailSyntaxError("The domain name %s is not valid. It should have a period." % domain_i18n) - - # Check special-use and reserved domain names. Raise these as - # deliverability errors since they are syntactically valid. - # Some might fail DNS-based deliverability checks, but that - # can be turned off, so we should fail them all sooner. - for d in SPECIAL_USE_DOMAIN_NAMES: - # See the note near the definition of SPECIAL_USE_DOMAIN_NAMES. - if d == "test" and test_environment: - continue - - if ascii_domain == d or ascii_domain.endswith("." + d): - raise EmailUndeliverableError("The domain name %s is a special-use or reserved name that cannot be used with email." % domain_i18n) - - # We also know that all TLDs currently end with a letter, and - # we'll consider that a non-DNS based deliverability check. - if not re.search(r"[A-Za-z]\Z", ascii_domain): - raise EmailUndeliverableError( - "The domain name %s is not valid. It is not within a valid top-level domain." % domain_i18n - ) - - # Return the IDNA ASCII-encoded form of the domain, which is how it - # would be transmitted on the wire (except when used with SMTPUTF8 - # possibly), as well as the canonical Unicode form of the domain, - # which is better for display purposes. This should also take care - # of RFC 6532 section 3.1's suggestion to apply Unicode NFC - # normalization to addresses. - return { - "ascii_domain": ascii_domain, - "domain": domain_i18n, - } - - -def validate_email_deliverability(domain, domain_i18n, timeout=DEFAULT_TIMEOUT, dns_resolver=None): - # Check that the domain resolves to an MX record. If there is no MX record, - # try an A or AAAA record which is a deprecated fallback for deliverability. - - # If no dns.resolver.Resolver was given, get dnspython's default resolver. - # Override the default resolver's timeout. This may affect other uses of - # dnspython in this process. - if dns_resolver is None: - dns_resolver = dns.resolver.get_default_resolver() - dns_resolver.lifetime = timeout - - def dns_resolver_resolve_shim(domain, record): - try: - # dns.resolver.Resolver.resolve is new to dnspython 2.x. - # https://dnspython.readthedocs.io/en/latest/resolver-class.html#dns.resolver.Resolver.resolve - return dns_resolver.resolve(domain, record) - except AttributeError: - # dnspython 2.x is only available in Python 3.6 and later. For earlier versions - # of Python, we maintain compatibility with dnspython 1.x which has a - # dnspython.resolver.Resolver.query method instead. The only difference is that - # query may treat the domain as relative and use the system's search domains, - # which we prevent by adding a "." to the domain name to make it absolute. - # dns.resolver.Resolver.query is deprecated in dnspython version 2.x. - # https://dnspython.readthedocs.io/en/latest/resolver-class.html#dns.resolver.Resolver.query - return dns_resolver.query(domain + ".", record) - - try: - # We need a way to check how timeouts are handled in the tests. So we - # have a secret variable that if set makes this method always test the - # handling of a timeout. - if getattr(validate_email_deliverability, 'TEST_CHECK_TIMEOUT', False): - raise dns.exception.Timeout() - - try: - # Try resolving for MX records and get them in sorted priority order - # as (priority, qname) pairs. - response = dns_resolver_resolve_shim(domain, "MX") - mtas = sorted([(r.preference, str(r.exchange).rstrip('.')) for r in response]) - mx_fallback = None - - # Do not permit delivery if there is only a "null MX" record (whose value is - # (0, ".") but we've stripped trailing dots, so the 'exchange' is just ""). - mtas = [(preference, exchange) for preference, exchange in mtas - if exchange != ""] - if len(mtas) == 0: - raise EmailUndeliverableError("The domain name %s does not accept email." % domain_i18n) - - except (dns.resolver.NoNameservers, dns.resolver.NXDOMAIN, dns.resolver.NoAnswer): - - # If there was no MX record, fall back to an A record. - try: - response = dns_resolver_resolve_shim(domain, "A") - mtas = [(0, str(r)) for r in response] - mx_fallback = "A" - except (dns.resolver.NoNameservers, dns.resolver.NXDOMAIN, dns.resolver.NoAnswer): - - # If there was no A record, fall back to an AAAA record. - try: - response = dns_resolver_resolve_shim(domain, "AAAA") - mtas = [(0, str(r)) for r in response] - mx_fallback = "AAAA" - except (dns.resolver.NoNameservers, dns.resolver.NXDOMAIN, dns.resolver.NoAnswer): - - # If there was no MX, A, or AAAA record, then mail to - # this domain is not deliverable. - raise EmailUndeliverableError("The domain name %s does not exist." % domain_i18n) - - except dns.exception.Timeout: - # A timeout could occur for various reasons, so don't treat it as a failure. - return { - "unknown-deliverability": "timeout", - } - - except EmailUndeliverableError: - # Don't let these get clobbered by the wider except block below. - raise - - except Exception as e: - # Unhandled conditions should not propagate. - raise EmailUndeliverableError( - "There was an error while checking if the domain name in the email address is deliverable: " + str(e) - ) - - return { - "mx": mtas, - "mx-fallback": mx_fallback, - } - - -def main(): - import json - - def __utf8_input_shim(input_str): - if sys.version_info < (3,): - return input_str.decode("utf-8") - return input_str - - def __utf8_output_shim(output_str): - if sys.version_info < (3,): - return unicode_class(output_str).encode("utf-8") - return output_str - - if len(sys.argv) == 1: - # Validate the email addresses pased line-by-line on STDIN. - dns_resolver = caching_resolver() - for line in sys.stdin: - email = __utf8_input_shim(line.strip()) - try: - validate_email(email, dns_resolver=dns_resolver) - except EmailNotValidError as e: - print(__utf8_output_shim("{} {}".format(email, e))) - else: - # Validate the email address passed on the command line. - email = __utf8_input_shim(sys.argv[1]) - try: - result = validate_email(email) - print(json.dumps(result.as_dict(), indent=2, sort_keys=True, ensure_ascii=False)) - except EmailNotValidError as e: - print(__utf8_output_shim(e)) - - -if __name__ == "__main__": - main() diff --git a/email_validator/__main__.py b/email_validator/__main__.py new file mode 100644 index 0000000..84d9fd4 --- /dev/null +++ b/email_validator/__main__.py @@ -0,0 +1,61 @@ +# A command-line tool for testing. +# +# Usage: +# +# python -m email_validator test@example.org +# python -m email_validator < LIST_OF_ADDRESSES.TXT +# +# Provide email addresses to validate either as a command-line argument +# or in STDIN separated by newlines. Validation errors will be printed for +# invalid email addresses. When passing an email address on the command +# line, if the email address is valid, information about it will be printed. +# When using STDIN, no output will be given for valid email addresses. +# +# Keyword arguments to validate_email can be set in environment variables +# of the same name but uppercase (see below). + +import json +import os +import sys +from typing import Any, Dict, Optional + +from .validate_email import validate_email, _Resolver +from .deliverability import caching_resolver +from .exceptions import EmailNotValidError + + +def main(dns_resolver: Optional[_Resolver] = None) -> None: + # The dns_resolver argument is for tests. + + # Set options from environment variables. + options: Dict[str, Any] = {} + for varname in ('ALLOW_SMTPUTF8', 'ALLOW_EMPTY_LOCAL', 'ALLOW_QUOTED_LOCAL', 'ALLOW_DOMAIN_LITERAL', + 'ALLOW_DISPLAY_NAME', + 'GLOBALLY_DELIVERABLE', 'CHECK_DELIVERABILITY', 'TEST_ENVIRONMENT'): + if varname in os.environ: + options[varname.lower()] = bool(os.environ[varname]) + for varname in ('DEFAULT_TIMEOUT',): + if varname in os.environ: + options[varname.lower()] = float(os.environ[varname]) + + if len(sys.argv) == 1: + # Validate the email addresses passed line-by-line on STDIN. + dns_resolver = dns_resolver or caching_resolver() + for line in sys.stdin: + email = line.strip() + try: + validate_email(email, dns_resolver=dns_resolver, **options) + except EmailNotValidError as e: + print(f"{email} {e}") + else: + # Validate the email address passed on the command line. + email = sys.argv[1] + try: + result = validate_email(email, dns_resolver=dns_resolver, **options) + print(json.dumps(result.as_dict(), indent=2, sort_keys=True, ensure_ascii=False)) + except EmailNotValidError as e: + print(e) + + +if __name__ == "__main__": + main() diff --git a/email_validator/deliverability.py b/email_validator/deliverability.py new file mode 100644 index 0000000..6100a31 --- /dev/null +++ b/email_validator/deliverability.py @@ -0,0 +1,159 @@ +from typing import Any, List, Optional, Tuple, TypedDict + +import ipaddress + +from .exceptions import EmailUndeliverableError + +import dns.resolver +import dns.exception + + +def caching_resolver(*, timeout: Optional[int] = None, cache: Any = None, dns_resolver: Optional[dns.resolver.Resolver] = None) -> dns.resolver.Resolver: + if timeout is None: + from . import DEFAULT_TIMEOUT + timeout = DEFAULT_TIMEOUT + resolver = dns_resolver or dns.resolver.Resolver() + resolver.cache = cache or dns.resolver.LRUCache() + resolver.lifetime = timeout # timeout, in seconds + return resolver + + +DeliverabilityInfo = TypedDict("DeliverabilityInfo", { + "mx": List[Tuple[int, str]], + "mx_fallback_type": Optional[str], + "unknown-deliverability": str, +}, total=False) + + +def validate_email_deliverability(domain: str, domain_i18n: str, timeout: Optional[int] = None, dns_resolver: Optional[dns.resolver.Resolver] = None) -> DeliverabilityInfo: + # Check that the domain resolves to an MX record. If there is no MX record, + # try an A or AAAA record which is a deprecated fallback for deliverability. + # Raises an EmailUndeliverableError on failure. On success, returns a dict + # with deliverability information. + + # If no dns.resolver.Resolver was given, get dnspython's default resolver. + # Override the default resolver's timeout. This may affect other uses of + # dnspython in this process. + if dns_resolver is None: + from . import DEFAULT_TIMEOUT + if timeout is None: + timeout = DEFAULT_TIMEOUT + dns_resolver = dns.resolver.get_default_resolver() + dns_resolver.lifetime = timeout + elif timeout is not None: + raise ValueError("It's not valid to pass both timeout and dns_resolver.") + + deliverability_info: DeliverabilityInfo = {} + + try: + try: + # Try resolving for MX records (RFC 5321 Section 5). + response = dns_resolver.resolve(domain, "MX") + + # For reporting, put them in priority order and remove the trailing dot in the qnames. + mtas = sorted([(r.preference, str(r.exchange).rstrip('.')) for r in response]) + + # RFC 7505: Null MX (0, ".") records signify the domain does not accept email. + # Remove null MX records from the mtas list (but we've stripped trailing dots, + # so the 'exchange' is just "") so we can check if there are no non-null MX + # records remaining. + mtas = [(preference, exchange) for preference, exchange in mtas + if exchange != ""] + if len(mtas) == 0: # null MX only, if there were no MX records originally a NoAnswer exception would have occurred + raise EmailUndeliverableError(f"The domain name {domain_i18n} does not accept email.") + + deliverability_info["mx"] = mtas + deliverability_info["mx_fallback_type"] = None + + except dns.resolver.NoAnswer: + # If there was no MX record, fall back to an A or AAA record + # (RFC 5321 Section 5). Check A first since it's more common. + + # If the A/AAAA response has no Globally Reachable IP address, + # treat the response as if it were NoAnswer, i.e., the following + # address types are not allowed fallbacks: Private-Use, Loopback, + # Link-Local, and some other obscure ranges. See + # https://www.iana.org/assignments/iana-ipv4-special-registry/iana-ipv4-special-registry.xhtml + # https://www.iana.org/assignments/iana-ipv6-special-registry/iana-ipv6-special-registry.xhtml + # (Issue #134.) + def is_global_addr(address: Any) -> bool: + try: + ipaddr = ipaddress.ip_address(address) + except ValueError: + return False + return ipaddr.is_global + + try: + response = dns_resolver.resolve(domain, "A") + + if not any(is_global_addr(r.address) for r in response): + raise dns.resolver.NoAnswer # fall back to AAAA + + deliverability_info["mx"] = [(0, domain)] + deliverability_info["mx_fallback_type"] = "A" + + except dns.resolver.NoAnswer: + + # If there was no A record, fall back to an AAAA record. + # (It's unclear if SMTP servers actually do this.) + try: + response = dns_resolver.resolve(domain, "AAAA") + + if not any(is_global_addr(r.address) for r in response): + raise dns.resolver.NoAnswer + + deliverability_info["mx"] = [(0, domain)] + deliverability_info["mx_fallback_type"] = "AAAA" + + except dns.resolver.NoAnswer as e: + # If there was no MX, A, or AAAA record, then mail to + # this domain is not deliverable, although the domain + # name has other records (otherwise NXDOMAIN would + # have been raised). + raise EmailUndeliverableError(f"The domain name {domain_i18n} does not accept email.") from e + + # Check for a SPF (RFC 7208) reject-all record ("v=spf1 -all") which indicates + # no emails are sent from this domain (similar to a Null MX record + # but for sending rather than receiving). In combination with the + # absence of an MX record, this is probably a good sign that the + # domain is not used for email. + try: + response = dns_resolver.resolve(domain, "TXT") + for rec in response: + value = b"".join(rec.strings) + if value.startswith(b"v=spf1 "): + if value == b"v=spf1 -all": + raise EmailUndeliverableError(f"The domain name {domain_i18n} does not send email.") + except dns.resolver.NoAnswer: + # No TXT records means there is no SPF policy, so we cannot take any action. + pass + + except dns.resolver.NXDOMAIN as e: + # The domain name does not exist --- there are no records of any sort + # for the domain name. + raise EmailUndeliverableError(f"The domain name {domain_i18n} does not exist.") from e + + except dns.resolver.NoNameservers: + # All nameservers failed to answer the query. This might be a problem + # with local nameservers, maybe? We'll allow the domain to go through. + return { + "unknown-deliverability": "no_nameservers", + } + + except dns.exception.Timeout: + # A timeout could occur for various reasons, so don't treat it as a failure. + return { + "unknown-deliverability": "timeout", + } + + except EmailUndeliverableError: + # Don't let these get clobbered by the wider except block below. + raise + + except Exception as e: + # Unhandled conditions should not propagate. + raise EmailUndeliverableError( + "There was an error while checking if the domain name in the email address is deliverable: " + str(e) + ) from e + + return deliverability_info diff --git a/email_validator/exceptions.py b/email_validator/exceptions.py new file mode 100644 index 0000000..87ef13c --- /dev/null +++ b/email_validator/exceptions.py @@ -0,0 +1,13 @@ +class EmailNotValidError(ValueError): + """Parent class of all exceptions raised by this module.""" + pass + + +class EmailSyntaxError(EmailNotValidError): + """Exception raised when an email address fails validation because of its form.""" + pass + + +class EmailUndeliverableError(EmailNotValidError): + """Exception raised when an email address fails validation because its domain name does not appear deliverable.""" + pass diff --git a/email_validator/py.typed b/email_validator/py.typed new file mode 100644 index 0000000..e69de29 diff --git a/email_validator/rfc_constants.py b/email_validator/rfc_constants.py new file mode 100644 index 0000000..39d8e31 --- /dev/null +++ b/email_validator/rfc_constants.py @@ -0,0 +1,51 @@ +# These constants are defined by the email specifications. + +import re + +# Based on RFC 5322 3.2.3, these characters are permitted in email +# addresses (not taking into account internationalization) separated by dots: +ATEXT = r'a-zA-Z0-9_!#\$%&\'\*\+\-/=\?\^`\{\|\}~' +ATEXT_RE = re.compile('[.' + ATEXT + ']') # ATEXT plus dots +DOT_ATOM_TEXT = re.compile('[' + ATEXT + ']+(?:\\.[' + ATEXT + r']+)*\Z') + +# RFC 6531 3.3 extends the allowed characters in internationalized +# addresses to also include three specific ranges of UTF8 defined in +# RFC 3629 section 4, which appear to be the Unicode code points from +# U+0080 to U+10FFFF. +ATEXT_INTL = ATEXT + "\u0080-\U0010FFFF" +ATEXT_INTL_DOT_RE = re.compile('[.' + ATEXT_INTL + ']') # ATEXT_INTL plus dots +DOT_ATOM_TEXT_INTL = re.compile('[' + ATEXT_INTL + ']+(?:\\.[' + ATEXT_INTL + r']+)*\Z') + +# The domain part of the email address, after IDNA (ASCII) encoding, +# must also satisfy the requirements of RFC 952/RFC 1123 2.1 which +# restrict the allowed characters of hostnames further. +ATEXT_HOSTNAME_INTL = re.compile(r"[a-zA-Z0-9\-\." + "\u0080-\U0010FFFF" + "]") +HOSTNAME_LABEL = r'(?:(?:[a-zA-Z0-9][a-zA-Z0-9\-]*)?[a-zA-Z0-9])' +DOT_ATOM_TEXT_HOSTNAME = re.compile(HOSTNAME_LABEL + r'(?:\.' + HOSTNAME_LABEL + r')*\Z') +DOMAIN_NAME_REGEX = re.compile(r"[A-Za-z]\Z") # all TLDs currently end with a letter + +# Domain literal (RFC 5322 3.4.1) +DOMAIN_LITERAL_CHARS = re.compile(r"[\u0021-\u00FA\u005E-\u007E]") + +# Quoted-string local part (RFC 5321 4.1.2, internationalized by RFC 6531 3.3) +# The permitted characters in a quoted string are the characters in the range +# 32-126, except that quotes and (literal) backslashes can only appear when escaped +# by a backslash. When internationalized, UTF-8 strings are also permitted except +# the ASCII characters that are not previously permitted (see above). +# QUOTED_LOCAL_PART_ADDR = re.compile(r"^\"((?:[\u0020-\u0021\u0023-\u005B\u005D-\u007E]|\\[\u0020-\u007E])*)\"@(.*)") +QTEXT_INTL = re.compile(r"[\u0020-\u007E\u0080-\U0010FFFF]") + +# Length constants +# RFC 3696 + errata 1003 + errata 1690 (https://www.rfc-editor.org/errata_search.php?rfc=3696&eid=1690) +# explains the maximum length of an email address is 254 octets. +EMAIL_MAX_LENGTH = 254 +LOCAL_PART_MAX_LENGTH = 64 +DNS_LABEL_LENGTH_LIMIT = 63 # in "octets", RFC 1035 2.3.1 +DOMAIN_MAX_LENGTH = 253 # in "octets" as transmitted, RFC 1035 2.3.4 and RFC 5321 4.5.3.1.2, and see https://stackoverflow.com/questions/32290167/what-is-the-maximum-length-of-a-dns-name + +# RFC 2142 +CASE_INSENSITIVE_MAILBOX_NAMES = [ + 'info', 'marketing', 'sales', 'support', # section 3 + 'abuse', 'noc', 'security', # section 4 + 'postmaster', 'hostmaster', 'usenet', 'news', 'webmaster', 'www', 'uucp', 'ftp', # section 5 +] diff --git a/email_validator/syntax.py b/email_validator/syntax.py new file mode 100644 index 0000000..751ce3e --- /dev/null +++ b/email_validator/syntax.py @@ -0,0 +1,783 @@ +from .exceptions import EmailSyntaxError +from .types import ValidatedEmail +from .rfc_constants import EMAIL_MAX_LENGTH, LOCAL_PART_MAX_LENGTH, DOMAIN_MAX_LENGTH, \ + DOT_ATOM_TEXT, DOT_ATOM_TEXT_INTL, ATEXT_RE, ATEXT_INTL_DOT_RE, ATEXT_HOSTNAME_INTL, QTEXT_INTL, \ + DNS_LABEL_LENGTH_LIMIT, DOT_ATOM_TEXT_HOSTNAME, DOMAIN_NAME_REGEX, DOMAIN_LITERAL_CHARS + +import re +import unicodedata +import idna # implements IDNA 2008; Python's codec is only IDNA 2003 +import ipaddress +from typing import Optional, Tuple, TypedDict, Union + + +def split_email(email: str) -> Tuple[Optional[str], str, str, bool]: + # Return the display name, unescaped local part, and domain part + # of the address, and whether the local part was quoted. If no + # display name was present and angle brackets do not surround + # the address, display name will be None; otherwise, it will be + # set to the display name or the empty string if there were + # angle brackets but no display name. + + # Typical email addresses have a single @-sign and no quote + # characters, but the awkward "quoted string" local part form + # (RFC 5321 4.1.2) allows @-signs and escaped quotes to appear + # in the local part if the local part is quoted. + + # A `display name ` format is also present in MIME messages + # (RFC 5322 3.4) and this format is also often recognized in + # mail UIs. It's not allowed in SMTP commands or in typical web + # login forms, but parsing it has been requested, so it's done + # here as a convenience. It's implemented in the spirit but not + # the letter of RFC 5322 3.4 because MIME messages allow newlines + # and comments as a part of the CFWS rule, but this is typically + # not allowed in mail UIs (although comment syntax was requested + # once too). + # + # Display names are either basic characters (the same basic characters + # permitted in email addresses, but periods are not allowed and spaces + # are allowed; see RFC 5322 Appendix A.1.2), or or a quoted string with + # the same rules as a quoted local part. (Multiple quoted strings might + # be allowed? Unclear.) Optional space (RFC 5322 3.4 CFWS) and then the + # email address follows in angle brackets. + # + # An initial quote is ambiguous between starting a display name or + # a quoted local part --- fun. + # + # We assume the input string is already stripped of leading and + # trailing CFWS. + + def split_string_at_unquoted_special(text: str, specials: Tuple[str, ...]) -> Tuple[str, str]: + # Split the string at the first character in specials (an @-sign + # or left angle bracket) that does not occur within quotes and + # is not followed by a Unicode combining character. + # If no special character is found, raise an error. + inside_quote = False + escaped = False + left_part = "" + for i, c in enumerate(text): + # < plus U+0338 (Combining Long Solidus Overlay) normalizes to + # ≮ U+226E (Not Less-Than), and it would be confusing to treat + # the < as the start of "" syntax in that case. Likewise, + # if anything combines with an @ or ", we should probably not + # treat it as a special character. + if unicodedata.normalize("NFC", text[i:])[0] != c: + left_part += c + + elif inside_quote: + left_part += c + if c == '\\' and not escaped: + escaped = True + elif c == '"' and not escaped: + # The only way to exit the quote is an unescaped quote. + inside_quote = False + escaped = False + else: + escaped = False + elif c == '"': + left_part += c + inside_quote = True + elif c in specials: + # When unquoted, stop before a special character. + break + else: + left_part += c + + # No special symbol found. The special symbols always + # include an at-sign, so this always indicates a missing + # at-sign. The other symbol is optional. + if len(left_part) == len(text): + # The full-width at-sign might occur in CJK contexts. + # We can't accept it because we only accept addresess + # that are actually valid. But if this is common we + # may want to consider accepting and normalizing full- + # width characters for the other special symbols (and + # full-width dot is already accepted in internationalized + # domains) with a new option. + # See https://news.ycombinator.com/item?id=42235268. + if "@" in text: + raise EmailSyntaxError("The email address has the \"full-width\" at-sign (@) character instead of a regular at-sign.") + + # Check another near-homoglyph for good measure because + # homoglyphs in place of required characters could be + # very confusing. We may want to consider checking for + # homoglyphs anywhere we look for a special symbol. + if "﹫" in text: + raise EmailSyntaxError('The email address has the "small commercial at" character instead of a regular at-sign.') + + raise EmailSyntaxError("An email address must have an @-sign.") + + # The right part is whatever is left. + right_part = text[len(left_part):] + + return left_part, right_part + + def unquote_quoted_string(text: str) -> Tuple[str, bool]: + # Remove surrounding quotes and unescape escaped backslashes + # and quotes. Escapes are parsed liberally. I think only + # backslashes and quotes can be escaped but we'll allow anything + # to be. + quoted = False + escaped = False + value = "" + for i, c in enumerate(text): + if quoted: + if escaped: + value += c + escaped = False + elif c == '\\': + escaped = True + elif c == '"': + if i != len(text) - 1: + raise EmailSyntaxError("Extra character(s) found after close quote: " + + ", ".join(safe_character_display(c) for c in text[i + 1:])) + break + else: + value += c + elif i == 0 and c == '"': + quoted = True + else: + value += c + + return value, quoted + + # Split the string at the first unquoted @-sign or left angle bracket. + left_part, right_part = split_string_at_unquoted_special(email, ("@", "<")) + + # If the right part starts with an angle bracket, + # then the left part is a display name and the rest + # of the right part up to the final right angle bracket + # is the email address, . + if right_part.startswith("<"): + # Remove space between the display name and angle bracket. + left_part = left_part.rstrip() + + # Unquote and unescape the display name. + display_name, display_name_quoted = unquote_quoted_string(left_part) + + # Check that only basic characters are present in a + # non-quoted display name. + if not display_name_quoted: + bad_chars = { + safe_character_display(c) + for c in display_name + if (not ATEXT_RE.match(c) and c != ' ') or c == '.' + } + if bad_chars: + raise EmailSyntaxError("The display name contains invalid characters when not quoted: " + ", ".join(sorted(bad_chars)) + ".") + + # Check for other unsafe characters. + check_unsafe_chars(display_name, allow_space=True) + + # Check that the right part ends with an angle bracket + # but allow spaces after it, I guess. + if ">" not in right_part: + raise EmailSyntaxError("An open angle bracket at the start of the email address has to be followed by a close angle bracket at the end.") + right_part = right_part.rstrip(" ") + if right_part[-1] != ">": + raise EmailSyntaxError("There can't be anything after the email address.") + + # Remove the initial and trailing angle brackets. + addr_spec = right_part[1:].rstrip(">") + + # Split the email address at the first unquoted @-sign. + local_part, domain_part = split_string_at_unquoted_special(addr_spec, ("@",)) + + # Otherwise there is no display name. The left part is the local + # part and the right part is the domain. + else: + display_name = None + local_part, domain_part = left_part, right_part + + if domain_part.startswith("@"): + domain_part = domain_part[1:] + + # Unquote the local part if it is quoted. + local_part, is_quoted_local_part = unquote_quoted_string(local_part) + + return display_name, local_part, domain_part, is_quoted_local_part + + +def get_length_reason(addr: str, limit: int) -> str: + """Helper function to return an error message related to invalid length.""" + diff = len(addr) - limit + suffix = "s" if diff > 1 else "" + return f"({diff} character{suffix} too many)" + + +def safe_character_display(c: str) -> str: + # Return safely displayable characters in quotes. + if c == '\\': + return f"\"{c}\"" # can't use repr because it escapes it + if unicodedata.category(c)[0] in ("L", "N", "P", "S"): + return repr(c) + + # Construct a hex string in case the unicode name doesn't exist. + if ord(c) < 0xFFFF: + h = f"U+{ord(c):04x}".upper() + else: + h = f"U+{ord(c):08x}".upper() + + # Return the character name or, if it has no name, the hex string. + return unicodedata.name(c, h) + + +class LocalPartValidationResult(TypedDict): + local_part: str + ascii_local_part: Optional[str] + smtputf8: bool + + +def validate_email_local_part(local: str, allow_smtputf8: bool = True, allow_empty_local: bool = False, + quoted_local_part: bool = False) -> LocalPartValidationResult: + """Validates the syntax of the local part of an email address.""" + + if len(local) == 0: + if not allow_empty_local: + raise EmailSyntaxError("There must be something before the @-sign.") + + # The caller allows an empty local part. Useful for validating certain + # Postfix aliases. + return { + "local_part": local, + "ascii_local_part": local, + "smtputf8": False, + } + + # Check the length of the local part by counting characters. + # (RFC 5321 4.5.3.1.1) + # We're checking the number of characters here. If the local part + # is ASCII-only, then that's the same as bytes (octets). If it's + # internationalized, then the UTF-8 encoding may be longer, but + # that may not be relevant. We will check the total address length + # instead. + if len(local) > LOCAL_PART_MAX_LENGTH: + reason = get_length_reason(local, limit=LOCAL_PART_MAX_LENGTH) + raise EmailSyntaxError(f"The email address is too long before the @-sign {reason}.") + + # Check the local part against the non-internationalized regular expression. + # Most email addresses match this regex so it's probably fastest to check this first. + # (RFC 5322 3.2.3) + # All local parts matching the dot-atom rule are also valid as a quoted string + # so if it was originally quoted (quoted_local_part is True) and this regex matches, + # it's ok. + # (RFC 5321 4.1.2 / RFC 5322 3.2.4). + if DOT_ATOM_TEXT.match(local): + # It's valid. And since it's just the permitted ASCII characters, + # it's normalized and safe. If the local part was originally quoted, + # the quoting was unnecessary and it'll be returned as normalized to + # non-quoted form. + + # Return the local part and flag that SMTPUTF8 is not needed. + return { + "local_part": local, + "ascii_local_part": local, + "smtputf8": False, + } + + # The local part failed the basic dot-atom check. Try the extended character set + # for internationalized addresses. It's the same pattern but with additional + # characters permitted. + # RFC 6531 section 3.3. + valid: Optional[str] = None + requires_smtputf8 = False + if DOT_ATOM_TEXT_INTL.match(local): + # But international characters in the local part may not be permitted. + if not allow_smtputf8: + # Check for invalid characters against the non-internationalized + # permitted character set. + # (RFC 5322 3.2.3) + bad_chars = { + safe_character_display(c) + for c in local + if not ATEXT_RE.match(c) + } + if bad_chars: + raise EmailSyntaxError("Internationalized characters before the @-sign are not supported: " + ", ".join(sorted(bad_chars)) + ".") + + # Although the check above should always find something, fall back to this just in case. + raise EmailSyntaxError("Internationalized characters before the @-sign are not supported.") + + # It's valid. + valid = "dot-atom" + requires_smtputf8 = True + + # There are no dot-atom syntax restrictions on quoted local parts, so + # if it was originally quoted, it is probably valid. More characters + # are allowed, like @-signs, spaces, and quotes, and there are no + # restrictions on the placement of dots, as in dot-atom local parts. + elif quoted_local_part: + # Check for invalid characters in a quoted string local part. + # (RFC 5321 4.1.2. RFC 5322 lists additional permitted *obsolete* + # characters which are *not* allowed here. RFC 6531 section 3.3 + # extends the range to UTF8 strings.) + bad_chars = { + safe_character_display(c) + for c in local + if not QTEXT_INTL.match(c) + } + if bad_chars: + raise EmailSyntaxError("The email address contains invalid characters in quotes before the @-sign: " + ", ".join(sorted(bad_chars)) + ".") + + # See if any characters are outside of the ASCII range. + bad_chars = { + safe_character_display(c) + for c in local + if not (32 <= ord(c) <= 126) + } + if bad_chars: + requires_smtputf8 = True + + # International characters in the local part may not be permitted. + if not allow_smtputf8: + raise EmailSyntaxError("Internationalized characters before the @-sign are not supported: " + ", ".join(sorted(bad_chars)) + ".") + + # It's valid. + valid = "quoted" + + # If the local part matches the internationalized dot-atom form or was quoted, + # perform additional checks for Unicode strings. + if valid: + # Check that the local part is a valid, safe, and sensible Unicode string. + # Some of this may be redundant with the range U+0080 to U+10FFFF that is checked + # by DOT_ATOM_TEXT_INTL and QTEXT_INTL. Other characters may be permitted by the + # email specs, but they may not be valid, safe, or sensible Unicode strings. + # See the function for rationale. + check_unsafe_chars(local, allow_space=(valid == "quoted")) + + # Try encoding to UTF-8. Failure is possible with some characters like + # surrogate code points, but those are checked above. Still, we don't + # want to have an unhandled exception later. + try: + local.encode("utf8") + except ValueError as e: + raise EmailSyntaxError("The email address contains an invalid character.") from e + + # If this address passes only by the quoted string form, re-quote it + # and backslash-escape quotes and backslashes (removing any unnecessary + # escapes). Per RFC 5321 4.1.2, "all quoted forms MUST be treated as equivalent, + # and the sending system SHOULD transmit the form that uses the minimum quoting possible." + if valid == "quoted": + local = '"' + re.sub(r'(["\\])', r'\\\1', local) + '"' + + return { + "local_part": local, + "ascii_local_part": local if not requires_smtputf8 else None, + "smtputf8": requires_smtputf8, + } + + # It's not a valid local part. Let's find out why. + # (Since quoted local parts are all valid or handled above, these checks + # don't apply in those cases.) + + # Check for invalid characters. + # (RFC 5322 3.2.3, plus RFC 6531 3.3) + bad_chars = { + safe_character_display(c) + for c in local + if not ATEXT_INTL_DOT_RE.match(c) + } + if bad_chars: + raise EmailSyntaxError("The email address contains invalid characters before the @-sign: " + ", ".join(sorted(bad_chars)) + ".") + + # Check for dot errors imposted by the dot-atom rule. + # (RFC 5322 3.2.3) + check_dot_atom(local, 'An email address cannot start with a {}.', 'An email address cannot have a {} immediately before the @-sign.', is_hostname=False) + + # All of the reasons should already have been checked, but just in case + # we have a fallback message. + raise EmailSyntaxError("The email address contains invalid characters before the @-sign.") + + +def check_unsafe_chars(s: str, allow_space: bool = False) -> None: + # Check for unsafe characters or characters that would make the string + # invalid or non-sensible Unicode. + bad_chars = set() + for i, c in enumerate(s): + category = unicodedata.category(c) + if category[0] in ("L", "N", "P", "S"): + # Letters, numbers, punctuation, and symbols are permitted. + pass + elif category[0] == "M": + # Combining character in first position would combine with something + # outside of the email address if concatenated, so they are not safe. + # We also check if this occurs after the @-sign, which would not be + # sensible because it would modify the @-sign. + if i == 0: + bad_chars.add(c) + elif category == "Zs": + # Spaces outside of the ASCII range are not specifically disallowed in + # internationalized addresses as far as I can tell, but they violate + # the spirit of the non-internationalized specification that email + # addresses do not contain ASCII spaces when not quoted. Excluding + # ASCII spaces when not quoted is handled directly by the atom regex. + # + # In quoted-string local parts, spaces are explicitly permitted, and + # the ASCII space has category Zs, so we must allow it here, and we'll + # allow all Unicode spaces to be consistent. + if not allow_space: + bad_chars.add(c) + elif category[0] == "Z": + # The two line and paragraph separator characters (in categories Zl and Zp) + # are not specifically disallowed in internationalized addresses + # as far as I can tell, but they violate the spirit of the non-internationalized + # specification that email addresses do not contain line breaks when not quoted. + bad_chars.add(c) + elif category[0] == "C": + # Control, format, surrogate, private use, and unassigned code points (C) + # are all unsafe in various ways. Control and format characters can affect + # text rendering if the email address is concatenated with other text. + # Bidirectional format characters are unsafe, even if used properly, because + # they cause an email address to render as a different email address. + # Private use characters do not make sense for publicly deliverable + # email addresses. + bad_chars.add(c) + else: + # All categories should be handled above, but in case there is something new + # to the Unicode specification in the future, reject all other categories. + bad_chars.add(c) + if bad_chars: + raise EmailSyntaxError("The email address contains unsafe characters: " + + ", ".join(safe_character_display(c) for c in sorted(bad_chars)) + ".") + + +def check_dot_atom(label: str, start_descr: str, end_descr: str, is_hostname: bool) -> None: + # RFC 5322 3.2.3 + if label.endswith("."): + raise EmailSyntaxError(end_descr.format("period")) + if label.startswith("."): + raise EmailSyntaxError(start_descr.format("period")) + if ".." in label: + raise EmailSyntaxError("An email address cannot have two periods in a row.") + + if is_hostname: + # RFC 952 + if label.endswith("-"): + raise EmailSyntaxError(end_descr.format("hyphen")) + if label.startswith("-"): + raise EmailSyntaxError(start_descr.format("hyphen")) + if ".-" in label or "-." in label: + raise EmailSyntaxError("An email address cannot have a period and a hyphen next to each other.") + + +class DomainNameValidationResult(TypedDict): + ascii_domain: str + domain: str + + +def validate_email_domain_name(domain: str, test_environment: bool = False, globally_deliverable: bool = True) -> DomainNameValidationResult: + """Validates the syntax of the domain part of an email address.""" + + # Check for invalid characters. + # (RFC 952 plus RFC 6531 section 3.3 for internationalized addresses) + bad_chars = { + safe_character_display(c) + for c in domain + if not ATEXT_HOSTNAME_INTL.match(c) + } + if bad_chars: + raise EmailSyntaxError("The part after the @-sign contains invalid characters: " + ", ".join(sorted(bad_chars)) + ".") + + # Check for unsafe characters. + # Some of this may be redundant with the range U+0080 to U+10FFFF that is checked + # by DOT_ATOM_TEXT_INTL. Other characters may be permitted by the email specs, but + # they may not be valid, safe, or sensible Unicode strings. + check_unsafe_chars(domain) + + # Perform UTS-46 normalization, which includes casefolding, NFC normalization, + # and converting all label separators (the period/full stop, fullwidth full stop, + # ideographic full stop, and halfwidth ideographic full stop) to regular dots. + # It will also raise an exception if there is an invalid character in the input, + # such as "⒈" which is invalid because it would expand to include a dot and + # U+1FEF which normalizes to a backtick, which is not an allowed hostname character. + # Since several characters *are* normalized to a dot, this has to come before + # checks related to dots, like check_dot_atom which comes next. + original_domain = domain + try: + domain = idna.uts46_remap(domain, std3_rules=False, transitional=False) + except idna.IDNAError as e: + raise EmailSyntaxError(f"The part after the @-sign contains invalid characters ({e}).") from e + + # Check for invalid characters after Unicode normalization which are not caught + # by uts46_remap (see tests for examples). + bad_chars = { + safe_character_display(c) + for c in domain + if not ATEXT_HOSTNAME_INTL.match(c) + } + if bad_chars: + raise EmailSyntaxError("The part after the @-sign contains invalid characters after Unicode normalization: " + ", ".join(sorted(bad_chars)) + ".") + + # The domain part is made up dot-separated "labels." Each label must + # have at least one character and cannot start or end with dashes, which + # means there are some surprising restrictions on periods and dashes. + # Check that before we do IDNA encoding because the IDNA library gives + # unfriendly errors for these cases, but after UTS-46 normalization because + # it can insert periods and hyphens (from fullwidth characters). + # (RFC 952, RFC 1123 2.1, RFC 5322 3.2.3) + check_dot_atom(domain, 'An email address cannot have a {} immediately after the @-sign.', 'An email address cannot end with a {}.', is_hostname=True) + + # Check for RFC 5890's invalid R-LDH labels, which are labels that start + # with two characters other than "xn" and two dashes. + for label in domain.split("."): + if re.match(r"(?!xn)..--", label, re.I): + raise EmailSyntaxError("An email address cannot have two letters followed by two dashes immediately after the @-sign or after a period, except Punycode.") + + if DOT_ATOM_TEXT_HOSTNAME.match(domain): + # This is a valid non-internationalized domain. + ascii_domain = domain + else: + # If international characters are present in the domain name, convert + # the domain to IDNA ASCII. If internationalized characters are present, + # the MTA must either support SMTPUTF8 or the mail client must convert the + # domain name to IDNA before submission. + # + # For ASCII-only domains, the transformation does nothing and is safe to + # apply. However, to ensure we don't rely on the idna library for basic + # syntax checks, we don't use it if it's not needed. + # + # idna.encode also checks the domain name length after encoding but it + # doesn't give a nice error, so we call the underlying idna.alabel method + # directly. idna.alabel checks label length and doesn't give great messages, + # but we can't easily go to lower level methods. + try: + ascii_domain = ".".join( + idna.alabel(label).decode("ascii") + for label in domain.split(".") + ) + except idna.IDNAError as e: + # Some errors would have already been raised by idna.uts46_remap. + raise EmailSyntaxError(f"The part after the @-sign is invalid ({e}).") from e + + # Check the syntax of the string returned by idna.encode. + # It should never fail. + if not DOT_ATOM_TEXT_HOSTNAME.match(ascii_domain): + raise EmailSyntaxError("The email address contains invalid characters after the @-sign after IDNA encoding.") + + # Check the length of the domain name in bytes. + # (RFC 1035 2.3.4 and RFC 5321 4.5.3.1.2) + # We're checking the number of bytes ("octets") here, which can be much + # higher than the number of characters in internationalized domains, + # on the assumption that the domain may be transmitted without SMTPUTF8 + # as IDNA ASCII. (This is also checked by idna.encode, so this exception + # is never reached for internationalized domains.) + if len(ascii_domain) > DOMAIN_MAX_LENGTH: + if ascii_domain == original_domain: + reason = get_length_reason(ascii_domain, limit=DOMAIN_MAX_LENGTH) + raise EmailSyntaxError(f"The email address is too long after the @-sign {reason}.") + else: + diff = len(ascii_domain) - DOMAIN_MAX_LENGTH + s = "" if diff == 1 else "s" + raise EmailSyntaxError(f"The email address is too long after the @-sign ({diff} byte{s} too many after IDNA encoding).") + + # Also check the label length limit. + # (RFC 1035 2.3.1) + for label in ascii_domain.split("."): + if len(label) > DNS_LABEL_LENGTH_LIMIT: + reason = get_length_reason(label, limit=DNS_LABEL_LENGTH_LIMIT) + raise EmailSyntaxError(f"After the @-sign, periods cannot be separated by so many characters {reason}.") + + if globally_deliverable: + # All publicly deliverable addresses have domain names with at least + # one period, at least for gTLDs created since 2013 (per the ICANN Board + # New gTLD Program Committee, https://www.icann.org/en/announcements/details/new-gtld-dotless-domain-names-prohibited-30-8-2013-en). + # We'll consider the lack of a period a syntax error + # since that will match people's sense of what an email address looks + # like. We'll skip this in test environments to allow '@test' email + # addresses. + if "." not in ascii_domain and not (ascii_domain == "test" and test_environment): + raise EmailSyntaxError("The part after the @-sign is not valid. It should have a period.") + + # We also know that all TLDs currently end with a letter. + if not DOMAIN_NAME_REGEX.search(ascii_domain): + raise EmailSyntaxError("The part after the @-sign is not valid. It is not within a valid top-level domain.") + + # Check special-use and reserved domain names. + # Some might fail DNS-based deliverability checks, but that + # can be turned off, so we should fail them all sooner. + # See the references in __init__.py. + from . import SPECIAL_USE_DOMAIN_NAMES + for d in SPECIAL_USE_DOMAIN_NAMES: + # See the note near the definition of SPECIAL_USE_DOMAIN_NAMES. + if d == "test" and test_environment: + continue + + if ascii_domain == d or ascii_domain.endswith("." + d): + raise EmailSyntaxError("The part after the @-sign is a special-use or reserved name that cannot be used with email.") + + # We may have been given an IDNA ASCII domain to begin with. Check + # that the domain actually conforms to IDNA. It could look like IDNA + # but not be actual IDNA. For ASCII-only domains, the conversion out + # of IDNA just gives the same thing back. + # + # This gives us the canonical internationalized form of the domain, + # which we return to the caller as a part of the normalized email + # address. + try: + domain_i18n = idna.decode(ascii_domain.encode('ascii')) + except idna.IDNAError as e: + raise EmailSyntaxError(f"The part after the @-sign is not valid IDNA ({e}).") from e + + # Check that this normalized domain name has not somehow become + # an invalid domain name. All of the checks before this point + # using the idna package probably guarantee that we now have + # a valid international domain name in most respects. But it + # doesn't hurt to re-apply some tests to be sure. See the similar + # tests above. + + # Check for invalid and unsafe characters. We have no test + # case for this. + bad_chars = { + safe_character_display(c) + for c in domain + if not ATEXT_HOSTNAME_INTL.match(c) + } + if bad_chars: + raise EmailSyntaxError("The part after the @-sign contains invalid characters: " + ", ".join(sorted(bad_chars)) + ".") + check_unsafe_chars(domain) + + # Check that it can be encoded back to IDNA ASCII. We have no test + # case for this. + try: + idna.encode(domain_i18n) + except idna.IDNAError as e: + raise EmailSyntaxError(f"The part after the @-sign became invalid after normalizing to international characters ({e}).") from e + + # Return the IDNA ASCII-encoded form of the domain, which is how it + # would be transmitted on the wire (except when used with SMTPUTF8 + # possibly), as well as the canonical Unicode form of the domain, + # which is better for display purposes. This should also take care + # of RFC 6532 section 3.1's suggestion to apply Unicode NFC + # normalization to addresses. + return { + "ascii_domain": ascii_domain, + "domain": domain_i18n, + } + + +def validate_email_length(addrinfo: ValidatedEmail) -> None: + # There are three forms of the email address whose length must be checked: + # + # 1) The original email address string. Since callers may continue to use + # this string, even though we recommend using the normalized form, we + # should not pass validation when the original input is not valid. This + # form is checked first because it is the original input. + # 2) The normalized email address. We perform Unicode NFC normalization of + # the local part, we normalize the domain to internationalized characters + # (if originally IDNA ASCII) which also includes Unicode normalization, + # and we may remove quotes in quoted local parts. We recommend that + # callers use this string, so it must be valid. + # 3) The email address with the IDNA ASCII representation of the domain + # name, since this string may be used with email stacks that don't + # support UTF-8. Since this is the least likely to be used by callers, + # it is checked last. Note that ascii_email will only be set if the + # local part is ASCII, but conceivably the caller may combine a + # internationalized local part with an ASCII domain, so we check this + # on that combination also. Since we only return the normalized local + # part, we use that (and not the unnormalized local part). + # + # In all cases, the length is checked in UTF-8 because the SMTPUTF8 + # extension to SMTP validates the length in bytes. + + addresses_to_check = [ + (addrinfo.original, None), + (addrinfo.normalized, "after normalization"), + ((addrinfo.ascii_local_part or addrinfo.local_part or "") + "@" + addrinfo.ascii_domain, "when the part after the @-sign is converted to IDNA ASCII"), + ] + + for addr, reason in addresses_to_check: + addr_len = len(addr) + addr_utf8_len = len(addr.encode("utf8")) + diff = addr_utf8_len - EMAIL_MAX_LENGTH + if diff > 0: + if reason is None and addr_len == addr_utf8_len: + # If there is no normalization or transcoding, + # we can give a simple count of the number of + # characters over the limit. + reason = get_length_reason(addr, limit=EMAIL_MAX_LENGTH) + elif reason is None: + # If there is no normalization but there is + # some transcoding to UTF-8, we can compute + # the minimum number of characters over the + # limit by dividing the number of bytes over + # the limit by the maximum number of bytes + # per character. + mbpc = max(len(c.encode("utf8")) for c in addr) + mchars = max(1, diff // mbpc) + suffix = "s" if diff > 1 else "" + if mchars == diff: + reason = f"({diff} character{suffix} too many)" + else: + reason = f"({mchars}-{diff} character{suffix} too many)" + else: + # Since there is normalization, the number of + # characters in the input that need to change is + # impossible to know. + suffix = "s" if diff > 1 else "" + reason += f" ({diff} byte{suffix} too many)" + raise EmailSyntaxError(f"The email address is too long {reason}.") + + +class DomainLiteralValidationResult(TypedDict): + domain_address: Union[ipaddress.IPv4Address, ipaddress.IPv6Address] + domain: str + + +def validate_email_domain_literal(domain_literal: str) -> DomainLiteralValidationResult: + # This is obscure domain-literal syntax. Parse it and return + # a compressed/normalized address. + # RFC 5321 4.1.3 and RFC 5322 3.4.1. + + addr: Union[ipaddress.IPv4Address, ipaddress.IPv6Address] + + # Try to parse the domain literal as an IPv4 address. + # There is no tag for IPv4 addresses, so we can never + # be sure if the user intends an IPv4 address. + if re.match(r"^[0-9\.]+$", domain_literal): + try: + addr = ipaddress.IPv4Address(domain_literal) + except ValueError as e: + raise EmailSyntaxError(f"The address in brackets after the @-sign is not valid: It is not an IPv4 address ({e}) or is missing an address literal tag.") from e + + # Return the IPv4Address object and the domain back unchanged. + return { + "domain_address": addr, + "domain": f"[{addr}]", + } + + # If it begins with "IPv6:" it's an IPv6 address. + if domain_literal.startswith("IPv6:"): + try: + addr = ipaddress.IPv6Address(domain_literal[5:]) + except ValueError as e: + raise EmailSyntaxError(f"The IPv6 address in brackets after the @-sign is not valid ({e}).") from e + + # Return the IPv6Address object and construct a normalized + # domain literal. + return { + "domain_address": addr, + "domain": f"[IPv6:{addr.compressed}]", + } + + # Nothing else is valid. + + if ":" not in domain_literal: + raise EmailSyntaxError("The part after the @-sign in brackets is not an IPv4 address and has no address literal tag.") + + # The tag (the part before the colon) has character restrictions, + # but since it must come from a registry of tags (in which only "IPv6" is defined), + # there's no need to check the syntax of the tag. See RFC 5321 4.1.2. + + # Check for permitted ASCII characters. This actually doesn't matter + # since there will be an exception after anyway. + bad_chars = { + safe_character_display(c) + for c in domain_literal + if not DOMAIN_LITERAL_CHARS.match(c) + } + if bad_chars: + raise EmailSyntaxError("The part after the @-sign contains invalid characters in brackets: " + ", ".join(sorted(bad_chars)) + ".") + + # There are no other domain literal tags. + # https://www.iana.org/assignments/address-literal-tags/address-literal-tags.xhtml + raise EmailSyntaxError("The part after the @-sign contains an invalid address literal tag in brackets.") diff --git a/email_validator/types.py b/email_validator/types.py new file mode 100644 index 0000000..1df60ff --- /dev/null +++ b/email_validator/types.py @@ -0,0 +1,126 @@ +import warnings +from typing import Any, Dict, List, Optional, Tuple, Union + + +class ValidatedEmail: + """The validate_email function returns objects of this type holding the normalized form of the email address + and other information.""" + + """The email address that was passed to validate_email. (If passed as bytes, this will be a string.)""" + original: str + + """The normalized email address, which should always be used in preference to the original address. + The normalized address converts an IDNA ASCII domain name to Unicode, if possible, and performs + Unicode normalization on the local part and on the domain (if originally Unicode). It is the + concatenation of the local_part and domain attributes, separated by an @-sign.""" + normalized: str + + """The local part of the email address after Unicode normalization.""" + local_part: str + + """The domain part of the email address after Unicode normalization or conversion to + Unicode from IDNA ascii.""" + domain: str + + """If the domain part is a domain literal, the IPv4Address or IPv6Address object.""" + domain_address: object + + """If not None, a form of the email address that uses 7-bit ASCII characters only.""" + ascii_email: Optional[str] + + """If not None, the local part of the email address using 7-bit ASCII characters only.""" + ascii_local_part: Optional[str] + + """A form of the domain name that uses 7-bit ASCII characters only.""" + ascii_domain: str + + """If True, the SMTPUTF8 feature of your mail relay will be required to transmit messages + to this address. This flag is True just when ascii_local_part is missing. Otherwise it + is False.""" + smtputf8: bool + + """If a deliverability check is performed and if it succeeds, a list of (priority, domain) + tuples of MX records specified in the DNS for the domain.""" + mx: List[Tuple[int, str]] + + """If no MX records are actually specified in DNS and instead are inferred, through an obsolete + mechanism, from A or AAAA records, the value is the type of DNS record used instead (`A` or `AAAA`).""" + mx_fallback_type: Optional[str] + + """The display name in the original input text, unquoted and unescaped, or None.""" + display_name: Optional[str] + + def __repr__(self) -> str: + return f"" + + """For backwards compatibility, support old field names.""" + def __getattr__(self, key: str) -> str: + if key == "original_email": + return self.original + if key == "email": + return self.normalized + raise AttributeError(key) + + @property + def email(self) -> str: + warnings.warn("ValidatedEmail.email is deprecated and will be removed, use ValidatedEmail.normalized instead", DeprecationWarning) + return self.normalized + + """For backwards compatibility, some fields are also exposed through a dict-like interface. Note + that some of the names changed when they became attributes.""" + def __getitem__(self, key: str) -> Union[Optional[str], bool, List[Tuple[int, str]]]: + warnings.warn("dict-like access to the return value of validate_email is deprecated and may not be supported in the future.", DeprecationWarning, stacklevel=2) + if key == "email": + return self.normalized + if key == "email_ascii": + return self.ascii_email + if key == "local": + return self.local_part + if key == "domain": + return self.ascii_domain + if key == "domain_i18n": + return self.domain + if key == "smtputf8": + return self.smtputf8 + if key == "mx": + return self.mx + if key == "mx-fallback": + return self.mx_fallback_type + raise KeyError() + + """Tests use this.""" + def __eq__(self, other: object) -> bool: + if not isinstance(other, ValidatedEmail): + return False + return ( + self.normalized == other.normalized + and self.local_part == other.local_part + and self.domain == other.domain + and getattr(self, 'ascii_email', None) == getattr(other, 'ascii_email', None) + and getattr(self, 'ascii_local_part', None) == getattr(other, 'ascii_local_part', None) + and getattr(self, 'ascii_domain', None) == getattr(other, 'ascii_domain', None) + and self.smtputf8 == other.smtputf8 + and repr(sorted(self.mx) if getattr(self, 'mx', None) else None) + == repr(sorted(other.mx) if getattr(other, 'mx', None) else None) + and getattr(self, 'mx_fallback_type', None) == getattr(other, 'mx_fallback_type', None) + and getattr(self, 'display_name', None) == getattr(other, 'display_name', None) + ) + + """This helps producing the README.""" + def as_constructor(self) -> str: + return "ValidatedEmail(" \ + + ",".join(f"\n {key}={repr(getattr(self, key))}" + for key in ('normalized', 'local_part', 'domain', + 'ascii_email', 'ascii_local_part', 'ascii_domain', + 'smtputf8', 'mx', 'mx_fallback_type', + 'display_name') + if hasattr(self, key) + ) \ + + ")" + + """Convenience method for accessing ValidatedEmail as a dict""" + def as_dict(self) -> Dict[str, Any]: + d = self.__dict__ + if d.get('domain_address'): + d['domain_address'] = repr(d['domain_address']) + return d diff --git a/email_validator/validate_email.py b/email_validator/validate_email.py new file mode 100644 index 0000000..0e8f6e0 --- /dev/null +++ b/email_validator/validate_email.py @@ -0,0 +1,200 @@ +from typing import Optional, Union, TYPE_CHECKING +import unicodedata + +from .exceptions import EmailSyntaxError +from .types import ValidatedEmail +from .syntax import split_email, validate_email_local_part, validate_email_domain_name, validate_email_domain_literal, validate_email_length +from .rfc_constants import CASE_INSENSITIVE_MAILBOX_NAMES + +if TYPE_CHECKING: + import dns.resolver + _Resolver = dns.resolver.Resolver +else: + _Resolver = object + + +def validate_email( + email: Union[str, bytes], + /, # prior arguments are positional-only + *, # subsequent arguments are keyword-only + allow_smtputf8: Optional[bool] = None, + allow_empty_local: Optional[bool] = None, + allow_quoted_local: Optional[bool] = None, + allow_domain_literal: Optional[bool] = None, + allow_display_name: Optional[bool] = None, + check_deliverability: Optional[bool] = None, + test_environment: Optional[bool] = None, + globally_deliverable: Optional[bool] = None, + timeout: Optional[int] = None, + dns_resolver: Optional[_Resolver] = None +) -> ValidatedEmail: + """ + Given an email address, and some options, returns a ValidatedEmail instance + with information about the address if it is valid or, if the address is not + valid, raises an EmailNotValidError. This is the main function of the module. + """ + + # Fill in default values of arguments. + from . import ALLOW_SMTPUTF8, ALLOW_EMPTY_LOCAL, ALLOW_QUOTED_LOCAL, ALLOW_DOMAIN_LITERAL, ALLOW_DISPLAY_NAME, \ + GLOBALLY_DELIVERABLE, CHECK_DELIVERABILITY, TEST_ENVIRONMENT, DEFAULT_TIMEOUT + if allow_smtputf8 is None: + allow_smtputf8 = ALLOW_SMTPUTF8 + if allow_empty_local is None: + allow_empty_local = ALLOW_EMPTY_LOCAL + if allow_quoted_local is None: + allow_quoted_local = ALLOW_QUOTED_LOCAL + if allow_domain_literal is None: + allow_domain_literal = ALLOW_DOMAIN_LITERAL + if allow_display_name is None: + allow_display_name = ALLOW_DISPLAY_NAME + if check_deliverability is None: + check_deliverability = CHECK_DELIVERABILITY + if test_environment is None: + test_environment = TEST_ENVIRONMENT + if globally_deliverable is None: + globally_deliverable = GLOBALLY_DELIVERABLE + if timeout is None and dns_resolver is None: + timeout = DEFAULT_TIMEOUT + + # Allow email to be a str or bytes instance. If bytes, + # it must be ASCII because that's how the bytes work + # on the wire with SMTP. + if not isinstance(email, str): + try: + email = email.decode("ascii") + except ValueError as e: + raise EmailSyntaxError("The email address is not valid ASCII.") from e + + # Split the address into the display name (or None), the local part + # (before the @-sign), and the domain part (after the @-sign). + # Normally, there is only one @-sign. But the awkward "quoted string" + # local part form (RFC 5321 4.1.2) allows @-signs in the local + # part if the local part is quoted. + display_name, local_part, domain_part, is_quoted_local_part \ + = split_email(email) + + if display_name: + # UTS #39 3.3 Email Security Profiles for Identifiers requires + # display names (incorrectly called "quoted-string-part" there) + # to be NFC normalized. Since these are not a part of what we + # are really validating, we won't check that the input was NFC + # normalized, but we'll normalize in output. + display_name = unicodedata.normalize("NFC", display_name) + + # Collect return values in this instance. + ret = ValidatedEmail() + ret.original = ((local_part if not is_quoted_local_part + else ('"' + local_part + '"')) + + "@" + domain_part) # drop the display name, if any, for email length tests at the end + ret.display_name = display_name + + # Validate the email address's local part syntax and get a normalized form. + # If the original address was quoted and the decoded local part is a valid + # unquoted local part, then we'll get back a normalized (unescaped) local + # part. + local_part_info = validate_email_local_part(local_part, + allow_smtputf8=allow_smtputf8, + allow_empty_local=allow_empty_local, + quoted_local_part=is_quoted_local_part) + ret.local_part = local_part_info["local_part"] + ret.ascii_local_part = local_part_info["ascii_local_part"] + ret.smtputf8 = local_part_info["smtputf8"] + + # RFC 6532 section 3.1 says that Unicode NFC normalization should be applied, + # so we'll return the NFC-normalized local part. Since the caller may use that + # string in place of the original string, ensure it is also valid. + # + # UTS #39 3.3 Email Security Profiles for Identifiers requires local parts + # to be NFKC normalized, which loses some information in characters that can + # be decomposed. We might want to consider applying NFKC normalization, but + # we can't make the change easily because it would break database lookups + # for any caller that put a normalized address from a previous version of + # this library. (UTS #39 seems to require that the *input* be NKFC normalized + # and has other requirements that are hard to check without additional Unicode + # data, and I don't know whether the rules really apply in the wild.) + normalized_local_part = unicodedata.normalize("NFC", ret.local_part) + if normalized_local_part != ret.local_part: + try: + validate_email_local_part(normalized_local_part, + allow_smtputf8=allow_smtputf8, + allow_empty_local=allow_empty_local, + quoted_local_part=is_quoted_local_part) + except EmailSyntaxError as e: + raise EmailSyntaxError("After Unicode normalization: " + str(e)) from e + ret.local_part = normalized_local_part + + # If a quoted local part isn't allowed but is present, now raise an exception. + # This is done after any exceptions raised by validate_email_local_part so + # that mandatory checks have highest precedence. + if is_quoted_local_part and not allow_quoted_local: + raise EmailSyntaxError("Quoting the part before the @-sign is not allowed here.") + + # Some local parts are required to be case-insensitive, so we should normalize + # to lowercase. + # RFC 2142 + if ret.ascii_local_part is not None \ + and ret.ascii_local_part.lower() in CASE_INSENSITIVE_MAILBOX_NAMES \ + and ret.local_part is not None: + ret.ascii_local_part = ret.ascii_local_part.lower() + ret.local_part = ret.local_part.lower() + + # Validate the email address's domain part syntax and get a normalized form. + is_domain_literal = False + if len(domain_part) == 0: + raise EmailSyntaxError("There must be something after the @-sign.") + + elif domain_part.startswith("[") and domain_part.endswith("]"): + # Parse the address in the domain literal and get back a normalized domain. + domain_literal_info = validate_email_domain_literal(domain_part[1:-1]) + if not allow_domain_literal: + raise EmailSyntaxError("A bracketed IP address after the @-sign is not allowed here.") + ret.domain = domain_literal_info["domain"] + ret.ascii_domain = domain_literal_info["domain"] # Domain literals are always ASCII. + ret.domain_address = domain_literal_info["domain_address"] + is_domain_literal = True # Prevent deliverability checks. + + else: + # Check the syntax of the domain and get back a normalized + # internationalized and ASCII form. + domain_name_info = validate_email_domain_name(domain_part, test_environment=test_environment, globally_deliverable=globally_deliverable) + ret.domain = domain_name_info["domain"] + ret.ascii_domain = domain_name_info["ascii_domain"] + + # Construct the complete normalized form. + ret.normalized = ret.local_part + "@" + ret.domain + + # If the email address has an ASCII form, add it. + if not ret.smtputf8: + if not ret.ascii_domain: + raise Exception("Missing ASCII domain.") + ret.ascii_email = (ret.ascii_local_part or "") + "@" + ret.ascii_domain + else: + ret.ascii_email = None + + # Check the length of the address. + validate_email_length(ret) + + # Check that a display name is permitted. It's the last syntax check + # because we always check against optional parsing features last. + if display_name is not None and not allow_display_name: + raise EmailSyntaxError("A display name and angle brackets around the email address are not permitted here.") + + if check_deliverability and not test_environment: + # Validate the email address's deliverability using DNS + # and update the returned ValidatedEmail object with metadata. + + if is_domain_literal: + # There is nothing to check --- skip deliverability checks. + return ret + + # Lazy load `deliverability` as it is slow to import (due to dns.resolver) + from .deliverability import validate_email_deliverability + deliverability_info = validate_email_deliverability( + ret.ascii_domain, ret.domain, timeout, dns_resolver + ) + mx = deliverability_info.get("mx") + if mx is not None: + ret.mx = mx + ret.mx_fallback_type = deliverability_info.get("mx_fallback_type") + + return ret diff --git a/email_validator/version.py b/email_validator/version.py new file mode 100644 index 0000000..8a124bf --- /dev/null +++ b/email_validator/version.py @@ -0,0 +1 @@ +__version__ = "2.2.0" diff --git a/pyproject.toml b/pyproject.toml new file mode 100644 index 0000000..a92c08e --- /dev/null +++ b/pyproject.toml @@ -0,0 +1,17 @@ +[tool.mypy] +disallow_any_generics = true +disallow_subclassing_any = true + +check_untyped_defs = true +disallow_incomplete_defs = true +disallow_untyped_calls = true +disallow_untyped_decorators = true +disallow_untyped_defs = true + +warn_redundant_casts = true +warn_unused_ignores = true + +[tool.pytest.ini_options] +markers = [ + "network: marks tests as requiring Internet access", +] diff --git a/release_to_pypi.sh b/release_to_pypi.sh new file mode 100755 index 0000000..466f4f8 --- /dev/null +++ b/release_to_pypi.sh @@ -0,0 +1,6 @@ +#!/bin/bash +source env/bin/activate +pip3 install --upgrade build twine +rm -rf dist +python3 -m build +twine upload -u __token__ dist/* # username: __token__ password: pypi API token diff --git a/setup.cfg b/setup.cfg index d32921b..8ceac96 100644 --- a/setup.cfg +++ b/setup.cfg @@ -1,40 +1,40 @@ [metadata] -name = email_validator -version = 1.2.1 -description = A robust email syntax and deliverability validation library. +name = email-validator +version = attr: email_validator.version.__version__ +description = A robust email address syntax and deliverability validation library. long_description = file: README.md long_description_content_type = text/markdown url = https://github.com/JoshData/python-email-validator author = Joshua Tauberer author_email = jt@occams.info -license = CC0 (copyright waived) -license_file = LICENSE +license = Unlicense +license_files = LICENSE classifiers = Development Status :: 5 - Production/Stable Intended Audience :: Developers - License :: CC0 1.0 Universal (CC0 1.0) Public Domain Dedication - Programming Language :: Python :: 2 - Programming Language :: Python :: 2.7 + License :: OSI Approved :: The Unlicense (Unlicense) Programming Language :: Python :: 3 - Programming Language :: Python :: 3.7 Programming Language :: Python :: 3.8 Programming Language :: Python :: 3.9 + Programming Language :: Python :: 3.10 + Programming Language :: Python :: 3.11 + Programming Language :: Python :: 3.12 Topic :: Software Development :: Libraries :: Python Modules keywords = email address validator [options] packages = find: install_requires = - dnspython>=1.15.0 + dnspython>=2.0.0 # optional if deliverability check isn't needed idna>=2.0.0 -python_requires = >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.* +python_requires = >=3.8 + +[options.package_data] +* = py.typed [options.entry_points] console_scripts = - email_validator=email_validator:main - -[bdist_wheel] -universal = 1 + email_validator=email_validator.__main__:main [flake8] max-line-length = 120 @@ -43,3 +43,5 @@ max-line-length = 120 testpaths = tests filterwarnings = error +markers = + network: mark a test as requiring Internet access. diff --git a/test_requirements.txt b/test_requirements.txt index 38dab84..bea5d5a 100644 --- a/test_requirements.txt +++ b/test_requirements.txt @@ -1,26 +1,26 @@ -# This file was generated by running +# This file was generated by running: +# sudo docker run --rm -it --network=host python:3.8-slim /bin/bash # pip install dnspython idna # from setup.cfg -# pip install pytest pytest-cov coverage flake8 +# pip install pytest pytest-cov coverage flake8 mypy # pip freeze -# in a virtualenv with Python 3.6. (Some packages' latest versions -# are not compatible with Python 3.6, so we must pin versions for -# repeatable testing in earlier versions of Python.) -attrs==21.4.0 -coverage==6.2 -dnspython==2.2.1 -flake8==4.0.1 -idna==3.3 -importlib-metadata==4.2.0 -iniconfig==1.1.1 -mccabe==0.6.1 -packaging==21.3 -pluggy==1.0.0 -py==1.11.0 -pycodestyle==2.8.0 -pyflakes==2.4.0 -pyparsing==3.0.7 -pytest==7.0.1 -pytest-cov==3.0.0 -tomli==1.2.3 -typing_extensions==4.1.1 -zipp==3.6.0 +# (Some packages' latest versions may not be compatible with +# the earliest Python version we support, and some exception +# messages may depend on package versions, so we pin versions +# for reproducible testing.) +coverage==7.5.3 +dnspython==2.6.1 +exceptiongroup==1.2.1 +flake8==7.1.0 +idna==3.7 +iniconfig==2.0.0 +mccabe==0.7.0 +mypy==1.10.0 +mypy-extensions==1.0.0 +packaging==24.1 +pluggy==1.5.0 +pycodestyle==2.12.0 +pyflakes==3.2.0 +pytest==8.2.2 +pytest-cov==5.0.0 +tomli==2.0.1 +typing_extensions==4.12.2 diff --git a/tests/mocked-dns-answers.json b/tests/mocked-dns-answers.json new file mode 100644 index 0000000..12d3885 --- /dev/null +++ b/tests/mocked-dns-answers.json @@ -0,0 +1,168 @@ +[ + { + "query": { + "name": "gmail.com", + "type": "MX", + "class": "IN" + }, + "answer": [ + "10 alt1.gmail-smtp-in.l.google.com.", + "20 alt2.gmail-smtp-in.l.google.com.", + "30 alt3.gmail-smtp-in.l.google.com.", + "40 alt4.gmail-smtp-in.l.google.com.", + "5 gmail-smtp-in.l.google.com." + ] + }, + { + "query": { + "name": "pages.github.com", + "type": "MX", + "class": "IN" + }, + "answer": [] + }, + { + "query": { + "name": "pages.github.com", + "type": "A", + "class": "IN" + }, + "answer": [ + "185.199.108.153", + "185.199.109.153", + "185.199.110.153", + "185.199.111.153" + ] + }, + { + "query": { + "name": "pages.github.com", + "type": "TXT", + "class": "IN" + }, + "answer": [] + }, + { + "query": { + "name": "xkxufoekjvjfjeodlfmdfjcu.com", + "type": "ANY", + "class": "IN" + }, + "answer": [] + }, + { + "query": { + "name": "example.com", + "type": "MX", + "class": "IN" + }, + "answer": [ + "0 ." + ] + }, + { + "query": { + "name": "g.mail.com", + "type": "MX", + "class": "IN" + }, + "answer": [] + }, + { + "query": { + "name": "g.mail.com", + "type": "A", + "class": "IN" + }, + "answer": [] + }, + { + "query": { + "name": "g.mail.com", + "type": "AAAA", + "class": "IN" + }, + "answer": [ + "::1" + ] + }, + { + "query": { + "name": "nellis.af.mil", + "type": "MX", + "class": "IN" + }, + "answer": [] + }, + { + "query": { + "name": "nellis.af.mil", + "type": "A", + "class": "IN" + }, + "answer": [ + "132.58.234.0" + ] + }, + { + "query": { + "name": "nellis.af.mil", + "type": "TXT", + "class": "IN" + }, + "answer": [ + "\"MS=ms47108184\"", + "\"v=spf1 -all\"" + ] + }, + { + "query": { + "name": "justtxt.joshdata.me", + "type": "MX", + "class": "IN" + }, + "answer": [] + }, + { + "query": { + "name": "justtxt.joshdata.me", + "type": "A", + "class": "IN" + }, + "answer": [] + }, + { + "query": { + "name": "justtxt.joshdata.me", + "type": "AAAA", + "class": "IN" + }, + "answer": [] + }, + { + "query": { + "name": "mail.example", + "type": "ANY", + "class": "IN" + }, + "answer": [] + }, + { + "query": { + "name": "mail.example.com", + "type": "ANY", + "class": "IN" + }, + "answer": [] + }, + { + "query": { + "name": "google.com", + "type": "MX", + "class": "IN" + }, + "answer": [ + "10 smtp.google.com." + ] + } +] \ No newline at end of file diff --git a/tests/mocked_dns_response.py b/tests/mocked_dns_response.py new file mode 100644 index 0000000..c6db5cb --- /dev/null +++ b/tests/mocked_dns_response.py @@ -0,0 +1,127 @@ +from typing import Any, Dict, Iterator, Optional + +import dns.exception +import dns.rdataset +import dns.resolver +import json +import os.path +import pytest + +from email_validator.deliverability import caching_resolver + +# To run deliverability checks without actually making +# DNS queries, we use a caching resolver where the cache +# is pre-loaded with DNS responses. + +# When False, all DNS queries must come from the mocked +# data. When True, tests are run with live DNS queries +# and the DNS responses are saved to a file. +BUILD_MOCKED_DNS_RESPONSE_DATA = False + + +# This class implements the 'get' and 'put' methods +# expected for a dns.resolver.Resolver's cache. +class MockedDnsResponseData: + DATA_PATH = os.path.dirname(__file__) + "/mocked-dns-answers.json" + + INSTANCE = None + + @staticmethod + def create_resolver() -> dns.resolver.Resolver: + if MockedDnsResponseData.INSTANCE is None: + # Create a singleton instance of this class and load the saved DNS responses. + # Except when BUILD_MOCKED_DNS_RESPONSE_DATA is true, don't load the data. + singleton = MockedDnsResponseData() + if not BUILD_MOCKED_DNS_RESPONSE_DATA: + singleton.load() + MockedDnsResponseData.INSTANCE = singleton + + # Return a new dns.resolver.Resolver configured for caching + # using the singleton instance. + dns_resolver = dns.resolver.Resolver(configure=BUILD_MOCKED_DNS_RESPONSE_DATA) + return caching_resolver(cache=MockedDnsResponseData.INSTANCE, dns_resolver=dns_resolver) + + def __init__(self) -> None: + self.data: Dict[dns.resolver.CacheKey, Optional[MockedDnsResponseData.Ans]] = {} + + # Loads the saved DNS response data from the JSON file and + # re-structures it into dnspython classes. + class Ans: # mocks the dns.resolver.Answer class + def __init__(self, rrset: dns.rdataset.Rdataset) -> None: + self.rrset = rrset + + def __iter__(self) -> Iterator[Any]: + return iter(self.rrset) + + def load(self) -> None: + with open(self.DATA_PATH) as f: + data = json.load(f) + for item in data: + key = (dns.name.from_text(item["query"]["name"] + "."), + dns.rdatatype.from_text(item["query"]["type"]), + dns.rdataclass.from_text(item["query"]["class"])) + rdatas = [ + dns.rdata.from_text(rdtype=key[1], rdclass=key[2], tok=rr) + for rr in item["answer"] + ] + if item["answer"]: + self.data[key] = MockedDnsResponseData.Ans(dns.rdataset.from_rdata_list(0, rdatas=rdatas)) + else: + self.data[key] = None + + def save(self) -> None: + # Re-structure as a list with basic data types. + data = [ + { + "query": { + "name": key[0].to_text(omit_final_dot=True), + "type": dns.rdatatype.to_text(key[1]), + "class": dns.rdataclass.to_text(key[2]), + }, + "answer": sorted([ + rr.to_text() + for rr in value + ]) + } + for key, value in self.data.items() + if value is not None + ] + with open(self.DATA_PATH, "w") as f: + json.dump(data, f, indent=True) + + def get(self, key: dns.resolver.CacheKey) -> Optional[Ans]: + # Special-case a domain to create a timeout. + if key[0].to_text() == "timeout.com.": + raise dns.exception.Timeout() # type: ignore [no-untyped-call] + + # When building the DNS response database, return + # a cache miss. + if BUILD_MOCKED_DNS_RESPONSE_DATA: + return None + + # Query the data for a matching record. + if key in self.data: + if not self.data[key]: + raise dns.resolver.NoAnswer() # type: ignore [no-untyped-call] + return self.data[key] + + # Query the data for a response to an ANY query. + ANY = dns.rdatatype.from_text("ANY") + if (key[0], ANY, key[2]) in self.data and self.data[(key[0], ANY, key[2])] is None: + raise dns.resolver.NXDOMAIN() # type: ignore [no-untyped-call] + + raise ValueError(f"Saved DNS data did not contain query: {key}") + + def put(self, key: dns.resolver.CacheKey, value: Ans) -> None: + # Build the DNS data by saving the live query response. + if not BUILD_MOCKED_DNS_RESPONSE_DATA: + raise ValueError("Should not get here.") + self.data[key] = value + + +@pytest.fixture(scope="session", autouse=True) +def MockedDnsResponseDataCleanup(request: pytest.FixtureRequest) -> None: + def cleanup_func() -> None: + if BUILD_MOCKED_DNS_RESPONSE_DATA and MockedDnsResponseData.INSTANCE is not None: + MockedDnsResponseData.INSTANCE.save() + request.addfinalizer(cleanup_func) diff --git a/tests/test_deliverability.py b/tests/test_deliverability.py new file mode 100644 index 0000000..b65116b --- /dev/null +++ b/tests/test_deliverability.py @@ -0,0 +1,86 @@ +from typing import Any, Dict + +import pytest +import re + +from email_validator import EmailUndeliverableError, \ + validate_email, caching_resolver +from email_validator.deliverability import validate_email_deliverability + +from mocked_dns_response import MockedDnsResponseData, MockedDnsResponseDataCleanup # noqa: F401 + +RESOLVER = MockedDnsResponseData.create_resolver() + + +@pytest.mark.parametrize( + 'domain,expected_response', + [ + ('gmail.com', {'mx': [(5, 'gmail-smtp-in.l.google.com'), (10, 'alt1.gmail-smtp-in.l.google.com'), (20, 'alt2.gmail-smtp-in.l.google.com'), (30, 'alt3.gmail-smtp-in.l.google.com'), (40, 'alt4.gmail-smtp-in.l.google.com')], 'mx_fallback_type': None}), + ('pages.github.com', {'mx': [(0, 'pages.github.com')], 'mx_fallback_type': 'A'}), + ], +) +def test_deliverability_found(domain: str, expected_response: str) -> None: + response = validate_email_deliverability(domain, domain, dns_resolver=RESOLVER) + assert response == expected_response + + +@pytest.mark.parametrize( + 'domain,error', + [ + ('xkxufoekjvjfjeodlfmdfjcu.com', 'The domain name {domain} does not exist'), + ('example.com', 'The domain name {domain} does not accept email'), # Null MX record + ('g.mail.com', 'The domain name {domain} does not accept email'), # No MX record but invalid AAAA record fallback (issue #134) + ('nellis.af.mil', 'The domain name {domain} does not send email'), # No MX record, A record fallback, reject-all SPF record. + + # No MX or A/AAAA records, but some other DNS records must + # exist such that the response is NOANSWER instead of NXDOMAIN. + ('justtxt.joshdata.me', 'The domain name {domain} does not accept email'), + ], +) +def test_deliverability_fails(domain: str, error: str) -> None: + with pytest.raises(EmailUndeliverableError, match=error.format(domain=domain)): + validate_email_deliverability(domain, domain, dns_resolver=RESOLVER) + + +@pytest.mark.parametrize( + 'email_input', + [ + ('me@mail.example'), + ('me@example.com'), + ('me@mail.example.com'), + ], +) +def test_email_example_reserved_domain(email_input: str) -> None: + # Since these all fail deliverabiltiy from a static list, + # DNS deliverability checks do not arise. + with pytest.raises(EmailUndeliverableError) as exc_info: + validate_email(email_input, dns_resolver=RESOLVER) + # print(f'({email_input!r}, {str(exc_info.value)!r}),') + assert re.match(r"The domain name [a-z\.]+ does not (accept email|exist)\.", str(exc_info.value)) is not None + + +def test_deliverability_dns_timeout() -> None: + response = validate_email_deliverability('timeout.com', 'timeout.com', dns_resolver=RESOLVER) + assert "mx" not in response + assert response.get("unknown-deliverability") == "timeout" + + +@pytest.mark.network +def test_caching_dns_resolver() -> None: + class TestCache: + def __init__(self) -> None: + self.cache: Dict[Any, Any] = {} + + def get(self, key: Any) -> Any: + return self.cache.get(key) + + def put(self, key: Any, value: Any) -> Any: + self.cache[key] = value + + cache = TestCache() + resolver = caching_resolver(timeout=1, cache=cache) + validate_email("test@gmail.com", dns_resolver=resolver) + assert len(cache.cache) == 1 + + validate_email("test@gmail.com", dns_resolver=resolver) + assert len(cache.cache) == 1 diff --git a/tests/test_main.py b/tests/test_main.py index f1f731d..ab8eecd 100644 --- a/tests/test_main.py +++ b/tests/test_main.py @@ -1,389 +1,47 @@ -import dns.resolver -import re import pytest -from email_validator import EmailSyntaxError, EmailUndeliverableError, \ - validate_email, validate_email_deliverability, \ - caching_resolver, ValidatedEmail -# Let's test main but rename it to be clear -from email_validator import main as validator_main - - -@pytest.mark.parametrize( - 'email_input,output', - [ - ( - 'Abc@example.tld', - ValidatedEmail( - local_part='Abc', - ascii_local_part='Abc', - smtputf8=False, - ascii_domain='example.tld', - domain='example.tld', - email='Abc@example.tld', - ascii_email='Abc@example.tld', - ), - ), - ( - 'Abc.123@test-example.com', - ValidatedEmail( - local_part='Abc.123', - ascii_local_part='Abc.123', - smtputf8=False, - ascii_domain='test-example.com', - domain='test-example.com', - email='Abc.123@test-example.com', - ascii_email='Abc.123@test-example.com', - ), - ), - ( - 'user+mailbox/department=shipping@example.tld', - ValidatedEmail( - local_part='user+mailbox/department=shipping', - ascii_local_part='user+mailbox/department=shipping', - smtputf8=False, - ascii_domain='example.tld', - domain='example.tld', - email='user+mailbox/department=shipping@example.tld', - ascii_email='user+mailbox/department=shipping@example.tld', - ), - ), - ( - "!#$%&'*+-/=?^_`.{|}~@example.tld", - ValidatedEmail( - local_part="!#$%&'*+-/=?^_`.{|}~", - ascii_local_part="!#$%&'*+-/=?^_`.{|}~", - smtputf8=False, - ascii_domain='example.tld', - domain='example.tld', - email="!#$%&'*+-/=?^_`.{|}~@example.tld", - ascii_email="!#$%&'*+-/=?^_`.{|}~@example.tld", - ), - ), - ( - '伊昭傑@郵件.商務', - ValidatedEmail( - local_part='伊昭傑', - smtputf8=True, - ascii_domain='xn--5nqv22n.xn--lhr59c', - domain='郵件.商務', - email='伊昭傑@郵件.商務', - ), - ), - ( - 'राम@मोहन.ईन्फो', - ValidatedEmail( - local_part='राम', - smtputf8=True, - ascii_domain='xn--l2bl7a9d.xn--o1b8dj2ki', - domain='मोहन.ईन्फो', - email='राम@मोहन.ईन्फो', - ), - ), - ( - 'юзер@екзампл.ком', - ValidatedEmail( - local_part='юзер', - smtputf8=True, - ascii_domain='xn--80ajglhfv.xn--j1aef', - domain='екзампл.ком', - email='юзер@екзампл.ком', - ), - ), - ( - 'θσερ@εχαμπλε.ψομ', - ValidatedEmail( - local_part='θσερ', - smtputf8=True, - ascii_domain='xn--mxahbxey0c.xn--xxaf0a', - domain='εχαμπλε.ψομ', - email='θσερ@εχαμπλε.ψομ', - ), - ), - ( - '葉士豪@臺網中心.tw', - ValidatedEmail( - local_part='葉士豪', - smtputf8=True, - ascii_domain='xn--fiqq24b10vi0d.tw', - domain='臺網中心.tw', - email='葉士豪@臺網中心.tw', - ), - ), - ( - 'jeff@臺網中心.tw', - ValidatedEmail( - local_part='jeff', - ascii_local_part='jeff', - smtputf8=False, - ascii_domain='xn--fiqq24b10vi0d.tw', - domain='臺網中心.tw', - email='jeff@臺網中心.tw', - ascii_email='jeff@xn--fiqq24b10vi0d.tw', - ), - ), - ( - '葉士豪@臺網中心.台灣', - ValidatedEmail( - local_part='葉士豪', - smtputf8=True, - ascii_domain='xn--fiqq24b10vi0d.xn--kpry57d', - domain='臺網中心.台灣', - email='葉士豪@臺網中心.台灣', - ), - ), - ( - 'jeff葉@臺網中心.tw', - ValidatedEmail( - local_part='jeff葉', - smtputf8=True, - ascii_domain='xn--fiqq24b10vi0d.tw', - domain='臺網中心.tw', - email='jeff葉@臺網中心.tw', - ), - ), - ( - 'ñoñó@example.tld', - ValidatedEmail( - local_part='ñoñó', - smtputf8=True, - ascii_domain='example.tld', - domain='example.tld', - email='ñoñó@example.tld', - ), - ), - ( - '我買@example.tld', - ValidatedEmail( - local_part='我買', - smtputf8=True, - ascii_domain='example.tld', - domain='example.tld', - email='我買@example.tld', - ), - ), - ( - '甲斐黒川日本@example.tld', - ValidatedEmail( - local_part='甲斐黒川日本', - smtputf8=True, - ascii_domain='example.tld', - domain='example.tld', - email='甲斐黒川日本@example.tld', - ), - ), - ( - 'чебурашкаящик-с-апельсинами.рф@example.tld', - ValidatedEmail( - local_part='чебурашкаящик-с-апельсинами.рф', - smtputf8=True, - ascii_domain='example.tld', - domain='example.tld', - email='чебурашкаящик-с-апельсинами.рф@example.tld', - ), - ), - ( - 'उदाहरण.परीक्ष@domain.with.idn.tld', - ValidatedEmail( - local_part='उदाहरण.परीक्ष', - smtputf8=True, - ascii_domain='domain.with.idn.tld', - domain='domain.with.idn.tld', - email='उदाहरण.परीक्ष@domain.with.idn.tld', - ), - ), - ( - 'ιωάννης@εεττ.gr', - ValidatedEmail( - local_part='ιωάννης', - smtputf8=True, - ascii_domain='xn--qxaa9ba.gr', - domain='εεττ.gr', - email='ιωάννης@εεττ.gr', - ), - ), - ], -) -def test_email_valid(email_input, output): - # print(f'({email_input!r}, {validate_email(email_input, check_deliverability=False)!r}),') - assert validate_email(email_input, check_deliverability=False) == output - - -@pytest.mark.parametrize( - 'email_input,error_msg', - [ - ('my@localhost', 'The domain name localhost is not valid. It should have a period.'), - ('my@.leadingdot.com', 'An email address cannot have a period immediately after the @-sign.'), - ('my@..leadingfwdot.com', 'An email address cannot have a period immediately after the @-sign.'), - ('my@..twodots.com', 'An email address cannot have a period immediately after the @-sign.'), - ('my@twodots..com', 'An email address cannot have two periods in a row.'), - ('my@baddash.-.com', - 'The domain name baddash.-.com contains invalid characters (Label must not start or end with a hyphen).'), - ('my@baddash.-a.com', - 'The domain name baddash.-a.com contains invalid characters (Label must not start or end with a hyphen).'), - ('my@baddash.b-.com', - 'The domain name baddash.b-.com contains invalid characters (Label must not start or end with a hyphen).'), - ('my@example.com\n', - 'The domain name example.com\n contains invalid characters (Codepoint U+000A at position 4 of ' - '\'com\\n\' not allowed).'), - ('my@example\n.com', - 'The domain name example\n.com contains invalid characters (Codepoint U+000A at position 8 of ' - '\'example\\n\' not allowed).'), - ('.leadingdot@domain.com', 'The email address contains invalid characters before the @-sign: FULL STOP.'), - ('..twodots@domain.com', 'The email address contains invalid characters before the @-sign: FULL STOP.'), - ('twodots..here@domain.com', 'The email address contains invalid characters before the @-sign: FULL STOP.'), - ('me@⒈wouldbeinvalid.com', - "The domain name ⒈wouldbeinvalid.com contains invalid characters (Codepoint U+2488 not allowed " - "at position 1 in '⒈wouldbeinvalid.com')."), - ('@example.com', 'There must be something before the @-sign.'), - ('\nmy@example.com', 'The email address contains invalid characters before the @-sign: \'\\n\'.'), - ('m\ny@example.com', 'The email address contains invalid characters before the @-sign: \'\\n\'.'), - ('my\n@example.com', 'The email address contains invalid characters before the @-sign: \'\\n\'.'), - ('11111111112222222222333333333344444444445555555555666666666677777@example.com', 'The email address is too long before the @-sign (1 character too many).'), - ('111111111122222222223333333333444444444455555555556666666666777777@example.com', 'The email address is too long before the @-sign (2 characters too many).'), - ('me@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.111111111122222222223333333333444444444455555555556.com', 'The email address is too long after the @-sign.'), - ('my.long.address@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.11111111112222222222333333333344444.info', 'The email address is too long (2 characters too many).'), - ('my.long.address@λ111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.11111111112222222222333333.info', 'The email address is too long (when converted to IDNA ASCII).'), - ('my.long.address@λ111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444.info', 'The email address is too long (at least 1 character too many).'), - ('my.λong.address@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.111111111122222222223333333333444.info', 'The email address is too long (when encoded in bytes).'), - ('my.λong.address@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444.info', 'The email address is too long (at least 1 character too many).'), - ], -) -def test_email_invalid_syntax(email_input, error_msg): - # Since these all have syntax errors, deliverability - # checks do not arise. - with pytest.raises(EmailSyntaxError) as exc_info: - validate_email(email_input) - # print(f'({email_input!r}, {str(exc_info.value)!r}),') - assert str(exc_info.value) == error_msg - - -@pytest.mark.parametrize( - 'email_input', - [ - ('me@anything.arpa'), - ('me@valid.invalid'), - ('me@link.local'), - ('me@host.localhost'), - ('me@onion.onion.onion'), - ('me@test.test.test'), - ], -) -def test_email_invalid_reserved_domain(email_input): - # Since these all fail deliverabiltiy from a static list, - # DNS deliverability checks do not arise. - with pytest.raises(EmailUndeliverableError) as exc_info: - validate_email(email_input) - # print(f'({email_input!r}, {str(exc_info.value)!r}),') - assert "is a special-use or reserved name" in str(exc_info.value) - - -@pytest.mark.parametrize( - 'email_input', - [ - ('me@mail.example'), - ('me@example.com'), - ('me@mail.example.com'), - ], -) -def test_email_example_reserved_domain(email_input): - # Since these all fail deliverabiltiy from a static list, - # DNS deliverability checks do not arise. - with pytest.raises(EmailUndeliverableError) as exc_info: - validate_email(email_input) - # print(f'({email_input!r}, {str(exc_info.value)!r}),') - assert re.match(r"The domain name [a-z\.]+ does not (accept email|exist)\.", str(exc_info.value)) is not None +from email_validator import validate_email, EmailSyntaxError +# Let's test main but rename it to be clear +from email_validator.__main__ import main as validator_command_line_tool -@pytest.mark.parametrize( - 'email_input', - [ - ('white space@test'), - ('\n@test'), - ('\u2005@test'), # four-per-em space (Zs) - ('\u009C@test'), # string terminator (Cc) - ('\u200B@test'), # zero-width space (Cf) - ('\u202Dforward-\u202Ereversed@test'), # BIDI (Cf) - ('\uD800@test'), # surrogate (Cs) - ('\uE000@test'), # private use (Co) - ('\uFDEF@test'), # unassigned (Cn) - ], -) -def test_email_unsafe_character(email_input): - # Check for various unsafe characters: - with pytest.raises(EmailSyntaxError) as exc_info: - validate_email(email_input, test_environment=True) - assert "invalid character" in str(exc_info.value) - +from mocked_dns_response import MockedDnsResponseData, MockedDnsResponseDataCleanup # noqa: F401 -def test_email_test_domain_name_in_test_environment(): - validate_email("anything@test", test_environment=True) - validate_email("anything@mycompany.test", test_environment=True) +RESOLVER = MockedDnsResponseData.create_resolver() -def test_dict_accessor(): +def test_dict_accessor() -> None: input_email = "testaddr@example.tld" valid_email = validate_email(input_email, check_deliverability=False) assert isinstance(valid_email.as_dict(), dict) - assert valid_email.as_dict()["original_email"] == input_email - - -def test_deliverability_found(): - response = validate_email_deliverability('gmail.com', 'gmail.com') - assert response.keys() == {'mx', 'mx-fallback'} - assert response['mx-fallback'] is None - assert len(response['mx']) > 1 - assert len(response['mx'][0]) == 2 - assert isinstance(response['mx'][0][0], int) - assert response['mx'][0][1].endswith('.com') - - -def test_deliverability_fails(): - # No MX record. - domain = 'xkxufoekjvjfjeodlfmdfjcu.com' - with pytest.raises(EmailUndeliverableError, match='The domain name {} does not exist'.format(domain)): - validate_email_deliverability(domain, domain) - - # Null MX record. - domain = 'example.com' - with pytest.raises(EmailUndeliverableError, match='The domain name {} does not accept email'.format(domain)): - validate_email_deliverability(domain, domain) - + assert valid_email.as_dict()["original"] == input_email -def test_deliverability_dns_timeout(): - validate_email_deliverability.TEST_CHECK_TIMEOUT = True - response = validate_email_deliverability('gmail.com', 'gmail.com') - assert "mx" not in response - assert response.get("unknown-deliverability") == "timeout" - validate_email('test@gmail.com') - del validate_email_deliverability.TEST_CHECK_TIMEOUT - -def test_main_single_good_input(monkeypatch, capsys): +def test_main_single_good_input(monkeypatch: pytest.MonkeyPatch, capsys: pytest.CaptureFixture[str]) -> None: import json test_email = "google@google.com" monkeypatch.setattr('sys.argv', ['email_validator', test_email]) - validator_main() + validator_command_line_tool(dns_resolver=RESOLVER) stdout, _ = capsys.readouterr() output = json.loads(str(stdout)) assert isinstance(output, dict) - assert validate_email(test_email).original_email == output["original_email"] + assert validate_email(test_email, dns_resolver=RESOLVER).original == output["original"] -def test_main_single_bad_input(monkeypatch, capsys): +def test_main_single_bad_input(monkeypatch: pytest.MonkeyPatch, capsys: pytest.CaptureFixture[str]) -> None: bad_email = 'test@..com' monkeypatch.setattr('sys.argv', ['email_validator', bad_email]) - validator_main() + validator_command_line_tool(dns_resolver=RESOLVER) stdout, _ = capsys.readouterr() assert stdout == 'An email address cannot have a period immediately after the @-sign.\n' -def test_main_multi_input(monkeypatch, capsys): +def test_main_multi_input(monkeypatch: pytest.MonkeyPatch, capsys: pytest.CaptureFixture[str]) -> None: import io test_cases = ["google1@google.com", "google2@google.com", "test@.com", "test3@.com"] test_input = io.StringIO("\n".join(test_cases)) monkeypatch.setattr('sys.stdin', test_input) monkeypatch.setattr('sys.argv', ['email_validator']) - validator_main() + validator_command_line_tool(dns_resolver=RESOLVER) stdout, _ = capsys.readouterr() assert test_cases[0] not in stdout assert test_cases[1] not in stdout @@ -391,59 +49,19 @@ def test_main_multi_input(monkeypatch, capsys): assert test_cases[3] in stdout -def test_main_input_shim(monkeypatch, capsys): - import json - monkeypatch.setattr('sys.version_info', (2, 7)) - test_email = b"google@google.com" - monkeypatch.setattr('sys.argv', ['email_validator', test_email]) - validator_main() - stdout, _ = capsys.readouterr() - output = json.loads(str(stdout)) - assert isinstance(output, dict) - assert validate_email(test_email).original_email == output["original_email"] - - -def test_main_output_shim(monkeypatch, capsys): - monkeypatch.setattr('sys.version_info', (2, 7)) - test_email = b"test@.com" - monkeypatch.setattr('sys.argv', ['email_validator', test_email]) - validator_main() - stdout, _ = capsys.readouterr() - - # This looks bad but it has to do with the way python 2.7 prints vs py3 - # The \n is part of the print statement, not part of the string, which is what the b'...' is - # Since we're mocking py 2.7 here instead of actually using 2.7, this was the closest I could get - assert stdout == "b'An email address cannot have a period immediately after the @-sign.'\n" - - -def test_validate_email__with_caching_resolver(): - # unittest.mock.patch("dns.resolver.LRUCache.get") doesn't - # work --- it causes get to always return an empty list. - # So we'll mock our own way. - class MockedCache: - get_called = False - put_called = False - - def get(self, key): - self.get_called = True - return None +def test_bytes_input() -> None: + input_email = b"testaddr@example.tld" + valid_email = validate_email(input_email, check_deliverability=False) + assert isinstance(valid_email.as_dict(), dict) + assert valid_email.as_dict()["normalized"] == input_email.decode("utf8") - def put(self, key, value): - self.put_called = True + input_email = "testaddr中example.tld".encode("utf32") + with pytest.raises(EmailSyntaxError): + validate_email(input_email, check_deliverability=False) - # Test with caching_resolver helper method. - mocked_cache = MockedCache() - dns_resolver = caching_resolver(cache=mocked_cache) - validate_email("test@gmail.com", dns_resolver=dns_resolver) - assert mocked_cache.put_called - validate_email("test@gmail.com", dns_resolver=dns_resolver) - assert mocked_cache.get_called - # Test with dns.resolver.Resolver instance. - dns_resolver = dns.resolver.Resolver() - dns_resolver.lifetime = 10 - dns_resolver.cache = MockedCache() - validate_email("test@gmail.com", dns_resolver=dns_resolver) - assert mocked_cache.put_called - validate_email("test@gmail.com", dns_resolver=dns_resolver) - assert mocked_cache.get_called +def test_deprecation() -> None: + input_email = b"testaddr@example.tld" + valid_email = validate_email(input_email, check_deliverability=False) + with pytest.deprecated_call(): + assert valid_email.email is not None diff --git a/tests/test_syntax.py b/tests/test_syntax.py new file mode 100644 index 0000000..853cc5e --- /dev/null +++ b/tests/test_syntax.py @@ -0,0 +1,774 @@ +from typing import Any + +import pytest + +from email_validator import EmailSyntaxError, \ + validate_email, \ + ValidatedEmail + + +def MakeValidatedEmail(**kwargs: Any) -> ValidatedEmail: + ret = ValidatedEmail() + for k, v in kwargs.items(): + setattr(ret, k, v) + return ret + + +@pytest.mark.parametrize( + 'email_input,output', + [ + ( + 'Abc@example.tld', + MakeValidatedEmail( + local_part='Abc', + ascii_local_part='Abc', + smtputf8=False, + ascii_domain='example.tld', + domain='example.tld', + normalized='Abc@example.tld', + ascii_email='Abc@example.tld', + ), + ), + ( + 'Abc.123@test-example.com', + MakeValidatedEmail( + local_part='Abc.123', + ascii_local_part='Abc.123', + smtputf8=False, + ascii_domain='test-example.com', + domain='test-example.com', + normalized='Abc.123@test-example.com', + ascii_email='Abc.123@test-example.com', + ), + ), + ( + 'user+mailbox/department=shipping@example.tld', + MakeValidatedEmail( + local_part='user+mailbox/department=shipping', + ascii_local_part='user+mailbox/department=shipping', + smtputf8=False, + ascii_domain='example.tld', + domain='example.tld', + normalized='user+mailbox/department=shipping@example.tld', + ascii_email='user+mailbox/department=shipping@example.tld', + ), + ), + ( + "!#$%&'*+-/=?^_`.{|}~@example.tld", + MakeValidatedEmail( + local_part="!#$%&'*+-/=?^_`.{|}~", + ascii_local_part="!#$%&'*+-/=?^_`.{|}~", + smtputf8=False, + ascii_domain='example.tld', + domain='example.tld', + normalized="!#$%&'*+-/=?^_`.{|}~@example.tld", + ascii_email="!#$%&'*+-/=?^_`.{|}~@example.tld", + ), + ), + ( + 'jeff@臺網中心.tw', + MakeValidatedEmail( + local_part='jeff', + ascii_local_part='jeff', + smtputf8=False, + ascii_domain='xn--fiqq24b10vi0d.tw', + domain='臺網中心.tw', + normalized='jeff@臺網中心.tw', + ascii_email='jeff@xn--fiqq24b10vi0d.tw', + ), + ), + ( + '"quoted local part"@example.org', + MakeValidatedEmail( + local_part='"quoted local part"', + ascii_local_part='"quoted local part"', + smtputf8=False, + ascii_domain='example.org', + domain='example.org', + normalized='"quoted local part"@example.org', + ascii_email='"quoted local part"@example.org' + ), + ), + ( + '"de-quoted.local.part"@example.org', + MakeValidatedEmail( + local_part='de-quoted.local.part', + ascii_local_part='de-quoted.local.part', + smtputf8=False, + ascii_domain='example.org', + domain='example.org', + normalized='de-quoted.local.part@example.org', + ascii_email='de-quoted.local.part@example.org' + ), + ), + ( + 'MyName ', + MakeValidatedEmail( + local_part='me', + ascii_local_part='me', + smtputf8=False, + ascii_domain='example.org', + domain='example.org', + normalized='me@example.org', + ascii_email='me@example.org', + display_name="MyName" + ), + ), + ( + 'My Name ', + MakeValidatedEmail( + local_part='me', + ascii_local_part='me', + smtputf8=False, + ascii_domain='example.org', + domain='example.org', + normalized='me@example.org', + ascii_email='me@example.org', + display_name="My Name" + ), + ), + ( + r'"My.\"Na\\me\".Is" <"me \" \\ me"@example.org>', + MakeValidatedEmail( + local_part=r'"me \" \\ me"', + ascii_local_part=r'"me \" \\ me"', + smtputf8=False, + ascii_domain='example.org', + domain='example.org', + normalized=r'"me \" \\ me"@example.org', + ascii_email=r'"me \" \\ me"@example.org', + display_name='My."Na\\me".Is' + ), + ), + ], +) +def test_email_valid(email_input: str, output: ValidatedEmail) -> None: + # These addresses do not require SMTPUTF8. See test_email_valid_intl_local_part + # for addresses that are valid but require SMTPUTF8. Check that it passes with + # allow_smtput8 both on and off. + emailinfo = validate_email(email_input, check_deliverability=False, allow_smtputf8=False, + allow_quoted_local=True, allow_display_name=True) + + assert emailinfo == output + assert validate_email(email_input, check_deliverability=False, allow_smtputf8=True, + allow_quoted_local=True, allow_display_name=True) == output + + # Check that the old `email` attribute to access the normalized form still works + # if the DeprecationWarning is suppressed. + import warnings + with warnings.catch_warnings(): + warnings.filterwarnings("ignore", category=DeprecationWarning) + assert emailinfo.email == emailinfo.normalized + + +@pytest.mark.parametrize( + 'email_input,output', + [ + ( + '伊昭傑@郵件.商務', + MakeValidatedEmail( + local_part='伊昭傑', + smtputf8=True, + ascii_domain='xn--5nqv22n.xn--lhr59c', + domain='郵件.商務', + normalized='伊昭傑@郵件.商務', + ), + ), + ( + 'राम@मोहन.ईन्फो', + MakeValidatedEmail( + local_part='राम', + smtputf8=True, + ascii_domain='xn--l2bl7a9d.xn--o1b8dj2ki', + domain='मोहन.ईन्फो', + normalized='राम@मोहन.ईन्फो', + ), + ), + ( + 'юзер@екзампл.ком', + MakeValidatedEmail( + local_part='юзер', + smtputf8=True, + ascii_domain='xn--80ajglhfv.xn--j1aef', + domain='екзампл.ком', + normalized='юзер@екзампл.ком', + ), + ), + ( + 'θσερ@εχαμπλε.ψομ', + MakeValidatedEmail( + local_part='θσερ', + smtputf8=True, + ascii_domain='xn--mxahbxey0c.xn--xxaf0a', + domain='εχαμπλε.ψομ', + normalized='θσερ@εχαμπλε.ψομ', + ), + ), + ( + '葉士豪@臺網中心.tw', + MakeValidatedEmail( + local_part='葉士豪', + smtputf8=True, + ascii_domain='xn--fiqq24b10vi0d.tw', + domain='臺網中心.tw', + normalized='葉士豪@臺網中心.tw', + ), + ), + ( + '葉士豪@臺網中心.台灣', + MakeValidatedEmail( + local_part='葉士豪', + smtputf8=True, + ascii_domain='xn--fiqq24b10vi0d.xn--kpry57d', + domain='臺網中心.台灣', + normalized='葉士豪@臺網中心.台灣', + ), + ), + ( + 'jeff葉@臺網中心.tw', + MakeValidatedEmail( + local_part='jeff葉', + smtputf8=True, + ascii_domain='xn--fiqq24b10vi0d.tw', + domain='臺網中心.tw', + normalized='jeff葉@臺網中心.tw', + ), + ), + ( + 'ñoñó@example.tld', + MakeValidatedEmail( + local_part='ñoñó', + smtputf8=True, + ascii_domain='example.tld', + domain='example.tld', + normalized='ñoñó@example.tld', + ), + ), + ( + '我買@example.tld', + MakeValidatedEmail( + local_part='我買', + smtputf8=True, + ascii_domain='example.tld', + domain='example.tld', + normalized='我買@example.tld', + ), + ), + ( + '甲斐黒川日本@example.tld', + MakeValidatedEmail( + local_part='甲斐黒川日本', + smtputf8=True, + ascii_domain='example.tld', + domain='example.tld', + normalized='甲斐黒川日本@example.tld', + ), + ), + ( + 'чебурашкаящик-с-апельсинами.рф@example.tld', + MakeValidatedEmail( + local_part='чебурашкаящик-с-апельсинами.рф', + smtputf8=True, + ascii_domain='example.tld', + domain='example.tld', + normalized='чебурашкаящик-с-апельсинами.рф@example.tld', + ), + ), + ( + 'उदाहरण.परीक्ष@domain.with.idn.tld', + MakeValidatedEmail( + local_part='उदाहरण.परीक्ष', + smtputf8=True, + ascii_domain='domain.with.idn.tld', + domain='domain.with.idn.tld', + normalized='उदाहरण.परीक्ष@domain.with.idn.tld', + ), + ), + ( + 'ιωάννης@εεττ.gr', + MakeValidatedEmail( + local_part='ιωάννης', + smtputf8=True, + ascii_domain='xn--qxaa9ba.gr', + domain='εεττ.gr', + normalized='ιωάννης@εεττ.gr', + ), + ), + ( + '\"s\u0323\u0307\" ', + MakeValidatedEmail( + local_part='\u1E69', + smtputf8=True, + ascii_domain='nfc.tld', + domain='nfc.tld', + normalized='\u1E69@nfc.tld', + display_name='\u1E69' + ), + ), + ( + '@@fullwidth.at', + MakeValidatedEmail( + local_part='@', + smtputf8=True, + ascii_domain='fullwidth.at', + domain='fullwidth.at', + normalized='@@fullwidth.at', + ), + ), + ], +) +def test_email_valid_intl_local_part(email_input: str, output: ValidatedEmail) -> None: + # Check that it passes when allow_smtputf8 is True. + assert validate_email(email_input, check_deliverability=False, allow_display_name=True) == output + + # Check that it fails when allow_smtputf8 is False. + with pytest.raises(EmailSyntaxError) as exc_info: + validate_email(email_input, allow_smtputf8=False, check_deliverability=False, allow_display_name=True) + assert "Internationalized characters before the @-sign are not supported: " in str(exc_info.value) + + +@pytest.mark.parametrize( + 'email_input,normalized_local_part', + [ + ('"unnecessarily.quoted.local.part"@example.com', 'unnecessarily.quoted.local.part'), + ('"quoted..local.part"@example.com', '"quoted..local.part"'), + ('"quoted.with.at@"@example.com', '"quoted.with.at@"'), + ('"quoted with space"@example.com', '"quoted with space"'), + ('"quoted.with.dquote\\""@example.com', '"quoted.with.dquote\\""'), + ('"unnecessarily.quoted.with.unicode.λ"@example.com', 'unnecessarily.quoted.with.unicode.λ'), + ('"quoted.with..unicode.λ"@example.com', '"quoted.with..unicode.λ"'), + ('"quoted.with.extraneous.\\escape"@example.com', 'quoted.with.extraneous.escape'), + ]) +def test_email_valid_only_if_quoted_local_part(email_input: str, normalized_local_part: str) -> None: + # These addresses are invalid with the default allow_quoted_local=False option. + with pytest.raises(EmailSyntaxError) as exc_info: + validate_email(email_input) + assert str(exc_info.value) == 'Quoting the part before the @-sign is not allowed here.' + + # But they are valid if quoting is allowed. + validated = validate_email(email_input, allow_quoted_local=True, check_deliverability=False) + + # Check that the normalized form correctly removed unnecessary backslash escaping + # and even the quoting if they weren't necessary. + assert validated.local_part == normalized_local_part + + +def test_domain_literal() -> None: + # Check parsing IPv4 addresses. + validated = validate_email("me@[127.0.0.1]", allow_domain_literal=True) + assert validated.domain == "[127.0.0.1]" + assert repr(validated.domain_address) == "IPv4Address('127.0.0.1')" + + # Check parsing IPv6 addresses. + validated = validate_email("me@[IPv6:::1]", allow_domain_literal=True) + assert validated.domain == "[IPv6:::1]" + assert repr(validated.domain_address) == "IPv6Address('::1')" + + # Check that IPv6 addresses are normalized. + validated = validate_email("me@[IPv6:0000:0000:0000:0000:0000:0000:0000:0001]", allow_domain_literal=True) + assert validated.domain == "[IPv6:::1]" + assert repr(validated.domain_address) == "IPv6Address('::1')" + + +@pytest.mark.parametrize( + 'email_input,error_msg', + [ + ('hello.world', 'An email address must have an @-sign.'), + ('hello@world', 'The email address has the "full-width" at-sign (@) character instead of a regular at-sign.'), + ('hello﹫world', 'The email address has the "small commercial at" character instead of a regular at-sign.'), + ('my@localhost', 'The part after the @-sign is not valid. It should have a period.'), + ('my@.leadingdot.com', 'An email address cannot have a period immediately after the @-sign.'), + ('my@.leadingfwdot.com', 'An email address cannot have a period immediately after the @-sign.'), + ('my@twodots..com', 'An email address cannot have two periods in a row.'), + ('my@twofwdots...com', 'An email address cannot have two periods in a row.'), + ('my@trailingdot.com.', 'An email address cannot end with a period.'), + ('my@trailingfwdot.com.', 'An email address cannot end with a period.'), + ('me@-leadingdash', 'An email address cannot have a hyphen immediately after the @-sign.'), + ('me@-leadingdashfw', 'An email address cannot have a hyphen immediately after the @-sign.'), + ('me@trailingdash-', 'An email address cannot end with a hyphen.'), + ('me@trailingdashfw-', 'An email address cannot end with a hyphen.'), + ('my@baddash.-.com', 'An email address cannot have a period and a hyphen next to each other.'), + ('my@baddash.-a.com', 'An email address cannot have a period and a hyphen next to each other.'), + ('my@baddash.b-.com', 'An email address cannot have a period and a hyphen next to each other.'), + ('my@baddashfw.-.com', 'An email address cannot have a period and a hyphen next to each other.'), + ('my@baddashfw.-a.com', 'An email address cannot have a period and a hyphen next to each other.'), + ('my@baddashfw.b-.com', 'An email address cannot have a period and a hyphen next to each other.'), + ('my@example.com\n', + 'The part after the @-sign contains invalid characters: U+000A.'), + ('my@example\n.com', + 'The part after the @-sign contains invalid characters: U+000A.'), + ('me@x!', 'The part after the @-sign contains invalid characters: \'!\'.'), + ('me@x ', 'The part after the @-sign contains invalid characters: SPACE.'), + ('.leadingdot@domain.com', 'An email address cannot start with a period.'), + ('twodots..here@domain.com', 'An email address cannot have two periods in a row.'), + ('trailingdot.@domain.email', 'An email address cannot have a period immediately before the @-sign.'), + ('me@⒈wouldbeinvalid.com', + "The part after the @-sign contains invalid characters (Codepoint U+2488 not allowed " + "at position 1 in '⒈wouldbeinvalid.com')."), + ('me@\u037e.com', "The part after the @-sign contains invalid characters after Unicode normalization: ';'."), + ('me@\u1fef.com', "The part after the @-sign contains invalid characters after Unicode normalization: '`'."), + ('@example.com', 'There must be something before the @-sign.'), + ('white space@test', 'The email address contains invalid characters before the @-sign: SPACE.'), + ('test@white space', 'The part after the @-sign contains invalid characters: SPACE.'), + ('\nmy@example.com', 'The email address contains invalid characters before the @-sign: U+000A.'), + ('m\ny@example.com', 'The email address contains invalid characters before the @-sign: U+000A.'), + ('my\n@example.com', 'The email address contains invalid characters before the @-sign: U+000A.'), + ('me.\u037e@example.com', 'After Unicode normalization: The email address contains invalid characters before the @-sign: \';\'.'), + ('test@\n', 'The part after the @-sign contains invalid characters: U+000A.'), + ('bad"quotes"@example.com', 'The email address contains invalid characters before the @-sign: \'"\'.'), + ('obsolete."quoted".atom@example.com', 'The email address contains invalid characters before the @-sign: \'"\'.'), + ('11111111112222222222333333333344444444445555555555666666666677777@example.com', 'The email address is too long before the @-sign (1 character too many).'), + ('111111111122222222223333333333444444444455555555556666666666777777@example.com', 'The email address is too long before the @-sign (2 characters too many).'), + ('\uFB2C111111122222222223333333333444444444455555555556666666666777777@example.com', 'After Unicode normalization: The email address is too long before the @-sign (2 characters too many).'), + ('me@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.11111111112222222222333333333344444444445555555555.com', 'The email address is too long after the @-sign (1 character too many).'), + ('me@中1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444.com', 'The email address is too long after the @-sign (1 byte too many after IDNA encoding).'), + ('me@\uFB2C1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444.com', 'The email address is too long after the @-sign (5 bytes too many after IDNA encoding).'), + ('me@1111111111222222222233333333334444444444555555555666666666677777.com', 'After the @-sign, periods cannot be separated by so many characters (1 character too many).'), + ('me@11111111112222222222333333333344444444445555555556666666666777777.com', 'After the @-sign, periods cannot be separated by so many characters (2 characters too many).'), + ('me@中111111111222222222233333333334444444444555555555666666.com', 'The part after the @-sign is invalid (Label too long).'), + ('meme@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.com', 'The email address is too long (4 characters too many).'), + ('my.long.address@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.11111111112222222222333333333344444.info', 'The email address is too long (2 characters too many).'), + ('my.long.address@λ111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444.info', 'The email address is too long (1-2 characters too many).'), + ('my.long.address@\uFB2C111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444.info', 'The email address is too long (1-3 characters too many).'), + ('my.λong.address@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.111111111122222222223333333333444.info', 'The email address is too long (1 character too many).'), + ('my.λong.address@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444.info', 'The email address is too long (1-2 characters too many).'), + ('my.\u0073\u0323\u0307.address@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444.info', 'The email address is too long (1-2 characters too many).'), + ('my.\uFB2C.address@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.11111111112222222222333333333344444.info', 'The email address is too long (1 character too many).'), + ('my.\uFB2C.address@1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.11111111112222222222333333333344.info', 'The email address is too long after normalization (1 byte too many).'), + ('my.long.address@λ111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.11111111112222222222333333.info', 'The email address is too long when the part after the @-sign is converted to IDNA ASCII (1 byte too many).'), + ('my.λong.address@λ111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.1111111111222222222233333333334444444444555555555.6666666666777777777788888888889999999999000000000.11111111112222222222333333.info', 'The email address is too long when the part after the @-sign is converted to IDNA ASCII (2 bytes too many).'), + ('me@bad-tld-1', 'The part after the @-sign is not valid. It should have a period.'), + ('me@bad.tld-2', 'The part after the @-sign is not valid. It is not within a valid top-level domain.'), + ('me@xn--0.tld', 'The part after the @-sign is not valid IDNA (Invalid A-label).'), + ('me@yy--0.tld', 'An email address cannot have two letters followed by two dashes immediately after the @-sign or after a period, except Punycode.'), + ('me@yy--0.tld', 'An email address cannot have two letters followed by two dashes immediately after the @-sign or after a period, except Punycode.'), + ('me@[127.0.0.1]', 'A bracketed IP address after the @-sign is not allowed here.'), + ('me@[127.0.0.999]', 'The address in brackets after the @-sign is not valid: It is not an IPv4 address (Octet 999 (> 255) not permitted in \'127.0.0.999\') or is missing an address literal tag.'), + ('me@[IPv6:::1]', 'A bracketed IP address after the @-sign is not allowed here.'), + ('me@[IPv6:::G]', 'The IPv6 address in brackets after the @-sign is not valid (Only hex digits permitted in \'G\' in \'::G\').'), + ('me@[tag:text]', 'The part after the @-sign contains an invalid address literal tag in brackets.'), + ('me@[untaggedtext]', 'The part after the @-sign in brackets is not an IPv4 address and has no address literal tag.'), + ('me@[tag:invalid space]', 'The part after the @-sign contains invalid characters in brackets: SPACE.'), + ('', 'A display name and angle brackets around the email address are not permitted here.'), + (' !', 'There can\'t be anything after the email address.'), + ('<\u0338me@example.com', 'The email address contains invalid characters before the @-sign: \'<\'.'), + ('DisplayName ', 'An email address cannot have a hyphen immediately after the @-sign.'), + ('DisplayName ', 'A display name and angle brackets around the email address are not permitted here.'), + ('Display Name ', 'A display name and angle brackets around the email address are not permitted here.'), + ('\"Display Name\" ', 'A display name and angle brackets around the email address are not permitted here.'), + ('Display.Name ', 'The display name contains invalid characters when not quoted: \'.\'.'), + ('\"Display.Name\" ', 'A display name and angle brackets around the email address are not permitted here.'), + ], +) +def test_email_invalid_syntax(email_input: str, error_msg: str) -> None: + # Since these all have syntax errors, deliverability + # checks do not arise. + with pytest.raises(EmailSyntaxError) as exc_info: + validate_email(email_input, check_deliverability=False) + assert str(exc_info.value) == error_msg + + +@pytest.mark.parametrize( + 'email_input', + [ + ('me@anything.arpa'), + ('me@valid.invalid'), + ('me@link.local'), + ('me@host.localhost'), + ('me@onion.onion.onion'), + ('me@test.test.test'), + ], +) +def test_email_invalid_reserved_domain(email_input: str) -> None: + # Since these all fail deliverabiltiy from a static list, + # DNS deliverability checks do not arise. + with pytest.raises(EmailSyntaxError) as exc_info: + validate_email(email_input) + assert "is a special-use or reserved name" in str(exc_info.value) + + +@pytest.mark.parametrize( + ('s', 'expected_error'), + [ + ('\u2005', 'FOUR-PER-EM SPACE'), # four-per-em space (Zs) + ('\u2028', 'LINE SEPARATOR'), # line separator (Zl) + ('\u2029', 'PARAGRAPH SEPARATOR'), # paragraph separator (Zp) + ('\u0300', 'COMBINING GRAVE ACCENT'), # grave accent (M) + ('\u009C', 'U+009C'), # string terminator (Cc) + ('\u200B', 'ZERO WIDTH SPACE'), # zero-width space (Cf) + ('\u202Dforward-\u202Ereversed', 'LEFT-TO-RIGHT OVERRIDE, RIGHT-TO-LEFT OVERRIDE'), # BIDI (Cf) + ('\uD800', 'U+D800'), # surrogate (Cs) + ('\uE000', 'U+E000'), # private use (Co) + ('\U0010FDEF', 'U+0010FDEF'), # priate use (Co) + ('\uFDEF', 'U+FDEF'), # unassigned (Cn) + ], +) +def test_email_unsafe_character(s: str, expected_error: str) -> None: + # Check for various unsafe characters that are permitted by the email + # specs but should be disallowed for being unsafe or not sensible Unicode. + + with pytest.raises(EmailSyntaxError) as exc_info: + validate_email(s + "@test", test_environment=True) + assert str(exc_info.value) == f"The email address contains unsafe characters: {expected_error}." + + with pytest.raises(EmailSyntaxError) as exc_info: + validate_email("test@" + s, test_environment=True) + assert "The email address contains unsafe characters" in str(exc_info.value) + + +@pytest.mark.parametrize( + ('email_input', 'expected_error'), + [ + ('λambdaツ@test', 'Internationalized characters before the @-sign are not supported: \'λ\', \'ツ\'.'), + ('"quoted.with..unicode.λ"@example.com', 'Internationalized characters before the @-sign are not supported: \'λ\'.'), + ], +) +def test_email_invalid_character_smtputf8_off(email_input: str, expected_error: str) -> None: + # Check that internationalized characters are rejected if allow_smtputf8=False. + with pytest.raises(EmailSyntaxError) as exc_info: + validate_email(email_input, allow_smtputf8=False, test_environment=True) + assert str(exc_info.value) == expected_error + + +def test_email_empty_local() -> None: + validate_email("@test", allow_empty_local=True, test_environment=True) + + # This next one might not be desirable. + validate_email("\"\"@test", allow_empty_local=True, allow_quoted_local=True, test_environment=True) + + +def test_email_test_domain_name_in_test_environment() -> None: + validate_email("anything@test", test_environment=True) + validate_email("anything@mycompany.test", test_environment=True) + + +def test_case_insensitive_mailbox_name() -> None: + validate_email("POSTMASTER@test", test_environment=True).normalized = "postmaster@test" + validate_email("NOT-POSTMASTER@test", test_environment=True).normalized = "NOT-POSTMASTER@test" + + +# This is the pyIsEmail (https://github.com/michaelherold/pyIsEmail) test suite. +# +# The test data was extracted by: +# +# $ wget https://raw.githubusercontent.com/michaelherold/pyIsEmail/master/tests/data/tests.xml +# $ xmllint --xpath '/tests/test/address/text()' tests.xml > t1 +# $ xmllint --xpath "/tests/test[not(address='')]/diagnosis/text()" tests.xml > t2 +# +# tests = [] +# def fixup_char(c): +# if ord(c) >= 0x2400 and ord(c) <= 0x2432: +# c = chr(ord(c)-0x2400) +# return c +# for email, diagnosis in zip(open("t1"), open("t2")): +# email = email[:-1] # strip trailing \n but not more because trailing whitespace is significant +# email = "".join(fixup_char(c) for c in email).replace("&", "&") +# tests.append([email, diagnosis.strip()]) +# print(repr(tests).replace("'], ['", "'],\n['")) +@pytest.mark.parametrize( + ('email_input', 'status'), + [ + ['test', 'ISEMAIL_ERR_NODOMAIN'], + ['@', 'ISEMAIL_ERR_NOLOCALPART'], + ['test@', 'ISEMAIL_ERR_NODOMAIN'], + # ['test@io', 'ISEMAIL_VALID'], # we reject domains without a dot, knowing they are not deliverable + ['@io', 'ISEMAIL_ERR_NOLOCALPART'], + ['@iana.org', 'ISEMAIL_ERR_NOLOCALPART'], + ['test@iana.org', 'ISEMAIL_VALID'], + ['test@nominet.org.uk', 'ISEMAIL_VALID'], + ['test@about.museum', 'ISEMAIL_VALID'], + ['a@iana.org', 'ISEMAIL_VALID'], + ['test.test@iana.org', 'ISEMAIL_VALID'], + ['.test@iana.org', 'ISEMAIL_ERR_DOT_START'], + ['test.@iana.org', 'ISEMAIL_ERR_DOT_END'], + ['test..iana.org', 'ISEMAIL_ERR_CONSECUTIVEDOTS'], + ['test_exa-mple.com', 'ISEMAIL_ERR_NODOMAIN'], + ['!#$%&`*+/=?^`{|}~@iana.org', 'ISEMAIL_VALID'], + ['test\\@test@iana.org', 'ISEMAIL_ERR_EXPECTING_ATEXT'], + ['123@iana.org', 'ISEMAIL_VALID'], + ['test@123.com', 'ISEMAIL_VALID'], + ['abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@iana.org', 'ISEMAIL_VALID'], + ['abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklmn@iana.org', 'ISEMAIL_RFC5322_LOCAL_TOOLONG'], + ['test@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm.com', 'ISEMAIL_RFC5322_LABEL_TOOLONG'], + ['test@mason-dixon.com', 'ISEMAIL_VALID'], + ['test@-iana.org', 'ISEMAIL_ERR_DOMAINHYPHENSTART'], + ['test@iana-.com', 'ISEMAIL_ERR_DOMAINHYPHENEND'], + ['test@g--a.com', 'ISEMAIL_VALID'], + ['test@.iana.org', 'ISEMAIL_ERR_DOT_START'], + ['test@iana.org.', 'ISEMAIL_ERR_DOT_END'], + ['test@iana..com', 'ISEMAIL_ERR_CONSECUTIVEDOTS'], + ['abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghij', 'ISEMAIL_RFC5322_TOOLONG'], + ['a@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg.hij', 'ISEMAIL_RFC5322_TOOLONG'], + ['a@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefg.hijk', 'ISEMAIL_RFC5322_DOMAIN_TOOLONG'], + ['"test"@iana.org', 'ISEMAIL_RFC5321_QUOTEDSTRING'], + # ['""@iana.org', 'ISEMAIL_RFC5321_QUOTEDSTRING'], # we think an empty quoted string should be invalid + ['"""@iana.org', 'ISEMAIL_ERR_EXPECTING_ATEXT'], + ['"\\a"@iana.org', 'ISEMAIL_RFC5321_QUOTEDSTRING'], + ['"\\""@iana.org', 'ISEMAIL_RFC5321_QUOTEDSTRING'], + ['"\\"@iana.org', 'ISEMAIL_ERR_UNCLOSEDQUOTEDSTR'], + ['"\\\\"@iana.org', 'ISEMAIL_RFC5321_QUOTEDSTRING'], + ['test"@iana.org', 'ISEMAIL_ERR_EXPECTING_ATEXT'], + ['"test@iana.org', 'ISEMAIL_ERR_UNCLOSEDQUOTEDSTR'], + ['"test"test@iana.org', 'ISEMAIL_ERR_ATEXT_AFTER_QS'], + ['test"text"@iana.org', 'ISEMAIL_ERR_EXPECTING_ATEXT'], + ['"test""test"@iana.org', 'ISEMAIL_ERR_EXPECTING_ATEXT'], + ['"test"."test"@iana.org', 'ISEMAIL_DEPREC_LOCALPART'], + ['"test\\ test"@iana.org', 'ISEMAIL_RFC5321_QUOTEDSTRING'], + ['"test".test@iana.org', 'ISEMAIL_DEPREC_LOCALPART'], + ['"test\x00"@iana.org', 'ISEMAIL_ERR_EXPECTING_QTEXT'], + ['"test\\\x00"@iana.org', 'ISEMAIL_DEPREC_QP'], + ['"abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefghj"@iana.org', 'ISEMAIL_RFC5322_LOCAL_TOOLONG'], + ['"abcdefghijklmnopqrstuvwxyz abcdefghijklmnopqrstuvwxyz abcdefg\\h"@iana.org', 'ISEMAIL_RFC5322_LOCAL_TOOLONG'], + ['test@[255.255.255.255]', 'ISEMAIL_RFC5321_ADDRESSLITERAL'], + ['test@a[255.255.255.255]', 'ISEMAIL_ERR_EXPECTING_ATEXT'], + ['test@[255.255.255]', 'ISEMAIL_RFC5322_DOMAINLITERAL'], + ['test@[255.255.255.255.255]', 'ISEMAIL_RFC5322_DOMAINLITERAL'], + ['test@[255.255.255.256]', 'ISEMAIL_RFC5322_DOMAINLITERAL'], + ['test@[1111:2222:3333:4444:5555:6666:7777:8888]', 'ISEMAIL_RFC5322_DOMAINLITERAL'], + ['test@[IPv6:1111:2222:3333:4444:5555:6666:7777]', 'ISEMAIL_RFC5322_IPV6_GRPCOUNT'], + ['test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888]', 'ISEMAIL_RFC5321_ADDRESSLITERAL'], + ['test@[IPv6:1111:2222:3333:4444:5555:6666:7777:8888:9999]', 'ISEMAIL_RFC5322_IPV6_GRPCOUNT'], + ['test@[IPv6:1111:2222:3333:4444:5555:6666:7777:888G]', 'ISEMAIL_RFC5322_IPV6_BADCHAR'], + ['test@[IPv6:1111:2222:3333:4444:5555:6666::8888]', 'ISEMAIL_RFC5321_IPV6DEPRECATED'], + ['test@[IPv6:1111:2222:3333:4444:5555::8888]', 'ISEMAIL_RFC5321_ADDRESSLITERAL'], + ['test@[IPv6:1111:2222:3333:4444:5555:6666::7777:8888]', 'ISEMAIL_RFC5322_IPV6_MAXGRPS'], + ['test@[IPv6::3333:4444:5555:6666:7777:8888]', 'ISEMAIL_RFC5322_IPV6_COLONSTRT'], + ['test@[IPv6:::3333:4444:5555:6666:7777:8888]', 'ISEMAIL_RFC5321_ADDRESSLITERAL'], + ['test@[IPv6:1111::4444:5555::8888]', 'ISEMAIL_RFC5322_IPV6_2X2XCOLON'], + ['test@[IPv6:::]', 'ISEMAIL_RFC5321_ADDRESSLITERAL'], + ['test@[IPv6:1111:2222:3333:4444:5555:255.255.255.255]', 'ISEMAIL_RFC5322_IPV6_GRPCOUNT'], + ['test@[IPv6:1111:2222:3333:4444:5555:6666:255.255.255.255]', 'ISEMAIL_RFC5321_ADDRESSLITERAL'], + ['test@[IPv6:1111:2222:3333:4444:5555:6666:7777:255.255.255.255]', 'ISEMAIL_RFC5322_IPV6_GRPCOUNT'], + ['test@[IPv6:1111:2222:3333:4444::255.255.255.255]', 'ISEMAIL_RFC5321_ADDRESSLITERAL'], + ['test@[IPv6:1111:2222:3333:4444:5555:6666::255.255.255.255]', 'ISEMAIL_RFC5322_IPV6_MAXGRPS'], + ['test@[IPv6:1111:2222:3333:4444:::255.255.255.255]', 'ISEMAIL_RFC5322_IPV6_2X2XCOLON'], + ['test@[IPv6::255.255.255.255]', 'ISEMAIL_RFC5322_IPV6_COLONSTRT'], + [' test @iana.org', 'ISEMAIL_DEPREC_CFWS_NEAR_AT'], + ['test@ iana .com', 'ISEMAIL_DEPREC_CFWS_NEAR_AT'], + ['test . test@iana.org', 'ISEMAIL_DEPREC_FWS'], + ['\r\n test@iana.org', 'ISEMAIL_CFWS_FWS'], + ['\r\n \r\n test@iana.org', 'ISEMAIL_DEPREC_FWS'], + ['(comment)test@iana.org', 'ISEMAIL_CFWS_COMMENT'], + ['((comment)test@iana.org', 'ISEMAIL_ERR_UNCLOSEDCOMMENT'], + ['(comment(comment))test@iana.org', 'ISEMAIL_CFWS_COMMENT'], + ['test@(comment)iana.org', 'ISEMAIL_DEPREC_CFWS_NEAR_AT'], + ['test(comment)test@iana.org', 'ISEMAIL_ERR_ATEXT_AFTER_CFWS'], + ['test@(comment)[255.255.255.255]', 'ISEMAIL_DEPREC_CFWS_NEAR_AT'], + ['(comment)abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghiklm@iana.org', 'ISEMAIL_CFWS_COMMENT'], + ['test@(comment)abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghikl.com', 'ISEMAIL_DEPREC_CFWS_NEAR_AT'], + ['(comment)test@abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghik.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghik.abcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcdefghijk.abcdefghijklmnopqrstuvwxyzabcdefghijk.abcdefghijklmnopqrstu', 'ISEMAIL_CFWS_COMMENT'], + ['test@iana.org\n', 'ISEMAIL_ERR_EXPECTING_ATEXT'], + ['test@xn--hxajbheg2az3al.xn--jxalpdlp', 'ISEMAIL_VALID'], + ['xn--test@iana.org', 'ISEMAIL_VALID'], + ['test@iana.org-', 'ISEMAIL_ERR_DOMAINHYPHENEND'], + ['"test@iana.org', 'ISEMAIL_ERR_UNCLOSEDQUOTEDSTR'], + ['(test@iana.org', 'ISEMAIL_ERR_UNCLOSEDCOMMENT'], + ['test@(iana.org', 'ISEMAIL_ERR_UNCLOSEDCOMMENT'], + ['test@[1.2.3.4', 'ISEMAIL_ERR_UNCLOSEDDOMLIT'], + ['"test\\"@iana.org', 'ISEMAIL_ERR_UNCLOSEDQUOTEDSTR'], + ['(comment\\)test@iana.org', 'ISEMAIL_ERR_UNCLOSEDCOMMENT'], + ['test@iana.org(comment\\)', 'ISEMAIL_ERR_UNCLOSEDCOMMENT'], + ['test@iana.org(comment\\', 'ISEMAIL_ERR_BACKSLASHEND'], + ['test@[RFC-5322-domain-literal]', 'ISEMAIL_RFC5322_DOMAINLITERAL'], + ['test@[RFC-5322]-domain-literal]', 'ISEMAIL_ERR_ATEXT_AFTER_DOMLIT'], + ['test@[RFC-5322-[domain-literal]', 'ISEMAIL_ERR_EXPECTING_DTEXT'], + ['test@[RFC-5322-\\\x07-domain-literal]', 'ISEMAIL_RFC5322_DOMLIT_OBSDTEXT'], + ['test@[RFC-5322-\\\t-domain-literal]', 'ISEMAIL_RFC5322_DOMLIT_OBSDTEXT'], + ['test@[RFC-5322-\\]-domain-literal]', 'ISEMAIL_RFC5322_DOMLIT_OBSDTEXT'], + ['test@[RFC-5322-domain-literal\\]', 'ISEMAIL_ERR_UNCLOSEDDOMLIT'], + ['test@[RFC-5322-domain-literal\\', 'ISEMAIL_ERR_BACKSLASHEND'], + ['test@[RFC 5322 domain literal]', 'ISEMAIL_RFC5322_DOMAINLITERAL'], + ['test@[RFC-5322-domain-literal] (comment)', 'ISEMAIL_RFC5322_DOMAINLITERAL'], + ['\x7f@iana.org', 'ISEMAIL_ERR_EXPECTING_ATEXT'], + ['test@\x7f.org', 'ISEMAIL_ERR_EXPECTING_ATEXT'], + ['"\x7f"@iana.org', 'ISEMAIL_DEPREC_QTEXT'], + ['"\\\x7f"@iana.org', 'ISEMAIL_DEPREC_QP'], + ['(\x7f)test@iana.org', 'ISEMAIL_DEPREC_CTEXT'], + ['test@iana.org\r', 'ISEMAIL_ERR_CR_NO_LF'], + ['\rtest@iana.org', 'ISEMAIL_ERR_CR_NO_LF'], + ['"\rtest"@iana.org', 'ISEMAIL_ERR_CR_NO_LF'], + ['(\r)test@iana.org', 'ISEMAIL_ERR_CR_NO_LF'], + ['test@iana.org(\r)', 'ISEMAIL_ERR_CR_NO_LF'], + ['\ntest@iana.org', 'ISEMAIL_ERR_EXPECTING_ATEXT'], + ['"\n"@iana.org', 'ISEMAIL_ERR_EXPECTING_QTEXT'], + ['"\\\n"@iana.org', 'ISEMAIL_DEPREC_QP'], + ['(\n)test@iana.org', 'ISEMAIL_ERR_EXPECTING_CTEXT'], + ['\x07@iana.org', 'ISEMAIL_ERR_EXPECTING_ATEXT'], + ['test@\x07.org', 'ISEMAIL_ERR_EXPECTING_ATEXT'], + ['"\x07"@iana.org', 'ISEMAIL_DEPREC_QTEXT'], + ['"\\\x07"@iana.org', 'ISEMAIL_DEPREC_QP'], + ['(\x07)test@iana.org', 'ISEMAIL_DEPREC_CTEXT'], + ['\r\ntest@iana.org', 'ISEMAIL_ERR_FWS_CRLF_END'], + ['\r\n \r\ntest@iana.org', 'ISEMAIL_ERR_FWS_CRLF_END'], + [' \r\ntest@iana.org', 'ISEMAIL_ERR_FWS_CRLF_END'], + [' \r\n test@iana.org', 'ISEMAIL_CFWS_FWS'], + [' \r\n \r\ntest@iana.org', 'ISEMAIL_ERR_FWS_CRLF_END'], + [' \r\n\r\ntest@iana.org', 'ISEMAIL_ERR_FWS_CRLF_X2'], + [' \r\n\r\n test@iana.org', 'ISEMAIL_ERR_FWS_CRLF_X2'], + ['test@iana.org\r\n ', 'ISEMAIL_CFWS_FWS'], + ['test@iana.org\r\n \r\n ', 'ISEMAIL_DEPREC_FWS'], + ['test@iana.org\r\n', 'ISEMAIL_ERR_FWS_CRLF_END'], + ['test@iana.org\r\n \r\n', 'ISEMAIL_ERR_FWS_CRLF_END'], + ['test@iana.org \r\n', 'ISEMAIL_ERR_FWS_CRLF_END'], + ['test@iana.org \r\n ', 'ISEMAIL_CFWS_FWS'], + ['test@iana.org \r\n \r\n', 'ISEMAIL_ERR_FWS_CRLF_END'], + ['test@iana.org \r\n\r\n', 'ISEMAIL_ERR_FWS_CRLF_X2'], + ['test@iana.org \r\n\r\n ', 'ISEMAIL_ERR_FWS_CRLF_X2'], + [' test@iana.org', 'ISEMAIL_CFWS_FWS'], + ['test@iana.org ', 'ISEMAIL_CFWS_FWS'], + ['test@[IPv6:1::2:]', 'ISEMAIL_RFC5322_IPV6_COLONEND'], + ['"test\\©"@iana.org', 'ISEMAIL_ERR_EXPECTING_QPAIR'], + ['test@iana/icann.org', 'ISEMAIL_RFC5322_DOMAIN'], + ['test.(comment)test@iana.org', 'ISEMAIL_DEPREC_COMMENT'] + ] +) +def test_pyisemail_tests(email_input: str, status: str) -> None: + if status == "ISEMAIL_VALID": + # All standard email address forms should not raise an exception + # with any set of parsing options. + validate_email(email_input, test_environment=True) + validate_email(email_input, allow_quoted_local=True, allow_domain_literal=True, test_environment=True) + + elif status == "ISEMAIL_RFC5321_QUOTEDSTRING": + # Quoted-literal local parts are only valid with an option. + with pytest.raises(EmailSyntaxError): + validate_email(email_input, test_environment=True) + validate_email(email_input, allow_quoted_local=True, test_environment=True) + + elif "_ADDRESSLITERAL" in status or status == 'ISEMAIL_RFC5321_IPV6DEPRECATED': + # Domain literals with IPv4 or IPv6 addresses are only valid with an option. + # I am not sure if the ISEMAIL_RFC5321_IPV6DEPRECATED case should be rejected: + # The Python ipaddress module accepts it. + with pytest.raises(EmailSyntaxError): + validate_email(email_input, test_environment=True) + validate_email(email_input, allow_domain_literal=True, test_environment=True) + + elif "_DOMLIT_" in status or "DOMAINLITERAL" in status or "_IPV6" in status: + # Invalid domain literals even when allow_domain_literal=True. + # The _DOMLIT_ diagnoses appear to be invalid domain literals. + # The DOMAINLITERAL diagnoses appear to be valid domain literals that can't + # be parsed as an IPv4 or IPv6 address. + # The _IPV6_ diagnoses appear to represent syntactically invalid domain literals. + with pytest.raises(EmailSyntaxError): + validate_email(email_input, allow_domain_literal=True, test_environment=True) + + elif "_ERR_" in status or "_TOOLONG" in status \ + or "_CFWS_FWS" in status or "_CFWS_COMMENT" in status \ + or status == "ISEMAIL_RFC5322_DOMAIN": + # Invalid syntax, extraneous whitespace, and "(comments)" should be rejected. + # The ISEMAIL_RFC5322_DOMAIN diagnosis appears to be a syntactically invalid domain. + # These are invalid with any set of options. + with pytest.raises(EmailSyntaxError): + validate_email(email_input, test_environment=True) + validate_email(email_input, allow_quoted_local=True, allow_domain_literal=True, test_environment=True) + + elif "_DEPREC_" in status: + # Various deprecated syntax are valid email addresses and are accepted by pyIsEmail, + # but we reject them even with extended options. + with pytest.raises(EmailSyntaxError): + validate_email(email_input, test_environment=True) + validate_email(email_input, allow_quoted_local=True, allow_domain_literal=True, test_environment=True) + + else: + raise ValueError(f"status {status} is not recognized")