Skip to content

export img decoders/encoders #8511

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

brian-armstrong-discord
Copy link
Contributor

This pull request changes the image codecs API so that findDecoder and findEncoder are available to users. This allows users to inspect the header of an image before decoding it so that they can decide whether they want to decode the image before they do so. For example, the image might describe itself as being 1000000 x 1000000 pixels, in which case openCV will go ahead and allocate an extremely large buffer to hold this image. This new API gives the user the ability to decide they don't want to do that.

In addition, this patch attempts to comment the API a little and make it safer to use by doing bounds checking in readData so that a too-small buffer given to readData will not yield some kind of bad write outside array bounds.

@brian-armstrong-discord
Copy link
Contributor Author

  • added string description to each decoder so that the caller can decide if they don't want to touch the header once the magic bytes have been read. for example, tiffs are quite prone to CVEs, so we might not want to touch them at all.

brian-armstrong-discord added a commit to brian-armstrong-discord/opencv that referenced this pull request Apr 7, 2017
this builds on opencv#8492 and
opencv#8511 to add a new method,
.orientation(), to ImageDecoder. The JPEG subclass now overrides this to
do jpeg-specific exif tag reading

we also now export a method, OrientationTransform, which uses this
orientation value to do the actual matrix operations
@brian-armstrong-discord
Copy link
Contributor Author

This patch resolves #8605 and #8606

@alalek
Copy link
Member

alalek commented Apr 19, 2017

New OpenCV public API should not contain complex classes with "virtual" methods. This is needed to avoid ABI (application binary interface) violations and producing hacks like this - just want to add new method.

Please follow "pImpl design pattern" instead. Example: ocl::Device
You would still have these "virtual classes", but they will go into the private scope (into some .hpp file from src folder: without exposing them public) - probably you should just reuse existed BaseImageEncoder as "internal" implementation.

Please avoid(or minimize) using of STL objects in interfaces (std::vector/std::map/std::string/etc) - this may raise ABI problems which is very hard to debug (especially for newbie C++ developers): #6198 #5227 #6165

@brian-armstrong-discord
Copy link
Contributor Author

@alalek 👍 Good stuff. I think I thought I was accomplishing the pimpl pattern by making the virtual base class and returning Ptr but this makes a lot more sense. Will make those changes soon.

@brian-armstrong-discord
Copy link
Contributor Author

@alalek What would you think about leaving ImageEncoder's write params arg as const std::vector<int>& params? imencode already takes this and it seems like it would make the most sense to have parity with that function, since the argument types are the same.

Actually, for that matter, imencode also uses std::vector<uchar>& buf for its output. So to do that with ImageEncoder here would match imencode as well.

As far as I can tell those are the only places in this patch where STL types are exposed in the ABI.

@alalek
Copy link
Member

alalek commented Apr 20, 2017

Probably we can try to use InputArray wrapper instead of "std::vector& params" (the latest changes support C++ std::array<> argument type).

void setDestination( std::vector& buf )

I believe Mat& (of 8UC1) should work in this case too.

@brian-armstrong-discord
Copy link
Contributor Author

@alalek is it possible to run the tests locally? i tried pulling down the sample data but i wasn't really sure where it needs to go or how to invoke it, so i'm dependent on the tester here to check for now :(

@brian-armstrong-discord
Copy link
Contributor Author

@alalek Ok, nevermind, I put together a small test program.

@brian-armstrong-discord
Copy link
Contributor Author

@alalek So this passes the tests, but I'm not so good at opencv-fu, and I needed to allocate a new Mat and then copy that Mat back to the destination vector in imencode once the encode finished. This is the only way I know of to maintain the backwards compat here with std::vector.

Is there a way around that that we can avoid? I have briefly tried using OutputArray but I'm not really sure how to do the operations I need to on it, including when it's of type std::vector

@brian-armstrong-discord
Copy link
Contributor Author

@alalek I folded #8547 into this patch - it wasn't a huge change, just exported the Exif stuff on top of Decoder/Encoder. It's now part of the JPEGDecoder

@alalek
Copy link
Member

alalek commented Apr 25, 2017

@brian-armstrong-discord There is some conflicts with master branch. Please rebase this patch to actual master branch.

Something like these:

BTW, We prefer to have patches with single commit to eliminate intermediate "fixup" commits from git history. Sometimes it is better to squash your commits before rebase to upstream code (git rebase -i HEAD~11 (11 is number of commits); "pick" the first commit, "f" or "s" - others)

Rebased branch for reference:

@brian-armstrong-discord brian-armstrong-discord force-pushed the export_img_decoder_encoder branch 2 times, most recently from 172aa54 to c2e329e Compare April 25, 2017 18:29
@brian-armstrong-discord
Copy link
Contributor Author

@alalek Looks like the conflicts are gone now

@brian-armstrong-discord
Copy link
Contributor Author

Hi @alalek

Have you had a chance to look at this recently?

@alalek
Copy link
Member

alalek commented May 24, 2017

Hi @brian-armstrong-discord !
I'm sorry for huge delay. I need more time to review this (to test some suggestions on public interfaces).

@Turim
Copy link
Contributor

Turim commented Jun 9, 2017

@alalek @brian-armstrong-discord I also found the idea -to get the header data- very promising. If one might ask, what's the state so far?

P.S. It doesn't seem like you need some help, but if you are - feel free to text me :)
P.P.S.

  • The stated patch intention is to provide, say, lazy evaluation for image processing. But to do so the export whole bunch of ImageEncoders/Decoders is seen as little redundand (4 me ofc) - e.g. we could have added just +1 method imreadheader() or like so. (as C-like interface extension)
  • I wonder shall or shall not Encoders be visible with only simmetric (AFAIK) reasoning
  • I have some concerns about imageDescription(). As for me, the downside is the case -what if I want to know, what the exact format of the image?-. Should I compare the strings? Or shall we introduce an enum instead?
  • Also, it seems, ctors like Decoder(const Decoder& oth, PtrDecoder::Impl) could be avoided.

@brian-armstrong-discord
Copy link
Contributor Author

Hi @alalek

Any updates in this PR? I would love to put this work behind me finally :)

Anecdotally, this patch is working well in production. I know you will likely have some more requests on the interface of it, though.

@brian-armstrong-discord
Copy link
Contributor Author

@Turim an imreadheader would, among other issues, force us to re-read the header bytes when we go to decode. it also opens the possibility of imread() and imreadheader() doing different things. this would be a brittle approach to the solution.

i'm open to using enums, but internally opencv uses strings, and it made sense to me to make this change as lightweight as possible.

@Turim
Copy link
Contributor

Turim commented Jun 23, 2017

@brian-armstrong-discord ,
Frankly speaking, I didn't see here strong reasoning against.

  • Yup, re-read() is possible. Ok, we could live even with that or provide within imreadheader() an out param, which could be used to read the image but header.
  • About possibilities. A strange argument, as for me, coz we could have had tests for the case. Besides, we could implement imread() via imreadheader() and that's it, isn't it?

i'm open to using enums, but internally opencv uses strings, and it made sense to me to make this change as lightweight as possible.

Can I ask you to be more specific? In competition 'enum vs strings' where's the pros of the string usage?
(Say, at present to the question -Which format does the image have?- I see the overhead in terms of string comparisons and have no clue -how to avoid it-).

@brian-armstrong-discord
Copy link
Contributor Author

@Turim You're more than welcome to try whatever you think will work well. I'd like to stick with the approach that's in this PR because the Decoder and Encoder classes do the same work that was being done before and offer the user a relatively clean interface.

@brian-armstrong-discord
Copy link
Contributor Author

@alalek Do you think there's any hope for this PR? I'm wondering where I should set my expectations.

@sturkmen72
Copy link
Contributor

sturkmen72 commented Nov 14, 2017

i want to share my humble thoughts about this PR
i completely agree that there should be a function or class giving image information only reading file header.
when adding this functionality we also should keep in mind some related improvements
1.catching internal errors while loading images (see #3314)
2.when saving an image keeping EXIF information or adding new EXIF tags

personally i think a new class using existing decoders and encoders internally can be a good candidate to the solution

@olalonde
Copy link

@alalek
Copy link
Member

alalek commented Nov 14, 2017

Sorry for huge delay!

Current interface is very questionable (for example, setScale(), orientation() and nextPage() methods). Such interfaces should be polished accurately:

  • minimal "required" set of parameters.
  • easily extendable interface
  • usability (construct image decoder, call readHeader(), check result and then maybe get image size via width and height - this long story we should to hide from users)

Perhaps, we could start with simple set of functions, for example:

/** Query image parameters

@sa cv::imread

Throws exception if image is not supported.

@param filename Name of file to be loaded
@param flags Flag that can take values of cv::ImreadModes
@param[out] type (optional) type of decoded cv::Mat returned by cv::imtead
@param[out] decoderName (optional) name of used image decoder for cv::imread
@returns size of image
*/
Size imquery(const cv::String& filename, int flags = IMREAD_COLOR,
                CV_OUT int* type = NULL, CV_OUT cv::String* decoderName = NULL);

I believe it is able to handle declared set of problems from the PR description.

@Turim
Copy link
Contributor

Turim commented Nov 15, 2017

@alalek , IMHO

  • nextPage() can't be ignored as just redundand for nowadays we're able to handle multipage TIFF images;
  • I hope imageHeader/width and imageHeader/height still can be a subject for a discussion;

I think the case could be mantained for the imgcodecs to be more clean. Nowadays the only what a imgcodecs' user can is try to load the image via, say, imread and check the retval. But in the prod code it's just sort of gambling with user's memory (i.e. working set) - every try to load image, say, 10000x10000 or even bigger leads to a memory consumption peak with empty retval.

Am I right that you prefer not to add struct/classes?
The another way to mantain the case could be code like that:

/**
@param filename Name of file to be loaded
@param flags Flag that can take values of cv::ImreadModes
@param[out] type (optional) type of decoded cv::Mat returned by cv::imread
@param[out] width (optional) width of decoded cv::Mat returned by cv::imread
@param[out] height (optional) width of decoded cv::Mat returned by cv::imread
*/
CV_EXPORTS_W bool imreadheader(const String& filename, int flags, 
                                CV_OUT int* type = NULL, CV_OUT int* width = NULL, CV_OUT int* height = NULL);

/**
^like the previous
*/
struct ImageSize // or pair_of_int or tuple_of_int
{
int width;
int height;
};
CV_EXPORTS_W bool imreadheadermulti(const String& filename, int flags, 
                                CV_OUT int* type = NULL, CV_OUT std::vector<ImageSize>* sizes = NULL);

As for mentioned imquery() - yes, it gives you the option to distinguish JPEG from TIFF, but the image size is an open question still - so you basically don't know if there any way to load it.

P.S. @brian-armstrong-discord I apologize for I've just mentioned yet another interface for the case. Quite unsure should I create a separate discussion for that. Feel free to provide the feedback on that.
P.P.S. About 'to throw or not to throw' - I'm pretty unsure that it has any connection to the iface. To enable throwing means just to break the backward compatibility and previously written error handling.
P.P.P.S. Yup, both imreadheader() and imquery() aren't so good regarding the perf - on imread() you eventually reread the header. I think we can't fix that without classes/output_token_parameters which could use the state. But even with current perf restrictions, it's good just to have functions like those in the imgcodecs interface.

@alalek
Copy link
Member

alalek commented Nov 15, 2017

but the image size is an open question still

In case of imquery proposal - size is return value of mentioned function.

multipage TIFF

I prefer to handle the most universal cases and I prefer not to add some rarely used hacks (like setScale() or readNext()) into widely used interfaces.
Image is a single surface of pixels (check "wiki"-like definitions).
If you want to work with multipage images - welcome to VideoCapture interface, which is initially designed to handle set of images, including cases of different frames resolutions.

Throw

Check amount of useful information between "bool" result and "exception" with message and potentially stack-trace to the reason of problem.

@Turim
Copy link
Contributor

Turim commented Nov 15, 2017

@alalek , thanks for clarifying.

size as the retval

You're correct, my mistake.

I prefer not to add some rarely used hacks

AFAIK imreadmulti() supports it and, I presume, it's the only goal.
So we're able to read multiple images, but can get the Size only for the 1st - I presume - the more clean would be to add imquerymulti() or to deprecate imreadmulti().

@brian-armstrong-discord
Copy link
Contributor Author

Should I go ahead and just abandon this? The imquery approach isn't very appetizing, and if this isn't mergeable, we might as well kill it.

@brian-armstrong-discord
Copy link
Contributor Author

Are there any plans to add something like imquery? We'd love to get off this fork, especially now that opencv has changed substantially.

@asenyaev
Copy link
Contributor

asenyaev commented Apr 7, 2021

jenkins cn please retry a build

@ocpalo
Copy link
Collaborator

ocpalo commented May 23, 2022

Hi,

I think this pull request covers my GSoC proposal :). Before the coding period started, I was doing some research and found myself here.

I am going to do a meeting with my mentor about this topic but after seeing lots of people discuss this topic, I wanted to hear about you if you are still interested. After re-read your comments, I came to this conclusion.

I think we should provide an API for reading image header-only using the approach c-like interface. If the user knows about the image data, he would not use this API to confirm. Only users would use this API who do not know about its image details. So for these users, re-reading the header should not be a problem. Because their concerns are probably about limited memory. They could sacrifice some CPU time to verify that their software is not going to die in the middle of the process.

I think it should be something like this

 int imreadheader(String const& file, int flags, <Struct Header Info>);
 int imreadheader(String const& file, int flags, <Struct Header Info>, Mat &img, int width, int height); // Read image to the img if image size within the given limits

The second function does a different thing than what it says but I think it is good to provide this API.

Please let me know what you think about this :). This pull request is opened in 2017 but is still open. So I did not bother to create a new one since there exists one on this topic. I hope this won't be a problem.

@alalek
Copy link
Member

alalek commented May 24, 2022

Having imreadheader or similar calls for reading of the image header are not enough.
Usually this call is followed by reading the actual image data.
Main point is that OpenCV and it s API should be effective and we should not read the input file twice.

@ocpalo
Copy link
Collaborator

ocpalo commented May 24, 2022

/** Query image parameters

@sa cv::imread

Throws exception if image is not supported.

@param filename Name of file to be loaded
@param flags Flag that can take values of cv::ImreadModes
@param[out] type (optional) type of decoded cv::Mat returned by cv::imtead
@param[out] decoderName (optional) name of used image decoder for cv::imread
@returns size of image
*/
Size imquery(const cv::String& filename, int flags = IMREAD_COLOR,
                CV_OUT int* type = NULL, CV_OUT cv::String* decoderName = NULL);

You suggested this in 2017. I think I can implement this and overload the imread_ method. But somehow I can't escape from these lines in imread.

    // in imread_ function
    ImageDecoder decoder;

#ifdef HAVE_GDAL
    if(flags != IMREAD_UNCHANGED && (flags & IMREAD_LOAD_GDAL) == IMREAD_LOAD_GDAL ){
        decoder = GdalDecoder().newDecoder();
    }else{
#endif
        decoder = findDecoder( filename );
#ifdef HAVE_GDAL
    }
#endif

We can overload the imread function to read image data using the result of imquery call.
We can implement the overloaded imread_ function as

bool imread(String const& filename, int type, String const& decoderName, int flags, Size size) {
        ImageDecoder decoder;

       // Add a new method to ImageDecoder to find appropiate decoder by decoderName?
#ifdef HAVE_GDAL
    if(flags != IMREAD_UNCHANGED && (flags & IMREAD_LOAD_GDAL) == IMREAD_LOAD_GDAL ){
        decoder = GdalDecoder().newDecoder();
    }else{
#endif
        decoder = findDecoder( filename );
#ifdef HAVE_GDAL
    }
#endif

    /// if no decoder was found, return nothing.
    if( !decoder ){
        return 0;
    }
    
    int scale_denom = 1;
    if( flags > IMREAD_LOAD_GDAL )
    {
        if( flags & IMREAD_REDUCED_GRAYSCALE_2 )
            scale_denom = 2;
        else if( flags & IMREAD_REDUCED_GRAYSCALE_4 )
            scale_denom = 4;
        else if( flags & IMREAD_REDUCED_GRAYSCALE_8 )
            scale_denom = 8;
    }
    
        /// set the scale_denom in the driver
    decoder->setScale( scale_denom );

    /// set the filename in the driver
    decoder->setSource( filename );
    
    mat.create( size.height, size.width, type );
    
    // read the image data
    bool success = false;
    try
    {
        success = decoder->readData(mat);
    }
    catch (const cv::Exception& e)
    {
        std::cerr << "imread_('" << filename << "'): can't read data: " << e.what() << std::endl << std::flush;
    }
    catch (...)
    {
        std::cerr << "imread_('" << filename << "'): can't read data: unknown exception" << std::endl << std::flush;
    }
    if (!success)
    {
        mat.release();
        return false;
    }

    if( decoder->setScale( scale_denom ) > 1 ) // if decoder is JpegDecoder then decoder->setScale always returns 1
    {
        resize( mat, mat, Size( size.width / scale_denom, size.height / scale_denom ), 0, 0, INTER_LINEAR_EXACT);
    }

    /// optionally rotate the data if EXIF orientation flag says so
    if (!mat.empty() && (flags & IMREAD_IGNORE_ORIENTATION) == 0 && flags != IMREAD_UNCHANGED )
    {
        ApplyExifOrientation(decoder->getExifTag(ORIENTATION), mat);
    }
    
    return true;

What do you think about it? I think the function name imreaddata would be more appropriate.

We need a new function static bool findDecoderByName(String const& decoderName) method to find the right decoder in order to use imquery result decoderName. Then we can change the findDecoder part of overloaded imread function i guess.

@brian-armstrong-discord brian-armstrong-discord deleted the export_img_decoder_encoder branch May 25, 2022 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants