Known Indirect Subclasses |
Common interface for every entity across the hierarchy of recognized text. An entity may contain other smaller entities, or may be an atom.
Public Method Summary
abstract Rect |
getBoundingBox()
Axis-aligned bounding box containing the text.
|
abstract List<? extends Text> |
getComponents()
Smaller components that comprise this entity, if any.
|
abstract Point[] |
getCornerPoints()
Four corner points in clockwise direction starting with top-left.
|
abstract String |
getLanguage()
Prevailing language in the text, if any.
|
abstract String |
getValue()
Retrieve the recognized text as a string.
|
Public Methods
public abstract Rect getBoundingBox ()
Axis-aligned bounding box containing the text. The bounding box may extend past the image boundary.
public abstract List<? extends Text> getComponents ()
Smaller components that comprise this entity, if any. If this entity is an atom, an
empty list is returned. TextBlock
is at the top of the Text hierarchy. TextBlock
contains Line
objects, which contains Element
s.
Element
s
are atoms. We may decide to add character-level objects in later versions.
For example, a client could draw bounding boxes for recognized text in different colors for paragraphs, lines, words, and alphabets by repeatedly traversing down the tree with this method.
public abstract Point[] getCornerPoints ()
Four corner points in clockwise direction starting with top-left. Due to the possible perspective distortions, this is not necessarily a rectangle. Parts of the region could be outside of the image.
public abstract String getLanguage ()
Prevailing language in the text, if any. The format is in BCP47 (e.g. "en" or "sr-Latn-BA") or "und" if the language could not be determined.
public abstract String getValue ()
Retrieve the recognized text as a string. Returned in reading order for the language. For Latin, this is top to bottom within a TextBlock, and left-to-right within Lines.