Vision
Upload a file to MediaCatch Vision API and get the results
Upload a file to MediaCatch Vision API.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fpath
|
str
|
File path. |
required |
type
|
Literal['ocr', 'face']
|
Type of inference to run on the file. |
required |
url
|
str
|
URL to the vision API. Defaults to 'https://api.mediacatch.io/vision'. |
'https://api.mediacatch.io/vision'
|
api_key
|
str
|
API key for the vision API. Defaults to None. |
None
|
fps
|
int
|
Frames per second for video processing. Defaults to 1. |
None
|
tolerance
|
int
|
Tolerance for text detection. Defaults to 10. |
None
|
min_bbox_iou
|
float
|
Minimum bounding box intersection over union for merging text detection. Defaults to 0.5. |
None
|
min_levenshtein_ratio
|
float
|
Minimum Levenshtein ratio for merging text detection (more info here: https://rapidfuzz.github.io/Levenshtein/levenshtein.html#ratio). Defaults to 0.75. |
None
|
moving_threshold
|
int
|
If merged text detections center moves more pixels than this threshold, it will be considered moving text. Defaults to 50. |
None
|
max_text_length
|
int
|
If text length is less than this value, use max_text_confidence as confidence threshold. Defaults to 3. |
None
|
min_text_confidence
|
float
|
Confidence threshold for text detection (if text length is greater than max_text_length). Defaults to 0.5. |
None
|
max_text_confidence
|
float
|
Confidence threshold for text detection (if text length is less than max_text_length). Defaults to 0.8. |
None
|
max_height_width_ratio
|
float
|
Discard detection if height/width ratio is greater than this value. Defaults to 2.0. |
None
|
get_detection_histogram
|
bool
|
If true, get histogram of detection. Defaults to False. |
None
|
detection_histogram_bins
|
int
|
Number of bins for histogram calculation. Defaults to 8. |
None
|
max_height_difference_ratio
|
float
|
Determine the maximum allowed difference in height between two text boxes for them to be merged. Defaults to 0.5. |
None
|
max_horizontal_distance_ratio
|
float
|
Determine if two boxes are close enough horizontally to be considered part of the same text line. Defaults to 0.9. |
None
|
get_frame_index
|
bool
|
If true, get frame index. Defaults to None. |
None
|
get_bbox
|
bool
|
If true, get bounding box. Defaults to None. |
None
|
face_recognition
|
bool
|
If true, run face recognition. Defaults to None. |
None
|
face_age
|
bool
|
If true, get face age. Defaults to None. |
None
|
face_gender
|
bool
|
If true, get face gender. Defaults to None. |
None
|
face_expression
|
bool
|
If true, get face expression. Defaults to None. |
None
|
face_ethnicity
|
bool
|
If true, get face ethnicity. Defaults to None. |
None
|
max_retries
|
int
|
Maximum number of retries. Defaults to 5. |
5
|
delay
|
float
|
Delay between retries. Defaults to 10.0. |
10.0
|
verbose
|
bool
|
If True, print log messages. Defaults to True. |
True
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
File ID. |
Wait for result from a URL.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_id
|
str
|
The file ID to get the result from. |
required |
url
|
str
|
The URL to get the result from. |
'https://api.mediacatch.io/vision'
|
timeout
|
int
|
Timeout for waiting in seconds. Defaults to 3600. |
3600
|
delay
|
int
|
Delay between each request. Defaults to 10. |
10
|
verbose
|
bool
|
If True, print log messages. Defaults to True. |
True
|
Returns:
Type | Description |
---|---|
dict[str, Any] | None
|
dict[str, Any] | None: Dictionary with the result from the URL or None if failed. |