Skip to content

Vision

Upload a file to MediaCatch Vision API and get the results

Upload a file to MediaCatch Vision API.

Parameters:

Name Type Description Default
fpath str

File path.

required
type Literal['ocr', 'face']

Type of inference to run on the file.

required
url str

URL to the vision API. Defaults to 'https://api.mediacatch.io/vision'.

'https://api.mediacatch.io/vision'
api_key str

API key for the vision API. Defaults to None.

None
fps int

Frames per second for video processing. Defaults to 1.

None
tolerance int

Tolerance for text detection. Defaults to 10.

None
min_bbox_iou float

Minimum bounding box intersection over union for merging text detection. Defaults to 0.5.

None
min_levenshtein_ratio float

Minimum Levenshtein ratio for merging text detection (more info here: https://rapidfuzz.github.io/Levenshtein/levenshtein.html#ratio). Defaults to 0.75.

None
moving_threshold int

If merged text detections center moves more pixels than this threshold, it will be considered moving text. Defaults to 50.

None
max_text_length int

If text length is less than this value, use max_text_confidence as confidence threshold. Defaults to 3.

None
min_text_confidence float

Confidence threshold for text detection (if text length is greater than max_text_length). Defaults to 0.5.

None
max_text_confidence float

Confidence threshold for text detection (if text length is less than max_text_length). Defaults to 0.8.

None
max_height_width_ratio float

Discard detection if height/width ratio is greater than this value. Defaults to 2.0.

None
get_detection_histogram bool

If true, get histogram of detection. Defaults to False.

None
detection_histogram_bins int

Number of bins for histogram calculation. Defaults to 8.

None
max_height_difference_ratio float

Determine the maximum allowed difference in height between two text boxes for them to be merged. Defaults to 0.5.

None
max_horizontal_distance_ratio float

Determine if two boxes are close enough horizontally to be considered part of the same text line. Defaults to 0.9.

None
get_frame_index bool

If true, get frame index. Defaults to None.

None
get_bbox bool

If true, get bounding box. Defaults to None.

None
face_recognition bool

If true, run face recognition. Defaults to None.

None
face_age bool

If true, get face age. Defaults to None.

None
face_gender bool

If true, get face gender. Defaults to None.

None
face_expression bool

If true, get face expression. Defaults to None.

None
face_ethnicity bool

If true, get face ethnicity. Defaults to None.

None
max_retries int

Maximum number of retries. Defaults to 5.

5
delay float

Delay between retries. Defaults to 10.0.

10.0
verbose bool

If True, print log messages. Defaults to True.

True

Returns:

Name Type Description
str str

File ID.

Wait for result from a URL.

Parameters:

Name Type Description Default
file_id str

The file ID to get the result from.

required
url str

The URL to get the result from.

'https://api.mediacatch.io/vision'
timeout int

Timeout for waiting in seconds. Defaults to 3600.

3600
delay int

Delay between each request. Defaults to 10.

10
verbose bool

If True, print log messages. Defaults to True.

True

Returns:

Type Description
dict[str, Any] | None

dict[str, Any] | None: Dictionary with the result from the URL or None if failed.