AssetManagerClient¶

class scenebox.clients.AssetManagerClient(asset_type='sessions', asset_manager_url=None, metadata_only=False, auth_token=None, cache_disabled=False, is_non_standard_asset=False, kv_store=None, num_threads=20)¶

Asset manager client for fetching/putting assets and metadata.

Parameters:

asset_type (str) – (str) The type of asset
asset_manager_url (Optional[str]) – (str) The URL of the asset manager
metadata_only (bool) – (bool) Whether to only return metadata
auth_token (Optional[str]) – (str) Authentication token of the user – must be provided
cache_disabled (bool) – (bool): Disable caching; defaults to False
(bool) (is_non_standard_asset) – Whether the asset is a non-standard asset. If True, ignore asset-type checks.
kv_store (Optional[KVStoreTemplate]) – key-value store to cache bytes, default to None.

Methods

`add_auxiliary_metadata`	Add auxiliary metadata to an asset.
`assets_iterator`	Retrieve an iterator for Assets with a search query.
`average`	Average a specified field over assets that satisfy a query.
`cardinality`	Args:
`clear_cache`	Clears asset manager’s Redis cache by organization Args: all_organizations: if True, Redis cache for all organization is cleared. Requires Super Admin privilege partitions: list of partitions (like images, sets, annotations) to clear their cache, if not set, all is cleared.
`copy`	Copy an asset.
`count`	Count the number of assets satisfying a query.
`delete`	Delete an asset.
`delete_with_list`	Delete assets specified with a list of asset IDs.
`delete_with_query`	Delete assets specified with a search query.
`download_data_in_batch`	Download asset files using a search query or ids.
`exists`	Check if an asset exists.
`exists_multiple`	Check if assets exist.
`get_bytes`	Get the data bytes of an asset.
`get_bytes_from_kv_store`	Get the data bytes from kv_store if available.
`get_bytes_in_batch`	Get the data bytes for a list of assets.
`get_metadata`	Fetch an asset’s metadata.
`get_metadata_in_batch`	Fetch an asset’s metadata.
`get_url`	Get the public URL of an asset.
`get_url_in_batch`	Get the public URLs of a list of assets.
`put_directory`	Convert a directory to a zipfile and upload.
`put_file`	Register and upload a file with the asset manager Args: file (BytesIO or bytes): bytes object id (str): the uid of the file. If not provided, will be set automatically owner_organization_id (str): file owner organization id wait_for_completion (bool): Whether to wait for the upload to finish before returning retry: should we retry? :rtype: `ObjectAccess` :return: object_access for the uploaded file.
`remove_key_from_index`	Remove a key from index and its mapping (Admin only).
`search_assets`	Retrieve asset IDs with a search query.
`search_assets_large`	Retrieve all asset IDs matching a search query.
`search_meta`	Retrieve asset metadata with a search query.
`search_meta_large`	Retrieve all asset metadata matching a search query.
`sum`	Sum a specified field over assets that satisfy a query.
`summary_meta`	Get a metadata summary.
`update_aux_metadata`	Update an asset’s auxiliary metadata.
`update_by_query`	Args:
`update_metadata`	Update an asset’s metadata.
`update_metadata_batch`	Update an asset’s metadata.
`upsert_metadata`	Upsert an asset’s metadata.
`wait_for_batch_update_task`	Waits for update by query task completion and returns the status.
`with_asset_state`	Set the asset state, for use in chaining.

delete(id, wait_for_deletion=True, raise_for_failure=False)¶

Delete an asset.

Parameters:

id – The ID of the asset to delete.
wait_for_deletion – If True, polls until the specified asset no longer exists. Otherwise, returns immediately (even if the asset is still in the process of being deleted).
raise_for_failure – should raise for failure

Return type:

dict

put_directory(directory_path, metadata, temp_dir=PosixPath('/tmp'))¶

Convert a directory to a zipfile and upload.

Return type:: str

put_file(file_object, id=None, folder=None, owner_organization_id=None, content_type=None, content_encoding=None, content_size=None, add_to_redis_cache=False, retry=False)¶

Register and upload a file with the asset manager Args:

file (BytesIO or bytes): bytes object id (str): the uid of the file. If not provided, will be set automatically owner_organization_id (str): file owner organization id wait_for_completion (bool): Whether to wait for the upload to finish before returning retry: should we retry?

Return type:: ObjectAccess
Returns:: object_access for the uploaded file

get_url(id, expiration=43200)¶

Get the public URL of an asset.

Parameters:

id – The ID of the asset to get the URL of.
expiration – url expiration time in seconds (default 12 hours)

Returns:

The URL of the specified asset.

Return type:

str

get_bytes_from_kv_store(id)¶

Get the data bytes from kv_store if available. :Parameters: id – The ID of the asset to get the bytes of.

Return type:: Optional[bytes]

get_bytes(id, add_to_diskcache=False, add_to_redis_cache=True, url=None, retry=False)¶

Get the data bytes of an asset.

Parameters:

id – The ID of the asset to get the bytes of.
add_to_diskcache – bool. Default is False. Flag to add retrieved asset to diskcache.
add_to_redis_cache – bool. Default is True. Flag to add retrieved asset to redis cache.
url – Optional string. Url of the asset.
retry – If True, retries the call for total_tries set in retry decorator

Returns:

The data bytes of the specified asset.

Return type:

bytes

get_bytes_in_batch(ids, add_to_redis_cache=True, add_to_diskcache=False)¶

Get the data bytes for a list of assets.

Parameters:

ids – The IDs of the assets to get the data bytes of.
add_to_redis_cache – should we add to redis cache
add_to_diskcache – should we add to disk cache

Returns:

The fetched bytes in a dictionary. Keys are the input ids. Values are the corresponding data bytes.

Return type:

dict[str, str]

get_url_in_batch(ids)¶

Get the public URLs of a list of assets.

Parameters:: ids – The IDs of the assets to get the URLs of.
Returns:: The URLs of the specified assets. Keys are the inputted ids. Values are the corresponding URLs.
Return type:: dict[str, str]

exists(id)¶

Check if an asset exists.

Parameters:: id – The ID of the asset to check the existence of.
Returns:: Returns True if the named asset exists. Otherwise, returns False.
Return type:: bool

exists_multiple(ids)¶

Check if assets exist.

Parameters:: ids – List of the IDs of assets to check the existence of.
Returns:: Returns True if the named assets exist. Otherwise, returns False.
Return type:: bool

copy(id, new_id)¶

Copy an asset.

Parameters:

id – The ID of the asset to copy.
new_id – The ID to give to the created asset copy.

count(search=None)¶

Count the number of assets satisfying a query.

Parameters:: search – Query to locate the data subset of interest. Filters through existing assets according to the dictionary passed.
Returns:: The number of assets that fulfil search.
Return type:: int

sum(field_name, is_nested=False, search=None)¶

Sum a specified field over assets that satisfy a query.

Find the aggregate sum of field_name over all the assets specified by search.

Parameters:

field_name – The field name in the filtered assets’ metadata of which to sum over.
is_nested – Whether field is nested type or not.
search – Query to locate the data subset of interest. Filters through existing assets according to the dictionary passed.

Returns:

The aggregate sum of the specified field name over the filtered assets.

Return type:

float

average(field_name, is_nested=False, search=None)¶

Average a specified field over assets that satisfy a query.

Find the aggregate average of field_name over all the assets specified by search.

Parameters:

field_name – The field name in the filtered assets’ metadata of which to average over.
is_nested – Whether field is nested type or not.
search – Query to locate the data subset of interest. Filters through existing assets according to the dictionary passed.

Returns:

The aggregate average of the specified field name over the filtered assets.

Return type:

float

cardinality(field_name, string_field=True, is_nested=False, search=None)¶

Args:: field_name: the field to do average over it is_nested: Whether field is nested type or not. search: search criteria for the assets

Returns: Count of distinct values

Return type:: float

get_metadata(id, retry=False)¶

Fetch an asset’s metadata.

Parameters:

id – The ID of the asset to receive the metadata of.
retry – If True, retries the call for total_tries set in retry decorator

Returns:

The metadata of the specified asset.

Return type:

dict

get_metadata_in_batch(ids, source_selected_fields=None)¶

Fetch an asset’s metadata.

Parameters:

ids – The IDs of the asset to receive the metadata of.
source_selected_fields – list of desired metadata fields. By default, retrieve everything

Returns:

Dict – The metadata of the specified asset-set to None if not found.

Return type:

str->dict or None

update_metadata(id, metadata, buffered_write=False, replace_sets=False, geo_field=None, shape_group_field=None, nested_fields=None, retry=False)¶

Update an asset’s metadata.

Parameters:

id – The ID of the asset to update the metadata of.
metadata – The metadata to update the existing metadata with.
buffered_write – If True, ingests the metadata in a buffered fashion.
replace_sets – If True, replaces existing metadata with metadata. Otherwise, attempts to append metadata to the existing metadata.
geo_field – Geolocation field
shape_group_field – Shape group field (example: UMAP)
nested_fields – nested fields (example: [“annotations_meta”])
retry – If True, retries the call for total_tries set in retry decorator

Return type:

dict

update_aux_metadata(id, metadata, geo_field=None, shape_group_field=None, nested_fields=None)¶

Update an asset’s auxiliary metadata.

Parameters:

id – The ID of the asset to update the metadata of.
metadata – The metadata to update the existing metadata with.
geo_field – Geolocation field
shape_group_field – Shape group field (example: UMAP)
nested_fields – nested fields (example: [“annotations_meta”])

Return type:

dict

update_metadata_batch(ids, metadata, buffered_write=False, replace_sets=False, geo_field=None, shape_group_field=None, nested_fields=None)¶

Update an asset’s metadata.

Parameters:

ids – The IDs of the assets to update the metadata of.
metadata – The metadata to update the existing metadata with.
buffered_write – If True, ingests the metadata in a buffered fashion.
replace_sets – If True, replaces existing metadata with metadata. Otherwise, attempts to append metadata to the existing metadata.
geo_field – Geolocation field
shape_group_field – Shape group field (example: UMAP)
nested_fields – nested fields (example: [“annotations_meta”])
retry – If True, retries the call for total_tries set in retry decorator

Returns:

Update status for each of the chunks the ids and metadata were broken into.

Return type:

dict

upsert_metadata(id, metadata, geo_field=None, shape_group_field=None, nested_fields=None, buffered_write=False)¶

Upsert an asset’s metadata.

Inserts new fields of metadata if they do not already exist. Otherwise, updates the existing field if it does exist.

Parameters:

id – The ID of the asset to upsert the metadata of.
metadata – The metadata to upsert the existing metadata with.
geo_field – Geolocation field
shape_group_field – Shape group field (example: UMAP)
nested_fields – nested fields (example: [“annotations_meta”])
buffered_write – If True, ingests metadata in a buffered fashion.

Return type:

dict

update_by_query(update_field, update_value, operation, limit=None, query=None)¶

Args:

update_field: field to update update_value: update value operation: add|remove limit: limit the number of docs in query result. <limit> number of docs will be randomly selected. If not

provided, all documents satisfying the query will be added to set.

query: search query to limit the docs to update

Returns:

update task id(s). you can check the status of task using the jobs/query_task endpoint. once the task is complete you may need to clear the cache for associated assets to view the lasted changes

Return type:: list

assets_iterator(query=None, filters=None, sort_field='id', sort_order='asc')¶

Retrieve an iterator for Assets with a search query.

Parameters:

query – Query to locate the data subset of interest. Filters through existing assets according to the dictionary passed.
filters – Filter names and values to append to query. Dict keys represent the existing filter names, and dict values are the filter values.
sort_field – Filters for the specified name.
sort_order – Specifies the string sorting order. [asc. desc]

Returns:

Iterator for assets

Return type:

Iterator[Asset]

search_assets(query=None, filters=None, size=50, offset=0, sort_field=None, sort_order=None, search_after=None, scan=False)¶

Retrieve asset IDs with a search query.

Returns the top size matching hits. If a return of more than 10000 hits is desired, please use AssetManagerClient.search_assets_large().

Parameters:

query – Query to locate the data subset of interest. Filters through existing assets according to the dictionary passed.
filters – Filter names and values to append to query. Dict keys represent the existing filter names, and dict values are the filter values.
size – Specifies the Elasticsearch search size. The maximum number of hits to return. Has no effect if scan is False.
offset – Specifies the Elasticsearch search offset. The number of hits to skip. Has no effect if scan is False.
sort_field – Filters for the specified name.
sort_order – Specifies the Elasticsearch string sorting order.
search_after – Used for retrieving all assets. See search_assets_large
scan – If True, uses the Elasticsearch scan capability. Otherwise, uses the Elasticsearch search API.

Returns:

A list of the IDs of the assets fulfilling the search query.

Return type:

List[str]

search_assets_large(query=None, filters=None)¶

Retrieve all asset IDs matching a search query.

Return all hits matching a search query. If a return of less than 10000 hits is desired, please use AssetManagerClient.search_assets() or SceneEngineClient.search_assets().

Parameters:

query – Query to locate the data subset of interest. Filters through existing assets according to the dictionary passed.
filters – Filter names and values to append to query. Dict keys represent the existing filter names, and dict values are the filter values.

Returns:

A list of the IDs of the assets fulfilling the search query.

Return type:

List[str]

remove_key_from_index(key, wait_for_completion=False)¶

Remove a key from index and its mapping (Admin only).

Parameters:

key – Key to be removed from index and its mapping
wait_for_completion – If True, polls until job is complete. Otherwise, continues execution and does not raise an error if the job does not complete.

Returns:

The id of the Job that carries out the key removal.

Return type:

str

delete_with_query(query=None, filters=None, wait_for_completion=False)¶

Delete assets specified with a search query.

Parameters:

query – Query to locate the data subset of interest. Filters through existing assets according to the dictionary passed.
filters – Filter names and values to append to query. Dict keys represent the existing filter names, and dict values are the filter values.
wait_for_completion – If True, polls until job is complete. Otherwise, continues execution and does not raise an error if the job does not complete.

Returns:

The id of the Job that carries out the asset deletion.

Return type:

str

delete_with_list(assets_list=None, wait_for_completion=False)¶

Delete assets specified with a list of asset IDs.

Parameters:

assets_list – Asset IDs of the assets to delete.
wait_for_completion – If True, polls until job is complete. Otherwise, continues execution and does not raise an error if the job does not complete.

Returns:

The id of the Job that carries out the asset deletion.

Return type:

str

search_meta(query=None, size=50, filters=None, offset=0, sort_field=None, sort_order=None, scan=False, compress=False)¶

Retrieve asset metadata with a search query.

Returns the top size matching hits. If a return of more than 10000 hits is desired, please use AssetManagerClient.search_meta_large().

Parameters:

query – Query to locate the data subset of interest. Filters through existing assets according to the dictionary passed.
filters – Filter names and values to append to query. Dict keys represent the existing filter names, and dict values are the filter values.
size – Specifies the Elasticsearch search size. The maximum number of hits to return. Has no effect if scan is False.
offset – Specifies the Elasticsearch search offset. The number of hits to skip. Has no effect if scan is False.
sort_field – Filters for the specified name.
sort_order – Specifies the Elasticsearch string sorting order.
scan – If True, uses the Elasticsearch scan capability. Otherwise, uses the Elasticsearch search API.
compress – Boolean. If set to True, a gzip compressed list of metadata is returned. Typically used in cases where the metadata returned is over 20MB.

Returns:

A list of the metadata dicts of each asset that fulfills the search query.

Return type:

List[dict]

search_meta_large(query=None, filters=None, compress=False)¶

Retrieve all asset metadata matching a search query.

Return all hits matching a search query. If a return of less than 10000 hits is desired, please use AssetManagerClient.search_meta() or SceneEngineClient.search_meta().

Parameters:

query – Query to locate the data subset of interest. Filters through existing assets according to the dictionary passed.
filters – Filter names and values to append to query. Dict keys represent the existing filter names, and dict values are the filter values.
compress – Boolean. If set to True, a gzip compressed list of metadata is returned. Typically used in cases where the metadata returned is over 20MB.

Returns:

A list of the metadata dicts of each asset that fulfills the search query.

Return type:

List[dict]

summary_meta(summary_request)¶

Get a metadata summary.

Parameters:

summary_request – Dict of summary settings of the following form: {

“search” (dict): <Query to locate the data subset of interest>, “dimensions” (Optional[List[str]): <Dimensions of summary (term aggregations)>, “nested_dimensions” (Optional[List[str]): <Nested dimensions of summary (term aggregations)>, “custom_buckets_numeric” Optional[List[dict]]: <Buckets for range aggregation> “custom_buckets_time” Optional[List[dict]]: <Buckets for date_range aggregation> “aggregate_field” Optional[str]: <Field for metric aggregations> “ is_nested” Optional[bool] Whether field is nested type or not. “aggregation_type” Optional[str]: <Type of metric aggregation (e.g. avg, sum)> “max_size_for_agg” Optional[str]: < Maximum number of buckets to return (default is 100)>

}

Returns:

Metadata summary according to summary_request.

Return type:

dict

with_asset_state(asset_type, metadata_only=False)¶

Set the asset state, for use in chaining.

Eg. client.with_asset_state(“images”, True).search_files({})

download_data_in_batch(destination_dir, ids=None, query=None, download_metadata=True)¶

Download asset files using a search query or ids.

Parameters:

destination_dir – The existing local directory to download the files to.
ids – The id of the assets to be downloaded
query – Query to locate the data subset of interest. Filters through existing assets according to the dictionary passed.
download_metadata – If True, downloads asset metadata. Otherwise, does not download asset metadata.

Returns:

Maps asset IDs obtained with query to local filepath and metadata. Highest level keys represent asset IDs. Lower level keys “filepath” and “metadata” map to local filepath and metadata for a specific ID.

Return type:

dict

add_auxiliary_metadata(id=None, ids=None, search=None, filters=None, **kwargs)¶

Add auxiliary metadata to an asset.

Parameters:

id – An asset IDs to add the auxiliary metadata to.
ids – A list of asset IDs to add the auxiliary metadata to.
search – Query to locate the data subset of interest. Filters through existing assets according to the dictionary passed.
filters – Filter names and values to append to search. Dict keys represent the existing filter names, and dict values are the filter values.
**kwargs – Maps custom metadata labels to any value. Keys represent the auxiliary metadata labels. Values represent the auxiliary metadata values.

clear_cache(all_organizations=False, partitions=None)¶

Clears asset manager’s Redis cache by organization Args:

all_organizations: if True, Redis cache for all organization is cleared. Requires Super Admin privilege partitions: list of partitions (like images, sets, annotations) to clear their cache, if not set, all is cleared

Returns:: ACK_OK_RESPONSE on success

wait_for_batch_update_task(query_task_id, timeout=1200)¶: Waits for update by query task completion and returns the status.