Entity recognition from video content

Reddi1 · Post by **Reddi1** » Thu Jan 30, 2025 6:48 am

The system can also refer to the entities that occur at a certain point in time in the video. These points in time can refer to specific sections such as minutes 10 to 15 or to set timestamps in the video.

For example, if user 176 provides a query at fifteen minutes into the video, entity extractor 132 may extract entities that appear in the video at the fifteen minute mark or in a range of time that includes the fifteen minute mark. The range of time may be a predetermined (or preset) range of time (e.g., two minute range, twenty second range, etc.). In some argentina phone number data implementations, entity extractor 132 extracts timestamps associated with the entities that may indicate when the entities appear in the multimedia content.

Entities appearing in the video can be evaluated in terms of timestamps using a scoring engine. The closer an entity is to a timestamp, the more likely it is that it plays a role in the scene.

In some implementations, scoring and ranking engine 138 may score one or more extracted entities based on or a time associated with a query initially provided by user 176 and timestamps and properties of extracted entities. For example, scoring and ranking engine 138 may review timestamp information that may be associated with one or more extracted entities.

The patent describes the features that Google can use to identify entities related to the video. These can be:

file name
Comments and remarks from users
video metadata
Comments and remarks from users are recorded in a social data server.

The special thing about the methods presented is that a user does not have to specify the context in a search query in order for an adequate answer to be given. This means that he can refer to a video scene without context and simply ask "who is that person?" or "what kind of car is that?" without having to refer directly to the video he is currently watching.

In some scenarios, the query may not include any context. For example, when viewing the video a user may want to know a name of a person appearing in the video when the person appears in the video. In this example, user 176 may provide the query “Who is this person?” The query may be provided by user 176 when the person appears in the video. In another example, the user may want to identify a model of a car appearing in the video. In this example, user 176 may provide the query “Which car is this?” or “Which car?” The queries may also include queries such as “What is the price of this car?” “Where is this car made?” “Who is the person driving this car?” “Show me other videos that have this car” etc. In these queries, the user has not provided context, such as, the name of the video, characteristics of the car or person or any other context.

The question asked can be supplemented by other similar questions about the respective entity or the search query is left unwritten. Evaluation criteria for further questions and/or rewriting can be co-occurrences, the entity type specified for an entity , the number of search results or previously submitted search queries.

This also makes it possible to answer questions in which the entity is not mentioned, such as "What other films does he appear in?" Depending on the entity identified in the video scene, the paraphrase could then be "What other films does Johnny Depp appear in?"