A system and method for performing multilingual multimodal summarization for multimodal input

A system and method for performing multilingual multimodal summarization for multimodal input

Shape