Expert Systems With Application s 123 (2019) 246-255

Contents lists availa ble at ScienceDirect

Expert Systems With Applications

ELSEVIER

journal homepage: www.elsevier.com/locate/eswa

Review

Identifying turning points in animated cartoons

豆］

Chang Liu*, Mark Last, Armin Shmilovici

Department of Software and Information Systems Engineering, Ben-Gu rion University, Beer Shenu;口，Israel

A R T I C L E I N F O A B S T R A C T

Articl e history:

Received 4 October 2018

Revised 31 December 2018

Accepted 2 January 2019

Available online 15 January 2019

Keywords.

Storyrsquo;s turning points Story elements detection Story understand ing

Video analytics

Detecting key story elements such as protagonist, opponent, desire, turning points, battle, and victory, etc. is essential for various narrative work applications including content retrieval and content recommendashy; tion systems. The task of automatically identifying story elements is challenging because of its complexity and su均ectiveness and currently, there are no available algorithms for this task. In this paper, we focus on identifying turning points in a story of a cartoon movie. The proposed methodology extends the novel two-clocks theory, originally validated on scripts of theatre plays, to video stories. The assumption behind the two-clocks theory is that the perception of time is different when some special event happens to a certain agent ( e.g., time flows slower for a patient and quicker for a tourist ). The story timeline is monishy; tored with two clocks: an event clock, which measures the regular time flow of the story; and a weighted clock, which measures the timing of the story events. We have conducted an experiment on 28 episodes of a cartoon series and achieved promising results: 78.6克 precision for turning points identification and 100% precision for key scene detection. The proposed approach is the first step towards development of intelligent systems for automated understanding of stories in narrative works such as cinema movies and

even amateur videos uploaded to the Internet.

Introduction

With the widely spread of web accessibility and the develshy; opment of video producing technologies, people are exposed to a massive amount of videos. A口10ng them，pi;1any video narrative works (e.g., movies, TV series, cartoons, etc.) are made with shots and scenes presenting the plot of some story. Automated undershy; standing of the stories told by such videos via analyzing the video content and structure can be beneficial for multiple tasks includshy;

ing video retrieval, video recommendation, and video annotation. This kind of analysis can also be utilized for educating students in film production or delivering preferable video content to users. It is natural for a human to perform video story analytics by watchshy; ing the videos. However, the manual approach is time-consuming and not scalable to massive amounts of online videos as it contains repetitive efforts such as assigning annotation tags to the videos ( Gomez-Uribe amp; Hunt, 2016; Soares amp; Viana, 2015 ). Therefore aushy; tomatic copi;1putational 口1ethods are being developed for video conshy; tent and story analytics. Unlike many of these methods, which conshy; centrate on detecting detailed visual elements such as objects and

Corresponding author.

E-mail 日ddresses: liuc@post.bgu.ac.il (C. Liu ). mlast@bgu.ac.il (M. Last ), armin@bgu.ac.il (A. Shmilovici)

https://doi.org/ 10.1016/j.eswa.2019 .01.003

actions or describing short video clips with simple sentences, we

aim at developing methods to identify the key elements of a story in a video, such as the “hero” （ the protagonist ), the turning points,

the roles of the characters. etc, and to understand how the eleshy; ments advance the story. This paper focuses on detection of one key ele口1ent - the turning points.

According to the widely used three-act “Paradigm 叶，a concepshy; tual scheme of scriptwriting/story-writing, a good story is comshy; posed of three main acts and each of them plays a different role in the story ( i.e., the set up, confrontation, and resolution ) (Field, 2007 ), as shown in Fig. 1. There are different elements within the three-act structure, such as climaxes, midpoint, beginshy; ning, inciting incident, second thoughts, obstacles, disaster, wrapshy; up and E时，which construct the main framework of a story. With the ultimate goal of understanding stories in videos, we start by identifying important story elements. ln this paper, we build upon an innovative two-clocks theory from Lotker (2016 ) that is aimed at detecting one key event in a narrative work (e.g., a movie script or a theater play) to identify multiple turning points of cartoon stories. This is the first application of the two-clocks theory to

1 The most notable contribution of the lead ing American screenwriter. Sydney Alvin Field . The structure of three-act was proposed in his 日rst book Screenplay: The Foundation s of Screenwriting (Dell Publishing, 1979), and became popular among writers and Hollywood film producers as guideline and quality measurement .

C. Liu, M. Last and A. Shmilovici / Expert Systems With Applications 123 (2019) 246-255 247

Three-Act Structure

Act One

(set up)

Act Two

(c01l岳阳tation)

Act Tl:ttee

(resolution)

剩余内容已隐藏，支付完成后下载完整资料

综述

识别动画中的转折点

Chang Liulowast;, Mark Last, Armin Shmilovici

软件与信息系统工程系，本古里安大学，比尔谢瓦，以色列

摘要：

检测关键的故事元素，如主角、对手、欲望、转折点、战斗、胜利等，对于内容检索和内容推荐系统等各种叙事工作应用都是必不可少的。自动识别故事元素的任务是具有挑战性的，因为它的复杂性和主观性，目前还没有可用的算法来完成这项任务。在这篇论文中，我们着重于确定一个卡通电影故事中的转折点。该方法将最初在戏剧剧本中得到验证的双时钟理论扩展到视频故事中。双时钟理论背后的假设是，当某一特殊事件发生在某一个体身上时(例如，病人的时间流得较慢，而游客的时间流得较快)，对时间的感知是不同的。故事时间轴由两个时钟构成:一个事件时钟，它测量故事的常规时间流;还有一个加权时钟，用来测量故事发生的时间。我们对一部动画片28集进行了实验，取得了良好的效果:转折点识别精度为78.6%，关键场景检测精度为100%。该方法是开发智能系统的第一步，以便其自动理解上传到互联网的电影、甚至业余视频等叙事作品中的故事。

引言

随着网络的普及和视频制作技术的发展，人们接触到大量的视频。其中，许多视频叙事作品(如电影、电视剧、动画片等)都是用镜头和场景来呈现某个故事的情节。通过对视频内容和结构的分析，实现的对视频故事的自动理解，有利于视频检索、视频推荐、视频注释等多种任务的实现。这种分析也可以用于对学生的电影制作教育或为用户提供更好的视频内容。人类通过观看视频来执行视频故事分析是很自然的。然而，手工方法是耗时的，并且不能扩展到大量的在线视频，因为它包含了重复的工作，例如为视频分配注释标签(Gomez-Uribe amp; Hunt, 2016;Soares amp; Viana, 2015)。因此，分析视频内容和故事的自动算法正在开发。与这些,注重检测等详细的视觉元素对象和动作或用简单的句子描述短视频剪辑的方法不同,我们旨在开发确定故事中的关键元素的方法,如“英雄”(主角),转折点,人物的角色,等等,以了解部件推动故事发展的作用。本文着重于对一个关键要素——转折点的检测。

根据广为流传的编剧/故事创作的概念框架三幕式 '范式 '1, 一个好故事是由三个主要的幕组成，每一幕在故事中有不同的功能（即设定、对立和解决)（Field, 2007）。在三幕式结构中，有高潮、中点、开端、开头、煽情事件、次第、困难、灾难、尾声、结局等不同的要素，这些构成了故事的主框架。我们从识别重要的故事元素开始，以理解视频中的故事为最终目标。在本文中，我们在Lotker (2016) 的新双钟理论针对检测一个叙事作品(如，电影剧本或戏剧)中的一个关键事件的基础上，来识别动画故事的多个转折点。这是首次将双时钟理论应用到视频分析中，通过在《摩登原始人（第一季）》中28集动画片的系列实验，我们证明了这一理论在识别动画故事转折点方面的强大作用。

美国著名剧作家 SydneyAlvinField 最著名的贡献。三幕剧的结构在他的第一本书《剧本:剧本写作的基础》(戴尔出版社， 1979年)中被提出，并作为指导方针和质量衡量标准受到作家和好莱坞电影制片人的欢迎。

本文提出了一个用于电影关键场景检测的原型专家系统。该系统建立在心理学家((Block amp; Grondin, 2014)和电影编剧指南((Field, 2007)研究的人类对时间的感知的整合之上。提出了一种基于这两种已知边缘源的电影转折点和关键场景检测专家规则。

具体来说，我们将Lotker的双时钟理论扩展到视频分析，并展示了动画片《摩登原始人》第一季的评估结果(28集，每集大约24分钟)。类似于Lotker在他的论文(Lotker, 2016)中所做的(即我们进行了实验，以确定每一集漫画故事的多个转折点。本文的其余部分组成如下:第2节介绍了理解视频内容的两大趋势以及Lotker的双时钟理论;第3节重点介绍了我们的方法与原Lotker方法的区别，并给出了实验设计;第4节提出并讨论了评价结果;第5节概述了进一步的步骤，以自动理解视频故事。

相关工作

视频理解领域的大多数工作都是基于计算机视觉算法，这些算法在基本的视频处理上表现良好。大多数视频理解工作都是基于计算机视觉算法的。这些算法在基本的视频理解任务中表现良好，例如识别视频剪辑中的动作(Peng amp; Schmid, 2016;Saha, Singh, Sapienza, Torr， amp; Cuzzolin, 2016;sigo -urdsson, Divvala, Farhadi， amp; Gupta, 2016;并为视频生成字幕(Kaufman, Levi, Hass-ner， amp; Wolf, 2016;Rohrbach等人，2017;Torabi, Pal, Larochelle， amp; Courville, 2015;Venugopalan等人，2015)。这些算法大多专注于分析短视频剪辑(长度小于30秒)，这使得它们非常适合探索“饮酒”或“散步”等视频中的详细(或低级)信息，但在理解这些视频中的高级事件(“享受派对”或“回家”)方面却很差。这可能是由于我们选择了低层次的特征，例如视频帧中的光流(Varol, Laptev， amp; Schmid, 2017)、VGG特征(Sigurdsson等，2016)或原始视频RGB帧(Venugopalan等，2015)。除了这些低层次的视觉特征外，一些作品还结合了音频特征(如频谱图)，以供视频理解(Evangelopoulos et al.， 2013;Li，Abu-El-Haija, Varadarajan， amp; Natsev, 2018)。这些视觉和听觉特征不足以识别高层次的活动，这些活动需要更多的诸如情绪或意图等基本信息。此外，还缺乏高质量的标记数据。在剪接动作样本和剪接标题样本的配对数据上，通常使用带监督学习的神经网络技术来分别识别动作和生成标题。然而，为了理解高层次的事件，人们需要复杂的信息，如情绪(喜欢或厌恶)或意图(留下或离开)，这些很难被标记。最先进的计算机视觉算法和故事分析之间的巨大差距似乎很难弥合，因此，需要新颖的方法来理解视频故事。作为对计算机视觉算法的补充，视频中多种模式的特征(即例如电影摘要(Evangelopoulos et al.， 2013)，推荐(Bougiatiotis amp; Giannakopoulos, 2018)和场景检测(Baraldi, Grana， amp; cuci -chiara, 2017;Zhu amp; Liu, 2009)。在语篇特征外延性方面，通常使用典型的方法，如单词计数(Baraldi et al.， 2017)、词包法、主题建模(Bougiatiotis amp; Giannakopoulos, 2018)和语篇显著性(Evangelopoulos et al.， 2013)，然后将它们与从其他模态中提取的特征融合在一起。人们已做了许多努力来提高这一过程的质量。虽然增加的信息增强了性能，但是增加的功能和故事开发之间的联系仍然很弱。在我们的工作中，我们从一个不同的角度来分析视频故事理解的问题，即提取代表故事结构的知识，从而向能够像人类一样理解故事的专家系统迈进了一步。

根据约翰·特鲁比(John Truby)的《22步写剧本》(22 steps of scriptwriting2)，一个好故事是由主人公和他的对手之间的互动驱动的(Truby, 2008)。只要故事中有人物，人物之间就会有一个社会网络。在视频叙事作品(如电影)方面，电影人物网络的思想比基于计算机视觉的算法更接近人类直觉。Weng, Chu, and Wu(2007)首次尝试基于电影角色在电影场景中出现的频率为电影构建一个角色网络，并基于角色网络检测电影的子故事。之后，Tran和他的团队发表了一系列关于构建电影角色网络的著作(基于角色的出现时间和共现，命名为CoCharNet) (Tran, Hwang, Lee， amp; Jung, 2016;(Tran amp; Jung, 2015)和CoCharNet在电影摘要中的应用(Tran, Hwang, Lee， amp; Jung, 2017)。最近他们解决了几个故事分析的问题，包括发现转折点(Lee amp; Jung, 2018)。他们提出了情感人物网络的概念，不仅要对人物的出现时间进行建模，还要对人物的情感关系(通过情感分析建模)进行建模。这种关系能够反映出故事中张力的变化，故事的张力被模拟成电影时间的一种功能。根据张力对电影时间的导数，他们确定了电影故事的转折点。他们还发现了电影角色之间的社区(一个角色社区是一群与主角有亲密或相似关系的角色，比如主人公的朋友和家人)，并定义了一个故事模型，这个模型利用情感角色网络来反映电影故事的结构。

作为另一种识别叙事作品中关键事件的尝试，Lotker提出了他的理论，将一个众所周知的心理学概念“时间感知”建模为不同的时钟(Lotker, 2016)。总的来说，时间在不同的情况下流动是不同的。例如，当一个人经历了如蹦极跳等危及生命的经历时，他所经历的时间比观众所经历的要慢得多。为此，我们设计了两个钟表模型:一个是固定时间(或舞台时间)，另一个是每个事件的时间。通过识别两个时钟之间的“时钟漂移”，lotker声称可以探测到叙事作品(如电影)中的关键事件。在Lotker(2016)中，他的创新理论被证明对三部莎士比亚戏剧剧本是有效的。在这里，我们将首次尝试将其扩展到视频分析领域。

与之前通过情感角色网络的变化来识别转折点的尝试(Lee amp; Jung, 2018)相比，我们的方法要简单得多，而且不依赖于角色的面部识别，这在许多电影中普遍存在的照明条件下可能具有挑战性。

2.。约翰·特鲁比是一位美国编剧、导演和编剧老师，以其 22步编剧理论而闻名。

方法

3.1双时钟理论

定义 1. 将 D = {（d1, . . . , dn ）|di isin; Z, di 1 minus; di = 1} 定义为叙事作品的对话序列，并且是以对话（按行）标数的有限序列，从d1 = 1 开始，在dn = N结束，其中N是对话中的总行数。

定义2.将一个事件时钟定义为Ce: [d1, dn] → [e1, en], 则 Ce （i） = ei = di, 即，事件时钟是对话行数ID的序列。

定义3. 将一个加权时钟定义为Cw: [d1, dn] → [w1, wn], 则

Cw （i） = wi = s（di ）

iisin;D

其中s（di）为第i行中的字数di

Lotker称，通过寻找这两个时钟之间的最大间隙，即 '时钟漂移'，可以检测到关键场景。间隙函数的定义为：

beta; = argmaxdi NCw minus; NCe （1）

其中NCw 和 NCe 是归一化的加权时钟（Cw）与事件时钟（Ce）, 而beta; 是时钟漂移（使用最小-最大归一化法）。需注意，归一化后两个时钟的范围都在单位区间[0,1]内，所以间隙函数实际上可求出两个时钟函数的差NCw minus; NCe的全局最大平均值。

数控w和数控e归一化加权时钟(C w)和事件时钟(C e),beta;是时钟漂移(min-max规范化使用)。注意，在归一化之后，两个时钟都在一个单位间隔内[0,1]，因此间隙函数实际上找到了时钟差函数NC w - NC e的全局最大值。我们将双时钟理论扩展到卡通电影中关键场景的检测。我们对原始方法的修改将在第3.2节中进行描述。

3.2将双时钟理论应用于动画电影

与lotker（2016）相比，我们的工作发生了一些变化：

1.我们使用电影字幕作为输入，而不是剧本。

2.我们同时考虑两个时钟之间差值的最大值和最小值，而不是只考虑最大值。

3.我们在一部电影中搜索多个转折点，而不是在一部电影中单单搜索一个转折点。

与典型的只有少量长场景(长对话)的戏剧相比，典型的电影有大量的短场景(短对话)。这使得电影的故事更加复杂，也带来了更多的对话。考虑到许多电影和其他视频的脚本不能公开使用，而字幕更容易找到(甚至自动生成语音识别工具)，我们把电影字幕作为输入，而不是脚本。此外，人类观众在看电影或动画片时没有读剧本，这可能与实际的电影并不完全相符，因此字幕是了解电影故事的一个重要和可靠的信息来源。此外，我们把我们的工作看作是理解视频的第一步，这些视频根本就不需要编写脚本。因此，我们选择字幕作为输入，而不是脚本。

然而，字幕文件通常并不指明每一句的说话者(即仅通过阅读字幕，人们无法知道谁说了什么)，而为了构建两个时钟，有必要区分不同人物说出的台词。为了解决这个问题，可以使用自动说话人识别工具对电影进行预处理。在我们的实验中，我们手动解决这个问题。

基于以上对双时钟理论和处理过的字幕文件的定义，我们将双时钟理论应用到一个配对的动画中。我们根据定义2和3创建两个时钟，并将它们归一化为一个单位间隔[0,1]。与Lotker(2016)的方法不同，我们通过寻找时钟差函数NCwminus;NCe的三个局部极值来检测每集动画片中的三个时钟漂移。这种选择背后的直觉是:对于大多数故事，情节中有不止一个转折点。根据约翰·特鲁比的22个编剧步骤，一个好的故事应该包括三个启发和决定，这些启发和决定推动了整个故事(特鲁比，2008)，这些启发和决定可以被视为故事的转折点。此外，我们假设不仅极大值指转折点，而且极小值也指转折点。为了证明这个假设，我们将gap函数修改为: beta; = argmaxdi |NCw minus; NCe|

为了检测两个时钟之间的三个“时钟漂移”

剩余内容已隐藏，支付完成后下载完整资料

资料编号：[235361]，资料为PDF文档或Word文档，PDF文档可免费转换为Word

您需要先支付 30元 才能查看全部内容！立即支付

课题毕业论文、开题报告、任务书、外文翻译、程序设计、图纸设计等资料可联系客服协助查找。

注册

找回密码

识别动画中的转折点外文翻译资料

Identifying turning points in animated cartoons

Chang Liu*, Mark Last, Armin Shmilovici

Three-Act Structure

您可能感兴趣的文章

最新文档

联系我们

登录

Identifying turning points in animated cartoons

Chang Liu*, Mark Last, Armin Shmilovici

Three-Act Structure

您可能感兴趣的文章

最新文档

联系我们