3
Data-driven Suggestions for Portrait Posing Hongbo Fu Xiaoguang Han Quoc Huy Phan The School of Creative Media, City University of Hong Kong Refinement Mode Exploration Mode ... ... (a) (d) (b) (e) (c) (f) ... ... Figure 1: With respect to the current pose of a subject being photographed (a) or a set of already-captured portraits (d), our technique automatically provides a ranked list of pose suggestions, which can serve as either visual guidance (b) or stimulate creativity (e) for portrait photographers. Our tool makes even amateur photographers create aesthetically pleasing portraits with great diversity (c and f). 1 Introduction Next to lighting, posing is the most challenging aspect of por- trait photography. A commonly adopted solution is to learn by example, which is beneficial for both trained photographers and novice users, especially when subjects have no clue about how to pose themselves. A collection of portrait images by professionals (e.g., [Perkins 2009]) provides a resource for photographers seek- ing inspiration for their own work. Such handful posing references (e.g., Posing App) have also been made available to smartphone platforms, which offer the unique possibility of directly overlaying camera view with a reference pose as visual guidance. Using a collection of reference poses, however, is effective only if a desired reference pose can be quickly identified. For example, to beautify a given pose, a reference pose that is the most similar to the given pose is needed. A traditional way to search for a target pose is to let users manually browse the collection, which might be orga- nized by category for faster access. It is obvious that the scalability of such completely manual browsing solutions is poor whereas the collection often keeps being expanded continuously given the limit- less variations of pose. In fact there are literally thousands of ways to pose a subject [Smith 2004]. In this work we intend to provide more intuitive ways for photog- raphers to get desired poses from a possibly large collection of reference poses. We present data-driven suggestions (Figure 1), which can serve as guidance and stimulate creativity, correspond- ing to two main operating modes in our system: refinement mode and exploration mode, respectively. The suggestions are automat- ically generated by real-time pose retrieval, using the current pose (i.e., for refinement mode) or a set of poses in existing portraits (i.e., for exploration mode) as a query. This is enabled by adopt- ing consumption-level depth cameras (i.e., the Microsoft Kinect in our system as shown in Figure 1 (left)), which allow real-time and reasonably accurate estimation of the current pose. We will also show our preliminary results on context-based retrieval of reference poses relevant to the estimated scene semantics (see the accompa- nying video). Our work takes the first step of using consumer-level depth sensors towards more intelligent cameras for computational photography. Our preliminary experiments indicate that our tool is easy to use and useful for unskilled photographers. 2 Related Work Data-driven suggestions for creativity support have been an active research topic very recently. Focusing on open-ended and creative discovery, various creativity support tools have been introduced for example in the context of freehand drawing [Lee et al. 2011] or 3D modeling [Chaudhuri and Koltun 2010]. Such ideas have also been applied to assisting non-professionals for taking more professional photos by making optimal composition suggestions [Cheng et al. 2010; Yin et al. 2012]. Our data-driven posing guide adds a new tool into the toolbox for creativity support. The problem of pose retrieval, i.e., finding the most similar human poses to a given query pose, has been extensively studied in the fields of computer vision and graphics. The previous works can be mainly categorized into two groups: retrieving 2D poses of humans in images or videos (e.g., [Jammalamadaka et al. 2012]), and re- trieving 3D poses from human motion data (e.g., [Choi et al. 2012]). The former often needs to address the problem of pose estimation while the latter typically employs temporal pose information, which is lacking in our problem. Our work is based on the success of pose retrieval in previous works, and focuses on a completely new appli- cation and its specific user interface design. Pose matching games [Yeh et al. 2006; Projects 2013] that require a player to match a specific target pose are popular. Such games are easily enabled by consumer-level depth cameras like the Microsoft Kinect, which however mainly serve as the eyes of a computer, i.e., more like webcams instead of cameras for photographers. Kinect has also been used to suggest desired manga poses based on the player’s kinematics [Nara et al. 2013]. Finally it might be inter- esting to integrate this work with the technique of parametric body reshaping [Zhou et al. 2010] to achieve visually more pleasing re-

Data-driven Suggestions for Portrait Posingsweb.cityu.edu.hk/hongbofu/projects/PoseSuggestions_SigA13/pose...Data-driven Suggestions for Portrait Posing Hongbo Fu Xiaoguang Han Quoc

Embed Size (px)

Citation preview

Data-driven Suggestions for Portrait Posing

Hongbo Fu Xiaoguang Han Quoc Huy PhanThe School of Creative Media, City University of Hong Kong

Re�n

emen

t Mod

eEx

plor

atio

n M

ode

... ...

(a)

(d)

(b)

(e)

(c)

(f )......

Figure 1: With respect to the current pose of a subject being photographed (a) or a set of already-captured portraits (d), our techniqueautomatically provides a ranked list of pose suggestions, which can serve as either visual guidance (b) or stimulate creativity (e) for portraitphotographers. Our tool makes even amateur photographers create aesthetically pleasing portraits with great diversity (c and f).

1 Introduction

Next to lighting, posing is the most challenging aspect of por-trait photography. A commonly adopted solution is to learn byexample, which is beneficial for both trained photographers andnovice users, especially when subjects have no clue about how topose themselves. A collection of portrait images by professionals(e.g., [Perkins 2009]) provides a resource for photographers seek-ing inspiration for their own work. Such handful posing references(e.g., Posing App) have also been made available to smartphoneplatforms, which offer the unique possibility of directly overlayingcamera view with a reference pose as visual guidance.

Using a collection of reference poses, however, is effective only ifa desired reference pose can be quickly identified. For example, tobeautify a given pose, a reference pose that is the most similar to thegiven pose is needed. A traditional way to search for a target poseis to let users manually browse the collection, which might be orga-nized by category for faster access. It is obvious that the scalabilityof such completely manual browsing solutions is poor whereas thecollection often keeps being expanded continuously given the limit-less variations of pose. In fact there are literally thousands of waysto pose a subject [Smith 2004].

In this work we intend to provide more intuitive ways for photog-raphers to get desired poses from a possibly large collection ofreference poses. We present data-driven suggestions (Figure 1),which can serve as guidance and stimulate creativity, correspond-ing to two main operating modes in our system: refinement modeand exploration mode, respectively. The suggestions are automat-ically generated by real-time pose retrieval, using the current pose(i.e., for refinement mode) or a set of poses in existing portraits(i.e., for exploration mode) as a query. This is enabled by adopt-ing consumption-level depth cameras (i.e., the Microsoft Kinect inour system as shown in Figure 1 (left)), which allow real-time andreasonably accurate estimation of the current pose. We will alsoshow our preliminary results on context-based retrieval of referenceposes relevant to the estimated scene semantics (see the accompa-

nying video). Our work takes the first step of using consumer-leveldepth sensors towards more intelligent cameras for computationalphotography. Our preliminary experiments indicate that our tool iseasy to use and useful for unskilled photographers.

2 Related Work

Data-driven suggestions for creativity support have been an activeresearch topic very recently. Focusing on open-ended and creativediscovery, various creativity support tools have been introduced forexample in the context of freehand drawing [Lee et al. 2011] or 3Dmodeling [Chaudhuri and Koltun 2010]. Such ideas have also beenapplied to assisting non-professionals for taking more professionalphotos by making optimal composition suggestions [Cheng et al.2010; Yin et al. 2012]. Our data-driven posing guide adds a newtool into the toolbox for creativity support.

The problem of pose retrieval, i.e., finding the most similar humanposes to a given query pose, has been extensively studied in thefields of computer vision and graphics. The previous works can bemainly categorized into two groups: retrieving 2D poses of humansin images or videos (e.g., [Jammalamadaka et al. 2012]), and re-trieving 3D poses from human motion data (e.g., [Choi et al. 2012]).The former often needs to address the problem of pose estimationwhile the latter typically employs temporal pose information, whichis lacking in our problem. Our work is based on the success of poseretrieval in previous works, and focuses on a completely new appli-cation and its specific user interface design.

Pose matching games [Yeh et al. 2006; Projects 2013] that requirea player to match a specific target pose are popular. Such games areeasily enabled by consumer-level depth cameras like the MicrosoftKinect, which however mainly serve as the eyes of a computer, i.e.,more like webcams instead of cameras for photographers. Kinecthas also been used to suggest desired manga poses based on theplayer’s kinematics [Nara et al. 2013]. Finally it might be inter-esting to integrate this work with the technique of parametric bodyreshaping [Zhou et al. 2010] to achieve visually more pleasing re-

(a)(a)

(b) (c)

Figure 2: Main user interface. (a) Input view of the Kinect, con-taining a low-resolution RGB image overlaid with the tracked skele-ton. (b) Input view of a high-resolution camera, which is roughlyaligned with the Kinect (Figure 1 (left)). (c) Suggested poses withrespect to the current pose for pose refinement.

sults.

3 Methodology

We first introduce our system from the user’s perspective. Our dis-cussions below focus on photographing individual subjects. Posesfor men and women usually have their own characteristics. There-fore, for every new subject the photographer needs to specify thegender of the subject, unless the photographer is interested in ap-plying female (male) poses to male (female) subjects. To capturea new picture with our tool, the photographer chooses one of twomutually exclusive suggestion modes: refinement mode and explo-ration mode, which will be explained in detail below. See Figure 2for a screen shot of our main user interface in the refinement mode.

Refinement Mode. This mode allows the photographer to fine-tune the current pose of the subject with the help of one or moresuggested reference poses (Figure 1 (a-c)). It is extremely beneficialwhen either the photographer or the subject has only a rough ideaof a desired pose. To start with, the subject poses himself or herself,or is directed by the photographer to a rough pose in mind (Figure 1(a)), possibly without any reference pose as a visual guide. Next thephotographer may manually click the “search” button to get a set ofsuggested poses with respect to the current pose, or wait for oursystem to automatically return the relevant suggestions (Figure 1(b)). The suggested poses are similar to the current pose but aregenerally aesthetically more pleasing.

Exploration Mode. It can be difficult to remain creative, espe-cially for unskilled photographers. This mode serves as a creativitystimulation tool and is designed for photographers who are bored atthe existing poses but are running out of pose ideas. To inspire thephotographer, our tool provides him or her with a set of new poses(Figure 1 (e)) that are very dissimilar to the already taken poses(Figure 1 (d)). The current pose of the subject is less important inthis mode (cf. the refinement mode). Before the photographer trig-gers the search process, he or she has to specify a directory wherethe existing portraits taken by our tool are stored. Otherwise, theposes already taken in the current session, each of which has itsassociated skeleton from the Kinect, is used as a query to find inter-esting poses. The newly taken pictures will be automatically addedas part of a query for the next suggestion search (see the accompa-nying video).

No matter which mode is adopted, the photographer will be givena small set of suggestions, which may be either ignored or usedby the photographer. To use the suggested poses, the photographer

Inpu

tR

efin

edR

efer

ence

(a) (b) (c) (d) (e)

Figure 3: Representative results produced in the refinement mode.Our system allows photographers to easily refine the poses of sub-jects being photographed using the automatically suggestions asreferences. See the supplemental materials for more results.

picks one reference pose appealing to him or her the most. Next thephotographer uses the reference pose to guide the subject towardsto a desired pose, which is usually a variant of the reference pose.Finally the photographer hits the software shutter button to take oneshot. Please refer to the accompanying video for a live demo.

Context Option. Our system also supports a context-based searchoption, which can be used together with either the refinement orexploration model. Once the context option is enabled, our systemreturns only the suggestions that match the current context. Forexample, if a wall context is detected, only poses interacting with awall will be suggested.

Implementation. Now we give an overview of our system froma technical point of view. We formulate our problem as a pose-retrieval problem. We manually maintain a large collection of pro-fessional portraits and represent the underlying pose in each por-trait by a standard skeleton which is compatible with that used bythe Kinect. Inspired by the work of [Jammalamadaka et al. 2012],we take an angle-based pose representation, which associates eachbone with a pair of in-plane angle and out-of-plane angles, resultingin a location- and size-invariant representation. In the refinementmode, we take the input pose as a query and retrieve the top-Nmatched poses from the dataset. The returned list of suggestionsmight be very similar to each other. Our system allows the photog-rapher to control the degree of diversity of suggestions by adopt-ing the Maximal Marginal Relevance criterion, which is commonlyused for a similar purpose in information retrieval [Chaudhuri andKoltun 2010]. In the exploration mode, we derive a formulationwhich prefers new poses that are very dissimilar to any of the exist-ing poses and meanwhile can be achieved by modifying the currentpose with ease. Please refer to the supplemental material for thetechnical details on individual components in our system, includ-ing the formulation of distance metric for pose similarity and theimplementation details for context-based pose retrieval.

4 Social Impact

Subjects in photography often heavily rely on the experience of thephotographer to guide them to flattering poses. Even experienced

(a) Refinement Mode (b) Exploration Mode

Figure 4: The exploration mode allows photographers to easily explore new poses (b) that are very different from the existing poses (a).The portraits in (b) were created one by one (from left to right) with the newly created ones being added into the set of “existing” poses forderiving new suggestions. See the supplemental materials for the complete set of results.

models still need the photographer to fine-tune their poses. How-ever, posing might take a photographer a few years of trial and errorto master, given numerous high-level guidelines to follow and prac-tice. Yet most people who take pictures everyday often have norelevant training and thus often face at least two challenges: howto turn an awkward arrangement of body parts to a graceful one,and how to avoid getting stuck in creativity. We believe that theproposed system benefits not only professional photographers butalso everyday users, who are interested in improving their portraitposing skills.

We have done a pilot study, for which we recruited 10 participants,3 men and 7 women. None of them had received any professionaltraining in photography and thus knew little about how to posethemselves gracefully. The participants came in pairs and each ofthem served as a photographer and a model in turn, i.e., alternat-ing the roles of model and photographer. We thus had 10 unskilledphotographers in total. To examine the refinement mode, the pho-tographer was asked to guide the paired model to achieve 5 differ-ent poses in the photographer’s mind. To test the exploration mode,the photographer was required to capture another 5 poses in mind,which this time should be as diverse as possible and as dissimilaras possible to the previous 5 poses. Through a questionnaire-basedsurvey at the end of the study, the participants confirmed that thesuggested poses were useful (average score of 4.2 in a range where1 is “strongly disagree” and 5 is “strongly agree”) and portrait pos-ing with suggestions is easy to use (average score of 3.9). Figures 3and 4 show the representative results in the refinement and explo-ration modes, respectively. Please refer to the supplemental materi-als for more results.

5 Future Work

First, we plan to conduct a user study to quantitatively evaluatethe effectiveness of our tool by comparing the portraits taken withand without data-driven suggestions. Second, our current systemfocuses on photographing individual models. The proposed ideascould be easily extended to group portrait photography, where ex-tra information such as the number of subjects and their relativeheight could be potentially incorporated for offering more construc-tive suggestions. Third, the recent technology from Extreme Realityenables real-time, software-only full body 3D motion capture, us-ing a single standard 2D camera. It would be interesting to adoptsuch technologies and port our system to the popular smartphoneplatforms, making data-driven posing suggestions available to ev-

eryday users. Lastly, this work focuses on searching for desiredposes or human positions given a specific point of view. Anotherclosely related problem is the selection of the best view given aspecific 3D arrangement of body parts and would be interesting toexplore in the future.

Acknowledgements

We thank the reviewers for their constructive comments, the userstudy participants (namely, Victor Dibia, Audrey Samson, GuoleiZhang, Pengfei Xu, Qiaochu Mei, Qingkun Su, Xiaoyan Shen,Wanqi Li, Xiaozhu Zhang, Jiayue Yu, Xiaoyong Shen, Siyu Li) fortheir time, and Michael Brown for video narration. We are grate-ful to Flickr for allowing us to download so many images. Thiswork was substantially supported by the Seed Grant from the CityUniversity of Hong Kong (No. 7003058).

References

CHAUDHURI, S., AND KOLTUN, V. 2010. Data-driven suggestions for creativitysupport in 3d modeling. ACM Trans. Graph. 29, 6, 183:1–183:10.

CHENG, B., NI, B., YAN, S., AND TIAN, Q. 2010. Learning to photograph. In MM’10, ACM, 291–300.

CHOI, M. G., YANG, K., IGARASHI, T., MITANI, J., AND LEE, J. 2012. Retrievaland visualization of human motion data via stick figures. Computer Graphics Fo-rum 21, 7, 2057–2065.

JAMMALAMADAKA, N., ZISSERMAN, A., EICHNER, M., FERRARI, V., AND JAWA-HAR, C. 2012. Video retrieval by mimicking poses. In ICMR, ACM, 34.

LEE, Y. J., ZITNICK, C. L., AND COHEN, M. F. 2011. Shadowdraw: real-time userguidance for freehand drawing. ACM Trans. Graph. 30, 27:1–27:10.

NARA, Y., FUJIMURA, W., KOIDE, Y., KUNITOMI, G., AND SHIRAI, A. 2013.Kinemotion: context controllable emotional motion analysis method for interactivecartoon generator. In ACM SIGGRAPH 2013 Posters.

PERKINS, M. 2009. 500 Poses for Photographing Women: A Visual Sourcebook forPortrait Photographers. Amherst Media, Incorporated.

PROJECTS, L., 2013. Sculpture lens - strike a pose. Cleveland Museum of Art.SMITH, J. 2004. Posing for Portrait Photography: A Head-to-Toe Guide. Amherst

Media, Incorporated.YEH, J.-S., WEN, C.-L., CHIANG, J.-Y., WANG, L.-K., HUAN, T.-H., CHEN,

D.-Y., LIN, L.-F., CHUANG, Y.-Y., YU, M.-Y., CHEN, B.-Y., ET AL. 2006.Bodyrevolution: Hand-shadow illusions and 3d DDR based on efficient model re-trieval. In ACM SIGGRAPH 2006 Emerging technologies, 13.

YIN, W., MEI, T., AND CHEN, C. W. 2012. Crowdsourced learning to photographvia mobile devices. In ICME ’12, 812–817.

ZHOU, S., FU, H., LIU, L., COHEN-OR, D., AND HAN, X. 2010. Parametricreshaping of human bodies in images. ACM Transactions on Computer Graphics29, 4, Article No. 126.