下载  >  课程资源  >  讲义  > Data-driven visual similarity for cross-domain image matching

Data-driven visual similarity for cross-domain image matching 评分

Data-driven visual similarity for cross-domain image matching ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2011 TOG Homepage Volume 30 Issue 6, December 2011
Data-driven Visual Similarity for Cross-domain Image Matching 154:3 for a given image, so it might be interesting to combine both the highly complex question of visual similarity into a fairly star dard problem in discriminative learning. Given some suitable way Within the text retrieval community, the tf-idf normaliza of representing an image as a vector of features, the result of the tion [Baeza-Yates and ribeiro-Neto 1999] used in the bag-of-words discriminative learning is a set of weights on these features that approaches shares the same goals as our work -trying to re-weight provide for the best discrimination. We can then use these same he different features (words in text, or"visual words in im- weights Lo COMpute visual similarily. Given the learned, query- ages [Sivic and Zisserman 2003 ])based on their relative frequency dependent weight vector wa, the visual similarity between a query The main difference is that in tf-idf, each word is re-weighted ind pendently of all the others, whereas our method takes the interac age lg and any other image/sub-image i can be defined simply lions belween all of the leatures into account S(l,1)=w2x (1) Most closely related to ours are approaches that try to learn the statistical structure of natural images by using large unlabeled im- where xi is li,'s extracted feature vector age sets, as a way to define a beller visual similarily. In To learn the feature weight vector which best discriminates an im- context of image retrieval, Hoiem et al. [2004] estimate the un age from a large"background"dataset, we employ the linear Sup conditional probability density of images off-line and use it in a Bayesian framework to find close matches; Tieu and Viola [2004] port Vector Machine(SVM) framework. We set up the learning roblem following [Malisiewicz. et al. 20ll which has demon- use boosting at query-time to discriminatively learn query-specifie features. However, these systems require multiple positive query strated that a linear SVm can generalize even with a single positive example, provided that a very large amount of negative data is avail images and/or user guidance, whereas most visual matching tasks able to"constrain the solution. However, whereas in [ Malisiewicz that we are interested in need to work automatically and with only et al. 2011 the negatives are guaranteed not to be members of the a single input image. Fortunately, recent work in visual recognition positive class( that is why they are called negatives), here this is has shown that it's possible to train a discriminative classifier using a single positive instance and a large body of negatives [ Wolf et al not the case. The"negatives " are just a dataset of images randomly 2009; Malisiewicz et al. 2011], provided that the negatives do not sampled from a large Flickr collection, and there is no guarantee that some of them might not be very similar to the"positive?query contain any images similar to the positive instance. In this work, we image. Interestingly, in practice, this does not seem to hurt the adapt this idea to image retrieval, where one cannot guarantee that the"negative set" will not contain images similar to the query(on SVM, suggesting that this is yet another new application where the SVM formalism can be successfully applied the contrary, it most probably will!). What we show is that, surpris- ingly, this assumption can be relaxed without adversely impacting The procedure described above should work with any sufficiently the performance powerful image teature representation. For the majority of our ex periments in this paper, we have picked the histogram of Oriented 2 Approach Gradients(HOG)template descriptor Dalal and Triggs 20051, due The problem considered in this paper is the following: how to com- Lo its good performance for a variety of tasks, its speed, robustness pute visual similarity between images which would be more con- adaptability to sliding window search, and popularity in the com- sistent with human expectations. One way to attack this is by de munity. We also show how our learning framework can be used signing a new, more powerful image representation. However we with Dense-SIFT(D-SIFT)template descriptor in Section 2.4 believe that existing representations are already sufliciently power- ful, but that the main difficulty is in developing the right similarit To visualize how the SVM captures the notion of dala-driven distance function, which can""pick" which parts of the representa uniqueness, we performed a series of experiments with simple, syn tion are most important for matching. In our view, there are two thetic data. In the first experiment, we use simple synthetic figures requirements for a good visual similarity function: 1) It has to fo- (a combination of circles and rectangles)as visual structures on the cus on the content of the image(the "what), rather that the style query image side. Our negative world consists of just rectangles of (the"how)e.g, the images on Figure I should exhibit high vi- multiple sizes and aspect ratios. If everything works right, using ual similarity despite large pixel-wise differences. 2) It should be the sVM-learned weights should downplay the features(gradients scene-dependent, that is, each image should have its own unique in HoG representation) generated from the rectangle and increase similarily function Chat depends on its global contenL. This is iIm the weights of features generated by the circle, since they are more portant since the same local feature can represent vastly different unique. We use the HoG visualization introduced by Dalal and visual content, depending on what else is depicted in the image Triggs 2005] which displays the learned weight vector as a gradi- ent distribution image. As Figure 4(a) shows, our approach indeed 2.1 Data-driven Uniqueness suppresses the gradients generated by the rectangle The visual similarity function that we propose is based on the idea One of the key requirements of our approach is that it should be of"data-driven uniqueness". We hypothesize that what humans find able to extract visually important regions even when the images are important or salient about an image is somehow related to how un from different visual domains. We consider this case in our next ex- usual or unique it is. If we could re-weight the different elements periment, shown on Figure 4(b). Here the set of negatives includes of an image based on how unique they are, the resulting similarity two domains- black-on -white rectangles and white-on-black rect- nction would, we argue, answer the requirements of the previous ngles. By having the negative set include both domains, our ap- section. However, estimating uniqueness of a visual signal is not proach should downplay any domain-dependent idiosyncrasies both It all an easy task, since it requires a very detailed model of our from the point of view of the query and target domains. Indeed, as entire visual world, since only then we can know if something is Figure 4(b)shows, our approach is again able to extract the unique truly unique. Therefore, instead we propose to compute uniqueness structures corresponding to circles while downplaying the gradients in a data-driven way- against a very large dataset of randomly generated due to rectangles, in a domain-independent way selected images. We can also observe this effect on real images The venice bridge The basic idea behind our approach is that the features of an image painting shown in Figure 5 initially has high gradients for building that exhibit high"uniqueness "will also be the features that would boundaries. the bridge and the boats. However since similar build- best discriminate this image (the positive sample)against th ing boundaries are quite common, they occur a lot in the randomly of the data(the negative samples). That is, we are able to map sampled negative images and hence, their weights are reduced CM Transactions on Graphics, Vol 30, No 6, Article 154, Publication date: December 2011 154:4 A. Shrivastava et al Initial Features Learnt Features 服 Entire World of Images Uniform Weight Features Top Match (a) Learnt Weight Features Top Match Entire World of Images J Figure 5: Learning data-driven uniqueness: Our approach down weighs the gradients on the building s since they are not as rare as Figure 4: Synthetic example oflearning data-driven"uniqueness". the circular gradients from the bridge In each case, our learned similarity measure boosts the gradients belonging to the circle because they are more unique with respect examples during training from a relatively small number of images to a synthetic world of rectangle images (10, 000). But more iinpurlanlly, as argued by [Hoie et al. 20041 sub-window search is likely to dramatically increase the number of 2.2 Algorithm Description good matches over the traditional full-image retrieval techniques We set up the learning problen using a single positive and a very 2.3 Relationship to saliency large negative set of samples similar to [Malisiew icz et al. 201 Each query image(Ig) is represented with a rigid grid-like HoG We found that our notion of data-driven uniqueness works surpris feature template(xg). We perform binning with sizing heuristics proxy for p ng which attempt to limit the dimensionality of (xg) to roughly 1-5K look,)-a topic of considerable interest to computer graphics. We which amounts to 150 cells for HoG template. To add robustness ran our algorithm on the human gaze dataset from Judd et al. 2009] to small errors due to image misalignment, we create a set of extra using a naive mapping from learned Hog weights to predicted pixel positive data-points, P, by applying small transformations(shift, saliency by spatially summing these weights followed by normal- scale and aspect ratio) to the query image Ig, and generating x for ization Figure 6 compares our saliency prediction against standard each sample. Therefore, the svm classifier is learned using la and saliency methods(suMmarized in JUdd el al. 20091). While our P as positive samples, and a set containing millions of sub-images score of 74%(mean area under roC curve) is below [Judd et al N(extracted from 10, 000 randomly selected Flickr images),as 2009] who are the top performers at 78%(without center prior negatives. Learning the weight vector wg amounts to minimizing beat most classic saliency methods such as Itti et al. [2000 the following convex objective function only obtained 62%. After incorporating a simple gaussian cen- ter prior, our score raises to 81.9%o, which is very close to 83. 8% of [Judd et a.2009」 ∑ +∑h(-w7x)+川|wl2(2) 2. 4 Other Features Our framework should be able to work with any rigid grid-like im- We use -IBSvM [Chang and Lin 2011] for learning wa with a com age representation where the template captures feature distribution mon regularization parameter X=100 and the standard hinge loss in some form of histogram of high-enough dimensionality We per function h(a)= max(0, 1-a). The hinge-loss allows us to use formed preliminary experiments using the dense SIFT (D-SIFT the hard-negative mining approach [Dalal and Triggs 2005] to cope Template descriptor (sinilar Lo Lazebnik et al. 2009D) within our ith millions of negative windows because the solution only de framework for the task of Sketch-to-Image Matching(Section 3. 2) pends on a small set of negative support vectors. In hard-negative The query sketch (Ia)was represented with a feature template(xa) mining, one first trains an initial classifier using a small set of train- of D-SIFT and sizing heuristics(Section 2. 2) produced 35 cells ing examples, and then uses the trained classifier to search the full or the template(128 dimensions per cell). Figure 10 demonstrates training set exhaustively for false positives (hard examples). Once the results of these preliminary experiments, whe ere our learning sufficient number of hard negatives are found in the training set, one framework improves the performance of D-SIFT baseline(without learning) indicating that our algorithm can be adapted to a different nate between learning wa given a current set of hard-negative ex feature representation amples, and mining additional negative examples using the current Wg as in [Dalal and Triggs 2005. For all experiments in this pa- 3 Experimental Validation per, we use 10 iterations of hard-mining procedure; with each itera- tion requiring more time than the previous one because it becomes To demonstrate our approach, we performed a number of im- harder to find hard-negatives as the classifier improves. Empiri- age matching experiments on different image datasets, comparing cally, we found that more than 10 iterations did not provide enough against the following popular baseline methods improvement to justify the run-time cost. Tiny Images: Following [Torralba et al. 2008 we re-size all im- The standard sliding window setup [Dalal and Triggs 2005] is used ages to 32x32, stack them into 3072-D vectors, and compare them o evaluate all the sub-windows of each image. For this the trained using Euclidean distance classifier is convolved with the hog feature pyramid at multiple GIST: We represent images with the GIst [Oliva and Torralba ales for each image in the database. The nunber of pyramid lev- 2006] descriptor, and compare them with the Euclidean distance els controls the size of possible detected windows in the image. W use simple non-maxima suppression to remove highly-overlapping Bow: We compute a Bag-of-Words representation for each image redundant matches. While the use of sub-window search is expen using vector-quantized sifT descriptors [Lowe 2004] and compare sive, we argue that it is crucial to good image matching for the the visual word histograms(with tf-idf normalization following reasons. First, it allows us to see millions of negative and Zisserman 2003] ACM Transactions on Graphics, VoL 30, No. 6, Article 154, Publication date: December 201 Data-driven Visual Similarity for Cross-domain Image Matching 154:5 Top-5 Datasct sizc 1,49011,490101,4901,001,490 a Our Approach ( with Center Prior)(0. 8185 GIST 0.01060.010600106 0.0106 Our Approach(0.7304) 0.01060.01060.0106U.0106 [ udd et al.2009](0838) Spatial Pyramid|0.3417 0.3063 0.24 0.1967 a[udd et al. 2009](without Center Prior)(0.78) Our Approach0.65880.63930.5890 0.5836 a[Torralba/Rosenholtz](0.7) To a[ltl and Koch 2000](0.62) Dataset size 1,49011,490101,4901,001,490 Center Prior(0.797) GIST 0.19210.14170.14170.1417 ■ Color(0.6 Tiny lmag 0.07130.05180.05180.0518 Chance(0.5) Spatial Pyramid0.48880.4150.34480.2792 Our approach0.68740.687406619 0.6150 Area under RoC Curve Figure 6: The concept of data-driven uniqueness can also be used Table 1: Instance retrieval in Holidays dataset FlickrIM. We as a proxy to predict saliency for an image. Our appoach per report the mean true positive rate from the top-k image matches as forms better than individual features(such as Itti et al. and Tor- a function of increasing dataset size (averaged across a set of 50 ralba rosenholtz, see (Judd et al. 2009)) and comparable to Judd Holidays query images) et al./20091 Spatial Pyramid: For each image, we compute spatial pyramid drastically reducing the ranks under consideration from the top 100 to just the top 5, our rate of true positives drops by only 3%(which [Lazebnik et al. 2009] representation with 3 pyramid levels using Dense-SIFT descriptors of 16x16 pixel patches computed over a attests to the quality of our rankings). For a dataset of one mil grid with spacing of 8 pixels. We used vocabulary of 200 visual ion images and a short-list of 100, legou et al. 2008 return 629 true positives which is only slightly better than our results; how words. The descriptors are compared using histogram intersection ever. their algorithm is designed for instance recognition whereas pyramid matching kernels as described in [Lazebnik et al. 2009 our approach is applicable lo a broad range of cross-domain visual Normalized-HOG(N-HoG ): We represent each image using the tasks same HoG descriptor as our approach, but instead of learning a query-specific weight vector, we match images directly in a 3.2 Sketch-to- Image Matching nearest-neighbor fashion. We experimented with different simi- Matching sketches to images is a difficult cross-domain visual sim larity metrics and found a simple normalized HoGi(N-Hogi) to larity task. While most current approaches use specialized methods give the best performance. The N-HoG weight vector is defined tailored lo sketches, here we apply exactly the saine procedure as as a zero-centered version of the query's HoG features wg before, without any changes. We collected a dataset of 50 sketches mean(xa). Matching is performed using Equation 1, by re (25 cars and 25 bicycles)to be used as queries(our dataset includes placing the learned weight vector with N-HoG weight vector both amateur sketches from the internet as well as freehand sketches collected from non-expert users). The sketches were used to query In addition, we also compare our algorithm to Googles recently re into the pascal voc dataset [Everingham et al. 2007, which is leased Search-by-Image feature It should be noted that the retrieval handy for evaluation since all the car and bicycle instances have dataset used by Google is orders of magnitude larger than the tens heen labeled. Figure 8(top) show some example queries and the of thousands of images typically used in our datasets, so this corresponding top retrieval results for our approach and the base parison is not quite fair. But while Googles algorithm shows a lines. It can be seen that our approach not only outperforms all of reasonable performance in retrieving landmark images with simi- lages show ing the target obj lar illumination, season and viewpoint, it does not seem to adapt similar pose and viewpoint as the query sketch ell to photos taken under different lighting conditions or photos from different visual domains such as sketches and paintings(see For quantitative evaluation, we compared how many car and bicy Figure 9) re retrieved in the top-K and bicycle sketches respectively. We used the bounded mean Average Preci- 3.1 Image-to-Image Matching sion(mAP) metric used by [Jegou et al. 2008]. We evaluated the While image retrieval is not the goal of this paper, the Cbir com- performance of our approach(using HoG and D-SIFT) as a function Imunity has produced a lot of good datasets that we can use fuI of dataset size and compare it with the multiple baselines, showing evaluation here we consider the instance retrieval setting using the robustness of our approach to the presence of distractors. For the INRIA Holidays dataset introduced by Jegou et al. [2008] and each query, we start with all images of the target class from the one million random distractor Flickr images from [Hays and efros dataset, increase the dataset size by adding 1000, 5000 images and 2007] to evaluate performance. The goal is to measure the quality finally the entire PaSCaL VOC 2007 dataset. Figure 10(a)and(b) of the top matching images when the exact instances are present in show mAP as a function of dataset size for cars and bicycles, re the retrieval dataset. For evaluation, we follow Jegou et al. 2008 spectively. For the top 150 matches, we achieve a mAP of 67%o and measure the quality of rankings as the true positive rate from for cars and 54% for bicycles(for Learnt-HoG). We also ran our the list of top k= 100 matches as a function of increasing dataset algorithm on the sketch-Based Image retrieval(sbir)benchmark size. Since the average number of true positives is very small for Dataset [Eitz et al. 2010]. For the top 20 similar images ranked by the Holidays dataset, we also perform the evaluation with smaller k users,we retrieve 51% of images as top 20) matches, compared 63% We compare our approach against GIST, Tiny Images and Spatial using a sketch-specific method of [Eitz et al. 2010] d baselines described in Section 3 on 50 random Holidays query images and evaluate the top 5 and 100 matches for the same 3.3 Painting-to-Image Matching dataset sizes used in [ Jegou et al. 2008 As another cross-domain image matching evaluation, we measured the performance of our system on matching paintings to images Table 1 demonstrates the robustness of our algorithm to adding dis- tractor images-the true positives rate only drops from 69% to 62% Retrieving images similar to paintings is an extremely difficuIt when we add iM distractors(which is of similar order as in [Jegou Maximum recall is bounded by the number of images being retrieved et al. 2008]), outperforming the state-of-art spatial pyramid match- For example, if we consider only top-150 matches the maximum true posi ing Lazebnik et al. 2009]. It is important to note that even after tives would be 150 images CM Transactions on Graphics, Vol 30, No 6, Article 154, Publication date: December 2011 154:6 A. Shrivastava et al z GIsT Tiny Images BoW Input sketch HOG Our Approach Input Palnting HOG ch Figure 7: Qualitative comparison of our approach against baselines for Sketch-to-Image and painting-to-image matching Input sketch Our Top Matches Input Sketch Our Top matches Our Top Matches Input Painting Our Top Matches Figure 8: A few more qualitative examples of top matches for sketch and painting queries h Google Top Matches Google Top Matches our Top Matches Input sketch Our Top Matches Google Top matches nput Inage ur Top Matches nput Painting Our Top Matches Figure 9: Qualitative comparison of our approach with Google's 'Search-by-Image'feature. While our approach is robust to illumination changes and performs well across different visual domains, Google image search fails completely when the exact matches are not in the database ACM Transactions on Graphics, VoL 30, No 6, Article 154, Publication date: December 201 Data-driven Visual Similarity for Cross-domain Image Matching 154:7 2 Internet Re-photography We were inspired by the recent work on computational re- hotography [Bae et al. 2010 which allows photographers to take modern photos that match a given historical photograph. However, the approach is quite time-consuming, requiring the photographer to go "on location"to rephotograph a particular scene. What if, in- stead of rephotographing ourselves, we could simply find the right (a) mAP for Car Sketches (b) mAP for Bicycle Sketches modern photograph online? This seemed like a perfect case for our cross-domain visual matching, since old and new photographs Tiny Images e.Dense SIFT Our Approach ( Learnt D-SIFn look quite different and would not be matched well by existing ap- Gist 一,,- Normalized-HOG Our Approach (Learnt N-HOG pI roaches SIFT-Bow Figure 10: Sketch-to-Image evaluation. We match car/bicycle We again use the 6.4M geo-tagged Flickr images of [Hays and sketches to sub-images in the pascal voc 2007 dataset and mea- Efros 2007], and given an old photograph as a query, we use our sure performance as the number of distractors increases. method to find its top matches from a pre-filtered set of 5, 000 im ages closest to the old photograph,s location(usually at least the problem because of the presence of strong local gradients due to city or region is known). Once we have an ordered set of imag brush strokes(even in the regions such as sky) For this experiment, matches, the user can choose one of the top five matches to gener- we collected a dataset of 50 paintings of outdoor scenes in a diverse ate the best old/new collage Re-photography examples can be seen set of painting styles geographical locations. The retrieval set was in Figure 12. sub-sampled from the 6. 4M GPS-tagged Flickr images of [Hays nd Efros 2008]. For each query, we created a set of 5, 000 im- 4.3 Painting2GPs ages randomly sampled within a 50 mile radius of each paintings Wouldn't it be useful if one could automatically determine from location(to make sure to catch the most meaningful distractors which location a particular painting was painted? Matching paint- and 5, 000 random images. Qualitative examples can be seen in ings to real photos from a large GPs-tagged collection allows us Figure 7 to estimate the gps coordinates of the input painting, similar to the approach of [Hays and Efros 2008]. We call this applica 4 Applications tion painLi: 1g S. We use painting-to-image matching as de- scribed in Section 3.3 and then find the gps distribution using the Our dala-driven visual similarity measure can be used to improve e.gorithm in [Hays and Efros 2008]. Qualitative painting 2GPS ex many existing matching-based application, as well as facilitate new amples overlayed onto Google-map can be seen in Figure 13 ones. We briefly discuss a few of them here 4.4 visual Scene Exploration 4.1 Better Scene Matching for Scene Completion Having a robust visual similarily opens the door to interesting ways Data-driven Scene Completion has been introduced by hays and of exploring and reasoning about large visual data. In particular Efros 2007]. However, their scene matching approach(using the one can construct a visual memex graph(using the terminology Gist descriptor) is not always able to find the best matches au from [Malisiewicz and efros 2009)), whose nodes are images/sub tomatically. Their solution is to present the user with the top 20 images, and edges are various types of associations, such as visual matches and let him find the best one to be used for completion similarity, context, etc. By visually browsing this memex graph, Here we propose to use our approach to automate scene comple- one can explore the dataset in a way that makes explicit the ways tion, removing the user from the loop. To evaluate the approach, we in which the data is interconnected. Such graph browsing visual- used the 78 query images from the scene completion lest set [Hays izations have been proposed for several types of visual data, such and Efros 2007] along with the top 160 results retrieved by them as photos of a 3D scene I Snavely et al. 2008], large collections We use our algorithm to re-rank these 160 images and evaluate both of outdoor scenes [ Kaneva et al. 2010, and faces Kemelmacher- the quality of scene matches and scene completions against [Hays Shlizerman et al. 2011]. Here we show how our visual similarity can be used to align photos of a scene and construct a movie. given and efros 20071 a set of 200 images automatically downloaded from Flickr using Figure I I shows a qualitative result for the top match using our ap- keyword search(e. g, "Medici Fountain Paris), we compute an all- proach as compared to the top match from the gist+ features used to-all matrix of visual similarities that represents our visual memex by [Hays and Efros 2007]. To compute quantitative results, we per graph. Note that because we are using scanning window matching formed two small user studies. In the first study, for each query im on the detection side, a zoomed -in scene detail can still match to a age participants were presented with the best scene match using our wide-angle shot as seen on Figure 14(Lop). Other side-informalion approach, [Hays and Efros 2007] and tiny-images [Torralba et al can also be added to the graph, such as the relative zoom factor, or 2008. Participants were asked to select the closest scene match similarity in season and illumination(computed from photo time out of the three options. In the second study. participants were pre stamps). One can now interactively browse through the graph, or sented with automatically completed scenes using the top matches create a visual memex movie showing a particular path from the for all three algorithms and were asked to select the most convinc data, as shown on Figure 14(bottom ), and in supplementary video ing/compelling completion. The order of presentation of queries as 5 Limitations and Future Work well as the order of the three options were randomized. Overall, for the first task of scene matching, the participants preferred our ap- The two main failure modes of our approach are illustrated on Fig proach in 51. 4%cases as opposed 27.6%c for [Hays and Efros 2007 ure 15. In the first example dell), we fail to find a good match due lo and 21% for Tiny-Images For the task of automatic scene comple- the relatively small size of our dataset(10, 000 images)compared to tion,our approach was found to be more convincing in 47.3% cases Google's billions of indexed images. In the second example(right) as compared to 27.5% for [Hays and Efros 2007] and 25.2% for the query scene is so cluttered that it is difficult for any algorithm Tiny-Images. The standard-deviation of user responses for most of to decide which parts of the scene-the car, the people on sidewalk, the queries were surprisingly low the building in the background-it should focus on. Addressing this CM Transactions on Graphics, Vol 30, No 6, Article 154, Publication date: December 2011 154 A. Shrivastava et al Input Image Top Matches Scene Completions Hays et al Our Approach Hays et aL. Our Approach Figure 11: Qualitative examples of scene completion using our approach andlHays and Efros 2007/ Paris(1940 Top Matches Manual Alignment Boston(1900) Top Matches Manual Alignment Figure 12: Internet Re-photography. Given an old photograph, we harness the power of large Internet data sets to find visually similar images. For each query we show the top 4 matches, and manually select one of the top matches and create a manual image alignment Input Painting Estimated Geo-location Input Painting Estimated Geo-location display estimated GPS location of the painting as a density map overlaid onto Google- map, and the top matching image Opera House), we Figure 13: Painting2GPS Qualitative Examples. In these two painting examples Tower Bridge in London and the sydr Average Image of top-20 NN Figure 14: Visual Scene exploration (Top): Given an input image, we show the top matches, aligned by the retrieved sub-window. The last image shows the average of top 20 matches(Bottom): A visualization of the memex-graph tour through the photos of the Medici fountain issue will likely require deeper level of image understanding than Claudio conforti, Eddie wong, Edson Campos, Prof. Hall Groat Il, Kath is currently available leen Brodeur, Moira Munro, Matt Wyatt, Keith Hornblower. Don Ama Speed remains the central limitation of the proposed approach dio(Scrambled Eggz Productions), The Stephen Wilthisire gallery,www since it requires training an SVM(with hard-negalive mining)al Caydaypaint. ccm, The Art Renewal Center and bundesarchiv query time. While we developed a fast, parallelized implementation References that takes under three minutes per query on a 200-node cluster, this is still too slow for many practical applications at this time. We are BAE, S, AGARWALA,A, AND URAND, F. 2010. Computa- currently investigating ways of sharing the computation by precom- tional rephotography. ACM Trans. Graph. 29 July), 24: 1-24: 15 puting some form of representation for the space of query images head of time. However even in its present form we believe that BAEZA-YATESR.A. AND RIBEIRO-NETO, B. 1999. Modern he increased computational cost of our method is a small price to Information Retrieval. Addison-Wesley L. ongman Publishing pay for the drastic improvements in quality of visual matching acknowledgments: Thc authors would like to thank martial hebert and BOIMAN, O: AND IRANI, M. 2007. Detecting irregularities in images and in video. In Jcv Bryan Russell for many helpful discussions. This work was supported by oNR Grant N000141010766, MSR-CMU Center for Computational Think BUADES. A. COLL.B. AND MORELJ -M. 2005. A non-local ing and Google. Image credits: Charalampos Laskaris, Carol Williams, algorithm for image denoising In CVPR ACM Transactions on Graphics, VoL 30, No 6, Article 154, Publication date: December 201 Data-driven Visual Similarity for Cross-domain Image Matching 154:9 Google Top Matches Google Top Matches Input Photograph Our Top Matches Input painting Our Top Matches Figure 15: Typical failure cases. (Left ): relatively small dataset size, compared to Google.(Right): too much clutter in the query image CHANG, C.-C, AND LIN, C -J. 2011. LIBSVM: A library for JoHNSON, M.K., DALE, K, AVIDAN,S, PFISTER, H, FREE support vector machines. ACM Transactions on Intelligent Sys MAN. W.T., AND MATUSIK, W. 2010. CG2real: Improving the tems and Technology realism of computer generated images using a large collection of photographs. IEEE TVCG CHEN, T, CHENG, M -M.. TAN, P, SHAMiR, A, AND HU M. 2009. Sketch2photo: internet image montage. ACM Trans JUDD. T. EHINGER. K. duRAND. F. AND TORRALBA. A. Graph. 28 2009. Learning to predict where humans look. In /CCv CHONG.H. GORTLER. S. AND ZICKLER.T. 2008. A KANEVA, B, SIVIC, J, TORRALBA, A, AVIDAN, S, AND perception-based color space for illumination-invariant image FREEMAN, W.T. 2010. Infinite images: Creating and exploring rocessing. In Proceedings of SIGGrAPh a large photorealistic virtual space. Proceedings of the IeEE. DALAL, N, AND TRIGGS, B. 2005 Histograms of oriented gra- KEMELMACHER-SHIIZERMAN. L. SHECHTMAN. E. GARG R dients for human detection. In CVPR AND SEITZ, S M. 2011. Exploring photobios In SIGGRAPH DALE. K. JOHNSON, M.K. SUNKAVALLI K. MATUSIK. W. LAZEBNIK, S, SCHMID, C, AND PONCE, J. 2009. Spatial pyra AND PFISTER, H. 2009. Image restoration using online photo mid matching. In Object Categorization: Computer and Human collections. In /CCv Vision Perspectives. CaInbridge University Press DATTA.R. JOSHI, D.LIJ. AND WANG..Z. 2008. mage LOWE. D. 2004. Distinctive image features from scale-invariant retrieval: Ideas. influences and trends of the new age. ACM key points. IJCV Comput. Su MALISIEWICZ. T, AND EFROS, A. A. 2009. Beyond categories: EFROS, A. A, AND FREEMAN, W. T. 2001. Image quilting for The visual memex model for reasoning about object relation- ships. In MIPs texture synthesis and transfer. In SIGGRAPH, Computer Graph- cs Proceedings. Annual Conference series MALISIEWICZT. GUPTA. A. AND EFROS A.A. 2011. En- semble of exemplar-svms for object detection and beyond. In EITZ. M.. hildebraNd, K. BOUBEKEUR. T. AND ALEXA ⅠCCv M. 2010. Sketch-based image retrieval: benchmark and bay of-features descriptors. IEEE TVCG OLIVA, A, AND TORRALBA, A. 2006. Building the gist of a scene: the role of global image features in recognition. Progress EVERINGHAM, M. GooL L.V. WiLlIAMS.C.K.L. WINN in brain research J, AND ZISSERMAN, A, 2007. The PASCAL Visual object Classes Challenge RUSSELL.B C. SIVIC... PONCE.. AND DESSALESH. 2011 Automatic alignment of paintings and photographs depicting a FREEMAN, W.T. JONES TR. AND PASZTOR E. C. 2002. 3d scene. In 3D Representation and Recognition (3dRR) Example-based super-resolulion. IEEE Computer Graphics Ap- plications SCHODL A. SZELISKLR. SALESIN. D,H AND ESSA. L 2000. Video textures. In siggraPh HACOHEN. Y. FAtTAL.R. AND LISCHINSKI. D. 2010. Image upsampling via texture hallucination. In ICCP SHECHTMAN, E. AND IRANI, M. 2007. Matching local self- similarities across images and videos. In CVPr HAYS, J, AND EFROS, A.A. 2007. Scene completion using millions of photographs. ACM Transactions on Graphics(SIG SIVIC, J. AND ZISSERMAN. A. 2003. Video google: A text GRAPH) retrieval approach to object matching in videos. In ICCv. SNAVELYN GARGR. SEITZ. S.M. AND SZELISKIR 2008 HAYS,J, AND EFROS, A.A. 2008. im2gps: estimating geo Finding paths through the worlds photos. ACM Transactions on graphic information from a single image. In CVPR HERTZMANNA. JACOBS. C. OLiVER.N. CURLESS.B. AND TIEU.K. AND VIOLA. P. 2004. Boosting image retrieval. IJCv. SALESIN, D. 2001. Image analogies In S/GGRAPH TORRALBA, A, FERGus, R, AND FREEMAN, W.T. 2008. 80 HOIEM. D. SUKTHANKARR. SCHNEIDERMAN.H. AND HUS million tiny images: a large database for non-parametric object TON, L. 2004. Object-based image retrieval using the statistical and scene recognition. IEEE PAMI structure of images. In cvpr WEXLER, Y, SIIECIITMAN, E, AND IRANI, M. Space-time com- ITTl,L, AND KOCH, C. 2000. A saliency-based search mecha pletion of video. IEEE PAMI nism for overt and covert shifts of visual attention vision re- search WiIYTE.O. SIVIC.J. AND ZISSERMAN. A. 2009. Get out of p ting. In Bmvc JEGOUH. DOUzE.M. AND SCHMID. C. 2008. Hamming em bedding and weak geometric consistency for large scale image WOLF, L. hassNER.T. AND TAIGMAN.Y 2009. The one-shot search. In ECCv similarity kernel. In ICCv CM Transactions on Graphics, Vol 30, No 6, Article 154, Publication date: December 2011

...展开详情
所需积分/C币:10 上传时间:2015-04-09 资源大小:39.36MB
举报 举报 收藏 收藏
分享 分享

评论 下载该资源后可以进行评论 1

qq_33889725 第一次下载取消了,竟然也算一次下载o(╥﹏╥)o
2017-11-18
回复
Data-driven Graphic Design Creative Coding for Visual Communication

Digital technology has not only revolutionized the way designers work, but also the kinds of designs they produce. The development of the computer as a design environment has encouraged a new breed of digital designer; keen to explore the unique creative potential of the computer as an input/output

立即下载
Data-driven visual similarity for cross-domain image matching

Data-driven visual similarity for cross-domain image matching ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2011 TOG Homepage Volume 30 Issue 6, December 2011

立即下载
Data-Driven 3D Facial Animation

Computer Facial Animation Expressive Visual Speech Generation Data-Driven Expressive Speech Animation Synthesis and Editing Eye Movements, Saccades, and Multiparty Conversations Realistic Eye Motion Synthesis by Texture Synthesis Learning Expressive Human-Like Head Motion Sequences from Speech A Use

立即下载
Applied ADO.NET: Building Data-Driven Solutions(1)

Applied ADO.NET: Building Data-Driven Solutions 第一部分<br>Table of Contents <br> Applied ADO.NET—Building Data-Driven Solutions <br> Introduction <br> Chapter 1 - ADO.NET Basics <br> Chapter 2 - Data Components in Visual Studio .NET <br> Chapter 3 - ADO.NET in Disconnected Environments <br> Chapter

立即下载
Applied ADO.NET: Building Data-Driven Solutions(2)

Applied ADO.NET: Building Data-Driven Solutions 第二部分 Table of Contents Applied ADO.NET—Building Data-Driven Solutions Introduction Chapter 1 - ADO.NET Basics Chapter 2 - Data Components in Visual Studio .NET Chapter 3 - ADO.NET in Disconnected Environments Chapter 4 - ADO.NET in Conne

立即下载
Knowledge-driven Encode, Retrieve, Paraphrase for MedicalImageReport.pdf

Generating long and semantic-coherent reports to describe medical images poses great challenges towards bridging visual and linguistic modalities, incorporating medical domain knowledge, and generating realistic and accurate descriptions. We propose a novel Knowledge-driven Encode, Retrieve, Paraphr

立即下载
data driven test

data driven test,关键字驱动测试的发展,未来.

立即下载
Large-Scale.Visual.Geo-Localization.331925779X

This timely and authoritative volume explores the bidirectional relationship between images and locations. The text presents a comprehensive review of the state of the art in large-scale visual geo-localization, and discusses the emerging trends in this area. Valuable insights are supplied by a pre-

立即下载
D3.js 4.x Data Visualization - Third Edition

D3.js 4.x Data Visualization - Third Edition by Andrew Rininsland English | 28 Apr. 2017 | ASIN: B01MG90SSJ | 308 Pages | AZW3 | 6.6 MB Key Features Build interactive and rich graphics and visualization using JavaScript`s powerful library D3.js Learn D3 from the ground up, using the all-new versio

立即下载
C# and .NET Core Test Driven Development

C# and .NET Core Test Driven Development: Dive into TDD to create flexible, maintainable, and production-ready .NET Core applications Learn how to apply a test-driven development process by building ready C# 7 and .NET Core applications. This book guides developers to create robust, production-ready

立即下载
Mastering Microsoft Visual Basic 2010

Mastering Microsoft Visual Basic 2010 Evangelos Petroutsose From the core of the language and user interface design to developing data-driven applications, this detailed book brings you thoroughly up to speed and features numerous example programs you can use to start building your own apps right

立即下载
Mastering Microsoft Visual Basic 2008

Mastering Microsoft Visual Basic 2008 continues the approach of its previous editions by providing thorough, expert coverage of VB 2008, from the core of the language and user interface design to developing data-driven applications and Web applications. The material is aimed at beginners with some p

立即下载
Applied Text Analysis with Python Language-Aware Data Products Machine Learning

从新闻和演讲到社交媒体上的非正式聊天,自然语言是最丰富,最未充分利用的数据来源之一。它不仅源于不断变化,而且始终在变化和适应环境; 它还包含传统数据源未传达的信息。解锁自然语言的关键是通过文本分析的创造性应用。这本实用的书介绍了数据科学家使用应用机器学习构建语言感知产品的方法。 您将学习使用Python进行文本分析的强大,可重复和可扩展的技术,包括上下文和语言特征工程,矢量化,分类,主题建模,实体解析,图形分析和可视化控制。在本书的最后,您将配备实用的方法来解决任何复杂的现实问题。 将文本预处理和矢量化为高维特征表示 执行文档分类和主题建模 通过可视化诊断引导模型选择过程 提取关键短语,命名

立即下载
Applied Text Analysis with Python Language-Aware Data Products Machine Learn pdf

从新闻和演讲到社交媒体上的非正式聊天,自然语言是最丰富,最未充分利用的数据来源之一。它不仅源于不断变化,而且始终在变化和适应环境; 它还包含传统数据源未传达的信息。解锁自然语言的关键是通过文本分析的创造性应用。这本实用的书介绍了数据科学家使用应用机器学习构建语言感知产品的方法。 您将学习使用Python进行文本分析的强大,可重复和可扩展的技术,包括上下文和语言特征工程,矢量化,分类,主题建模,实体解析,图形分析和可视化控制。在本书的最后,您将配备实用的方法来解决任何复杂的现实问题。 将文本预处理和矢量化为高维特征表示 执行文档分类和主题建模 通过可视化诊断引导模型选择过程 提取关键短语,命名

立即下载
Data Visualization with D3 4.x Cookbook

Data Visualization with D3 4.x Cookbook by Nick Qi Zhu English | 2017 | ISBN: 1786468255 | 199 pages | EPUB | 7,6 MB Key Features Create modern, stunning data visualizations with the new features introduced in D3 4.0 Bootstrap D3 quickly with the help of ready-to-go code samples Solve real-world vi

立即下载
Mastering Python Data Visualization(PACKT,2015)

About This Book Explore various tools and their strengths while building meaningful representations that can make it easier to understand data Packed with computational methods and algorithms in diverse fields of science Written in an easy-to-follow categorical style, this book discusses some niche

立即下载
Mastering.Python.Data.Visualization.1783988320

About This Book Explore various tools and their strengths while building meaningful representations that can make it easier to understand data Packed with computational methods and algorithms in diverse fields of science Written in an easy-to-follow categorical style, this book discusses some niche

立即下载
Making.Sense.of.Data.Designing.Effective.Visualizations.1491

Early Release You have a mound of data sitting in front of you and a suite of computation tools at your disposal. And yet, you’re stumped as to how to turn that data into insight. Which part of that data actually matters, and where is this insight hidden? If you’re a data scientist who struggles t

立即下载
he.Rise.of.Big.Spatial.Data.3319451227

This edited volume gathers the proceedings of the Symposium GIS Ostrava 2016, the Rise of Big Spatial Data, held at the Technical University of Ostrava, Czech Republic, March 16–18, 2016. Combining theoretical papers and applications by authors from around the globe, it summarises the latest researc

立即下载
Beginning ASP.NET 1.1 with Visual C#.net 2003

Chapter 1: Getting Started with ASP.NET 1 What Is a Static Web Page? 2 How Are Static Web Pages Served? 3 Limitations of Static Web Pages 4 What Is a Web Server? 5 How Are Dynamic Web Pages Served? 6 Client-Side Dynamic Web Pages 6

立即下载