본문 바로가기

Research Reviews15

[논문 리뷰] You Only Look Once: Unified, Real-Time Object Detection Paper DetailsTitle: You Only Look Once: Unified, Real-Time Object DetectionAuthors: Joseph Redmon, Santosh Divvala, Ross Girshick, Ali FarhadiConference: IEEE Conference on Computer Vision and Pattern Recognition (CVPR)Year of Publication: 2016Link: https://arxiv.org/abs/1506.02640Key Focus:YOLO reframes object detection as a single regression problem, predicting bounding boxes and class probabi.. 2025. 2. 5.
[논문 리뷰] InstructPix2Pix: Learning to Follow Image Editing Instructions Paper DetailsTitle: InstructPix2Pix: Learning to Follow Image Editing InstructionsAuthors: Tim Brooks, Aleksander Holynski, Alexei A. EfrosConference: IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2023Year of Publication: 2023Link: https://arxiv.org/abs/2211.09800 / https://www.timothybrooks.com/instruct-pix2pixKey Focus: This paper introduces InstructPix2Pix, a condition.. 2025. 1. 27.
[논문 리뷰] Attention is All You Need Paper DetailsTitle: Attention is All You NeedAuthors: Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, Illia PolosukhinConference: 2017 NeurIPS (31st Conference on Neural Information Processing Systems)Year of Publication: 2017Link: https://arxiv.org/abs/1706.03762Key Focus: The Transformer replaces traditional RNN and CNN-based sequence mod.. 2025. 1. 24.
[논문 리뷰] V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation Paper DetailsTitle: V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image SegmentationAuthors: Fausto Milletari, Nassir Navab, Seyed-Ahmad AhmadiConference: Medical Image Computing and Computer-Assisted Intervention (MICCAI)Year of Publication: 2016Link: https://arxiv.org/abs/1606.04797Key Focus: This paper introduces V-Net, a fully convolutional neural network designed for 3D .. 2025. 1. 10.
[논문 리뷰] Interpolating between Images with Diffusion Models Paper DetailsTitle: Interpolating between Images with Diffusion Models Authors: Clinton J. Wang, Polina Golland Institution: MIT CSAIL Year of Publication: 2023Link: https://arxiv.org/pdf/2307.12560Key Focus: This paper addresses the challenge of image interpolation, which is underexplored in current image generation pipelines. The authors propose a novel method using latent diffusion models to .. 2024. 12. 30.
[논문 리뷰] Flamingo: A Visual Language Model for Few-Shot Learning Paper Details Title: Flamingo: A Visual Language Model for Few-Shot Learning Authors: Jean-Baptiste Alayrac*, Jeff Donahue*, Pauline Luc*, Antoine Miech*, Iain Barr†, Yana Hasson†, Karel Lenc†, Arthur Mensch†, Katie Millican†, Malcolm Reynolds†, Roman Ring†, Eliza Rutherford†, Serkan Cabi, Tengda Han, Zhitao Gong, Sina Samangooei, Marianne Monteiro, Jacob Menick, Sebastian Borgeaud, Andrew Brock.. 2024. 12. 23.
[논문 리뷰] Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Paper DetailsTitle: Retrieval-Augmented Generation for Knowledge-Intensive NLP TasksAuthors: Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe KielaConference: Facebook AI Research, University College London, New York UniversityYear of Publication: 2021Link: https://ar.. 2024. 12. 16.
[논문 리뷰] U-Net: Convolution Networks for Biomedical Image Segmentation Paper DetailsTitle: U-Net: Convolutional Networks for Biomedical Image Segmentation Authors: Olaf Ronneberger, Philipp Fischer, Thomas Brox Conference: Medical Image Computing and Computer-Assisted Intervention (MICCAI) 2015 Year of Publication: 2015 Link: https://arxiv.org/abs/1505.04597Key Focus: This paper presents U-Net, a convolutional neural network designed for biomedical image segmentati.. 2024. 11. 18.
[논문 리뷰] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Paper Details Title: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Authors: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina ToutanovaConference: NAACL 2019Year of Publication: 2019Link: https://arxiv.org/abs/1810.04805Key Focus: This paper introduces BERT (Bidirectional Encoder Representations from Transformers), a novel language representation model that pr.. 2024. 11. 11.