LetsGoDDubi

Introduction공식문서에 보면 released sources를 직접 로컬에 설치, Docker Image를 이용한 설치, PyPI를 이용하여 로컬에 설치하는 방법들이 있다.PyPI 방식의 경우 로컬 computer에 pip 과 같은 python package를 활용하여 constraint files를 이용하여 설치한다. 하지만 conflict가 많이 발생할 것 같고.. conda로 가상환경 만들어줘서 설치해줘야 할 것 같다,, 기왕이면 독립된 공간이 더 좋지 않을까? 생각이 들어 Docker Image를 활용한 설치 방법을 이용하기로 했다. 참고로 Quick start는 Python pacakge 관리 uv 를 이용해서 설치하는데, 빠르게 실행하는 예시일 뿐 실제 production에서 활용하면 ..

본 글은 공식문서 Architecture Overview를 보고 작성하였습니다. IntroductionAirflow는 workflows를 build하고 실행할 수 있게 해주는 platform이라고 했다.Airflow의 architecture에는 어떤 Component로 구성되어 있고, 각 기능이 무엇인지, 어떤 역할을 하는지 알아보자! Workflows모든 workflows는 Dag(a Directed Acyclic Graph)로 표현되고 Task라는 개별 work들이 포함되어 있다. 이때 각 task는 dependencies과 data 흐름을 고려해서 배치된다.Dag: Task들 간의 dependencies를 명시하며, 이로부터 Task의 실행 순서를 정의한다.Task : 무엇을 해야할지(data를..

본 글은 Apache Airflow 공식 문서와 GPT를 참고하여 작성했습니다. Airflow란?developing, scheduling, monitoring workflows의 자동화를 도와주는 플랫폼이다.Ariflow는 다음과 같은 특징들이 있다.Scalable하나의 single process에서 distribued system까지 확장이 가능하다.modular arhcitecture와 message queue를 사용해서 무한한 수의 workers를 orchestration 할 수 있다.Batch based workflows정해진 시점에 정해진 workflows를 처리하는 작업 단위를 의미한다.Modeling에서는 한 번에 처리할 수 있는 data 크기와 연관된 개념이지만, 여기서는 특정 시점의 작업..

process and reason about visual and textual informationgaps in contextual understandingdifficulties with sapatial and temproal reasoningreliance on large-scale data → Not generalize well to real-world scenarios 1. The Lack of robust contextual reasoningdeeper understanding of context or common-sense knowledge📷 Image: 어떤 사람이 우산을 들고 있다VLM):"A person holding an umbrella!" → ⭕H): "Why is the person..

KeyPointpre-training 과 fine-tuning 을 어떤 방식으로 했는가?training 시 어디까지 freeze 했는가?→ 그리고 그 phase에서 무엇을 학습하고자 했는가?Qwen-VL의 목적⇒ LLM이 잘 학습된 상태에 Vision의 능력을 학습시켜야 하는데 본래 LLM의 성능을 잃지 않기 위함.⇒ catastrophic forgetting (파멸적 망각): Qwen은 기존의 잘 학습된 LLM이 존재. MultiModal LLM으로 확장하기 위해서는Summaryvisual encoder에서 window attention을 구현→ local attention만 보겠다!→ resoultion 손실 없이 연산 효율 향상→ inference efficiency을 최적화dynamic FPS s..

Key PointFlamingo나 BLIP2 에 비해서 어떤 더 좋은 instruction을 만들고자 했는지 알아야 한다.이 visual Instruction tuning 방법의 동기, 이전 연구와 비교했을 때 어떤 방법으로 처리한 것인지, 어떤 효과가 있는지왜 이 구조를 사용했는지, 모델을 선택했는지 유의하면서 읽어볼 것 Summary기존의 instruction tuning → visual instruction tuning을 제안Multimodal (language-image) instruction-following data using GPT-4LLaVA(Larage Language and Vision Assistant)CLIP(vision encoder) + Projection Layer(차원 연결)..

1. Subword modeling 간단하게 알아보기 먼저, 우리의 language의 vocabulary를 만들 때 다음과 가정을 한다고 하자. training set으로 부터 만들어진, 수 천개의 words로 이루어진 fixed vocab가 있고,test time에 처음 본 모든 word들에 대해서 single UNK로 mapping 한다고 가정한다. 이 상황에서, subword의 modeling은 word level보다 더 작은 structure (Parts of words, characters, bytes) 를 고려하는 다양한 method 방법들을 포함한다! 현대 NLP에서는 subword tokens으로 구성된 vocabulary를 학습하는 방식이 지배적이다. training과 testing 시..

Transformer는 Decoder와 Encoder 구조로 이루어져 있다..! 앞에서 배운 핵심 개념들을 바탕으로 각 구조를 자세히 알아보도록 하자. Transformer Decoder쉽게 생각해서 Language models처럼 어떻게 우리가 systems을 build할까를 생각하면 된다. minimal self-attention architecture로 보이지만, 사실 좀 더 많은 component가 있다! - Embeddings와 Position Embeddings는 동일하다. 1. Multi-head Attention 먼저 self-attention이 아닌 multi-head self-attetion이네..? 여기서 알 수 있는 사실.. Head는 무엇을 attention 할 것인지를 결정..

티스토리툴바