- PPT 한 장 분량으로 논문 정리
- CNN (1 layer) + Vpre :word2vec ⇒ 문장분류
- little hyperparameter + static vector (‘universal’ feature extractors)⇒ excellent results
- task-specific vectors + fine-tuning ⇒ performance gains
- task-specific & static vector ⇒ 4 out of 7 tasks (감정분석, 질문분류)
=========================
< model hyperparameter >
- activation function: ReLU(rectified linear units)
- filter windows (h) = 3, 4, 5
- feature maps = 100
- dropout rate (p) = 0.5
- L2 Norm constraint (s) = 3
- mini-batch size =50
=========================
- CNN-rand (무작위 초기화): not well
- CNN-static: remarkably well
- CNN-non-static: further improvements
- Multichannel: results mixed
=========================
- a ‘masking’ vector of Bernoulli random variables with probability p of being 1.