Task-oriented dialogue systems are designed to support users achieve predefined goals or tasks such as restaurant reservation or navigation inquiry. They often use a pipeline approach that employs multiple modules to perform natural language understanding, dialogue action decision making and response generation. Conventional task-oriented dialogue systems train these modules independently, which can lead to error propagation when the full dialog context is not provided in the subsequent modules. To address the limitation of the conventional pipeline, recent work has explored large pretrained models in the sequence-to-sequence setting for end-to-end task-oriented dialogue systems [1,2]. Despite of the efforts of recent studies, several challenges still remain, including coherence and consistent response generation, mitigating inappropriate response, better strategies for few-shot learning, learning new knowledge or dialogue skills, and better evaluation metrics. This project aims to investigate different approaches for alleviating these challenges.