In this study, we propose an innovative approach to analyzing and characterizing dance performances by converting visual data into a text-based classification and description framework. Our method entails extracting key features from dance videos and converting these features into textual representations. These text-based features are then used to predict and describe the specific dance style depicted in the video. By shifting from traditional computer vision tasks to text retrieval and classification, we significantly enhance the interpretability and contextual understanding of dance performances. Our experimental results indicate a commendable accuracy rate exceeding 86%, highlighting the effectiveness of our approach in both predicting dance styles and generating insightful descriptions. Additionally, our method provides transparent explanations for prediction outcomes, offering valuable insights into the distinctive characteristics of various dance forms. The implementation code and dataset are publicly available on GitHub, promoting further research and applications in this field.