loss = 0.6246786828041077 perplexity = tensor(1.8676)