phamson02's picture
add word segmentation before tokenization
c1d85a2