mardi 17 septembre 2019

Guide me to tokenize files placed in a directory

i have made a directory and i have 5 files in that directory .I want to load all of them one by one and want to preprocess them one by one.what i actually want to get is that my code should love for each document in directory get that document split it into tokens and same those tokens into separate list means it shouldn't mix tokens of all files .I have loaded the file but i a not getting how to tokenize each document separately

i have loaded the files but not getting how to tokeniz each file seperately one by one

 import os
    path = '/home/ali/PycharmProjects/goldstandard/goldstandard'
      filename in os.listdir(path):
    with open(filename, "r",encoding="utf-8" ,errors="ignore") as f:
        # Read each line of the file
        for line in f.readlines():
            print(line.rstrip())




Aucun commentaire:

Enregistrer un commentaire