i have made a directory and i have 5 files in that directory .I want to load all of them one by one and want to preprocess them one by one.what i actually want to get is that my code should love for each document in directory get that document split it into tokens and same those tokens into separate list means it shouldn't mix tokens of all files .I have loaded the file but i a not getting how to tokenize each document separately
i have loaded the files but not getting how to tokeniz each file seperately one by one
import os
path = '/home/ali/PycharmProjects/goldstandard/goldstandard'
filename in os.listdir(path):
with open(filename, "r",encoding="utf-8" ,errors="ignore") as f:
# Read each line of the file
for line in f.readlines():
print(line.rstrip())
Aucun commentaire:
Enregistrer un commentaire