Python Pandas -remove rows based on given value in cell?

I have following script which removes rows in which date start with 202110 in column DATE OF OPERATION. I understand that space in the name of column is not allowed so script also replace space by _ and then after rows are removed it add back the space. For some reason I couldn't attach csv here so please see example below:

Column1DATE OF OPERATIONNAVpUnitsdsasa2021120124324dsasa2021102223232sd20211022232sd202110223-2802.6667


The code is as below and the error I'm getting is: KeyError: 'DATE_OF_OPERATION'

Could you advice what is the case of the error? - Thank you

import os
import glob
import pandas as pd
from pathlib import Path
source_files = sorted(Path(r'/Users/maciejgrzeszczuk/Downloads/').glob('*.csv'))

for file in source_files:
df = pd.read_csv(file)
df.columns = df.columns.str.replace(' ', '_')
df = df[~df['DATE_OF_OPERATION'].astype(str).str.startswith('202110')]
df.columns = df.columns.str.replace('_', ' ')
name, ext = file.name.split('.')
df.to_csv(f'{name}.{ext}', index=0)
Tagged:

Best Answer

  • pf
    pf
    Answer ✓

    Hi @grzeszczukmaciek ,

    This is a pure Python question (and not an issue related to our APIs), but let's try to propose a solution.

    spaces are allowed in column names, so your code could be simplified to:

    for file in source_files:
     df = pd.read_csv(file)
     df = df[~df['DATE OF OPERATION'].astype(str).str.startswith('202110')]
     name, ext = file.name.split('.')
     df.to_csv(f'{name}.{ext}', index=0)

    On my side, it woked with following file.csv content:

    DATE OF OPERATION,ASK,BID,TRADE PRICE
    20210901,10,12,11
    20211001,11,13,12
    20211101,12,14,13