I'm attempting to parse a large Excel file with numerous sheets. Each sheet has a column that contains the following data (three sheets = three columns):
ReceivedEmail    OpenedEmail    ClickedURL
aaaa@aaa.com     gggg@aaa.com   aaaa@aaa.com
bbbb@aaa.com     dddd@aaa.com   rrrr@aaa.com
cccc@aaa.com     rrrr@aaa.com
dddd@aaa.com     aaaa@aaa.com
eeee@aaa.com     oooo@aaa.com
ffff@aaa.com
gggg@aaa.com
rrrr@aaa.com
qqqq@aaa.com
oooo@aaa.com
What I'm looking for is a single table that keeps the first column of sheet one, the one with all the information about ReceivedEmail (persons we mass e-mailed). Instead of repeating the emails in the successive sheets' first columns, I want to utilize list comprehension to determine whether OpenedEmail is present in ReceivedEmail and output 1 otherwise.
Here's what I did so far:
import pandas as pd
xl = pd.ExcelFile(path_to_file)
xl.sheet_names
['ReceivedEmail', 'OpenedEmail', 'ClickedURL']
df = xl.parse(sheet_name=xl.sheet_names[0], header=None)
df.rename(columns={df.columns[0]:xl.sheet_names[0]}, inplace=True);
df.columns[0]
['ReceivedEmail']
# then I created a buffer dataframe to check next columns
df_buffer = xl.parse(sheet_name=xl.sheet_names[1], header=None)
df_buffer.rename(columns={df_buffer.columns[0]:xl.sheet_names[1]}, inplace=True);
But then when I run list comprehension like this:
df[df_buffer.columns[0]] = [1 if x in df[df.columns[0]] else 0 for x in df_buffer[df_buffer.columns[0]]]
I get an error:
ValueError: Length of values does not match length of index
Any clue how to solve this error or handle the problem in a smart way? I am doing manually to see if it works, then I could do a looping later, but I am stuck with the error.
End result should be:
ReceivedEmail    OpenedEmail    ClickedURL
aaaa@aaa.com         1              1
bbbb@aaa.com         0              0      
cccc@aaa.com         0              0     
dddd@aaa.com         1              0
eeee@aaa.com         0              0    
ffff@aaa.com         0              0  
gggg@aaa.com         1              0
rrrr@aaa.com         1              1
qqqq@aaa.com         0              0
oooo@aaa.com         1              0