1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Efficient way to assign values from another column pandas df

Discussion in 'Programming/Internet' started by Maxibon, Sep 12, 2018.

  1. Maxibon

    Maxibon Guest

    I'm trying to create a more efficient script that creates a new column based off values in another column. At the moment I'm selecting one value at a time instead of all of them.

    For the df below I'm selecting each value in Location. However, I want to run the script on all unique values.

    The script itself assigns a string that increases based off Location and Day. This is determined by every 3 unique values. So, for each value in Location, a new string gets assigned to the first 3 unique values in Day. A full description is provided after the intended output.

    import pandas as pd
    import numpy as np

    d = ({
    'Day' : ['Mon','Tues','Wed','Wed','Thurs','Thurs','Fri','Mon','Sat','Fri','Sun'],
    'Location' : ['Home','Home','Away','Home','Away','Home','Home','Home','Home','Away','Home'],
    })

    df = pd.DataFrame(data=d)

    #Select value
    mask = df['Location'] == 'Home'
    df1 = df[mask].drop_duplicates('Day')
    d = dict(zip(df1['Day'], np.arange(len(df1)) // 3 + 1))

    df.loc[mask, 'Assign'] = df.loc[mask, 'Day'].map(d)


    At the moment I'm selecting each value, e.g. mask = df['Location'] == 'Home'. Instead, I want to do it on all values. e.g. mask = df['Location'] == All unique values.

    Would a loop be better that runs the script on each value in ['Location']. Or something that finds all unique values to run it on?

    Intended Output:

    Day Location Assign
    0 Mon Home C1
    1 Tues Home C1
    2 Wed Away C2
    3 Wed Home C1
    4 Thurs Away C2
    5 Thurs Home C3
    6 Fri Home C3
    7 Mon Home C1
    8 Sat Home C3
    9 Fri Away C2
    10 Sun Home C4


    Reference for why the output should look this:

    Index 0: C1 is assigned as Location['Home'] and Day['Mon'] are the first values

    Index 1: C1 is assigned as Location['Home'] is repeated and Day['Tues'] is the second unique value

    Index 2: Location['Away'] is new so C2 is assigned. The first unique value is Day['Wed']

    Index 3: C1 is assigned as Location['Home'] is the repeated and Day['Wed'] is the third unique value

    Index 4: C2 is assigned as Location['Away'] is repeated and Day['Thurs'] is the second unique value

    Index 5: While Location['Home'] is repeated, Day['Thurs'] is the fourth unique value. Therefore C3 is assigned

    Index 6: C3 is assigned as Location['Home'] is repeated and Day['Fri'] is the second(5th) unique value.

    Index 7: Location['Home'] and Day['Mon'] are both repeated so C1 is assigned

    Index 8: C3 is assigned as Location['Home'] is repeated and Day['Sat'] is the third(6th) unique value

    Index 9: C2 is assigned as Location['Away'] is repeated and Day['Fri'] is the third unique value

    Index 10: While Location['Home'] is repeated, Day['Sun'] is the fourth(7th) unique value. Therefore C4 is assigned

    Do this make sense?

    Update

    I will remove this afterwards but by using @Naga Kiran's code the output is pretty close. Some discrepancies are index 4 and 7. Index 4 should be C2 and index 7 should be C1.

    Unique_group = ['Mon','Tues','Wed']

    df['Group'] = df['Day'].apply(lambda x:1 if x in Unique_group else 2)
    df['Assign'] = np.zeros(len(df))
    # Assigning the ditionary values for output from numeric
    vals = dict([(i,'C'+str(i)) for i in range(len(df))])

    for i in range(1,len(df)+1,1):
    # Slicing the Dataframe line by line
    df1 = df[:i]
    # Incorporating the conditions of Group and Location
    df1 = df1[(df1.Location == df1.Location.loc[i-1]) & (df1.Group == df1.Group.loc[i-1]) ]
    # Writing the 'Assign' value for the first line of sliced df
    if len(df1)==1:
    df.loc[i-1,'Assign'] = df[:i].Assign.max()+1
    # Writing the 'Assign value based on previous values if it has contiuos 2 values of same group
    elif (df1.Assign.value_counts()[df1.Assign.max()] <3):
    df.loc[i-1,'Assign'] = df1.Assign.max()
    # Writing 'Assign' value for new group
    else:
    df.loc[i-1,'Assign'] = df[:i]['Assign'].max()+1
    df.Assign = df.Assign.map(vals)

    Login To add answer/comment
     

Share This Page