客户价值观分析

news/2024/10/7 14:28:47

客户价值分析

一、实验目的与要求
1、掌握使用numpy和pandas库处理数据的基本方法。
2、掌握使用RFM分析模型对客户信息进行特征提取的基本方法。
3、掌握对特征数据进行标准化处理的基本方法。
4、掌握使用Sklearn库对K-Means聚类算法的实现及其评价方法。
5、掌握使用matplotlib结合pandas库对数据分析可视化处理的基本方法。
二、实验内容
1、利用python中pandas等库完成对数据的预处理,并计算R、F、M等3个特征指标,最后将处理好的文件进行保存。
2、利用python中pandas等库完成对数据的标准化处理。
3、利用Sklearn库和RFM分析方法建立聚类模型,完成对客户价值的聚类分析,并对巨累结果进行评价。
4、结合pandas、matplotlib库对聚类完成的结果进行可视化处理。
三、实验步骤

1、数据预处理。

(1)导入所需要使用的包

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import re
from sklearn.cluster import KMeans
from datetime import datetime

(2)读取文件

datafile="/data/bigfiles/data2.csv"
data = pd.read_csv(datafile)

(3)查看数据的基本统计信息

data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2832 entries, 0 to 2831
Data columns (total 54 columns):
买家会员名           2660 non-null object
买家实际支付积分        2660 non-null float64
买家实际支付金额        2660 non-null float64
买家应付货款          2660 non-null float64
买家应付邮费          2660 non-null float64
买家支付宝账号         2658 non-null object
买家支付积分          2660 non-null float64
买家服务费           2660 non-null object
买家留言            163 non-null object
修改后的sku         0 non-null float64
修改后的收货地址        358 non-null object
分阶段订单信息         0 non-null float64
卖家服务费           2660 non-null float64
发票抬头            0 non-null float64
含应开票给个人的个人红包    0 non-null float64
天猫卡券抵扣          0 non-null float64
定金排名            0 non-null float64
宝贝总数量           2660 non-null float64
宝贝标题            2397 non-null object
宝贝种类            2660 non-null float64
店铺Id            1581 non-null float64
店铺名称            2660 non-null object
异常信息            0 non-null float64
总金额             2660 non-null float64
打款商家金额          2660 non-null object
支付单号            1560 non-null object
支付详情            1560 non-null object
收货人姓名           2660 non-null object
收货地址            2660 non-null object
新零售交易类型         2660 non-null object
新零售发货门店id       0 non-null float64
新零售发货门店名称       0 non-null float64
新零售导购门店id       0 non-null float64
新零售导购门店名称       0 non-null float64
是否上传合同照片        2660 non-null object
是否上传小票          2660 non-null object
是否代付            2660 non-null object
是否手机订单          1838 non-null object
是否是O2O交易        0 non-null float64
物流公司            1425 non-null object
物流单号            1425 non-null object
特权订金订单id        0 non-null float64
确认收货时间          1876 non-null object
联系手机            2659 non-null object
联系电话            130 non-null object
订单付款时间          2148 non-null object
订单关闭原因          2660 non-null object
订单创建时间          2660 non-null object
订单备注            695 non-null object
订单状态            2660 non-null object
运送方式            2660 non-null object
返点积分            2660 non-null float64
退款金额            2660 non-null float64
数据采集时间          2660 non-null object
dtypes: float64(25), object(29)
memory usage: 1.2+ MB
len(data)
2832
data.describe()
买家实际支付积分买家实际支付金额买家应付货款买家应付邮费买家支付积分修改后的sku分阶段订单信息卖家服务费发票抬头含应开票给个人的个人红包...异常信息总金额新零售发货门店id新零售发货门店名称新零售导购门店id新零售导购门店名称是否是O2O交易特权订金订单id返点积分退款金额
count2660.02660.0000002660.0000002660.0000002660.00.00.02660.00.00.0...0.02660.0000000.00.00.00.00.00.02660.02660.000000
mean0.0155.113094181.1932411.2575190.0NaNNaN0.0NaNNaN...NaN182.450759NaNNaNNaNNaNNaNNaN0.010.436218
std0.0350.332509366.8719654.4087250.0NaNNaN0.0NaNNaN...NaN366.806966NaNNaNNaNNaNNaNNaN0.0131.244263
min0.00.0000000.1000000.0000000.0NaNNaN0.0NaNNaN...NaN0.100000NaNNaNNaNNaNNaNNaN0.00.000000
25%0.043.89000050.8600000.0000000.0NaNNaN0.0NaNNaN...NaN51.870000NaNNaNNaNNaNNaNNaN0.00.000000
50%0.062.86000089.7000000.0000000.0NaNNaN0.0NaNNaN...NaN90.130000NaNNaNNaNNaNNaNNaN0.00.000000
75%0.0199.000000268.0000000.0000000.0NaNNaN0.0NaNNaN...NaN268.000000NaNNaNNaNNaNNaNNaN0.00.000000
max0.013246.80000013246.80000055.0000000.0NaNNaN0.0NaNNaN...NaN13246.800000NaNNaNNaNNaNNaNNaN0.03950.730000

8 rows × 25 columns

(4)提取属性列

data.columns
Index(['买家会员名', '买家实际支付积分', '买家实际支付金额', '买家应付货款', '买家应付邮费', '买家支付宝账号','买家支付积分', '买家服务费', '买家留言', '修改后的sku', '修改后的收货地址', '分阶段订单信息', '卖家服务费','发票抬头', '含应开票给个人的个人红包', '天猫卡券抵扣', '定金排名', '宝贝总数量', '宝贝标题 ', '宝贝种类 ','店铺Id', '店铺名称', '异常信息', '总金额', '打款商家金额', '支付单号', '支付详情', '收货人姓名','收货地址', '新零售交易类型', '新零售发货门店id', '新零售发货门店名称', '新零售导购门店id', '新零售导购门店名称','是否上传合同照片', '是否上传小票', '是否代付', '是否手机订单', '是否是O2O交易', '物流公司', '物流单号 ','特权订金订单id', '确认收货时间', '联系手机', '联系电话 ', '订单付款时间', '订单关闭原因', '订单创建时间','订单备注', '订单状态', '运送方式', '返点积分', '退款金额', '数据采集时间'],dtype='object')
data.订单状态.unique()
array(['买家已付款,等待卖家发货', '等待买家付款', '卖家已发货,等待买家确认', '交易关闭', '交易成功', nan],dtype=object)
data = data[data.订单状态 == '交易成功']
data
买家会员名买家实际支付积分买家实际支付金额买家应付货款买家应付邮费买家支付宝账号买家支付积分买家服务费买家留言修改后的sku...联系电话订单付款时间订单关闭原因订单创建时间订单备注订单状态运送方式返点积分退款金额数据采集时间
25gang_20150.0143.64143.640.0181048602230.00元NaNNaN...NaN2018/1/27订单未关闭2018-01-27 09:57:23NaN交易成功快递0.00.02018/12/31
26tb6683844_20110.055.8655.860.0177434519910.00元NaNNaN...NaN2018/1/26订单未关闭2018-01-26 22:55:46NaN交易成功快递0.00.02018/12/31
30dlzslv0.090.7290.720.0zs-lv@sohu.com0.00元NaNNaN...NaN2018/1/26订单未关闭2018-01-26 13:37:22NaN交易成功快递0.00.02018/12/31
31劳什子20100.048.8648.860.0tangzhai2010@163.com0.00元NaNNaN...NaN2018/1/26订单未关闭2018-01-26 10:12:18v6交易成功快递0.00.02018/12/31
32李氏江江480.0103.74103.740.0849694657@qq.com0.00元NaNNaN...NaN2018/1/26订单未关闭2018-01-26 06:48:35NaN交易成功快递0.00.02018/12/31
..................................................................
2655旋光精灵0.0999.00999.000.020323624@qq.com0.00元NaNNaN...NaN2017/1/4订单未关闭2017/1/4 15:05NaN交易成功快递0.00.02018/12/31
2656leryang0.0268.00268.000.09722165@163.com0.00元NaNNaN...NaN2017/1/3订单未关闭2017/1/3 16:51'中通快递:728773317678交易成功虚拟物品0.00.02018/12/31
2657leryang0.0134.00134.000.09722165@163.com0.00元NaNNaN...NaN2017/1/3订单未关闭2017/1/3 16:51NaN交易成功虚拟物品0.00.02018/12/31
2658crazy2830.0268.00268.000.0crazy355@126.com0.00元NaNNaN...NaN2017/1/3订单未关闭2017/1/3 16:01'中通:728773317331 【月月 01-04 08:58】交易成功虚拟物品0.00.02018/12/31
2659zhangyang520580.063.7048.7015.0136935164330.00元NaNNaN...NaN2017/1/2订单未关闭2017/1/2 23:28NaN交易成功快递0.00.02018/12/31

1876 rows × 54 columns

#提取需要的列
# 这里需要买家id,支付金额,支付时间,最后付款时间
data=data.filter(items=['买家会员名','打款商家金额','订单付款时间'])

(5)处理异常数据

# 统计数据缺失的值
datas=data.isnull().sum()
datas
买家会员名     0
打款商家金额    0
订单付款时间    0
dtype: int64
# 查看完全重复行
result=data.duplicated()
df=data[result]
df
买家会员名打款商家金额订单付款时间
71qufan_xiao100.00元2018/1/20
119kangfengtj55.86元2018/1/16
207waterli200555.86元2018/1/3
211时尚乐器268.00元2018/1/3
584猪头luing254.00元2018/6/22
............
2308bill163com200.00元2017/6/21
2320南山熊00340268.00元2017/6/13
2354铭铭猪是的念倒201.00元2017/6/1
2446夜沉晨201.00元2017/4/15
2533chenzh3664951201.00元2017/3/14

103 rows × 3 columns

# 删除完全重复的行
data=data.drop_duplicates()
#删除未付款的行
data.drop(data.loc[data['打款商家金额']=='0.00元'].index, inplace=True)
data['订单付款时间'] = data.订单付款时间.map(lambda x: datetime.strptime(x, '%Y/%m/%d'))
data.打款商家金额 = data.打款商家金额.map(lambda x: re.sub('元','',x))
data.打款商家金额 = data.打款商家金额.map(lambda x: float(x))
# print(data)
data =data.groupby("买家会员名").agg({"打款商家金额":"sum","订单付款时间":"max","买家会员名":"count"})
data = data.rename(columns = {'打款商家金额':'总金额','买家会员名':'付款次数'})

(6)计算R并进行标准化,更改列名

# 计算R
# 数据采集时间减去订单付款时间
exdata_date=datetime(2018,12,31)
start_date=datetime(2017,1,2)
data['R(最后一次消费时间)']=exdata_date-data['订单付款时间']
data
总金额订单付款时间付款次数R(最后一次消费时间)
买家会员名
00牛哥哥00402.002017-02-062693 days
020luo74.702017-11-181408 days
0587xueguangju268.002017-04-141626 days
0o秋天de童话411.502018-10-09283 days
0残缺048.862018-01-191346 days
...............
黑河市201347.882018-01-111354 days
黑瑾瞳158.442018-07-262158 days
鼠标右键点51.872018-12-12119 days
龙星宇1018198.002017-11-171409 days
龙魂爱上凤灵43.862017-12-131383 days

1483 rows × 4 columns

(7)计算F并进行标准化,更改列名

from math import ceil
# 计算最后一次消费事件和起始时间
period_day=data['订单付款时间']-start_date
#创建空列表统计月数
period_month=[]
for i in period_day:period_month.append(ceil(i.days/30))
# 第一次输出月数统计
print(period_month)
[2, 11, 4, 22, 13, 3, 9, 15, 8, 7, 17, 12, 24, 23, 24, 17, 22, 17, 17, 5, 24, 13, 18, 18, 11, 24, 13, 13, 9, 22, 8, 22, 22, 11, 11, 12, 15, 13, 6, 20, 17, 13, 13, 22, 8, 15, 4, 24, 11, 10, 24, 13, 12, 18, 13, 15, 13, 13, 13, 9, 12, 23, 11, 12, 24, 24, 23, 24, 10, 17, 11, 24, 24, 6, 22, 24, 19, 8, 12, 18, 12, 2, 19, 25, 6, 6, 10, 17, 12, 10, 5, 25, 15, 12, 9, 18, 8, 7, 18, 23, 18, 8, 22, 9, 3, 17, 3, 9, 7, 5, 3, 10, 9, 20, 12, 11, 24, 23, 18, 17, 23, 1, 15, 8, 9, 4, 24, 22, 13, 20, 22, 11, 15, 10, 15, 22, 11, 5, 12, 12, 19, 1, 13, 6, 9, 9, 15, 19, 19, 19, 9, 10, 17, 15, 17, 5, 24, 10, 9, 3, 23, 22, 13, 15, 15, 12, 24, 11, 9, 15, 22, 11, 8, 22, 12, 12, 22, 6, 22, 11, 18, 8, 22, 2, 4, 13, 23, 23, 23, 23, 15, 9, 23, 24, 23, 24, 9, 13, 7, 23, 12, 8, 10, 12, 23, 22, 10, 10, 23, 9, 19, 3, 15, 13, 12, 13, 13, 15, 10, 17, 9, 15, 13, 13, 15, 17, 12, 13, 13, 19, 11, 17, 3, 3, 18, 12, 13, 15, 15, 19, 9, 15, 10, 8, 13, 12, 22, 17, 17, 15, 5, 12, 15, 23, 18, 13, 17, 24, 11, 22, 13, 5, 14, 5, 5, 13, 15, 11, 7, 11, 24, 9, 7, 13, 13, 17, 15, 6, 14, 18, 23, 11, 24, 19, 8, 25, 11, 17, 13, 7, 23, 13, 22, 15, 24, 3, 22, 7, 17, 7, 19, 24, 12, 12, 22, 1, 12, 17, 12, 24, 10, 17, 6, 19, 15, 12, 18, 15, 12, 19, 19, 19, 23, 24, 17, 3, 13, 11, 12, 12, 6, 13, 13, 6, 18, 18, 20, 19, 4, 1, 18, 11, 17, 13, 7, 8, 18, 19, 12, 23, 13, 23, 10, 23, 24, 11, 10, 15, 19, 19, 11, 17, 2, 21, 13, 22, 15, 3, 13, 24, 20, 15, 17, 11, 1, 7, 4, 12, 12, 12, 17, 12, 18, 3, 22, 23, 8, 23, 18, 10, 17, 13, 12, 23, 13, 7, 24, 23, 21, 18, 10, 24, 7, 18, 23, 5, 22, 8, 11, 13, 7, 8, 9, 7, 13, 18, 15, 9, 8, 5, 3, 7, 15, 15, 5, 20, 22, 25, 9, 19, 15, 24, 24, 14, 11, 13, 4, 13, 19, 2, 7, 13, 24, 8, 12, 11, 12, 11, 11, 12, 10, 3, 17, 3, 11, 7, 17, 6, 12, 11, 8, 12, 11, 15, 4, 17, 22, 3, 11, 13, 19, 3, 18, 12, 20, 13, 2, 10, 12, 12, 13, 2, 21, 24, 12, 24, 23, 15, 17, 13, 17, 15, 22, 25, 24, 2, 3, 3, 11, 11, 9, 18, 13, 22, 7, 17, 11, 12, 24, 5, 19, 8, 9, 10, 23, 12, 19, 24, 12, 24, 17, 24, 17, 24, 22, 19, 13, 19, 22, 21, 22, 7, 12, 17, 1, 23, 24, 11, 22, 3, 13, 12, 21, 19, 8, 21, 18, 6, 6, 24, 23, 23, 19, 23, 10, 13, 7, 22, 5, 12, 19, 23, 24, 19, 17, 23, 4, 12, 12, 3, 9, 13, 13, 1, 15, 11, 11, 22, 8, 12, 18, 3, 15, 13, 13, 18, 9, 17, 2, 17, 18, 13, 23, 4, 13, 15, 6, 22, 4, 13, 3, 24, 5, 12, 7, 24, 14, 15, 15, 9, 15, 3, 4, 7, 18, 13, 11, 24, 24, 9, 23, 13, 22, 15, 12, 7, 6, 24, 19, 20, 12, 9, 2, 14, 18, 4, 10, 24, 3, 18, 22, 12, 20, 8, 11, 13, 21, 23, 13, 21, 9, 23, 13, 19, 24, 18, 23, 12, 1, 8, 23, 4, 5, 24, 2, 17, 23, 2, 24, 15, 13, 10, 13, 6, 15, 12, 21, 1, 9, 6, 9, 24, 11, 23, 8, 19, 19, 10, 9, 7, 8, 12, 18, 7, 17, 3, 10, 22, 3, 3, 1, 13, 11, 12, 18, 15, 19, 22, 17, 8, 24, 24, 4, 5, 22, 14, 13, 5, 6, 19, 12, 22, 23, 8, 11, 11, 15, 24, 13, 15, 10, 12, 12, 22, 12, 24, 2, 7, 12, 23, 10, 18, 12, 12, 5, 24, 12, 8, 15, 17, 24, 13, 17, 18, 9, 24, 9, 23, 14, 18, 8, 4, 10, 7, 21, 19, 17, 23, 15, 22, 22, 18, 23, 18, 18, 22, 23, 8, 23, 7, 6, 22, 4, 12, 19, 24, 18, 8, 15, 1, 23, 11, 11, 17, 12, 15, 15, 19, 8, 0, 18, 11, 25, 22, 18, 7, 19, 2, 4, 24, 13, 9, 19, 19, 10, 11, 19, 13, 2, 12, 24, 19, 17, 12, 15, 17, 25, 23, 18, 12, 12, 15, 21, 14, 15, 22, 6, 24, 15, 19, 18, 15, 24, 23, 23, 6, 7, 7, 2, 8, 9, 5, 12, 8, 7, 2, 10, 8, 14, 13, 15, 18, 3, 10, 23, 6, 8, 24, 22, 15, 2, 2, 2, 2, 10, 13, 3, 9, 13, 22, 5, 23, 8, 18, 12, 8, 13, 22, 3, 11, 13, 22, 19, 13, 2, 2, 19, 25, 18, 25, 17, 15, 9, 17, 18, 13, 24, 8, 23, 13, 13, 15, 8, 12, 19, 13, 13, 4, 22, 10, 21, 13, 24, 8, 8, 9, 11, 22, 1, 6, 15, 15, 13, 13, 22, 25, 19, 23, 12, 8, 13, 12, 24, 4, 15, 19, 10, 7, 24, 4, 17, 12, 17, 24, 13, 24, 13, 18, 22, 12, 2, 15, 12, 19, 11, 1, 23, 13, 24, 15, 13, 13, 23, 22, 13, 15, 8, 15, 12, 13, 13, 22, 11, 4, 19, 12, 18, 6, 3, 23, 4, 8, 12, 13, 7, 23, 12, 17, 22, 22, 15, 24, 11, 22, 8, 18, 22, 13, 15, 9, 7, 24, 23, 10, 5, 23, 1, 23, 9, 10, 15, 10, 25, 25, 13, 15, 14, 12, 18, 13, 11, 8, 18, 11, 12, 15, 22, 10, 25, 2, 6, 17, 14, 15, 14, 11, 12, 13, 11, 8, 24, 15, 23, 17, 10, 23, 23, 10, 17, 4, 7, 13, 3, 14, 12, 22, 19, 23, 25, 15, 21, 22, 12, 10, 1, 5, 21, 13, 15, 22, 22, 8, 18, 12, 1, 13, 23, 2, 18, 12, 4, 7, 13, 13, 24, 14, 12, 13, 13, 15, 4, 24, 13, 25, 12, 15, 24, 4, 4, 7, 18, 2, 12, 22, 15, 11, 23, 15, 15, 13, 23, 17, 1, 23, 13, 21, 19, 8, 13, 11, 15, 18, 24, 17, 22, 24, 10, 15, 18, 10, 5, 3, 17, 15, 17, 17, 23, 12, 24, 14, 12, 10, 23, 15, 12, 13, 1, 8, 17, 13, 2, 5, 19, 25, 12, 15, 13, 13, 24, 12, 17, 8, 15, 22, 2, 18, 12, 17, 18, 17, 18, 8, 12, 22, 8, 15, 19, 20, 12, 5, 17, 22, 12, 24, 7, 8, 13, 12, 7, 11, 8, 19, 15, 23, 12, 18, 19, 5, 19, 24, 19, 18, 2, 4, 7, 17, 19, 15, 8, 13, 15, 12, 23, 13, 24, 3, 11, 17, 15, 22, 15, 15, 22, 20, 24, 13, 5, 1, 19, 14, 5, 15, 18, 24, 24, 11, 22, 15, 3, 4, 9, 13, 3, 3, 23, 19, 19, 22, 17, 18, 18, 18, 7, 13, 24, 13, 5, 17, 24, 22, 24, 15, 24, 23, 5, 12, 22, 22, 19, 15, 12, 23, 24, 19, 19, 13, 15, 18, 15, 12, 3, 18, 15, 19, 3, 17, 24, 9, 8, 22, 8, 17, 8, 15, 25, 11, 19, 18, 15, 23, 14, 19, 18, 12, 19, 2, 19, 9, 14, 22, 24, 12, 14, 3, 4, 21, 19, 17, 21, 3, 9, 23, 23, 24, 15, 13, 11, 10, 12, 9, 18, 22, 24, 16, 7, 4, 24, 3, 12, 24, 18, 12, 13, 19, 18, 8, 2, 8, 9, 6, 17, 19, 2, 12, 7, 23, 17, 20, 13, 12, 24, 5, 18, 9, 13, 24, 9, 13, 18, 23, 24, 18, 22, 13, 6, 12, 9, 15, 5, 9, 13, 19, 19, 23, 3, 10, 19, 15, 3, 15, 25, 5, 12, 3, 10, 10, 13, 23, 1, 13, 22, 17, 17, 15, 8, 20, 22, 3, 5, 24, 11, 18, 17, 5, 13, 15, 24, 24, 23, 10, 23, 13, 13, 22, 22, 18, 7, 3, 10, 18, 9, 22, 2, 8, 24, 8, 3, 13, 13, 24, 12, 12, 23, 17, 23, 10, 8, 18, 22, 18, 11, 15, 15, 13, 17, 12, 25, 22, 7, 23, 24, 23, 19, 13, 23, 18, 13, 13, 13, 19, 24, 11, 12]
# 遍历清除0值
for i in range(0,len(period_month)):if period_month[i]==0:period_month[i]=1
# 第二次统计月数
print(period_month)
[2, 11, 4, 22, 13, 3, 9, 15, 8, 7, 17, 12, 24, 23, 24, 17, 22, 17, 17, 5, 24, 13, 18, 18, 11, 24, 13, 13, 9, 22, 8, 22, 22, 11, 11, 12, 15, 13, 6, 20, 17, 13, 13, 22, 8, 15, 4, 24, 11, 10, 24, 13, 12, 18, 13, 15, 13, 13, 13, 9, 12, 23, 11, 12, 24, 24, 23, 24, 10, 17, 11, 24, 24, 6, 22, 24, 19, 8, 12, 18, 12, 2, 19, 25, 6, 6, 10, 17, 12, 10, 5, 25, 15, 12, 9, 18, 8, 7, 18, 23, 18, 8, 22, 9, 3, 17, 3, 9, 7, 5, 3, 10, 9, 20, 12, 11, 24, 23, 18, 17, 23, 1, 15, 8, 9, 4, 24, 22, 13, 20, 22, 11, 15, 10, 15, 22, 11, 5, 12, 12, 19, 1, 13, 6, 9, 9, 15, 19, 19, 19, 9, 10, 17, 15, 17, 5, 24, 10, 9, 3, 23, 22, 13, 15, 15, 12, 24, 11, 9, 15, 22, 11, 8, 22, 12, 12, 22, 6, 22, 11, 18, 8, 22, 2, 4, 13, 23, 23, 23, 23, 15, 9, 23, 24, 23, 24, 9, 13, 7, 23, 12, 8, 10, 12, 23, 22, 10, 10, 23, 9, 19, 3, 15, 13, 12, 13, 13, 15, 10, 17, 9, 15, 13, 13, 15, 17, 12, 13, 13, 19, 11, 17, 3, 3, 18, 12, 13, 15, 15, 19, 9, 15, 10, 8, 13, 12, 22, 17, 17, 15, 5, 12, 15, 23, 18, 13, 17, 24, 11, 22, 13, 5, 14, 5, 5, 13, 15, 11, 7, 11, 24, 9, 7, 13, 13, 17, 15, 6, 14, 18, 23, 11, 24, 19, 8, 25, 11, 17, 13, 7, 23, 13, 22, 15, 24, 3, 22, 7, 17, 7, 19, 24, 12, 12, 22, 1, 12, 17, 12, 24, 10, 17, 6, 19, 15, 12, 18, 15, 12, 19, 19, 19, 23, 24, 17, 3, 13, 11, 12, 12, 6, 13, 13, 6, 18, 18, 20, 19, 4, 1, 18, 11, 17, 13, 7, 8, 18, 19, 12, 23, 13, 23, 10, 23, 24, 11, 10, 15, 19, 19, 11, 17, 2, 21, 13, 22, 15, 3, 13, 24, 20, 15, 17, 11, 1, 7, 4, 12, 12, 12, 17, 12, 18, 3, 22, 23, 8, 23, 18, 10, 17, 13, 12, 23, 13, 7, 24, 23, 21, 18, 10, 24, 7, 18, 23, 5, 22, 8, 11, 13, 7, 8, 9, 7, 13, 18, 15, 9, 8, 5, 3, 7, 15, 15, 5, 20, 22, 25, 9, 19, 15, 24, 24, 14, 11, 13, 4, 13, 19, 2, 7, 13, 24, 8, 12, 11, 12, 11, 11, 12, 10, 3, 17, 3, 11, 7, 17, 6, 12, 11, 8, 12, 11, 15, 4, 17, 22, 3, 11, 13, 19, 3, 18, 12, 20, 13, 2, 10, 12, 12, 13, 2, 21, 24, 12, 24, 23, 15, 17, 13, 17, 15, 22, 25, 24, 2, 3, 3, 11, 11, 9, 18, 13, 22, 7, 17, 11, 12, 24, 5, 19, 8, 9, 10, 23, 12, 19, 24, 12, 24, 17, 24, 17, 24, 22, 19, 13, 19, 22, 21, 22, 7, 12, 17, 1, 23, 24, 11, 22, 3, 13, 12, 21, 19, 8, 21, 18, 6, 6, 24, 23, 23, 19, 23, 10, 13, 7, 22, 5, 12, 19, 23, 24, 19, 17, 23, 4, 12, 12, 3, 9, 13, 13, 1, 15, 11, 11, 22, 8, 12, 18, 3, 15, 13, 13, 18, 9, 17, 2, 17, 18, 13, 23, 4, 13, 15, 6, 22, 4, 13, 3, 24, 5, 12, 7, 24, 14, 15, 15, 9, 15, 3, 4, 7, 18, 13, 11, 24, 24, 9, 23, 13, 22, 15, 12, 7, 6, 24, 19, 20, 12, 9, 2, 14, 18, 4, 10, 24, 3, 18, 22, 12, 20, 8, 11, 13, 21, 23, 13, 21, 9, 23, 13, 19, 24, 18, 23, 12, 1, 8, 23, 4, 5, 24, 2, 17, 23, 2, 24, 15, 13, 10, 13, 6, 15, 12, 21, 1, 9, 6, 9, 24, 11, 23, 8, 19, 19, 10, 9, 7, 8, 12, 18, 7, 17, 3, 10, 22, 3, 3, 1, 13, 11, 12, 18, 15, 19, 22, 17, 8, 24, 24, 4, 5, 22, 14, 13, 5, 6, 19, 12, 22, 23, 8, 11, 11, 15, 24, 13, 15, 10, 12, 12, 22, 12, 24, 2, 7, 12, 23, 10, 18, 12, 12, 5, 24, 12, 8, 15, 17, 24, 13, 17, 18, 9, 24, 9, 23, 14, 18, 8, 4, 10, 7, 21, 19, 17, 23, 15, 22, 22, 18, 23, 18, 18, 22, 23, 8, 23, 7, 6, 22, 4, 12, 19, 24, 18, 8, 15, 1, 23, 11, 11, 17, 12, 15, 15, 19, 8, 1, 18, 11, 25, 22, 18, 7, 19, 2, 4, 24, 13, 9, 19, 19, 10, 11, 19, 13, 2, 12, 24, 19, 17, 12, 15, 17, 25, 23, 18, 12, 12, 15, 21, 14, 15, 22, 6, 24, 15, 19, 18, 15, 24, 23, 23, 6, 7, 7, 2, 8, 9, 5, 12, 8, 7, 2, 10, 8, 14, 13, 15, 18, 3, 10, 23, 6, 8, 24, 22, 15, 2, 2, 2, 2, 10, 13, 3, 9, 13, 22, 5, 23, 8, 18, 12, 8, 13, 22, 3, 11, 13, 22, 19, 13, 2, 2, 19, 25, 18, 25, 17, 15, 9, 17, 18, 13, 24, 8, 23, 13, 13, 15, 8, 12, 19, 13, 13, 4, 22, 10, 21, 13, 24, 8, 8, 9, 11, 22, 1, 6, 15, 15, 13, 13, 22, 25, 19, 23, 12, 8, 13, 12, 24, 4, 15, 19, 10, 7, 24, 4, 17, 12, 17, 24, 13, 24, 13, 18, 22, 12, 2, 15, 12, 19, 11, 1, 23, 13, 24, 15, 13, 13, 23, 22, 13, 15, 8, 15, 12, 13, 13, 22, 11, 4, 19, 12, 18, 6, 3, 23, 4, 8, 12, 13, 7, 23, 12, 17, 22, 22, 15, 24, 11, 22, 8, 18, 22, 13, 15, 9, 7, 24, 23, 10, 5, 23, 1, 23, 9, 10, 15, 10, 25, 25, 13, 15, 14, 12, 18, 13, 11, 8, 18, 11, 12, 15, 22, 10, 25, 2, 6, 17, 14, 15, 14, 11, 12, 13, 11, 8, 24, 15, 23, 17, 10, 23, 23, 10, 17, 4, 7, 13, 3, 14, 12, 22, 19, 23, 25, 15, 21, 22, 12, 10, 1, 5, 21, 13, 15, 22, 22, 8, 18, 12, 1, 13, 23, 2, 18, 12, 4, 7, 13, 13, 24, 14, 12, 13, 13, 15, 4, 24, 13, 25, 12, 15, 24, 4, 4, 7, 18, 2, 12, 22, 15, 11, 23, 15, 15, 13, 23, 17, 1, 23, 13, 21, 19, 8, 13, 11, 15, 18, 24, 17, 22, 24, 10, 15, 18, 10, 5, 3, 17, 15, 17, 17, 23, 12, 24, 14, 12, 10, 23, 15, 12, 13, 1, 8, 17, 13, 2, 5, 19, 25, 12, 15, 13, 13, 24, 12, 17, 8, 15, 22, 2, 18, 12, 17, 18, 17, 18, 8, 12, 22, 8, 15, 19, 20, 12, 5, 17, 22, 12, 24, 7, 8, 13, 12, 7, 11, 8, 19, 15, 23, 12, 18, 19, 5, 19, 24, 19, 18, 2, 4, 7, 17, 19, 15, 8, 13, 15, 12, 23, 13, 24, 3, 11, 17, 15, 22, 15, 15, 22, 20, 24, 13, 5, 1, 19, 14, 5, 15, 18, 24, 24, 11, 22, 15, 3, 4, 9, 13, 3, 3, 23, 19, 19, 22, 17, 18, 18, 18, 7, 13, 24, 13, 5, 17, 24, 22, 24, 15, 24, 23, 5, 12, 22, 22, 19, 15, 12, 23, 24, 19, 19, 13, 15, 18, 15, 12, 3, 18, 15, 19, 3, 17, 24, 9, 8, 22, 8, 17, 8, 15, 25, 11, 19, 18, 15, 23, 14, 19, 18, 12, 19, 2, 19, 9, 14, 22, 24, 12, 14, 3, 4, 21, 19, 17, 21, 3, 9, 23, 23, 24, 15, 13, 11, 10, 12, 9, 18, 22, 24, 16, 7, 4, 24, 3, 12, 24, 18, 12, 13, 19, 18, 8, 2, 8, 9, 6, 17, 19, 2, 12, 7, 23, 17, 20, 13, 12, 24, 5, 18, 9, 13, 24, 9, 13, 18, 23, 24, 18, 22, 13, 6, 12, 9, 15, 5, 9, 13, 19, 19, 23, 3, 10, 19, 15, 3, 15, 25, 5, 12, 3, 10, 10, 13, 23, 1, 13, 22, 17, 17, 15, 8, 20, 22, 3, 5, 24, 11, 18, 17, 5, 13, 15, 24, 24, 23, 10, 23, 13, 13, 22, 22, 18, 7, 3, 10, 18, 9, 22, 2, 8, 24, 8, 3, 13, 13, 24, 12, 12, 23, 17, 23, 10, 8, 18, 22, 18, 11, 15, 15, 13, 17, 12, 25, 22, 7, 23, 24, 23, 19, 13, 23, 18, 13, 13, 13, 19, 24, 11, 12]
# 计算f
data['F(月平消费次数)']=data['付款次数']/period_month
data
总金额订单付款时间付款次数R(最后一次消费时间)F(月平消费次数)
买家会员名
00牛哥哥00402.002017-02-062693 days1.000000
020luo74.702017-11-181408 days0.090909
0587xueguangju268.002017-04-141626 days0.250000
0o秋天de童话411.502018-10-09283 days0.090909
0残缺048.862018-01-191346 days0.076923
..................
黑河市201347.882018-01-111354 days0.076923
黑瑾瞳158.442018-07-262158 days0.105263
鼠标右键点51.872018-12-12119 days0.041667
龙星宇1018198.002017-11-171409 days0.090909
龙魂爱上凤灵43.862017-12-131383 days0.083333

1483 rows × 5 columns

(8)更改M为列名,对数据进行标准化

data['m(月平均消费金额)']=data['总金额']/period_month
data
总金额订单付款时间付款次数R(最后一次消费时间)F(月平消费次数)m(月平均消费金额)
买家会员名
00牛哥哥00402.002017-02-062693 days1.000000201.000000
020luo74.702017-11-181408 days0.0909096.790909
0587xueguangju268.002017-04-141626 days0.25000067.000000
0o秋天de童话411.502018-10-09283 days0.09090918.704545
0残缺048.862018-01-191346 days0.0769233.758462
.....................
黑河市201347.882018-01-111354 days0.0769233.683077
黑瑾瞳158.442018-07-262158 days0.1052638.338947
鼠标右键点51.872018-12-12119 days0.0416672.161250
龙星宇1018198.002017-11-171409 days0.09090918.000000
龙魂爱上凤灵43.862017-12-131383 days0.0833333.655000

1483 rows × 6 columns

# 标准化
cdata=data[['R(最后一次消费时间)','F(月平消费次数)','m(月平均消费金额)']]
# 修改索引
cdata.index = data.index
cdata
R(最后一次消费时间)F(月平消费次数)m(月平均消费金额)
买家会员名
00牛哥哥00693 days1.000000201.000000
020luo408 days0.0909096.790909
0587xueguangju626 days0.25000067.000000
0o秋天de童话83 days0.09090918.704545
0残缺0346 days0.0769233.758462
............
黑河市2013354 days0.0769233.683077
黑瑾瞳158 days0.1052638.338947
鼠标右键点19 days0.0416672.161250
龙星宇1018409 days0.09090918.000000
龙魂爱上凤灵383 days0.0833333.655000

1483 rows × 3 columns

z_cdata=(cdata-cdata.mean())/cdata.std()
#重命名列名
z_cdata.columns=['R(标准化)','F(标准化)','m(标准化)']
z_cdata
R(标准化)F(标准化)m(标准化)
买家会员名
00牛哥哥001.9268514.4323092.781167
020luo0.469973-0.211456-0.304766
0587xueguangju1.5843570.6012030.651941
0o秋天de童话-1.191378-0.211456-0.115461
0残缺00.153038-0.282899-0.352951
............
黑河市20130.193933-0.282899-0.354149
黑瑾瞳-0.807990-0.138134-0.280168
鼠标右键点-1.518537-0.462993-0.378330
龙星宇10180.475085-0.211456-0.126656
龙魂爱上凤灵0.342177-0.250154-0.354595

1483 rows × 3 columns

(9)存储预处理后的文件

data.to_csv('/data/bigfiles/client.csv')

2、数据分析

(1)读取预处理后的文件

data=pd.read('/data/bigfiles/client.csv')

(2)利用肘部法确定k的值(图像展示)


# 用SSE来记录每次聚集类后样本到中心的欧式距离
SSE=[]
# 分别聚类为1~9个类别
for k in range(1,9):estimator =KMeans(n_clusters=k)estimator.fit(z_cdata)
# 样本到最近聚类中心的距离平方之和SSE.append(estimator.inertia_)
#设置x轴数据
X=range(1,9)
#设置字体
plt.rcParams['font.sans-serif']=['SimHei']
#开始绘图
plt.plot(X,SSE,'o-')
plt.xlabel('k')
plt.ylabel('SSE')
plt.title("肘部图")
plt.show()

png



(3)建立KMeans模型

# 聚类分析
kmodel=KMeans(n_clusters=4,n_init=4,max_iter=100,random_state = 0)
kmodel.fit(z_cdata)
KMeans(max_iter=100, n_clusters=4, n_init=4, random_state=0)

(4)输出各个簇的质心

#查看每条数据所属的聚类类别 
kmodel.labels_
#查看聚类中心坐标
kmodel.cluster_centers_
array([[ 1.57670505,  1.17239812,  0.98112868],[-1.03013307, -0.37085365, -0.28728299],[ 0.43389504, -0.13530733, -0.18934963],[ 1.70098269,  4.71659247,  5.44718135]])

(5)存储客户类型文件

# 统计所属各个类别的数据个数
r1=pd.Series(kmodel.labels_).value_counts()
r2=pd.DataFrame(kmodel.cluster_centers_)
# 连接labels_与z_cdata
result=pd.concat([r2,r1],axis=1)
#重命名列名
result.columns=['R','F','M']+['类别']
result
RFM类别
01.5767051.1723980.981129157
1-1.030133-0.370854-0.287283587
20.433895-0.135307-0.189350712
31.7009834.7165925.44718127
# 连接labels_与z_cdata
KM_data=pd.concat([z_cdata,pd.Series(kmodel.labels_,index=z_cdata.index)],axis=1)
data1=pd.concat([data,pd.Series(kmodel.labels_,index=data.index)],axis=1)
#重命名列名
data1.columns=list(data.columns)+['类别']
KM_data.columns=['R','F','M']+['类别']
KM_data.head()
#买家会员名列与类名标签对应
KM_data['买家会员名']=KM_data.index

3、数据可视化(对每个类型客户标准化后的R、F、M数据分别进行图像展示)

# 分组统计求均值
kmeans_analysis =KM_data.groupby(KM_data['类别']).mean()
#重命名列名
kmeans_analysis.columns=['R','F','M']
kmeans_analysis
RFM
类别
01.5804171.1837410.988757
1-1.030133-0.370854-0.287283
20.436287-0.134135-0.187744
31.7009834.7165925.447181
#绘制柱状图
kmeans_analysis.plot(kind ='bar',rot=0,yticks=range(-1,9))
#完善图表
plt.title("聚类结果统计柱状图")
plt.xticks(range(0,4),['第0类','第1类','第2类','第3类'])
plt.grid(axis='y',color='grey',linestyle='--',alpha=0.5)
plt.ylabel("R,F,M 3个指标均值")
plt.savefig("聚类结果统计柱状图",dpi=128)

png

4、分析评价

实验总结:
通过本次实验,我们学习了如何使用numpy和pandas库处理数据,掌握了使用RFM分析模型对客户信息进行特征提取的方法。
同时,我们还学会了如何对特征数据进行标准化处理,以及使用Sklearn库实现K-Means聚类算法及其评价方法。
最后,我们利用matplotlib结合pandas库对数据分析进行了可视化处理。
在实验过程中,我们首先使用pandas等库完成了数据的预处理,计算了R、F、M三个特征指标,并将处理好的文件进行了保存。
接着,我们使用pandas等库完成了数据的标准化处理。然后,我们利用Sklearn库和RFM分析方法建立了聚类模型,完成了对客户价值的聚类分析,
并对聚类结果进行了评价。最后,我们结合pandas、matplotlib库对聚类完成的结果进行了可视化处理。
通过本次实验,我们对客户价值分析有了更深入的了解,掌握了相关的数据处理和分析方法,为今后的数据分析工作打下了坚实的基础。

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.hjln.cn/news/42784.html

如若内容造成侵权/违法违规/事实不符,请联系我们进行投诉反馈,一经查实,立即删除!

相关文章

当接口出现404问题时,可能出现问题的原因如下

当我们在进行网络应用开发或者使用API时,经常会遇到后端接口返回404(Not Found)错误的情况。这种错误通常意味着客户端请求了服务器上不存在的资源,可能由多种原因造成。下面将详细介绍这些原因,并给出相应的解决方法。 1. 资源路径错误一个常见的原因是客户端请求的资源路…

我的大学

不断发问 在走进大学,时不时问一问自己诸位在校,有两个问题应该自己问问,第一,到浙大来做什么?第二,将来毕业后要做什么样的人?科学的方法,公正的态度,果断的决心我们的心中一定另有答案,或清楚或模糊。 每个人都应该想一想才不会虚度自己在大学中的时光。 关于学习骐…

如何应对缺失值带来的分布变化?探索填充缺失值的最佳插补算法

本文将探讨了缺失值插补的不同方法,并比较了它们在复原数据真实分布方面的效果,处理插补是一个不确定性的问题,尤其是在样本量较小或数据复杂性高时的挑战,应选择能够适应数据分布变化并准确插补缺失值的方法。 我们假设存在一个潜在的分布P,从中得出观察值X。此外,还绘制…

202400610刷题总结

T1 T559。 T2(带权并查集) 1380。 把行和列的取值看成变量,其中行取1代表+1,列取1代表-1,为了凑x - y = c,这样可以拿并查集来做了。 维护d[x],到根的距离,我们把边定义为+,反向走为-。这样就行了,如果在一个集合,那么判断距离是不是c。 还可以差分约束,dfs(直接遍历一…

常微分方程

虽然这部分在笔记本上只有短短三页,但总是记不清公式,所以写下来,随时参考 规定 \(\int{p(x)\mathrm{d}x}\) 不含 \(C\) 一阶微分方程 一、变量分离方程 \[\frac{\mathrm d y}{\mathrm d x}=\frac{X(x)}{Y(y)} \]解:移项积分 \(\int{Y(y)}\mathrm{d}y=\int{X(x)}\mathrm{d}…

Python爬虫:通过js逆向了解某音请求接口参数a_bogus加密过程

1. 前言 需要提前说明以下,本篇文章讲述的内容仅供学习,切莫用于商业活动,如若被相关人员发现,本小编概不负责!切记。。 本次分析的接口为:https://www.douyin.com/aweme/v1/web/discover/search/ 它的请求方式为:GET 请求需要的参数有:请求参数中需要进行js逆向是:a_…

Chapter1 p1 Output Image

由于本文章是对TinyRenderer的模仿,所以并不打算引入外部库。 那么我们第一步需要解决的就是图形输出的问题,毕竟,如果连渲染的结果都看不到,那还叫什么Renderer嘛。 由于不引入外部库,所以选择输出的图片格式应该越简单越好,各种位图就成为了我们的首选。 这里我们选择了…