【SQL】相同ip集的用户对

zjk 发布于 2024-03-29 107 次阅读


问题

现在有一张用户登陆日志表,该表包括user_id,ip,log_time三个字段,现在需要找出共同使用ip数量超过3个(含)的所有用户对。比如下面的示例数据,101和102用户共同使用的ip为4个,101和103用户共同使用的ip为3个,102和103用户共同使用的ip为3个。

user_id       ip        log_time
101	192.168.10.101	2022-05-10 11:00:00	
101	192.168.10.101	2022-05-10 11:01:00	
101	192.168.10.102	2022-05-10 11:02:00	
101	192.168.10.103	2022-05-10 11:03:00	
101	192.168.10.104	2022-05-10 11:04:00	
102	192.168.10.101	2022-05-10 11:04:30	
102	192.168.10.102	2022-05-10 11:05:00	
102	192.168.10.103	2022-05-10 11:06:00	
102	192.168.10.104	2022-05-10 11:07:00	
103	192.168.10.102	2022-05-10 11:08:00	
103	192.168.10.103	2022-05-10 11:08:00	
103	192.168.10.104	2022-05-10 11:10:00	
104	192.168.10.103	2022-05-10 11:11:00	
104	192.168.10.104	2022-05-10 11:12:00	
105	192.168.10.105	2022-05-10 11:13:00

解答

类似相同好友问题,关键点是使用自连接。先对用户+ip去重之后,使用ip进行自关联,得到的t1.user_id和t2.user_id的关系就是有同一个ip。按这个ip进行聚合,出现的就是t1和t2的用户相同的ip集合,判断这个集合大小即可。

with user_login as (
select 101 as user_id ,'192.168.10.101' as ip ,'2022-05-10 11:00:00' as log_time
union all                                                            
select 101 as user_id ,'192.168.10.101' as ip ,'2022-05-10 11:01:00' as log_time
union all                                                            
select 101 as user_id ,'192.168.10.102' as ip ,'2022-05-10 11:02:00' as log_time
union all                                                            
select 101 as user_id ,'192.168.10.103' as ip ,'2022-05-10 11:03:00' as log_time
union all                                                            
select 101 as user_id ,'192.168.10.104' as ip ,'2022-05-10 11:04:00' as log_time
union all                                                            
select 102 as user_id ,'192.168.10.101' as ip ,'2022-05-10 11:04:30' as log_time
union all                                                            
select 102 as user_id ,'192.168.10.102' as ip ,'2022-05-10 11:05:00' as log_time
union all                                                            
select 102 as user_id ,'192.168.10.103' as ip ,'2022-05-10 11:06:00' as log_time
union all                                                            
select 102 as user_id ,'192.168.10.104' as ip ,'2022-05-10 11:07:00' as log_time
union all                                                            
select 103 as user_id ,'192.168.10.102' as ip ,'2022-05-10 11:08:00' as log_time
union all                                                            
select 103 as user_id ,'192.168.10.103' as ip ,'2022-05-10 11:08:00' as log_time
union all                                                            
select 103 as user_id ,'192.168.10.104' as ip ,'2022-05-10 11:10:00' as log_time
union all                                                            
select 104 as user_id ,'192.168.10.103' as ip ,'2022-05-10 11:11:00' as log_time
union all                                                            
select 104 as user_id ,'192.168.10.104' as ip ,'2022-05-10 11:12:00' as log_time
union all                                                            
select 105 as user_id ,'192.168.10.105' as ip ,'2022-05-10 11:13:00' as log_time
),
tmp as
(
    select 
    user_id
    ,ip
    from user_login --实际换成自己的表或上面的样例数据
    group by user_id,ip  --同一个ip同一用户可能多次登录,先去重
)

select
t1.user_id
,t2.user_id
,count(t1.ip) as ip_cnt
from tmp t1
inner join tmp t2
on t1.ip=t2.ip  --通过ip自关联
where t1.user_id<t2.user_id  --因为存在101对102,102对101的情况,保留一种即可
group by t1.user_id,t2.user_id
having ip_cnt>=3  --保留用户对ip数量超过3个的(含)