前提
書籍:ビックデータ分析・活用のためのSQLレシピ 5.2.1.2のコードをTRYしていますが
AnalysisExceptionエラーが発生しています。
実現したいこと
エラーの原因を解消したい。
発生している問題・エラーメッセージ
--------------------------------------------------------------------------- AnalysisException Traceback (most recent call last) Cell In [40], line 1 ----> 1 spark.sql(""" 2 WITH mst_users_with_year_month AS ( 3 SELECT 4 *, 5 substr(register_date, 1, 7) AS year_month 6 FROM test) 7 SELECT 8 year_month, 9 COUNT(distinct user_id) AS register_count, 10 LAG(COUNT(DISTINCT user_id)) OVER(ORDER BY year_month ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS last_month_count, 11 1 * COUNT(distinct user_id) / LAG(COUNT(DISTINCT user_id)) 12 OVER(ORDER BY year_month ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS month_over_month_ratio 13 FROM mst_users_with_year_month 14 GROUP BY year_month 15 """).show() File /opt/homebrew/Caskroom/miniforge/base/envs/arm_pyspark_env/lib/python3.8/site-packages/pyspark/sql/session.py:1034, in SparkSession.sql(self, sqlQuery, **kwargs) 1032 sqlQuery = formatter.format(sqlQuery, **kwargs) 1033 try: -> 1034 return DataFrame(self._jsparkSession.sql(sqlQuery), self) 1035 finally: 1036 if len(kwargs) > 0: File /opt/homebrew/Caskroom/miniforge/base/envs/arm_pyspark_env/lib/python3.8/site-packages/py4j/java_gateway.py:1321, in JavaMember.__call__(self, *args) 1315 command = proto.CALL_COMMAND_NAME +\ 1316 self.command_header +\ 1317 args_command +\ 1318 proto.END_COMMAND_PART 1320 answer = self.gateway_client.send_command(command) -> 1321 return_value = get_return_value( 1322 answer, self.gateway_client, self.target_id, self.name) 1324 for temp_arg in temp_args: 1325 temp_arg._detach() File /opt/homebrew/Caskroom/miniforge/base/envs/arm_pyspark_env/lib/python3.8/site-packages/pyspark/sql/utils.py:196, in capture_sql_exception.<locals>.deco(*a, **kw) 192 converted = convert_exception(e.java_exception) 193 if not isinstance(converted, UnknownException): 194 # Hide where the exception came from that shows a non-Pythonic 195 # JVM exception message. --> 196 raise converted from None 197 else: 198 raise AnalysisException: Cannot specify window frame for lag function
該当のソースコード
SparkSQL
※SQL文のみ spark.sql(""" WITH mst_users_with_year_month AS ( SELECT *, substr(register_date, 1, 7) AS year_month FROM test) SELECT year_month, COUNT(distinct user_id) AS register_count, LAG(COUNT(DISTINCT user_id)) OVER(ORDER BY year_month ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS last_month_count, 1 * COUNT(distinct user_id) / LAG(COUNT(DISTINCT user_id)) OVER(ORDER BY year_month ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS month_over_month_ratio FROM mst_users_with_year_month GROUP BY year_month """).show()
試したこと
LAG(COUNT(DISTINCT user_id)) OVER(ORDER BY year_month ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS last_month_count,
1 * COUNT(distinct user_id) / LAG(COUNT(DISTINCT user_id))
OVER(ORDER BY year_month ROWS BETWEEN 1 PRECEDING AND 1 PRECEDING) AS month_over_month_ratioを除外すればコードを動かすことはできた。
補足情報(FW/ツールのバージョンなど)
テーブル情報
DROP TABLE IF EXISTS mst_users; CREATE TABLE mst_users( user_id varchar(255) , sex varchar(255) , birth_date varchar(255) , register_date varchar(255) , register_device varchar(255) , withdraw_date varchar(255) ); INSERT INTO mst_users VALUES ('U001', 'M', '1977-06-17', '2016-10-01', 'pc' , NULL ) , ('U002', 'F', '1953-06-12', '2016-10-01', 'sp' , '2016-10-10') , ('U003', 'M', '1965-01-06', '2016-10-01', 'pc' , NULL ) , ('U004', 'F', '1954-05-21', '2016-10-05', 'pc' , NULL ) , ('U005', 'M', '1987-11-23', '2016-10-05', 'sp' , NULL ) , ('U006', 'F', '1950-01-21', '2016-10-10', 'pc' , '2016-10-10') , ('U007', 'F', '1950-07-18', '2016-10-10', 'app', NULL ) , ('U008', 'F', '2006-12-09', '2016-10-10', 'sp' , NULL ) , ('U009', 'M', '2004-10-23', '2016-10-15', 'pc' , NULL ) , ('U010', 'F', '1987-03-18', '2016-10-16', 'pc' , NULL ) , ('U011', 'F', '1993-10-21', '2016-10-18', 'pc' , NULL ) , ('U012', 'M', '1993-12-22', '2016-10-18', 'app', NULL ) , ('U013', 'M', '1988-02-09', '2016-10-20', 'app', NULL ) , ('U014', 'F', '1994-04-07', '2016-10-25', 'sp' , NULL ) , ('U015', 'F', '1994-03-01', '2016-11-01', 'app', NULL ) , ('U016', 'F', '1991-09-02', '2016-11-01', 'pc' , NULL ) , ('U017', 'F', '1972-05-21', '2016-11-01', 'app', NULL ) , ('U018', 'M', '2009-10-12', '2016-11-01', 'app', NULL ) , ('U019', 'M', '1957-05-18', '2016-11-01', 'pc' , NULL ) , ('U020', 'F', '1954-04-17', '2016-11-03', 'app', NULL ) , ('U021', 'M', '2002-08-14', '2016-11-03', 'sp' , NULL ) , ('U022', 'M', '1979-12-09', '2016-11-03', 'app', NULL ) , ('U023', 'M', '1992-01-12', '2016-11-04', 'sp' , NULL ) , ('U024', 'F', '1962-10-16', '2016-11-05', 'app', NULL ) , ('U025', 'F', '1958-06-26', '2016-11-05', 'app', NULL ) , ('U026', 'M', '1969-02-21', '2016-11-10', 'sp' , NULL ) , ('U027', 'F', '2001-07-10', '2016-11-10', 'pc' , NULL ) , ('U028', 'M', '1976-05-26', '2016-11-15', 'app', NULL ) , ('U029', 'M', '1964-04-06', '2016-11-28', 'pc' , NULL ) , ('U030', 'M', '1959-10-07', '2016-11-28', 'sp' , NULL ) ;
0 コメント