pyspark drop column if exists

pyspark drop column if exists

It contains well written, well thought and well explained computer science and programming articles, quizzes and practice/competitive programming/company interview Questions. Consider 2 dataFrames: >>> aDF.show() Connect and share knowledge within a single location that is structured and easy to search. How to drop rows of Pandas DataFrame whose value in a certain column is NaN. drop() is a transformation function hence it returns a new DataFrame after dropping the rows/records from the current Dataframe.if(typeof ez_ad_units != 'undefined'){ez_ad_units.push([[250,250],'sparkbyexamples_com-medrectangle-4','ezslot_9',109,'0','0'])};__ez_fad_position('div-gpt-ad-sparkbyexamples_com-medrectangle-4-0');if(typeof ez_ad_units != 'undefined'){ez_ad_units.push([[250,250],'sparkbyexamples_com-medrectangle-4','ezslot_10',109,'0','1'])};__ez_fad_position('div-gpt-ad-sparkbyexamples_com-medrectangle-4-0_1'); .medrectangle-4-multi-109{border:none !important;display:block !important;float:none !important;line-height:0px;margin-bottom:15px !important;margin-left:auto !important;margin-right:auto !important;margin-top:15px !important;max-width:100% !important;min-height:250px;min-width:250px;padding:0;text-align:center !important;}. Specifically, well discuss how to. DataFrame/Dataset has a variable na which is an instance of class DataFrameNaFunctions hence, you should be using na variable on DataFrame to use drop(). You can use following code to do prediction on a column may not exist. As you see above DataFrame most of the rows have NULL values except record with id=4. Here we are dropping the rows with null values, we are using isNotNull() function to drop the rows, Syntax: dataframe.where(dataframe.column.isNotNull()), Python program to drop null values based on a particular column. You could either explicitly name the columns you want to keep, like so: keep = [a.id, a.julian_date, a.user_id, b.quan_created_money, b.quan_create !if(typeof ez_ad_units != 'undefined'){ez_ad_units.push([[728,90],'sparkbyexamples_com-box-4','ezslot_5',153,'0','0'])};__ez_fad_position('div-gpt-ad-sparkbyexamples_com-box-4-0'); Save my name, email, and website in this browser for the next time I comment. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Alternatively you can also get same result with na.drop("any"). Is there a way to only permit open-source mods for my video game to stop plagiarism or at least enforce proper attribution? Should I include the MIT licence of a library which I use from a CDN? If the table is cached, the command clears cached data of the table and all its dependents that refer to it. Launching the CI/CD and R Collectives and community editing features for How do I detect if a Spark DataFrame has a column, Create new Dataframe with empty/null field values, Selecting map key as column in dataframe in spark, Difference between DataFrame, Dataset, and RDD in Spark, spark - set null when column not exist in dataframe. You just keep the necessary columns: drop_column_list = ["drop_column"] Then pass the Array[Column] to select Spark Dataframe distinguish columns with duplicated name. As you see columns type, city and population columns have null values. If the table is cached, the ALTER TABLE .. SET LOCATION command clears cached data of the table and all its dependents that refer to it. Python Programming Foundation -Self Paced Course, PySpark DataFrame - Drop Rows with NULL or None Values, Drop rows containing specific value in PySpark dataframe, Drop rows from the dataframe based on certain condition applied on a column, Count rows based on condition in Pyspark Dataframe, Python PySpark - Drop columns based on column names or String condition. The idea of banned_columns is to drop any columns that start with basket and cricket, and columns that contain the word ball anywhere in their name. @seufagner it does just pass it as a list, How to delete columns in pyspark dataframe, spark.apache.org/docs/latest/api/python/, The open-source game engine youve been waiting for: Godot (Ep. Not the answer you're looking for? if i in df: Has the term "coup" been used for changes in the legal system made by the parliament? How to add a constant column in a Spark DataFrame? Find centralized, trusted content and collaborate around the technologies you use most. Python program to drop rows where ID less than 4. I want to drop columns in a pyspark dataframe that contains any of the words in the banned_columns list and form a new dataframe out of the remaining columns. Because drop () is a transformation method, it produces a new DataFrame after removing rows/records from the current Dataframe. i tried and getting org.apache.spark.SparkException: Failed to execute user defined function(DataFrameConverter$$$Lambda$2744/0x000000080192ef48: (string, string) => string), Spark: Return empty column if column does not exist in dataframe, how do I detect if a spark dataframe has a column, general guidelines about adding empty columns, https://gist.github.com/ebuildy/3c9b2663d47f7b65fbc12cfb469ae19c, The open-source game engine youve been waiting for: Godot (Ep. Has 90% of ice around Antarctica disappeared in less than a decade? Adding to @Patrick's answer, you can use the following to drop multiple columns, An easy way to do this is to user "select" and realize you can get a list of all columns for the dataframe, df, with df.columns. Note that one can use a typed literal (e.g., date2019-01-02) in the partition spec. Find centralized, trusted content and collaborate around the technologies you use most. Save my name, email, and website in this browser for the next time I comment. Syntax: col_name col_type [ col_comment ] [ col_position ] [ , ]. First, lets create an example DataFrame that well reference throughout this guide in order to demonstrate a few concepts. A Computer Science portal for geeks. I just had to do this; here's what I did: # Drop these columns if they exist Drop One or Multiple Columns From PySpark DataFrame, How to drop duplicates and keep one in PySpark dataframe. Spark 2.4 (and least versions) doesn't accepts more than one column name. Create a function to check on the columns and keep checking each column to see if it exists, if not replace it with None or a relevant datatype value. rev2023.3.1.43269. Here you evaluate in function if column exists, and if it doesn't it just returns a NULL column. When specifying both labels and columns, only labels will be Drop rows with condition using where() and filter() keyword. There are two id: bigint and I want to delete one. this overrides the old value with the new one. I do not think that axis exists in pyspark ? Not the answer you're looking for? In my tests the following was at least as fast as any of the given answers: candidates=['row_num','start_date','end_date','symbol'] NA values are the missing value in the dataframe, we are going to drop the rows having the missing values. Yes, it is possible to drop/select columns by slicing like this: slice = data.columns[a:b] data.select(slice).show() Example: newDF = spark.createD How to Order PysPark DataFrame by Multiple Columns ? Connect and share knowledge within a single location that is structured and easy to search. document.getElementById( "ak_js_1" ).setAttribute( "value", ( new Date() ).getTime() ); SparkByExamples.com is a Big Data and Spark examples community page, all examples are simple and easy to understand, and well tested in our development environment, | { One stop for all Spark Examples }, PySpark Drop One or Multiple Columns From DataFrame, Fonctions filter where en PySpark | Conditions Multiples, PySpark Convert Dictionary/Map to Multiple Columns, PySpark split() Column into Multiple Columns, PySpark Where Filter Function | Multiple Conditions, PySpark withColumnRenamed to Rename Column on DataFrame. Apply pandas function to column to create multiple new columns? How can the mass of an unstable composite particle become complex? How to drop all columns with null values in a PySpark DataFrame ? You could either explicitly name the columns you want to keep, like so: Or in a more general approach you'd include all columns except for a specific one via a list comprehension. To subscribe to this RSS feed, copy and paste this URL into your RSS reader. So as @Hello.World said this throws an error if the column does not exist. Webpyspark.sql.functions.exists(col, f) [source] . In order to remove Rows with NULL values on selected columns of PySpark DataFrame, use drop(columns:Seq[String]) or drop(columns:Array[String]). By clicking Post Your Answer, you agree to our terms of service, privacy policy and cookie policy. df = df.drop(['row How to react to a students panic attack in an oral exam? Apart from directly dropping columns, weve also seen that in some cases it might be more convenient to reverse the operation and actually select only the desired columns you wish to keep in the resulting DataFrame. drop () Remove columns by specifying label names and axis=1 or columns. Since version 1.4 of spark there is a function drop(col) which can be used in pyspark on a dataframe. the partition rename command clears caches of all table dependents while keeping them as cached. Example 2: Drop duplicates based on the column name. as in example? They are represented as null, by using dropna() method we can filter the rows. WebYou cannot drop or alter a primary key column or a column that participates in the table partitioning clause. The cache will be lazily filled when the next time the table or the dependents are accessed. Issue is that some times, the JSON file does not have some of the keys that I try to fetch - like ResponseType. Moreover, is using the filter or/and reduce functions adds optimization than creating list and for loops? getOrCreate()the method returns an existing SparkSession if it exists otherwise it creates a new SparkSession. In this article, we will discuss how to drop columns in the Pyspark dataframe. Site design / logo 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA. Lets check if column exists by case insensitive, here I am converting column name you wanted to check & all DataFrame columns to Caps.if(typeof ez_ad_units != 'undefined'){ez_ad_units.push([[728,90],'sparkbyexamples_com-medrectangle-3','ezslot_3',156,'0','0'])};__ez_fad_position('div-gpt-ad-sparkbyexamples_com-medrectangle-3-0'); df.columns dont return columns from the nested struct, so If you have a DataFrame with nested struct columns, you can check if the column exists on the nested column by getting schema in a string using df.schema.simpleString(). How to react to a students panic attack in an oral exam? rev2023.3.1.43269. Your list comprehension does not do what you expect it to do. Note that this statement is only supported with v2 tables. Catalog.tableExists(tableName: str, dbName: Optional[str] = None) bool [source] . WebDrop specified labels from columns. By default drop() without arguments remove all rows that have null values on any column of DataFrame. Select needs to take a list of strings NOT a list of columns. Below example drops all rows that has NULL values on all columns. Hope this helps ! x = ['row_num','start_date','end_date','symbol'] Is it ethical to cite a paper without fully understanding the math/methods, if the math is not relevant to why I am citing it? 542), How Intuit democratizes AI development across teams through reusability, We've added a "Necessary cookies only" option to the cookie consent popup. Browse other questions tagged, Where developers & technologists share private knowledge with coworkers, Reach developers & technologists worldwide. Is something's right to be free more important than the best interest for its own species according to deontology? Launching the CI/CD and R Collectives and community editing features for Join PySpark dataframe with a filter of itself and columns with same name, Concatenate columns in Apache Spark DataFrame. By clicking Accept all cookies, you agree Stack Exchange can store cookies on your device and disclose information in accordance with our Cookie Policy. Help me understand the context behind the "It's okay to be white" question in a recent Rasmussen Poll, and what if anything might these results show? How do I check whether a file exists without exceptions? Making statements based on opinion; back them up with references or personal experience. | id|datA| Note that one can use a typed literal (e.g., date2019-01-02) in the partition spec. Can I use this tire + rim combination : CONTINENTAL GRAND PRIX 5000 (28mm) + GT540 (24mm), Centering layers in OpenLayers v4 after layer loading, Ackermann Function without Recursion or Stack, How to choose voltage value of capacitors. What are examples of software that may be seriously affected by a time jump? Making statements based on opinion; back them up with references or personal experience. How to handle multi-collinearity when all the variables are highly correlated? What happened to Aham and its derivatives in Marathi? +---+----+ In your case : df.drop("id").columns where (): This I want to drop columns in a pyspark dataframe that contains any of the words in the banned_columns list and form a new dataframe out of the remaining And to resolve the id ambiguity I renamed my id column before the join then dropped it after the join using the keep list. New in version 3.1.0. Solution: PySpark Check if Column Exists in DataFrame. Check if a given key already exists in a dictionary, Fastest way to check if a value exists in a list. I think I got the answer. Make an Array of column names from your oldDataFrame and delete the columns that you want to drop ("colExclude"). PySpark DataFrame has an attribute columns() that returns all column names as a list, hence you can use Python to In todays short guide, well explore a few different ways for deleting columns from a PySpark DataFrame. Since this answer was helpful to some, I would rather link the question. Maybe a little bit off topic, but here is the solution using Scala. Make an Array of column names from your oldDataFrame and delete the columns The above is what I did so far, but it does not work (as in the new dataframe still contains those columns names). PySpark DataFrame has an attribute columns() that returns all column names as a list, hence you can use Python to check if the column exists. Partition to be renamed. Remove columns by specifying label names and axis=1 or columns. You should avoid the collect() version, because it will send to the master the complete dataset, it will take a big computing effort! Site design / logo 2023 Stack Exchange Inc; user contributions licensed under CC BY-SA. and >>> bDF.show() Happy Learning ! Which basecaller for nanopore is the best to produce event tables with information about the block size/move table? The cache will be lazily filled when the next time the table or the dependents are accessed. 2. Making statements based on opinion; back them up with references or personal experience. For example, if the number of columns you want to drop is greater than the number of columns you want to keep in the resulting DataFrame then it makes sense to perform a selection instead. If you want to drop more than one column you Web1. Note that one can use a typed literal (e.g., date2019-01-02) in the partition spec. This complete code is available at GitHub project. A Computer Science portal for geeks. rev2023.3.1.43269. Webpyspark check if delta table exists. Yes, it is possible to drop/select columns by slicing like this: Use select method to get features column: To accomplish what you are looking for, there are 2 ways: 1. Adding to @Patrick's answer, you can use the following to drop multiple columns columns_to_drop = ['id', 'id_copy'] porter county recent arrests; facts about shepherds during biblical times; pros and cons of being a lady in medieval times; real talk kim husband affairs 2020; grocery outlet locations; tufted roman geese; perry's steakhouse roasted creamed corn recipe; Has Microsoft lowered its Windows 11 eligibility criteria? SERDEPROPERTIES ( key1 = val1, key2 = val2, ). Partition to be replaced. I saw many confusing answers, so I hope this helps in Pyspark, here is how you do it! RV coach and starter batteries connect negative to chassis; how does energy from either batteries' + terminal know which battery to flow back to? This removes all rows with null values and returns the clean DataFrame with id=4 where it doesnt have any NULL values. In todays short guide, well explore a few different ways for deleting Browse other questions tagged, Where developers & technologists share private knowledge with coworkers, Reach developers & technologists worldwide. Introduction. | 3| a3| In this case it makes more sense to simply select that column rather than dropping the other 3 columns: In todays short guide we discussed a few different ways for deleting columns from a PySpark DataFrame. Jordan's line about intimate parties in The Great Gatsby? How to drop multiple column names given in a list from PySpark DataFrame ? What does a search warrant actually look like? How to add a constant column in a Spark DataFrame? Launching the CI/CD and R Collectives and community editing features for How do I merge two dictionaries in a single expression in Python? Below is a PySpark example of using dropna() function of DataFrame to drop rows with NULL values. Was Galileo expecting to see so many stars? All good points. Reading the Spark documentation I found an easier solution. The most elegant way for dropping columns is the use of pyspark.sql.DataFrame.drop function that returns a new DataFrame with the specified columns being dropped: Note that if a specified column does not exist in the column, this will be a no-op meaning that the operation wont fail and will have no effect at all. drop (how='any', thresh=None, subset=None) The drop () method in PySpark has three optional arguments that may be used to eliminate NULL values from single, any, all, or numerous DataFrame columns. Find centralized, trusted content and collaborate around the technologies you use most. Here, the SQL expression uses the any (~) method which returns a Youll also get full access to every story on Medium. filter if all elements in an array meet a condition Create a DataFrame with some integers: df = spark.createDataFrame( So do this: Well, that should do exactly the same thing as my answer, as I'm pretty sure that, @deusxmach1na Actually the column selection based on strings cannot work for the OP, because that would not solve the ambiguity of the. Thanks for contributing an answer to Stack Overflow! rev2023.3.1.43269. Additionally: Specifies a table name, which may be optionally qualified with a database name. Asking for help, clarification, or responding to other answers. In this article, we are going to drop the rows in PySpark dataframe. Note that one can use a typed literal (e.g., date2019-01-02) in the partition spec. ALTER TABLE RECOVER PARTITIONS statement recovers all the partitions in the directory of a table and updates the Hive metastore. 542), How Intuit democratizes AI development across teams through reusability, We've added a "Necessary cookies only" option to the cookie consent popup. Add parameter errors to DataFrame.drop : errors : {'ignore', 'raise'}, default 'raise' If 'ignore', suppress error and only existing labels are Python code to create student dataframe with three columns: Here we are going to delete a single column from the dataframe. 542), How Intuit democratizes AI development across teams through reusability, We've added a "Necessary cookies only" option to the cookie consent popup. A Computer Science portal for geeks. Specifies the SERDE properties to be set. In pyspark the drop () To learn more, see our tips on writing great answers. For example like this (excluding the id column from b): Finally you make a selection on your join result: Maybe a little bit off topic, but here is the solution using Scala. All the functions are included in the example together with test data. Making statements based on opinion; back them up with references or personal experience. What are some tools or methods I can purchase to trace a water leak? How do I check if directory exists in Python? Is email scraping still a thing for spammers, Theoretically Correct vs Practical Notation. How to select and order multiple columns in Pyspark DataFrame ? You cannot drop the first column of any projection sort order, or columns that participate in a projection segmentation expression. How can I recognize one? The problem that i have is that these check conditions are not static but instead, they are read from an external file and generated on the fly and it may have columns that the actual dataframe does not have and causes error's as below. In this article, we will describe an approach for Change Data Capture Implementation using PySpark. Why was the nose gear of Concorde located so far aft? An easy way to do this is to user " select " and realize you can get a list of all columns for the dataframe , df , with df.columns drop_list To be free more important than the best interest for its own species according to?! Na.Drop ( `` any '' ) cause unexpected behavior bit off topic, but here is the to! A thing for spammers, Theoretically Correct vs Practical Notation adds optimization than creating list and loops! You use most proper attribution way to only permit open-source mods for my video to. ) to learn more, see our tips on writing Great answers Spark 2.4 ( and least versions ) n't. Example together with test data from a CDN handle multi-collinearity when all variables! Inc ; user contributions licensed under CC BY-SA the keys that I try to fetch - like.! Be lazily filled when the next time the table is cached, the command clears of... ) without arguments remove all rows with NULL values we will describe an approach for Change data Capture Implementation PySpark. Null values in a projection segmentation expression information about the block size/move?... Dependents while keeping them as cached an error if the column does not exist examples of software may... Which basecaller for nanopore is the solution using Scala ( key1 = val1, key2 = val2,.! A decade already exists in a Spark DataFrame library which I use from a CDN trace! The column does not exist variables are highly correlated PySpark example pyspark drop column if exists dropna..., by using dropna ( ) Happy Learning way to check if column exists, and in... Partitioning clause as you see above DataFrame most of the table is cached, the command caches. Be lazily filled when the next time I comment order, or responding to other answers or a column participates. Col_Type [ col_comment ] [, ] column in a single expression Python. May be seriously affected by a time jump or columns around Antarctica disappeared in less than a decade filter... Check if directory exists in DataFrame and order multiple columns in the PySpark DataFrame and. Names from your oldDataFrame and delete the columns that you want to drop rows of Pandas DataFrame value! Id: bigint and I want to delete one a library which I use from a CDN,. Unstable composite particle become complex the command clears cached data of the keys that I try to fetch - ResponseType! Sparksession if it exists otherwise it creates a new DataFrame after removing rows/records from the current DataFrame columns! Community editing features for how do I check if a value exists in a single location that is and! 2.4 ( and least versions ) does n't accepts more than one column you Web1 a.. Because drop ( ) the method returns an existing SparkSession if it does n't more... A water leak saw many confusing answers, so creating this branch may cause behavior... Because drop ( col, f ) [ source ] DataFrame with id=4 where it doesnt have any values... ) which can be used in PySpark DataFrame off topic, but is! Rows where ID less than a decade exists without exceptions tableName: str, dbName: Optional str! Concorde located so far aft few concepts thought and well explained computer science and programming articles, quizzes and programming/company. It exists otherwise it creates a new DataFrame after removing rows/records from the current DataFrame column to multiple... Cookie policy than 4 as NULL, by using dropna ( ) method can! The filter or/and reduce functions adds optimization than creating list and for loops private knowledge with,... Array of column names given in a list: PySpark check if column exists Python. Library which I use from a CDN CI/CD and R Collectives and community editing for... Collectives and community editing features for how do I check if a value exists in.! Given key already exists in DataFrame of all table dependents while keeping them as cached in. A column that participates in the partition spec and returns the clean DataFrame with id=4 where it doesnt have NULL... Recovers all the variables are highly correlated table or the dependents are accessed you use most ( e.g., ). Cause unexpected behavior, Fastest way to check if a value exists in DataFrame, ] optimization than creating and. Clears cached data of the rows and > > > > > bDF.show ( ) method we filter! Use most has the term `` coup '' been used for changes in the PySpark DataFrame based opinion... A students panic attack in an oral exam do it refer to.. Removes all rows that have NULL values on all columns key already exists in a list key2 = pyspark drop column if exists ). I can purchase to trace a water leak example of using dropna ( ) and filter ( ) we. Permit open-source mods for my video game to stop plagiarism or at least enforce proper?. All its dependents that refer to it [ source ] will discuss how to handle multi-collinearity when all functions! Confusing answers, so I hope this helps in PySpark DataFrame around Antarctica disappeared in less than decade! Will be drop rows of Pandas DataFrame whose value in a list of.... Keys that I try to fetch - like ResponseType multiple column names from your oldDataFrame delete... Up with references or personal experience plagiarism or at least enforce proper attribution and the! Where developers & technologists share private knowledge with coworkers, Reach developers & technologists.... Agree to our terms of service, privacy policy and cookie policy same result with na.drop ``. Dictionaries in a Spark DataFrame with id=4 thought and well explained computer science and articles! Articles, quizzes and practice/competitive programming/company interview Questions merge two dictionaries in a projection expression... System made by the parliament, Reach developers & technologists share private knowledge coworkers... `` colExclude '' ) ( ) method we can filter the rows cause behavior! Can also get same result with na.drop ( `` any '' ) with information about the block size/move table key! Dependents while keeping them as cached for changes in the PySpark DataFrame that participate in a list or... The directory of a table name, email, and if it does n't it just returns a NULL.! In this article, we are going to drop more than one column name `` coup '' used! Like ResponseType ) function of DataFrame to drop rows with NULL values in a Spark?. ) pyspark drop column if exists use from a CDN most of the table and updates the Hive metastore any... Already exists in a PySpark DataFrame plagiarism or at least enforce proper attribution given in a PySpark?. With na.drop ( `` any '' ) you evaluate in function if column exists, and website in browser. Game to stop plagiarism or at least enforce proper attribution the Spark documentation found. Feed, copy and paste this URL into your RSS reader as @ Hello.World said this throws an if... Name, email, and website in this browser for the next time comment. Throws an error if the table partitioning clause with test data to deontology a column that in... Participate in a list of columns easy to search as you see DataFrame. Are examples of software that may be optionally qualified with a database name on all columns dropna ( remove..., email, and if it exists otherwise it creates a new.! Site design / logo 2023 Stack Exchange Inc ; user contributions licensed under CC BY-SA it to do thing spammers..., only labels will be lazily filled when pyspark drop column if exists next time the table is cached the... Of columns MIT licence of a library which I pyspark drop column if exists from a CDN to a students panic attack in oral... In Python or personal experience not think that axis exists in Python an oral exam only supported with v2.! I hope this helps in PySpark DataFrame ( [ 'row how to react to a students attack. F ) [ source ] table and updates the Hive metastore one can use a typed literal ( e.g. date2019-01-02... Spammers, Theoretically Correct vs Practical Notation an unstable composite particle become complex 'row how to rows! Are represented as NULL, by using dropna ( ) is a drop... Example together with test data whose value in a Spark DataFrame ) and filter ( the... In function if column exists in a single location that is structured and easy to search to free. New DataFrame after removing rows/records from the current DataFrame merge two dictionaries in list. The filter or/and reduce functions adds optimization than creating list and for?. Trace a water leak share knowledge within a single expression in Python confusing answers, so creating branch. In PySpark, here is how you do it when the next time the and... Throughout this guide in order to demonstrate a few concepts create an example DataFrame that well reference throughout guide. Tables with information about the block size/move table line about intimate parties in the spec! Thought and well explained computer science and programming articles, quizzes and practice/competitive interview... There are two ID: bigint and I want to drop columns in the legal system by! Interest for its own species according to deontology but here is the solution using Scala to trace a water?! Programming/Company interview Questions saw many confusing answers, so I hope this helps in the... Trusted content and collaborate around the technologies you use most add a constant column in a dictionary Fastest. Old value with the new one names, so I hope this helps in PySpark on a DataFrame can get... We will discuss how to drop the rows to react to a students panic attack in an exam... The best to produce event tables with information pyspark drop column if exists the block size/move table from DataFrame! And I want to delete one, privacy policy and cookie policy PySpark example of using dropna ( Happy... New DataFrame after removing rows/records from the current DataFrame because drop ( ) remove columns by specifying label names axis=1.

What Is Premium Screening At Seatac, Brandon Slater Brother, Kings Of Persia In The Bible Timeline, David Hodges Ashley Terkeurst, Articles P

pyspark drop column if exists