I have a dataset (Excel file /Size1.4GB ) that have 8 sheets each one having 900,000 rows.
I want to convert string to numeric. Moreover, normalization and scaling too.
Hi Ziyad R. Al Ashhab ,
I think at first you should tell us what kind of string is in your data.
For example, there is a column that contains:
http://www.yahoo.com/mail
http://www.yahoo.com/123
http://www.yahoo.com/mail/yahoo
Hi Ziyad R. Al Ashhab,
You could try pandas package for Python. It’s easy-way to use and transform your data. You can convert your string columns to categories, and then to numbers.
Here an example code:
import pandas as pd
df = pd.read_csv("your_excel_finel_in_CSV_format.csv")
df.your_string_col_name1 = pd.Categorical(df.your_string_col_name1).codes
df.your_string_col_name2 = pd.Categorical(df.your_string_col_name2).codes
...
Here a dummy example based on your data:
df = pd.DataFrame({"col1":[1, 2, 3, 4, 5], "col2":[22.5, -3.45, 5.12, 48.00, 33.33], "col3": ["http://www.yahoo.com/123", "http://www.yahoo.com/mail", "http://www.yahoo.com/mail/yahoo", "http://www.yahoo.com/123", "http://www.yahoo.com/mail"]})
print(df)
df.col3 = pd.Categorical(df.col3).codes print(df)
I hope I have been able to help you with this task.
@ francisco
Many thanks for your replied 👍⚘
Francisco M. Garcia-Moreno
Many thanks...
What about normalization/scaling Please
Air moisture harvesting Air water collection devices
06 August 2024 5,473 2 View
Hi everyone I need a file with a dirty and clean potato image
04 August 2024 7,199 4 View
Molecular docking software/ websites?
02 August 2024 8,704 7 View
Can we patent a process flow diagram developed using a process simulator but no actual cases is carried out? For example consider a process for certain product manufacture where a new process flow...
31 July 2024 781 1 View
I am working on algal extract to which gas chromatography (Not GC-MS) spectrum I want to discover. My question is can we identify specific compounds using retention time if I compared the RT with...
29 July 2024 8,034 4 View
I want to write a topic for my PhD thesis in hospitality (hotels), can u please suggest some variables
29 July 2024 9,058 3 View
Time-Frequency Domain
19 July 2024 8,031 2 View
Dear Colleagues, I hope this message finds you well. My name is Noor Al-Huda K. Hussein,and I am a researcher specializing in deep learning applications in genetic data analysis. I am currently...
18 July 2024 5,562 0 View
Dear Colleagues, I hope this message finds you well. My name is Noor Al-Huda K. Hussein, and I am a researcher specializing in deep learning applications in genetic data analysis. I am currently...
16 July 2024 3,981 6 View
I am currently testing the effect of bacterial filtrates on cancer cells , after seeding the cells I tested the bacterial filtrates against them , and got images of all 96 wells using inverted...
10 July 2024 7,145 2 View
Approximate concentrations are require in compared with the WHO permissible limts
11 August 2024 2,723 1 View
I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...
10 August 2024 4,005 2 View
How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?
09 August 2024 3,291 2 View
One can try to generalize the Vandermonde determinant in the following direction: Let $A$ be any symmetric $n$-order square matrix. Consider its powers' diagonal elements $(A^k)_{ii}$ and...
08 August 2024 6,690 1 View
The paper in question is "Interpolation of Nitrogen Fertilizer Use in Canada from Fertilizer Use Surveys". This paper was very recently published by Agronomy (MDPI). Agronomy has, in the last day...
07 August 2024 9,934 3 View
Program: g_mmpbsa, version 2024.1 Source file: extrn_apbs.cxx (line 152) Fatal error: Failed to execute command: $APBS pybYcUWA.in --output-file=pybYcUW.out
07 August 2024 6,066 0 View
The first pdf file I uploaded had an error. So I uploaded an updated, corrected pdf of that paper with a different pdf name. I dpon't want the old copy to be download or read.
07 August 2024 9,508 1 View
Dear QE-users, In the method where full MS positive mode and PRM mode are used, we always get an incorrect auxiliary gas reading (41 instead of 25). This only happens in this method; other...
06 August 2024 4,953 0 View
I have protein-membrane simulations (PDB, PSF, DCD) and have noticed that water molecules near the protein are not visible in the simulations. How can I fix this issue? Is there a way to place the...
04 August 2024 1,200 2 View
Dear Researchers Kindly share JCPDS 65-7246 file Thanks in advance
04 August 2024 5,613 1 View