大熊猫卸妆后

数据科学 (Data Science)

Pandas is used mainly for reading, cleaning, and extracting insights from data. We will see an advanced use of Pandas which are very important to a Data Scientist. These operations are used to analyze data and manipulate it if required. These are used in the steps performed before building any machine learning model.

熊猫主要用于读取,清洁和从数据中提取见解。 我们将看到对数据科学家非常重要的熊猫的高级用法。 这些操作用于分析数据并根据需要进行操作。 这些用于构建任何机器学习模型之前执行的步骤。

  1. Summarising Data

    汇总数据
  2. Concatenation

    级联
  3. Merge and Join

    合并与加入
  4. Grouping

    分组
  5. Pivot Table

    数据透视表
  6. Reshaping multi-index DataFrame

    重塑多索引DataFrame

We will be using the very famous Titanic dataset to explore the functionalities of Pandas. Let’s just quickly import NumPy, Pandas, and load Titanic Dataset from Seaborn.

我们将使用非常著名的泰坦尼克号数据集来探索熊猫的功能。 让我们快速导入NumPy,Pandas,并从Seaborn加载Titanic Dataset。

import numpy as np
import pandas as pd
import seaborn as sns
df = sns.load_dataset('titanic')
df.head()
Image for post

汇总数据 (Summarizing data)

The very first thing any data scientist would like to know is the statistics of the entire data. With the help of the Pandas .describe() method, we can see the summary stats of each feature. Notice, the stats are given only for numerical columns which is an obvious behavior we can also ask describe function to include categorical columns with the parameter ‘include’ and value equal to ‘all’ ( include=‘all’).

任何数据科学家都想知道的第一件事就是整个数据的统计信息。 借助Pandas .describe()方法 ,我们可以看到每个功能的摘要状态。 注意,仅针对数字列提供统计信息,这是显而易见的行为,我们也可以要求describe函数包括参数为'include'并且值等于'all'(include ='all')的分类列。

df.describe()
Image for post

Another method is .info(). It gives metadata of a dataset. We can see the size of the dataset, dtype, and count of null values in each column.

另一个方法是.info() 。 它给出了数据集的元数据。 我们可以在每个列中看到数据集的大小,dtype和空值的计数。

df.info()
Image for post

级联 (Concatenation)

Concatenation of two DataFrames is very straightforward, thanks to the Pandas method concat(). Let us take a small section of our Titanic data with the help of vector indexing. Vector indexing is a way to specify the row and column name/integer we would like to index in any order as a list.

借助于Pandas 方法concat() ,两个DataFrame的串联非常简单。 让我们在矢量索引的帮助下,提取一小部分Titanic数据。 扇区索引是一种指定行和列名称/整数的方法,我们希望以列表的任何顺序对其进行索引。

smallData = df.loc[[1,7,21,10], ['sex','age','fare','who','class']]
smallData
Image for post

Also, I have created a dataset with matching columns to explain concatenation.

另外,我还创建了带有匹配列的数据集来说明串联。

newData = pd.DataFrame({’sex’:[’female’,’male’,’male’],’age’: [25,49,35], ’fare’:[89.22,70.653,30.666],’who’:[’child’, 'women’, 'man’],'class’:[’First’,’First’,’First’]})
Image for post

By default, the concatenation happens row-wise. Let’s see how the new dataset looks when we concat the two DataFrames.

默认情况下,串联是逐行进行的 。 让我们看看当连接两个DataFrame时新数据集的外观。

pd.concat([smallData, newData])
Image for post

What if we want to concatenate ignoring the index? just set the ingore_index parameter to True.

如果我们想串联忽略索引怎么办? 只需将ingore_index参数设置为True。

pd.concat([ newData,smallData], ignore_index=True)
Image for post

If we wish to concatenate along with the columns we just have to change the axis parameter to 1.

如果我们希望与列连接在一起我们只需要将axis参数更改为1。

pd.concat([ newData,smallData], axis=1)
Image for post
Image for post
Left table-smallData, Right table-newData
左表-smallData,右表-newData

Notice the changes? As soon we concatenated column-wise Pandas arranged the data in an order of row indices. In smallData, row 0 and 2 are missing but present in newData hence insert them in sequential order. But we have row 1 in both the data and Pandas retained the data of the 1st dataset because that was the 1st dataset we passed as a parameter to concat. Also, the missing data is represented as NaN.

注意到变化了吗? 一旦我们串联列式熊猫,便以行索引的顺序排列了数据。 在smallData中,缺少第0行和第2行,但存在于newData中,因此按顺序插入它们。 但是我们在数据中都有第1行,Pandas保留了第一个数据集的数据,因为那是我们作为参数传递给concat的第一个数据集。 同样,缺失的数据表示为NaN。

We can also perform concatenation in SQL join fashion. Let’s create a new DataFrame ‘newData’ having a few columns the same as smallData but not all.

我们还可以以SQL连接的方式执行串联。 让我们创建一个新的DataFrame'newData',其中有几列与smallData相同,但并非全部。

newData = pd.DataFrame({'fare':[89.22,70.653,30.666,100],
'who':['child', 'women', 'man', 'women'],
'class':['First','First','First','Second'],
'adult_male': [True, False, True, False]})
newData
Image for post

If you are familiar with SQL join operation we can notice that .concat() performs outer join by default. Missing values for unmatched columns are filled with NaN.

如果您熟悉SQL连接操作,我们会注意到.concat()默认执行外部连接 。 未匹配列的缺失值用NaN填充。

pd.concat([smallData, newData])
Image for post

We can control the type of join operation with ‘join’ parameter. Let’s perform an inner join that takes only common columns from two.

我们可以使用'join'参数控制联接操作的类型。 让我们执行一个内部联接,该联接只接受两个公共列。

pd.concat([smallData, newData], join='inner')
Image for post

Merge and Join

合并与加入

Pandas provide us an exclusive and more efficient method .merge() to perform in-memory join operations. Merge method is a subset of relational algebra that comes under SQL.

熊猫为我们提供了一种排他的,更有效的方法.merge()来执行内存中的联接操作。 合并方法是SQL下的关系代数的子集。

I will be moving away from our Titanic dataset only for this section to ease the understanding of join operation with less complex data.

在本节中,我将不再使用Titanic数据集,以简化对不太复杂数据的联接操作的理解。

There are different types of join operations:

有不同类型的联接操作:

  1. One-to-one

    一对一
  2. Many-to-one

    多对一
  3. Many-to-many

    多对多

The classic data used to explain joins in SQL in the employee dataset. Lets create DataFrames.

用于解释员工数据集中SQL中联接的经典数据。 让我们创建DataFrames。

df1=pd.DataFrame({'employee_name'['Tasha','Linda','Olliver','Jack'],'department':['Engineering', 'Accounting', 'HR', 'HR']})df2 = pd.DataFrame({'employee_name':['Linda', 'Tasha', 'Jack', 'Olliver'],'salary':[35000, 20500, 90000, 68000]})
Image for post
Left table-df1, Right table-df2
左表df1,右表df2

One-to-one

一对一

One-to-one merge is very similar to column-wise concatenation. To combine ‘df1’ and ‘df2’ we use .merge() method. Merge is capable of recognizing common columns in the datasets and uses it as a key, in our case column ‘employee_name’. Also, the names are not in order. Let’s see how the merge does the work for us by ignoring the indices.

一对一合并与按列合并非常相似。 要组合'df1'和'df2',我们使用.merge()方法。 合并能够识别数据集中的公共列 ,并将其用作 (在我们的示例中为“ employee_name”列)。 另外,名称不正确。 让我们看看合并如何通过忽略索引为我们工作。

df3 = pd.merge(df1,df2)
df3
Image for post

Many-to-one

多对一

Many-to-one is a type of join in which one of the two key columns have duplicate values. Suppose we have supervisors for each department and there are many employees in each department hence, Many employees to one supervisor.

多对一是一种连接类型,其中两个键列之一具有重复值 。 假设我们每个部门都有主管,每个部门有很多员工,因此, 一个主管有很多员工。

df4 = pd.DataFrame({'department':['Engineering', 'Accounting', 'HR'],'supervisor': ['Jonas', 'Martha', 'Martin']})print('--------df4---------\n',df4)print('-------merged--------')pd.merge(df3, df4)
Image for post

Many-to-many

多对多

This is the case where the key column in both the dataset has duplicate values. Suppose many skills are mapped to each department then the resulting DataFrame will have duplicate entries.

这是两个数据集中的键列都有重复值的情况 。 假设每个部门都有许多技能,那么生成的DataFrame将具有重复的条目。

df5=pd.DataFrame({'department'['Engineering','Engineering','Accounting','Accounting', 'HR', 'HR'],'skills': ['Coding', 'Soft skills', 'Math', 'Excel', 'Organizing', 'Decision making']})print('--------df5---------\n',df5)print('\n-------merged--------')pd.merge(df3, df5)
Image for post

合并不常见的列名称和值 (Merge on uncommon column names and values)

Uncommon column names

罕见的列名

Many times merging is not that simple since the data we receive will not be so clean. We saw how the merge does all the work provided we have one common column. What if we have no common columns at all? or there is more than one common column. Pandas provide us the flexibility to explicitly specify the columns to act as the key in both DataFrames.

很多时候合并不是那么简单,因为我们收到的数据不会那么干净。 我们看到合并是如何完成所有工作的,只要我们有一个共同的专栏即可。 如果我们根本没有公共列怎么办? 或有多个共同的栏目 。 熊猫为我们提供了灵活地显式指定列以充当两个DataFrame中的键的灵活性。

Suppose we change our ‘employee_name’ column to ‘name’ in ‘df2’. Let’s see how datasets look and how to tell merge explicitly the key columns.

假设我们在“ df2”中将“ employee_name”列更改为“ name”。 让我们看看数据集的外观以及如何明确地合并关键列。

df2 = pd.DataFrame({'name':['Linda', 'Tasha', 'Jack', 'Olliver'],
'salary':[35000, 20500, 90000, 68000]})
print('--------df1---------\n',df1)print('--------df2---------\n',df2)print('\n-------merged--------')pd.merge(df1, df2, left_on='employee_name', right_on='name')
Image for post

Parameter ‘left_on’ to specify the key of the first column and ‘right_on’ for the key of the second. Remember, the value of ‘left_on’ should match with the columns of the first DataFrame you passed and ‘right_on’ with second. Notice, we get redundant column ‘name’, we can drop it if not needed.

参数'left_on'指定第一列的键,'right_on'指定第二列的键。 请记住,“ left_on”的值应与您传递的第一个DataFrame的列匹配,第二个则与“ right_on”的列匹配。 注意,我们得到了多余的列“名称”,如果不需要的话可以将其删除。

Uncommon values

罕见的价值观

Previously we saw that all the employee names present in one dataset were also present in other. What if the names are missing.

以前,我们看到一个数据集中存在的所有雇员姓名也存在于另一个数据集中。 如果缺少名称怎么办。

df1=pd.DataFrame({'employee_name'['Tasha','Linda','Olliver','Jack'],  'department':['Engineering', 'Accounting', 'HR', 'HR']})
df2 = pd.DataFrame({'employee_name':['Linda', 'Mary'],'salary':[35000, 20500]})
print('--------df1---------\n',df1)print('--------df2---------\n',df2)print('\n-------merged--------\n')pd.merge(df1, df2)
Image for post

By default merge applies inner join, meaning join in performed only on common values which is always not preferred way since there will be data loss. The method of joining can be controlled by using the parameter ‘how’. We can perform left join or right join to overcome data loss. The missing values will be represented as NaN by Pandas.

默认情况下,合并将应用内部联接 ,这意味着联接仅在通用值上执行,由于存在数据丢失,因此始终不建议采用这种方式。 可以通过使用参数“如何”来控制加入方法。 我们可以执行左联接或右联接以克服数据丢失。 缺失值将由Pandas表示为NaN。

print('-------left join--------\n',pd.merge(df1, df2, how='left'))
print('\n-------right join--------\n',pd.merge(df1,df2,how='right'))
Image for post

通过...分组 (GroupBy)

GroupBy is a very flexible abstraction, we can think of it as a collection of DataFrame. It allows us to do many different powerful operations. In simple words, it groups the entire data set by the values of the column we specify and allows us to perform operations to extract insights.

GroupBy是一个非常灵活的抽象,我们可以将其视为DataFrame的集合。 它允许我们执行许多不同的强大操作。 简而言之,它通过我们指定的列的值对整个数据集进行分组,并允许我们执行操作以提取见解。

Let’s come back to our Titanic dataset

让我们回到泰坦尼克号数据集

Suppose we would like to see how many male and female passengers survived.

假设我们想看看有多少男女乘客幸存下来。

print(df.groupby('sex'))
df.groupby('sex').sum()
Image for post

Notice, printing only the groupby without performing any operation gives GroupBy object. Since there are only two unique values in the column ‘sex’ we can see a summation of every other column grouped by male and female. More insightful would be to get the percentage. We will capture only the ‘survived’ column of groupby result above upon summation and calculate percentages.

注意,仅打印groupby而不执行任何操作将给出GroupBy对象。 由于“性别”列中只有两个唯一值,因此我们可以看到按性别分组的所有其他列的总和。 更有见地的将是获得百分比。 求和后,我们将仅捕获以上groupby结果的“幸存”列,并计算百分比。

data = df.groupby('sex')['survived'].sum()print('% of male survivers',(data['male']/(data['male']+data['female']))*100)print('% of male female',(data['female']/(data['male']+data['female']))*100)Output% of male survivers 31.87134502923976
% of male female 68.12865497076024

Under the hood, the GroupBy function performs three operations: split-apply-combine.

在幕后,GroupBy函数执行三个操作: split-apply-combine。

  1. Split - breaking the DataFrame in order to group it into the specified key.

    拆分-拆分DataFrame以便将其分组为指定的键。
  2. Apply - it involves computing the function we wish like aggregation or transformation or filter.

    应用-它涉及计算我们希望的功能,例如聚合,转换或过滤。
  3. Combine - merging the output into a single DataFrame.

    合并-将输出合并到单个DataFrame中。
Image for post
Courtesy-Python Data Science Handbook by Jake VanderPlas
由Jake VanderPlas提供的Python数据科学手册

Perhaps, more powerful operations that can be performed on groupby are:

也许可以对groupby执行的更强大的操作是:

  1. Aggregate

    骨料
  2. Filter

    过滤
  3. Transform

    转变
  4. Apply

    应用

Let’s see each one with an example.

让我们来看一个例子。

Aggregate

骨料

The aggregate function allows us to perform more than one aggregation at a time. We need to pass the list of required aggregates as a parameter to .aggregate() function.

聚合功能使我们可以一次执行多个聚合 。 我们需要将所需聚合的列表作为参数传递给.aggregate()函数。

df.groupby('sex')['survived'].aggregate(['sum', np.mean,'median'])
Image for post

Filter

过滤

The filter function allows us to drop data based on group property. Suppose we want to see data where the standard deviation of ‘fare’ is greater than the threshold value say 50 when grouped by ‘survived’.

过滤器功能使我们可以基于组属性删除数据 。 假设我们想查看“票价”的标准偏差大于按“生存”分组的阈值(例如50)的数据。

df.groupby('survived').filter(lambda x: x['fare'].std() > 50)
Image for post

Since the standard deviation of ‘fare’ is greater than 50 only for values of ‘survived’ equal to 1, we can see data only where ‘survived’ is 1.

由于仅当“ survived”等于1时,“ fare”的标准差才大于50,因此我们只能在“ survived”为1的情况下才能看到数据。

Transform

转变

Transform returns the transformed version of the entire data. The best example to explain is to center the dataset. Centering the data is nothing but subtracting each value of the column with the mean value of its respective column.

转换返回整个数据的转换版本 。 最好的解释示例是将数据集居中。 使数据居中无非是用该列的各个值的平均值减去该列的每个值。

df.groupby('survived').transform(lambda x: x - x.mean())
Image for post

Apply

应用

Apply is very flexible unlike filter and transform, the only criteria are it takes a DataFrame and returns Pandas object or scalar. We have the flexibility to do anything we wish in the function.

Apply与过滤器和变换不同,它非常灵活,唯一的条件是它需要一个DataFrame并返回Pandas对象或标量。 我们可以灵活地执行该函数中希望执行的任何操作。

def func(x):
x['fare'] = x['fare'] / x['fare'].sum()
return x
df.groupby('survived').apply(func)
Image for post

数据透视表 (Pivot tables)

Previously in GroupBy, we saw how ‘sex’ affected survival, the survival rate of females is much larger than males. Suppose we would also like to see how ‘pclass’ affected the survival but both ‘sex’ and ‘pclass’ side by side. Using GroupBy we would do something like this.

之前在GroupBy中,我们看到了“性别”如何影响生存,女性的生存率远高于男性。 假设我们还想了解“ pclass”如何影响生存,但“ sex”和“ pclass”并存。 使用GroupBy,我们将执行以下操作。

df.groupby(['sex', 'pclass']['survived'].aggregate('mean').unstack()
Image for post

This is more insightful, we can easily make out passengers in the third class section of the Titanic are less likely to be survived.

这是更有洞察力的,我们可以很容易地看出,泰坦尼克号三等舱的乘客幸存下来的可能性较小。

This type of operation is very common in the analysis. Hence, Pandas provides the function .pivot_table() which performs the same with more flexibility and less complexity.

这种类型的操作在分析中非常常见。 因此,Pandas提供了.pivot_table()函数,该函数以更大的灵活性和更低的复杂度执行相同的功能。

df.pivot_table('survived', index='sex', columns='pclass')
Image for post

The result of the pivot table function is a DataFrame, unlike groupby which returned a groupby object. We can perform all the DataFrame operations normally on it.

数据透视表功能的结果是一个DataFrame,与groupby不同,后者返回了groupby对象。 我们可以正常执行所有DataFrame操作。

We can also add a third dimension to our result. Suppose we want to see how ‘age’ has also affected the survival rate along with ‘sex’ and ‘pclass’. Let’s divide our ‘age’ into groups within it: 0–18 child/teenager, 18–40 adult, and 41–80 old.

我们还可以在结果中添加第三维 。 假设我们想了解“年龄”如何同时影响“性别”和“ pclass”的生存率。 让我们将“年龄”分为几类:0-18岁的儿童/青少年,18-40岁的成人和41-80岁的孩子。

age = pd.cut(df['age'], [0, 18, 40, 80])
pivotTable = df.pivot_table('survived', ['sex', age], 'class')
pivotTable
Image for post

Interestingly female children and teenagers in the second class have a 100% survival rate. This is the kind of power the pivot table of Pandas has.

有趣的是,二等班的女孩和青少年的存活率是100%。 这就是Pandas数据透视表具有的功能。

重塑多索引DataFrame (Reshaping Multi-index DataFrame)

To see a multi-index DataFrame from a different view we reshape it. Stack and Unstack are the two methods to accomplish this.

为了从不同的视图查看多索引DataFrame,我们对其进行了重塑。 Stack和Unstack是完成此操作的两种方法。

unstack( )

取消堆叠()

It is the process of converting the row index to the column index. The pivot table we created previously is multi-indexed row-wise. We can get the innermost row index(age groups) into the innermost column index.

这是将行索引转换为列索引的过程。 我们之前创建的数据透视表是多索引的。 我们可以将最里面的行索引(年龄组)转换为最里面的列索引。

pivotTable = pivotTable.unstack()
pivotTable
Image for post

We can also convert the outermost row index(sex) into the innermost column index by using parameter ‘level’.

我们还可以通过使用参数“级别”将最外面的行索引(性别)转换为最里面的列索引。

pivotTable = pivotTable.unstack(level=0)
piviotTable
Image for post

stack( )

堆栈()

Stacking is exactly inverse of unstacking. We can convert the column index of multi-index DataFrame into a row index. The innermost column index ‘sex’ is converted to the innermost row index. The result is slightly different from the original DataFrame because we unstacked with level 0 previously.

堆叠与堆叠完全相反。 我们可以将多索引DataFrame的列索引转换为行索引。 最里面的列索引“ sex”被转换​​为最里面的行索引。 结果与原始DataFrame略有不同,因为我们之前使用0级进行了堆叠。

pivotTable.stack()
Image for post

These functions and methods are very helpful to understand data, further used to manipulation, or to build a predictive model. We can also plot graphs to get visual insights.

这些功能和方法对于理解数据,进一步用于操纵或建立预测模型非常有帮助。 我们还可以绘制图形以获得视觉见解。

翻译自: https://medium.com/towards-artificial-intelligence/6-pandas-operations-you-should-not-miss-d531736c6574

大熊猫卸妆后

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。
如若转载,请注明出处:http://www.pswp.cn/news/389431.shtml
繁体地址,请注明出处:http://hk.pswp.cn/news/389431.shtml
英文地址,请注明出处:http://en.pswp.cn/news/389431.shtml

如若内容造成侵权/违法违规/事实不符,请联系英文站点网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

数据eda_关于分类和有序数据的EDA

数据eda数据科学和机器学习统计 (STATISTICS FOR DATA SCIENCE AND MACHINE LEARNING) Categorical variables are the ones where the possible values are provided as a set of options, it can be pre-defined or open. An example can be the gender of a person. In the …

PyTorch官方教程中文版:PYTORCH之60MIN入门教程代码学习

Pytorch入门 import torch""" 构建非初始化的矩阵 """x torch.empty(5,3) #print(x)""" 构建随机初始化矩阵 """x torch.rand(5,3)""" 构造一个矩阵全为 0,而且数据类型是 long &qu…

Flexbox 最简单的表单

弹性布局(Flexbox)逐渐流行&#xff0c;越来越多的人开始使用&#xff0c;因为它写Css布局真是太简单了一一、<form>元素表单使用<form>元素<form></form>复制代码上面是一个空的表单&#xff0c;根据HTML标准&#xff0c;它是一个块级元素&#xff0c…

CSS中的盒子模型

一.为什么使用CSS 1.有效的传递页面信息 2.使用CSS美化过的页面文本&#xff0c;使页面漂亮、美观&#xff0c;吸引用户 3.可以很好的突出页面的主题内容&#xff0c;使用户第一眼可以看到页面主要内容 4.具有良好的用户体验 二.字体样式属性 1.font-family:英…

jdk重启后步行_向后介绍步行以一种新颖的方式来预测未来

jdk重启后步行“永远不要做出预测&#xff0c;尤其是关于未来的预测。” (KK Steincke) (“Never Make Predictions, Especially About the Future.” (K. K. Steincke)) Does this picture portray a horse or a car? 这张照片描绘的是马还是汽车&#xff1f; How likely is …

PyTorch官方教程中文版:入门强化教程代码学习

PyTorch之数据加载和处理 from __future__ import print_function, division import os import torch import pandas as pd #用于更容易地进行csv解析 from skimage import io, transform #用于图像的IO和变换 import numpy as np import matplotlib.pyplot a…

css3-2 CSS3选择器和文本字体样式

css3-2 CSS3选择器和文本字体样式 一、总结 一句话总结&#xff1a;是要记下来的&#xff0c;记下来可以省很多事。 1、css的基本选择器中的:first-letter和:first-line是什么意思&#xff1f; :first-letter选择第一个单词&#xff0c;:first-line选择第一行 2、css的伪类选…

mongodb仲裁者_真理的仲裁者

mongodb仲裁者Coming out of college with a background in mathematics, I fell upward into the rapidly growing field of data analytics. It wasn’t until years later that I realized the incredible power that comes with the position. As Uncle Ben told Peter Par…

优化 回归_使用回归优化产品价格

优化 回归应用数据科学 (Applied data science) Price and quantity are two fundamental measures that determine the bottom line of every business, and setting the right price is one of the most important decisions a company can make. Under-pricing hurts the co…

Node.js——异步上传文件

前台代码 submit() {var file this.$refs.fileUpload.files[0];var formData new FormData();formData.append("file", file);formData.append("username", this.username);formData.append("password", this.password);axios.post("http…

用 JavaScript 的方式理解递归

原文地址 1. 递归是啥? 递归概念很简单&#xff0c;“自己调用自己”&#xff08;下面以函数为例&#xff09;。 在分析递归之前&#xff0c;需要了解下 JavaScript 中“压栈”&#xff08;call stack&#xff09; 概念。 2. 压栈与出栈 栈是什么&#xff1f;可以理解是在内存…

PyTorch官方教程中文版:Pytorch之图像篇

微调基于 torchvision 0.3的目标检测模型 """ 为数据集编写类 """ import os import numpy as np import torch from PIL import Imageclass PennFudanDataset(object):def __init__(self, root, transforms):self.root rootself.transforms …

大数据数据科学家常用面试题_进行数据科学工作面试

大数据数据科学家常用面试题During my time as a Data Scientist, I had the chance to interview my fair share of candidates for data-related roles. While doing this, I started noticing a pattern: some kinds of (simple) mistakes were overwhelmingly frequent amo…

scrapy模拟模拟点击_模拟大流行

scrapy模拟模拟点击复杂系统 (Complex Systems) In our daily life, we encounter many complex systems where individuals are interacting with each other such as the stock market or rush hour traffic. Finding appropriate models for these complex systems may give…

公司想申请网易企业电子邮箱,怎么样?

不论公司属于哪个行业&#xff0c;选择企业邮箱&#xff0c;交互界面友好度、稳定性、安全性都是选择邮箱所必须考虑的因素。网易企业邮箱邮箱方面已有21年的运营经验&#xff0c;是国内资历最高的电子邮箱&#xff0c;在各个方面都非常成熟完善。 从交互界面友好度来看&#x…

莫烦Matplotlib可视化第二章基本使用代码学习

基本用法 import matplotlib.pyplot as plt import numpy as np""" 2.1基本用法 """ # x np.linspace(-1,1,50) #[-1,1]50个点 # #y 2*x 1 # # y x**2 # plt.plot(x,y) #注意&#xff1a;x,y顺序不能反 # plt.show()"""…

vue.js python_使用Python和Vue.js自动化报告过程

vue.js pythonIf your organization does not have a data visualization solution like Tableau or PowerBI nor means to host a server to deploy open source solutions like Dash then you are probably stuck doing reports with Excel or exporting your notebooks.如果…

plsql中导入csvs_在命令行中使用sql分析csvs

plsql中导入csvsIf you are familiar with coding in SQL, there is a strong chance you do it in PgAdmin, MySQL, BigQuery, SQL Server, etc. But there are times you just want to use your SQL skills for quick analysis on a small/medium sized dataset.如果您熟悉SQ…

第十八篇 Linux环境下常用软件安装和使用指南

提醒&#xff1a;如果之后要安装virtualenvwrapper的话&#xff0c;可以直接跳到安装virtualenvwrapper的方法&#xff0c;而不需要先安装好virtualenv安装virtualenv和生成虚拟环境安装virtualenv&#xff1a;yum -y install python-virtualenv生成虚拟环境&#xff1a;先切换…

莫烦Matplotlib可视化第三章画图种类代码学习

3.1散点图 import matplotlib.pyplot as plt import numpy as npn 1024 X np.random.normal(0,1,n) Y np.random.normal(0,1,n) T np.arctan2(Y,X) #用于计算颜色plt.scatter(X,Y,s75,cT,alpha0.5)#alpha是透明度 #plt.scatter(np.arange(5),np.arange(5)) #一条线的散点…