Bokeh 简明教程
Bokeh - Filtering Data
通常,您可能希望获取与满足特定条件的数据部分相关的绘图,而不是整个数据集。bokeh.models 模块中定义的 CDSView 类的对象通过在其上应用一个或多个筛选器来返回正在考虑的 ColumnDatasource 的子集。
Often, you may want to obtain a plot pertaining to a part of data that satisfies certain conditions instead of the entire dataset. Object of the CDSView class defined in bokeh.models module returns a subset of ColumnDatasource under consideration by applying one or more filters over it.
IndexFilter 是最简单的筛选器类型。您必须仅指定要使用的绘图图形时的那些行从数据集中的索引。
IndexFilter is the simplest type of filter. You have to specify indices of only those rows from the dataset that you want to use while plotting the figure.
以下示例演示了使用 IndexFilter 设置 CDSView 的用法。结果图形显示 ColumnDataSource 的 x 和 y 数据序列之间的直线符号。通过在其上应用索引筛选器获得视图对象。该视图用于将圆形符号绘图作为 IndexFilter 的结果。
Following example demonstrates use of IndexFilter to set up a CDSView. The resultant figure shows a line glyph between x and y data series of the ColumnDataSource. A view object is obtained by applying index filter over it. The view is used to plot circle glyph as a result of IndexFilter.
Example
from bokeh.models import ColumnDataSource, CDSView, IndexFilter
from bokeh.plotting import figure, output_file, show
source = ColumnDataSource(data = dict(x = list(range(1,11)), y = list(range(2,22,2))))
view = CDSView(source=source, filters = [IndexFilter([0, 2, 4,6])])
fig = figure(title = 'Line Plot example', x_axis_label = 'x', y_axis_label = 'y')
fig.circle(x = "x", y = "y", size = 10, source = source, view = view, legend = 'filtered')
fig.line(source.data['x'],source.data['y'], legend = 'unfiltered')
show(fig)
Output
data:image/s3,"s3://crabby-images/d1083/d1083082776437f3bfdc8b689d0566281878c9c3" alt="indexfilter"
若要仅选择满足特定布尔条件的数据源中的那些行,请应用 BooleanFilter。
To choose only those rows from the data source, that satisfy a certain Boolean condition, apply a BooleanFilter.
典型的 Bokeh 安装包含 sampledata 目录中的许多样本数据集。对于以下示例,我们使用 unemployment1948.csv 形式提供的 unemployment1948 数据集。它存储 1948 年以来美国按年计算的失业百分比。我们想仅为 1980 年及以后的年份生成绘图。为此,通过应用 BooleanFilter 获得 CDSView 对象在给定的数据源上。
A typical Bokeh installation consists of a number of sample data sets in sampledata directory. For following example, we use unemployment1948 dataset provided in the form of unemployment1948.csv. It stores year wise percentage of unemployment in USA since 1948. We want to generate a plot only for year 1980 onwards. For that purpose, a CDSView object is obtained by applying BooleanFilter over the given data source.
from bokeh.models import ColumnDataSource, CDSView, BooleanFilter
from bokeh.plotting import figure, show
from bokeh.sampledata.unemployment1948 import data
source = ColumnDataSource(data)
booleans = [True if int(year) >= 1980 else False for year in
source.data['Year']]
print (booleans)
view1 = CDSView(source = source, filters=[BooleanFilter(booleans)])
p = figure(title = "Unemployment data", x_range = (1980,2020), x_axis_label = 'Year', y_axis_label='Percentage')
p.line(x = 'Year', y = 'Annual', source = source, view = view1, color = 'red', line_width = 2)
show(p)
Output
data:image/s3,"s3://crabby-images/a07e0/a07e03fa1d31a9ea668c712972e953c2da007edf" alt="booleanfilter"
为了灵活地应用筛选器,Bokeh 提供了一个 CustomJSFilter 类,借助该类,可以使用用户定义的 JavaScript 函数筛选数据源。
To add more flexibility in applying filter, Bokeh provides a CustomJSFilter class with the help of which the data source can be filtered with a user defined JavaScript function.
以下给出的示例使用相同的美国失业数据。定义 CustomJSFilter 以绘出 1980 及以后的失业数据。
The example given below uses the same USA unemployment data. Defining a CustomJSFilter to plot unemployment figures of year 1980 and after.
from bokeh.models import ColumnDataSource, CDSView, CustomJSFilter
from bokeh.plotting import figure, show
from bokeh.sampledata.unemployment1948 import data
source = ColumnDataSource(data)
custom_filter = CustomJSFilter(code = '''
var indices = [];
for (var i = 0; i < source.get_length(); i++){
if (parseInt(source.data['Year'][i]) > = 1980){
indices.push(true);
} else {
indices.push(false);
}
}
return indices;
''')
view1 = CDSView(source = source, filters = [custom_filter])
p = figure(title = "Unemployment data", x_range = (1980,2020), x_axis_label = 'Year', y_axis_label = 'Percentage')
p.line(x = 'Year', y = 'Annual', source = source, view = view1, color = 'red', line_width = 2)
show(p)