Python PyPI

PyPI(Python Package Index)是Python软件仓库。

pip是Python包管理工具,默认使用pypi.python.org作为PyPI的镜像。

pip安装Python软件包经常会出现”链接pypi.python.org失败”。为了优化包的管理,考虑替换pypi.python.org,转而使用国内的PyPI镜像。

PyPI mirror list

Mirror Location
pypi.python.org San Francisco, California US
pypi.douban.com Beijing, Beijing CN
pypi.fcio.net Oberhausen, Nordrhein-Westfalen DE

Linux(Debian) 替换PyPI镜像

touch ~/.pip/pip.conf

1
2
3
[global]
index-url = https://pypi.douban.com/simple
format = columns

参考

Lisp

Lisp语言从诞生的时候就包含9种思想.其中一些我们今天已经习以为常,另一些则刚刚在其他高级语言中出现,至今还有2种是Lisp独有的.按照大众的接受程度,这9种思想依次如下排列:

  1. 条件结构.现在大家都觉得这是理所当然的,但是Fortran I就没有这个结构,它只有底层机器的goto结构.
  2. 函数也是一种数据类型.在Lisp语言中,函数与整数或字符串一样,也属于数据类型的一种.它有自己的字面表示形式(literal representation),能狗存储在变量中,也能当作参数传递.一种数据类型应该有的功能,它都有.
  3. 递归.Lisp是第一个支持递归函数的高级语言.
  4. 变量的动态类型.在Lisp语言中,所有变量实际上都是指针,所指向的值有类型之分,而变量本身没有.复制变量就是相当于复制指针,而不是复制它们指向的数据.
  5. 垃圾回收机制.
  6. 程序由表达式组成.Lisp程序是一些表达树的集合,每个表达式都返回一个值.这与Fortran和大多数后来的语言都截然不同,他们的程序都由表达式和语句组成.区分表达式与语句在Fortran I中是自然的,因为它不支持语句嵌套.所以,如果你需要用数学式子计算一个值,那就只有表达式返回这个值,没有其他语法结构可用,否则就无法处理这个值.后来,新的编程语言支持块结构,这种限制当然就不存在了.但是为时已晚,表达式和语句的区分已经根深蒂固.它从Fortran扩散到它们两者的后继语言.
  7. 符号类型.符号实际上是一种指针,指向存储在散列表中字符串.所以,比较两个符号是否相等,只要看它们的指针是否一样就可以了,不用逐个字符比较.
  8. 代码使用符号和常量组成的树形表示法.
  9. 无论什么时候,整个语言都是可用的.Lisp并不真正区分读取,编译期和运行期.你可以在读取期编译或运行代码,也可以在编译期读取和运行代码,还可以在运行期读取或编译代码.在读取期运行代码,使得用户可以重新调整Lisp的语法,在编译期运行代码,则是Lisp宏的工作基础,在运行期编译代码,使得Lisp可以在Emacs这样的程序中充当扩展语言(extension language),在运行期读取代码,使得程序之间可以用S表达式通信,近来XML格式的出现使得这个概念被重新”发明”出来了.

Lisp语言刚出现的时候,这些思想与其他编程语言大相径庭,后者的设计思想主要由50年代后期的硬件决定.随着时间流逝,流行的编程语言不断更新换代,语言设计思想逐渐向Lisp靠拢.思想(1)到思想(5)已经被广泛接受,思想(6)开始在主流编程语言中出现,思想(7)在Python语言中有所实现,不过似乎没有专用的语法.

思想(8)可能是最有意思的一点.它与思想(9)只是由于偶然的原因成为Lisp语言的一部分,因为它们不属于麦卡锡的原始构想,是由拉塞尔自行添加的.它们从此使得Lisp语言看上去很古怪,但也成为了这种语言最独一无二的特点.说Lisp语法古怪不是因为它的语法很古怪,而是因为它根本就没有语法,程序直接以解析树(parse tree)的形式表达出来.在其他语言中,这种形式只是经过解析在后台产生,但是Lisp直接采用它作为表达式形式.它由列表构成,而列表则是Lisp的基本数据结构.

用一种语言自己的数据结构来表达该语言是非常强大的功能.思想(8)和思想(9),意味着你可以写出一种能够自己编程的程序.

Django数据库路由

django ORM数据模型配置数据库.

django支持多个数据库,通过django ORM定义数据模型,比如class User(Model),无法通过class Meta配置管理该数据模型对应的数据库,只能使用默认数据库default

django ConnectionRouter解决数据模型与数据库映射.

实现DB router

db_router.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
import logging
from django.conf import settings

_logger = logging.getLogger('django')

class DatabaseRouter(object):
"""
Database router to control the models for differrent db.
"""

DEFAULT_DB = 'default'

def _db(self, model, **hints):
db = getattr(model, '_database', None)
if not db:
return self.DEFAULT_DB

if db in settings.DATABASES.keys():
return db
else:
_logger.warn('%s not exist' % db)
return self.DEFAULT_DB

def db_for_read(self, model, **hints):
return self._db(model, **hints)

def db_for_write(self, model, **hints):
return self._db(model, **hints)

配置DB routers

DATABASE_ROUTERS = ['db_router.DatabaseRouter']

为Model制定数据库

1
2
3
4
5
class User(Model)
_database = 'user_db'
class Meta:
db_table = 'user'
...

实现原理

django通过ConnectionRouter管理数据库路由

django/db/utils.py

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
class ConnectionRouter(object):

@cached_property
def routers(self):
if self._routers is None:
self._routers = settings.DATABASE_ROUTERS
routers = []
for r in self._routers:
if isinstance(r, six.string_types):
router = import_string(r)()
else:
router = r
routers.append(router)
return routers

def _router_func(action):
def _route_db(self, model, **hints):
chosen_db = None
for router in self.routers:
try:
method = getattr(router, action)
except AttributeError:
# If the router doesn't have a method, skip to the next one.
pass
else:
chosen_db = method(model, **hints)
if chosen_db:
return chosen_db
instance = hints.get('instance')
if instance is not None and instance._state.db:
return instance._state.db
return DEFAULT_DB_ALIAS
return _route_db

db_for_read = _router_func('db_for_read')
db_for_write = _router_func('db_for_write')

router初始化

router = ConnectionRouter()

router引用

django/db/models/query.py

1
2
3
4
5
6
7
8
class QuerySet(object):

@property
def db(self):
"Return the database that will be used if this query is executed now"
if self._for_write:
return self._db or router.db_for_write(self.model, **self._hints)
return self._db or router.db_for_read(self.model, **self._hints)

Python WSGI

WSGI(Web Server Gateway Interface) Web服务网关接口。

WSGI的目的替代CGI。CGI进程(类似Python解释器)针对每个请求进行创建,完成请求后退出。如果应程序接收树钱个请求,创建大量的语言解释器进程就会很快导致服务器宕机。

其目标在Web服务器与Web框架层之间提供一个通用的API标准,减少之间互操作性,并形成统一的调用方式。

WSGI 应用

根据WSGI的定义,其应用是可调用的对象,其参数固定为两个:

  • 含有服务器环境变量的字典
  • 可调用的对象, 该对象使用HTTP状态码和会返回客户端的HTTP头来初始化响应
1
2
3
4
5
def simple_wsgi_app(environment, start_response):
status = '200 OK'
headers = ['Content-Type': 'text/plain']
start_response(status, headers)
return ['Hello world!']

environment 包含一些环境变量,如HTTP_HOST, HTTP_USER, HTTP_AGENT, SERVER_PROTOCOL等。‘

start_response()是一个可调用的对象,必须在应用执行,生成最终发送回客户端的响应

werkzeug中start_response定义:

1
2
3
4
5
6
7
8
9
10
11
def start_response(status, response_headers, exc_info=None):
if exc_info:
try:
if headers_sent:
reraise(*exc_info)
finally:
exc_info = None
elif headers_set:
raise AssertionError('Headers already set')
headers_set[:] = [status, response_headers]
return write

WSGI服务器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import StringIO

def run_wsgi_app(app, environment):
body = StringIO.StringIO()

def start_response(status, headers):
body.write('Status: %s\r\n' % status)
for header in headers:
body.write('%s: %s\r\n' % header)
return body.write

iterable = app(environment, start_response)
try:
body.write('\r\n%s\r\n' % '\r\n'.join(line for line in iterable))
finally:
if hasattr(iterable, 'close') and callable(iterable.close):
iterable.close()

Flask WSGI 应用和服务实现

flask WSGI app

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
# Flask:wsgi_app

def wsgi_app(self, environ, start_response):
"""The actual WSGI application. This is not implemented in
`__call__` so that middlewares can be applied without losing a
reference to the class. So instead of doing this::

app = MyMiddleware(app)

It's a better idea to do this instead::

app.wsgi_app = MyMiddleware(app.wsgi_app)

Then you still have the original application object around and
can continue to call methods on it.

.. versionchanged:: 0.7
The behavior of the before and after request callbacks was changed
under error conditions and a new callback was added that will
always execute at the end of the request, independent on if an
error occurred or not. See :ref:`callbacks-and-errors`.

:param environ: a WSGI environment
:param start_response: a callable accepting a status code,
a list of headers and an optional
exception context to start the response
"""
ctx = self.request_context(environ)
ctx.push()
error = None
try:
try:
response = self.full_dispatch_request()
except Exception as e:
error = e
response = self.handle_exception(e)
return response(environ, start_response)
finally:
if self.should_ignore_error(error):
error = None
ctx.auto_pop(error)

# Flask:__call__

def __call__(self, environ, start_response):
"""Shortcut for :attr:`wsgi_app`."""
return self.wsgi_app(environ, start_response)

# BaseResponse:__call__

def __call__(self, environ, start_response):
"""Process this response as WSGI application.

:param environ: the WSGI environment.
:param start_response: the response callable provided by the WSGI
server.
:return: an application iterator
"""
app_iter, status, headers = self.get_wsgi_response(environ)
start_response(status, headers)
return app_iter

flask WSGI Server

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
def run_wsgi(self):
if self.headers.get('Expect', '').lower().strip() == '100-continue':
self.wfile.write(b'HTTP/1.1 100 Continue\r\n\r\n')

self.environ = environ = self.make_environ()
headers_set = []
headers_sent = []

def write(data):
assert headers_set, 'write() before start_response'
if not headers_sent:
status, response_headers = headers_sent[:] = headers_set
try:
code, msg = status.split(None, 1)
except ValueError:
code, msg = status, ""
self.send_response(int(code), msg)
header_keys = set()
for key, value in response_headers:
self.send_header(key, value)
key = key.lower()
header_keys.add(key)
if 'content-length' not in header_keys:
self.close_connection = True
self.send_header('Connection', 'close')
if 'server' not in header_keys:
self.send_header('Server', self.version_string())
if 'date' not in header_keys:
self.send_header('Date', self.date_time_string())
self.end_headers()

assert isinstance(data, bytes), 'applications must write bytes'
self.wfile.write(data)
self.wfile.flush()

def start_response(status, response_headers, exc_info=None):
if exc_info:
try:
if headers_sent:
reraise(*exc_info)
finally:
exc_info = None
elif headers_set:
raise AssertionError('Headers already set')
headers_set[:] = [status, response_headers]
return write

def execute(app):
application_iter = app(environ, start_response)
try:
for data in application_iter:
write(data)
if not headers_sent:
write(b'')
finally:
if hasattr(application_iter, 'close'):
application_iter.close()
application_iter = None

try:
execute(self.server.app)
except (socket.error, socket.timeout) as e:
self.connection_dropped(e, environ)
except Exception:
if self.server.passthrough_errors:
raise
from werkzeug.debug.tbtools import get_current_traceback
traceback = get_current_traceback(ignore_system_exceptions=True)
try:
# if we haven't yet sent the headers but they are set
# we roll back to be able to set them again.
if not headers_sent:
del headers_set[:]
execute(InternalServerError())
except Exception:
pass
self.server.log('error', 'Error on request:\n%s',
traceback.plaintext)

References

Flask信号

Flask signals 默认没有自己实现signal,而是使用blink进行信号的定义,连接,分发。

Flask singal

flask未实现自己信号处理,而是使用blink.

1
2
3
4
5
6
7
8
signals_available = False
try:
from blinker import Namespace
signals_available = True
except ImportError:
class Namespace(object):
def signal(self, name, doc=None):
return _FakeSignal(name, doc)

flask 支持的信号

1
2
3
4
5
6
7
8
9
10
11
12
13
14
_signals = Namespace()

# Core signals. For usage examples grep the source code or consult
# the API documentation in docs/api.rst as well as docs/signals.rst
template_rendered = _signals.signal('template-rendered')
before_render_template = _signals.signal('before-render-template')
request_started = _signals.signal('request-started')
request_finished = _signals.signal('request-finished')
request_tearing_down = _signals.signal('request-tearing-down')
got_request_exception = _signals.signal('got-request-exception')
appcontext_tearing_down = _signals.signal('appcontext-tearing-down')
appcontext_pushed = _signals.signal('appcontext-pushed')
appcontext_popped = _signals.signal('appcontext-popped')
message_flashed = _signals.signal('message-flashed')

Blinker signal

blinker支持的特性:

  • a global registry of named signals
  • anonymous signals
  • custom name registries
  • permanently or temporarily connected receivers
  • automically disconnected receivers via weak referencing
  • sending arbirary data payloads
  • collecting return values from signal receivers
  • thread safety

Blinker signal sample

TODO

Blinker signal realization

TODO

注册

弱引用

线程安全

Problems

connecter weak reference

1
2
3
4
5
6
7
8
def register_signal_handlers(app):
import logging
from flask.signals import request_finished
_logger = logging.getLogger('api.debug')

def log_response(sender, response, **options):
print(response)
request_finished.connect(log_response, app)

以上代码,不能按原意正确的运行。因为log_response是局部作用域函数,在函数调用完成后,该作用域会消失,因此不能正确的调用log_response函数。

References

Nginx autoindex乱码

使用nginx_autoindex_moudle检索本地文件时,如果文件名字包含中文,则会出现乱码.

nginx配置文件如下

1
2
3
4
5
6
location /static/ {
alias $static/;
autoindex on;
autoindex_exact_size on;
autoindex_localtime on;
}

检查发现nginx auto_index 生成的html文件header未包含charset

nginx 源码剖析

src/http/modules/ngx_http_autoindex_module.c

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
switch (format) {

case NGX_HTTP_AUTOINDEX_JSON:
ngx_str_set(&r->headers_out.content_type, "application/json");
break;

case NGX_HTTP_AUTOINDEX_JSONP:
ngx_str_set(&r->headers_out.content_type, "application/javascript");
break;

case NGX_HTTP_AUTOINDEX_XML:
ngx_str_set(&r->headers_out.content_type, "text/xml");
ngx_str_set(&r->headers_out.charset, "utf-8");
break;

default: /* NGX_HTTP_AUTOINDEX_HTML */
ngx_str_set(&r->headers_out.content_type, "text/html");
break;
}

确认autoindex_format为html时,确实未增加charset的http头部信息。

修改源代码

1
2
3
4
default: /* NGX_HTTP_AUTOINDEX_HTML */
ngx_str_set(&r->headers_out.content_type, "text/html");
ngx_str_set(&r->headers_out.charset, "utf-8");
break;

重新编译nginx

1
2
$ ./configure --prefix=/opt/nginx --with-http_ssl_module
$ make; make install

重启nginx

强制刷新页面, OK, 问题解决

An official read-only mirror of http://hg.nginx.org/nginx/ which is updated hourly. Pull requests on GitHub cannot be accepted and will be automatically closed. The proper way to submit changes to nginx is via the nginx development mailing list, see http://nginx.org/en/docs/contributing_changes.html http://nginx.org/

nginx 不支持github pull request. Fuck

References

McFarland

McFarland movie lines.

We fly like black birds through the orange groves.

When we run, we own the earth. The land is ours, we speak the birds’s language. Not immigrants no more, Not stupid Mexicans.

When we run, our spirits fly. We speak to the gods.

When we run, we are the gods.

黑客伦理

Hacker Ethic

  1. Access to computers and anything that might teach you something about the way the world works - should be unlimited and total. Always yield to the hands-on imperative.
  2. All information should be free.
  3. Mitrust Authority, Promote Decentralization.
  4. Hackers should be judged by their hacking, not bogus criteria such as degrees, age, race or position.
  5. You can create art and beauty on a computer.
  6. Computers can change your life for the better.

黑客伦理

  1. 使用计算机以及所有有助于了解这个世界本质的事物都不应该受到任何限制。任何事情都应该亲手尝试。
  2. 信息应该全部免费。
  3. 不信任权威,提倡去中心化。
  4. 判断一名黑客的水平应该看他的技术能力,而不是他的学历、年龄或地位等其他标准。
  5. 你可以使用计算机创造美和艺术。
  6. 计算机可以使你的生活更好。

黑客伦理的核心价值观:分享、开放、民主、计算机的自由使用和进步。

为黑客正名,黑客都是高智商,具有探索精神,能力超群,10X的大神级别的程序员。黑客创造了Unix、Linux, 黑客创建了Microsoft 、Google、Facebook,黑客编写了《计算机程序设计艺术》等等。和建筑师一样,黑客create the world。

write the code and change the world.

Django信号

信号实现了一个复杂系统中子系统之间的解耦,一个子系统的状态发生改变时,通过信号同步(或通知)其他依赖于该系统的系统更新状态,实现了状态的一致性。

不同于flask,直接使用blinker,django内部实现了信号处理机制。

实现原理

定义信号

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def __init__(self, providing_args=None, use_caching=False):
"""
Create a new signal.

providing_args
A list of the arguments this signal can pass along in a send() call.
"""
self.receivers = []
if providing_args is None:
providing_args = []
self.providing_args = set(providing_args)
self.lock = threading.Lock()
self.use_caching = use_caching
# For convenience we create empty caches even if they are not used.
# A note about caching: if use_caching is defined, then for each
# distinct sender we cache the receivers that sender has in
# 'sender_receivers_cache'. The cache is cleaned when .connect() or
# .disconnect() is called and populated on send().
self.sender_receivers_cache = weakref.WeakKeyDictionary() if use_caching else {}
self._dead_receivers = False

连接信号

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
def connect(self, receiver, sender=None, weak=True, dispatch_uid=None):
from django.conf import settings

# If DEBUG is on, check that we got a good receiver
if settings.configured and settings.DEBUG:
assert callable(receiver), "Signal receivers must be callable."

# Check for **kwargs
if not func_accepts_kwargs(receiver):
raise ValueError("Signal receivers must accept keyword arguments (**kwargs).")

if dispatch_uid:
lookup_key = (dispatch_uid, _make_id(sender))
else:
lookup_key = (_make_id(receiver), _make_id(sender))

if weak:
ref = weakref.ref
receiver_object = receiver
# Check for bound methods
if hasattr(receiver, '__self__') and hasattr(receiver, '__func__'):
ref = WeakMethod
receiver_object = receiver.__self__
if sys.version_info >= (3, 4):
receiver = ref(receiver)
weakref.finalize(receiver_object, self._remove_receiver)
else:
receiver = ref(receiver, self._remove_receiver)

with self.lock:
self._clear_dead_receivers()
for r_key, _ in self.receivers:
if r_key == lookup_key:
break
else:
self.receivers.append((lookup_key, receiver))
self.sender_receivers_cache.clear()

发送信号

1
2
3
4
5
6
7
8
9
def send(self, sender, **named):
responses = []
if not self.receivers or self.sender_receivers_cache.get(sender) is NO_RECEIVERS:
return responses

for receiver in self._live_receivers(sender):
response = receiver(signal=self, sender=sender, **named)
responses.append((receiver, response))
return responses

断开连接

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def disconnect(self, receiver=None, sender=None, weak=True, dispatch_uid=None):
if dispatch_uid:
lookup_key = (dispatch_uid, _make_id(sender))
else:
lookup_key = (_make_id(receiver), _make_id(sender))

disconnected = False
with self.lock:
self._clear_dead_receivers()
for index in range(len(self.receivers)):
(r_key, _) = self.receivers[index]
if r_key == lookup_key:
disconnected = True
del self.receivers[index]
break
self.sender_receivers_cache.clear()
return disconnected

信号发生时序图

TODO

weakref

为了防止调用被释放了对象,信号内部保持对receiver调用对象的弱引用,每次发送信号之前,检查该对象是否存在,如果不存在,则标记_dead_receivers为True,等待清楚。

thread lock

使用线程锁保证线程安全

观察者设计模式

定义对象间一对多的依赖关系,当一个对象的状体发生改变时,所有依赖于它的对象都得到通知并被自动更新。

将一系统分割成一系列相互协作的类有一个副作用:需要维护对象间的一致性。我们不希望为了维持一致性而使各类聚合,这样就降低了可重用性。

observer模式描述了如何建立这种关系。这一模式的关键对象是目标(subject)和观察者(observer).一个目标可以有任意数量的观察者。一旦目标的状态发生改变,所有的观察者都得到通知。作为对这个通知的响应,每个观察者都将查询目标以使其状态与目标的状态同步。

这种交互也成为发布-订阅(publish-subscribe).目标是通知的发布者,可以有任意数目的观察者订阅并接收通知。

/django/dispatch/dispatcher.py

References