Transkodieren in Python (Transkodieren in Python), Lektion, Seite 724148
https://www.purl.org/stefan_ram/pub/transkodieren_python (Permalink) ist die kanonische URI dieser Seite.
Stefan Ram

Transkodieren in Python

main.py

import sys

import base64

text = 'plain text'

key = "key"

l = len( key )

y = bytes()

for i, c in enumerate( text ):

k = key[ i % l ]

d = chr( ord( c )^ord( k )).encode()

y += d

print( text := base64.b64encode( y ))

import sys

import base64

text = base64.b64decode( b'cryptotext' ).decode( "utf_8" )

key = "key"

l = len( key )

y = bytes()

for i, c in enumerate( text ):

k = key[ i % l ]

d = chr( ord( c )^ord( k )).encode()

y += d

__import__( "sys" ).stdout.flush()
__import__( "sys" ).stdout.buffer.write(y)
__import__( "sys" ).stdout.flush()

Protokoll

print( ( ''.join( [ f'{x:02x} ' for x in b"source" ])))

print( ( ''.join( [ f'{x:02x} ' for x in "source".encode( 'ascii' ) ])))

print( ( ''.join( [ f'{x:02x} ' for x in __import__( "urllib.parse" ).parse.quote( "source", encoding='iso8859-1' ).encode( 'iso8859-1' ) ])))

bytes.fromhex( '41 42 43' )

bytes.fromhex( '414243' )

main.py
print( __import__( "quopri" ).decodestring( '=9a' ))
Protokoll

b'\x9a'

main.py
print( *"Hi, I'm plain ASCII!".encode( 'iso8859-1' ))
Protokoll

main.py
print( ''.join( [ format( b, '08b' )for b in "Hi, I'm plain ASCII!".encode( 'US-ASCII' )]))
Protokoll
0100100001101001001011000010000001001001001001110110110100100000011100000110110001100001011010010110111000100000010000010101001101000011010010010100100100100001
main.py

from urllib.parse import unquote

print( unquote( 'Z6%2BH%2B%2Ff5OpQ%3D', encoding='iso8859-1' ))

Protokoll
Z6+H+/f5OpQ=

Quoted Printable

Header (Dekodieren von UTF-8/base64)


b64decode erwarte bytes-ähnliches Objekt und ergibt bytes-Objekt

main.py

print()

__import__( "sys" ).stdout.buffer.write\
( __import__( 'base64' ).b64decode( '>=?UTF-8?B?UmU6IGl0J3MgYWJvdXQgIOKApmluZw==?='[ 10: ]))

print()

Protokoll
Re: it's about  …ing
main.py

print()

__import__( "sys" ).stdout.buffer.write\
( __import__( 'base64' ).b64decode( '=?UTF-8?B?SGFhcnfDpHNjaGU=?='[ 10: ]))

print()

Protokoll

Re: it's about …ing


main.py

print()

__import__( "sys" ).stdout.buffer.write\
( __import__( 'base64' ).b64decode( '=?UTF-8?B?bWVhbmluZyBvZiDigJxkaWdlc3TigJ0=?='[ 9: ]))

print()

Protokoll
meaning of “digest”
main.py

print()

__import__( "sys" ).stdout.buffer.write\
( __import__( 'base64' ).b64decode( 'RnJvbSB0aGlzIHllYXIsIHRoZSBudW1iZXIgb2Yg5pWZ6IKy5ryi5a2XIGhhcyBiZWVuIGluY3JlYXNlZCBmcm9tDQoxLDAwNiB0byAxLDAyNi4gSSBoYXZlIGp1c3QgdXBkYXRlZCB0aGUgZGF0YWJhc2UgZHJpdmluZyB0aGUgS2FuamlkaWMNCnZlcnNpb25zIHRvIHJlZmxlY3QgdGhlIGNoYW5nZXMuDQoNCjIwIGFkZGl0aW9uYWwga2FuamkgaGF2ZSBiZWVuIGFkZGVkIHRvIHRoZSA0dGggZ3JhZGUgc2V0Og0K6Iyo44CB5aqb44CB5bKh44CB5r2f44CB5bKQ44CB54aK44CB6aaZ44CB5L2Q44CB5Z+844CB5bSO44CB5ruL44CB6bm/44CB57iE44CB5LqV44CB5rKW44CB5qCD44CB5aWI44CB5qKo44CB6Ziq44CB6ZicDQooYXMgY2FuIGJlIHNlZW4sIG1hbnkgb2NjdXIgaW4gdGhlIG5hbWVzIG9mIHByZWZlY3R1cmVzLikNCg0KMzIga2FuamkgaGF2ZSBiZWVuIG1vdmVkIGJldHdlZW4gZ3JhZGVzOg0KLSDog4MgYW5kIOiFuCBoYXZlIGJlZW4gbW92ZWQgZnJvbSBncmFkZSA0IHRvIGdyYWRlIDYuDQotIOWBnCDlj7Ig5ZGKIOWWnCDlm7Ig5Z6LIOWggiDlo6sg5b6XIOaVkSDmrbQg5q66IOavkiDnsokg57SAIOiEiCDoiKog6LGhIOiyryDosrsg6LOeIGhhdmUNCmJlZW4gbW92ZWQgZnJvbSBncmFkZSA0IHRvIGdyYWRlIDUNCi0g5L+1IOWIuCDmgakg5om/IOaVtSDoiIwg6YCAIOmKrSDpoJAgaGF2ZSBiZWVuIG1vdmVkIGZyb20gZ3JhZGUgNSB0byBncmFkZSA2Lg0KDQpKaW0NCg0KW1NlZTogaHR0cHM6Ly9qYS53aWtpcGVkaWEub3JnL3dpa2kvJUU2JTk1JTk5JUU4JTgyJUIyJUU2JUJDJUEyJUU1JUFEJTk3XQ=='))

print()

Protokoll
From this year, the number of 教育漢字 has been increased from
1,006 to 1,026. I have just updated the database driving the Kanjidic
versions to reflect the changes.

20 additional kanji have been added to the 4th grade set:
茨、媛、岡、潟、岐、熊、香、佐、埼、崎、滋、鹿、縄、井、沖、栃、奈、梨、阪、阜
(as can be seen, many occur in the names of prefectures.)

32 kanji have been moved between grades:
- 胃 and 腸 have been moved from grade 4 to grade 6.
- 停 史 告 喜 囲 型 堂 士 得 救 歴 殺 毒 粉 紀 脈 航 象 貯 費 賞 have
been moved from grade 4 to grade 5
- 俵 券 恩 承 敵 舌 退 銭 預 have been moved from grade 5 to grade 6.

Jim

[See: https://ja.wikipedia.org/wiki/%E6%95%99%E8%82%B2%E6%BC%A2%E5%AD%97]
main.py

print()

__import__( "sys" ).stdout.buffer.write\
( __import__( 'base64' ).b64decode( '=?UTF-8?B?UmU6IFvKjl0gZXQgW8myXSBlbiBmcmFuw6dhaXM=?='[ 10: ]))

print()

Protokoll
Re: [ʎ] et [ɲ] en français
....
main.py

import quopri

# Content-Transfer-Encoding: quoted-printable

# Content-Type: text/plain; charset="UTF-8"

print( quopri.decodestring('=C2=A3').decode('utf-8') )

Protokoll
£
main.py

import quopri

# Content-Transfer-Encoding: quoted-printable

# Content-Type: text/plain; charset="UTF-8"

print( quopri.decodestring('=?UTF-8?Q?...='[ 10: ]).decode('utf-8') )

Protokoll

Kodieren in qp(UTF-8)

main.py

import quopri

print( quopri.encodestring(chr(168).encode('utf-8')) )

Protokoll
b'=C2=A8'

Dekodieren von qp(UTF-8)

main.py

__import__( "sys" ).stdout.buffer.write\
( __import__( 'quopri' ).decodestring( \

'''
>When you print the variable =E2=80=9Ca=E2=80=9D it appears as True, but the program is it is not getting in the if a=3D=3DTrue:
'''
))

Protokoll

>When you print the variable “a” it appears as True, but the program is it is not getting in the if a==True:

Header (Dekodieren von iso-8859-1/base64)

print( __import__( 'base64' ).b64decode( source_str ).decode( 'iso-8859-1' ))

Subject: Re: =?UTF-8?B?SG93IHRvIHJlcGxhY2Ugc3BhY2UgaW4gYSBzdHJpbmcgd2l0aCBcbg==?=

main.py
print( __import__( 'base64' ).b64decode( '=SG93IHRvIHJlcGxhY2Ugc3BhY2UgaW4gYSBzdHJpbmcgd2l0aCBcbg==?=' ).decode( 'utf-8' ))
Protokoll
How to replace space in a string with \n

Hex-Codes

main.py
print( ''.join( [ f'{x:02x} ' for x in b"example" ]))
Protokoll
65 78 61 6d 70 6c 65 
main.py

print( ''.join( [ f'{x:02x} ' for x in b"example" ]))

print( bytes.fromhex( ''.join( [ f'{x:02x} ' for x in b"example" ])))

print( bytes.fromhex( ''.join( [ f'{x:02x} ' for x in "äöüß".encode( 'utf-8' ) ])))

Protokoll

65 78 61 6d 70 6c 65

b'example'

Transkodieren

Dekodieren von iso-8859-1 in UTF-8s

city.decode('cp1252').encode('utf-8')

HTML

import html

s = html.unescape( s )

main.py

# encoding: utf-8

print( __import__( 'html.entities' ).entities.codepoint2name[ ord( 'é' )])

transcript

eacute

main.py

'année'.encode(encoding='ascii',errors='xmlcharrefreplace').decode('ascii')

transcript

'année'

Sonstiges

main.py

script = '''
Import os
Test

def Test():
print(“test”)
os.system(“pause”)
'''

exec\
( script.replace( '“', '"' ).\
replace( '”', '"' ).\
replace( 'I', 'i' ).\
replace( 'Test\n', '' ).\
replace( 'pr', ' pr' ).\
replace( 'os.', ' os.' ) + \
'\nTest()' )

Protokoll

Press any key to continue . . .

test

main.py

# coding=cp1252

# coding of the source file specified above for u umlaut (todo: convert VBA to use UTF-8)

print( chr( 0x0041 ))

print( f"0x{ord( 'ü' ):04x}" )

print( bytes.fromhex('0040, 0041, 0042, 0043'.replace( ',', '' )).decode( 'utf-8' ))

print( list( b'ABC' ))

print( b'ABC'.hex() )

Protokoll

A

0x00fc

&#;@&#;A&#;B&#;C

[65, 66, 67]

414243

main.py

# print( [ chr(int(__import__('ast').literal_eval('0x'+x.strip()))) for x in "231E, 231F, 231C, 231D".split(',') ])

print( flush=True, end='' )

__import__( "sys" ).stdout.buffer.write\

( chr( 0x231E ).encode( 'utf-8' ))

print( flush=True, end='' )

Protokoll
main.py

print( __import__( "sysconfig" ).get_python_version() )

print( __import__( "sys" ).version.split()[ 0 ])

print( 'quoting the documentation (3.6 - 3.8): ' )

print( 'chr(i)' )

print( 'Return the string representing a character whose Unicode code point is the integer i.' )

# print( chr( 0x231E )) # Unicode Character 'BOTTOM LEFT CORNER' (U+231E)

print( flush=True, end='' )

__import__( "sys" ).stdout.buffer.write( chr( 0x231E ).encode( 'utf-8' ))

print( flush=True, end='' )

print( 'Test' )

print( flush=True, end='' )

__import__( "sys" ).stdout.buffer.write( chr( 0x231E ).encode( 'utf-8' ))

print( flush=True, end='' )

print( 'Test' )

print( flush=True, end='' )

__import__( "sys" ).stdout.buffer.write( chr( 0x231E ).encode( 'utf-8' ))

print( flush=True, end='' )

print( 'Test' )

print( flush=True, end='' )

__import__( "sys" ).stdout.buffer.write( chr( 0x231E ).encode( 'utf-8' ))

print( flush=True, end='' )

print( 'Test' )

Protokoll

3.7

3.7.0

quoting the documentation (3.6 - 3.8):

chr(i)

Return the string representing a character whose Unicode code point is the integer i.

{$c8990}Test

{$c8990}Test

{$c8990}Test

{$c8990}Test

main.py

print( bin( 100 ))

print( 0b1100100 )

print( hex( 100 ))

print( 0x64 )

print( oct( 100 ))

print( 0o144 )

print( chr( 65 ))

print( ord( 'A' ))

Protokoll

0b1100100
100
0x64
100
0o144
100
A
65

Notizen

Newsgroups: comp.lang.python,comp.lang.javascript

Here's a small comparing tutorial about URI encoding

and how to use it for repairing mis-encoded strings.

Code examples are given for both JavaScript ("JS") and

Python ("py").

Section 1: Preparations

In Python, we need two import's:

py> from urllib.parse import quote

py> from urllib.parse import unquote

Section 2: One-pass operations

convert a character to its ISO 8859-1 representation and

then represent this in URI notation

JS> escape( 'ä' )

JS> "%E4"

py> quote( 'ä', encoding='iso8859-1' )

py> '%E4'

convert a character to its UTF-8 representation and then

represent this in URI notation

JS> encodeURI( 'ä' )

JS> "%C3%A4"

py> quote( 'ä', encoding='utf-8' )

py> '%C3%A4'

get the character from URI notation assuming it was

encoded using its ISO 8859-1 representation

JS> unescape( '%E4' )

JS> "ä"

py> unquote( '%E4', encoding='iso8859-1' )

py> 'ä'

unquote( '%E4', encoding='iso8859-1' )

get the character from URI notation assuming it was

encoded using its UTF-8 representation

JS> decodeURIComponent( '%C3%A4' )

JS> "ä"

py> unquote( '%C3%A4', encoding='utf-8' )

py> 'ä'

Section 2: Repairing a misencoded character sequence

Sometimes people decode UTF-8 as if it was ISO 8859-1.

This results in ugly strings like »Ã¤« (for »ä«).

JS> unescape( encodeURI( 'ä' ))

JS> "ä"

py> unquote( quote( 'ä', encoding='utf-8' ), encoding='iso8859-1' )

py> 'ä'

But with JavaScript or Python we can repair such strings!

We first URI-encode them to get a kind of octet sequence.

JS> escape( 'ä' )

JS> "%C3%A4"

py> quote( 'ä', encoding='iso8859-1' )

py> '%C3%A4'

Now we can decode this octet sequence using the correct

encoding!

JS> decodeURIComponent( escape( 'ä' ))

JS> "ä"

py> unquote( quote( 'ä', encoding='iso8859-1' ), encoding='utf-8' )

py> 'ä'

Summary

Both Python an JavaScript allow to URI-encode ISO 8859-1

characters using either ISO 8859-1 or UTF-8. They also allow

to decode them again. The details of the function calls

differ somewhat. The explicit mentioning of the encoding in

the Python calls makes them more orthogonal (readable)

than the JavaScript names who do not mention the encodings.

PS:

In JavaScript, there is a difference between "encodeURI" and

"encodeURIComponent" that might be represented in Python as:

encodeURI( s ) --> urllib.parse.quote(s, safe='~@#$&()*!+=:;,.?/\'');

encodeURIComponent( s ) --> urllib.parse.quote(s, safe='~()*!.\'')

.

Seiteninformationen und Impressum   |   Mitteilungsformular  |   "ram@zedat.fu-berlin.de" (ohne die Anführungszeichen) ist die Netzpostadresse von Stefan Ram.   |   Eine Verbindung zur Stefan-Ram-Startseite befindet sich oben auf dieser Seite hinter dem Text "Stefan Ram".)  |   Der Urheber dieses Textes ist Stefan Ram. Alle Rechte sind vorbehalten. Diese Seite ist eine Veröffentlichung von Stefan Ram. Schlüsselwörter zu dieser Seite/relevant keywords describing this page: Stefan Ram Berlin slrprd slrprd stefanramberlin spellched stefanram724148 stefan_ram:724148 Transkodieren in Python Stefan Ram, Berlin, and, or, near, uni, online, slrprd, slrprdqxx, slrprddoc, slrprd724148, slrprddef724148, PbclevtugFgrsnaEnz Erklärung, Beschreibung, Info, Information, Hinweis,

Der Urheber dieses Textes ist Stefan Ram. Alle Rechte sind vorbehalten. Diese Seite ist eine Veröffentlichung von Stefan Ram.
https://www.purl.org/stefan_ram/pub/transkodieren_python